Tabular Data - E2E - Basic Cloud Storage
Why do this?
You want to understand how to create Projects in Encord that use Tabular Data (CSV files). This example assumes your Tabular Data is stored in cloud storage.
Pros and Cons
Pros | Cons |
---|---|
|
|
Tabular data currently supports CONSENSUS Projects only.
Import/Register Data
We’re going to register our tiny dataset of CSV files.
Create Integration
Select your cloud provider.
Download Data
Download and extract the contents of e2e-tabular-data.zip file.
Modify JSON
Modify the tabular-data.json
file in the e2e-tabular-data.zip file.
- Open the
tabular-data.json
file and replace<file-path>
with the file path to the data stored in your cloud storage.
tabular-data.json
file includes the file path and title for each CSV file. It does NOT include clientMetadata
.Create a Mirrored Dataset
Create a mirrored Dataset called E2E - Tabular Data - Dataset
using the UI. Using mirrored Datasets is a simple way to sync data from folders to Datasets. Mirrored Datasets provide no method of curating or managing your data.
If you want to add more data to your Dataset, add more data to the JSON file. Then re-import the JSON file and data automatically gets added to your Dataset and Project.
Register/Import Data
Use the tabular-data.json
, from the e2e-tabular-data.zip
, to register/import the data to the mirrored Dataset.
Create Ontology
For this step you need the following:
genre-options.csv
andplatform-options.csv
from thee2e-tabular-data.zip
file.- One
video_game_annotation_X.csv
from thee2e-tabular-data.zip
file. tabular_create_ontology.py
script. You create this.
The tabular_create_ontology.py
script does the following:
- Creates the Ontology based on the structure of any of the
video_game_annotation_X.csv
files. - Creates feature mapping for the genre column using
genre-options.csv
. - Creates feature mapping for the platform column using
platform-options.csv
.
E2E - Tabular Data - Ontology
appears in your Ontology list after running the script.
Create Project
Create a CONSENSUS Project, after creating the Mirrored Dataset and registering/importing the CSV files, and creating the Ontology.
- Tabular data currently supports CONSENSUS Projects only.
- An AGENT block must be the first block for tabular data.
- The AGENT block and AGENT pathway MUST be the exact name specified below.
- Name:
E2E - Tabular Data - Project
- Agent name:
Pre-label
- Agent pathway:
Labelled
Run the Agent script
The tabular_run_agent.py
populates tasks in the AGENT block in your workflow.
Create the following Python scripts. Both scripts must be in the same directory.
tabular_run_agent.py
tabular_utils.py
After creating the scripts, run the tabular_run_agent.py
script.
After running the script, tasks that were in the AGENT stage are now in the CONSENSUS - ANNOTATE stage.
Annotate Tabular Project
Annotation of CSV files depends on your Ontology. Our Ontology E2E - Tabular Data - Ontology
uses text regions, but your annotators and reviewers use drop downs.
In this section, you’ll see the following Collaborators:
- Annotators labelling data
- Reviewers reviewing labels created by Annotators
- Team Manager managing the Annotators and Reviewers
- Project Admin managing the Project and exporting labels
Prepare to Label
Team Manager or Project Admin
Team Manager or Project Admin
The Team Manager or Project Admin can prioritize certain data to be labelled and reviewed first. Let’s prioritize two data units to be labelled first by setting the priority for those files to 75
.
Set Priority to 75
Annotators
Annotators
Annotators can sort by priority, search by file name, or filter Dataset or Issue Status.
Label Data
Team Manager or Project Admin
Team Manager or Project Admin
The Team Manager or Project Admin can monitor the performance and progress of the annotation team.
Annotators
Annotators
Annotators use drop downs to select the genre and platform for each row.
Review Labels
Team Manager or Project Admin
Team Manager or Project Admin
The Team Manager or Project Admin can monitor the performance and progress of the review team.
Review Labels
Review Labels
Reviewers verify that the labels are correct.
When there is an issue with labels/classifications, Reviewers can:
- Reject the task and add a comment about why a task was rejected. Rejected tasks go back to the person who added the labels/classifications.
- Edit labels directly using the Edit labels button and then approve the task.
Export Labels
Only Project Admins can export labels from Encord.
Project Admin
Project Admin
Export the labels from the Project Labels page.