Tabular data Projects work a little differently than typical Projects in Encord. Annotators and Reviewers select from options columns for each row. You can use multiple columns for selection.
Modify the following script example to create your Ontology.
Items of Interest
Notes
READ_ONLY_COLUMNS
Specifies the columns you want your Annotators and Reviewers to see in the Label Editor.
Column count starts at 0.
Omit the columns in your CSV you do not want your Annotators and Reviewers to see.
ANNOTATION_COLUMNS
Specifies the columns your Annotators and Reviewers use to label data from. Your Annotators and Reviewers select answers from a drop down in these columns.
Specify the options available to Annotators and Reviewers using the files in MAPPING_FIELD_OPTION_PATHS. These files are single columnm files with one option available on each row.
ONTOLOGY_NAME
Specifies the name for your Ontology.
OBJECT_NAME
Specifies the name of the text region for each row in your CSV file. The script applies a label to each row in your CSV file using this text region.
tabular_create_ontology script
Copy
import pandas as pdfrom encord.objects import OntologyStructure, Shape, TextAttributefrom encord.objects.attributes import RadioAttributefrom encord.user_client import EncordUserClient# --- Configuration ---ENCORD_SSH_KEY = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SHH private keyTASK_CSV_PATH = "/file/path/to/video_game_annotation_1.csv" # Replace with the file path to any of the video_game_annotation_X.csv filesREAD_ONLY_COLUMNS = [0, 1, 2]ANNOTATION_COLUMNS = [3, 4]# Replace these paths with actual mapping column name > options fileMAPPING_FIELD_OPTION_PATHS = { "genre": "/file/path/to/genre-options.csv", "platform": "/file/path/to/platform-options.csv",}ONTOLOGY_NAME = "E2E - Tabular Data - Ontology"OBJECT_NAME = "Game Row"def parse_csv(): csv_df = pd.read_csv(TASK_CSV_PATH) readonly_columns = csv_df.columns[READ_ONLY_COLUMNS].tolist() mapping_columns = csv_df.columns[ANNOTATION_COLUMNS].tolist() return mapping_columns, readonly_columnsdef create_ontology(text_attribute_names, radio_option_names): ontology_structure = OntologyStructure() text_object = ontology_structure.add_object(name=OBJECT_NAME, shape=Shape.TEXT) for attribute in text_attribute_names: text_object.add_attribute(TextAttribute, attribute) for column_name in radio_option_names: options_path = MAPPING_FIELD_OPTION_PATHS.get(column_name) if options_path is None: raise ValueError(f"No options file defined for column '{column_name}'") options = pd.read_csv(options_path).iloc[:, 0].dropna().astype(str).tolist() radio_attribute = text_object.add_attribute(RadioAttribute, column_name, required=True) for option in options: radio_attribute.add_option(option) user_client = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=ENCORD_SSH_KEY, domain="https://api.encord.com", ) return user_client.create_ontology(ONTOLOGY_NAME, structure=ontology_structure)if __name__ == "__main__": mapping_columns, readonly_columns = parse_csv() ontology = create_ontology(readonly_columns, mapping_columns) print(f"Created ontology {ontology.title}, id: {ontology.ontology_hash}")
The tabular_run_agent.py populates tasks in the AGENT block in your workflow.Create the following Python scripts. Both scripts must be in the same directory.
tabular_run_agent.py
tabular_utils.py
After creating the scripts, run the tabular_run_agent.py script.After running the script, tasks that were in the AGENT stage are now in the CONSENSUS - ANNOTATE stage.
Items of Interest
Notes
AGENT_STAGE
Specifies the name of the AGENT block in your Tabular Data Project. This name must exactly match the name of the AGENT block in your Project.
AGENT_PATHWAY
Specifies the name of the Pathway in your AGENT block. This name must exactly match the name of the pathway in the AGENT block in your Project.
Copy
from typing import Annotatedfrom pathlib import Pathimport osfrom encord_agents.tasks import Runnerfrom encord.objects.ontology_labels_impl import LabelRowV2from encord.project import Projectfrom encord_agents.tasks.dependencies import dep_assetfrom encord_agents.core.dependencies import Dependsfrom encord.objects.common import Shapefrom tabular_utils import parse_csv_and_add_objects# --- Configuration ---ENCORD_SSH_KEY = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SSH private keyPROJECT_HASH = "00000000-0000-0000-0000-000000000000" # Replace with unique Project ID of the tabular data ProjectAGENT_STAGE = "Pre-label"AGENT_PATHWAY = "Labelled"# Inject into environment so Encord Agents can pick it upos.environ["ENCORD_SSH_KEY_FILE"] = ENCORD_SSH_KEYrunner = Runner(project_hash=PROJECT_HASH)@runner.stage(stage=AGENT_STAGE)def agent_logic( lr: LabelRowV2, project: Project, asset: Annotated[Path, Depends(dep_asset)]): ontology = project.ontology_structure text_object = ontology.objects[0] if text_object is None: raise Exception("No objects found") elif text_object.shape is not Shape.TEXT: raise Exception("Text object required") parse_csv_and_add_objects(text_object, lr, asset) return AGENT_PATHWAYif __name__ == "__main__": runner.run()