Once all the videos are re-encoded, and you created an Ontology and Dataset you are ready to create an Annotate Project. Once you create a Project you need to create your Data Groups and then your team will be ready to annotate your data.
Creating Data Groups requires mapping your data units to the layout, used during annotation and review. Currently mapping to the layout uses the File ID/UUID of the data unit Encord assigns the data unit.
To find the File ID/UUID of your data units use storage_folder.list_items. The following script provides a way to get the file name and ID of your data units. The output saves to a JSON and CSV file.
List File Name and File ID
Copy
from encord import EncordUserClientimport jsonimport csv# --- Configuration ---SSH_PATH = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SSH private keyFOLDER_ID = "00000000-0000-0000-0000-000000000000" # Replace with the Folder ID# Output file pathsJSON_OUTPUT_PATH = "/file/path/to/save/file_mapping.json" # Update this as requiredCSV_OUTPUT_PATH = "/file/path/to/save/file_mapping.csv" # Update this as required# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH, # For US platform users use "https://api.us.encord.com" domain="https://api.encord.com",)# Find the storage folder by namefolder_name = FOLDER_IDfolders = list(user_client.find_storage_folders(search=folder_name, page_size=1000))# Ensure the folder was foundif folders: storage_folder = folders[0] # List all data units items = list(storage_folder.list_items()) # Create a list of dicts for structured output file_data = [ { "file_id": str(item.uuid), # Convert UUID to string "file_name": item.name, "file_type": item.item_type } for item in items ] # --- Save to JSON File --- with open(JSON_OUTPUT_PATH, "w") as f: json.dump(file_data, f, indent=4) # --- Save to CSV File --- with open(CSV_OUTPUT_PATH, "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=["file_id", "file_name", "file_type"]) writer.writeheader() writer.writerows(file_data) print(f"Saved output to:\n- {JSON_OUTPUT_PATH}\n- {CSV_OUTPUT_PATH}")else: print("Folder not found.")
Use the output file from the Map Data Units for Data Groups section to map File IDs to their corresponding layout for Data Groups.
Use the script in this section to create Data Groups, add those Data Groups to a Dataset, and add the Dataset to a Project.
The script creates Data Groups with five data units in the following layout:
Copy
+-------------------------------------------+| text file |+------------------+------------------------+| video 1 | video 2 |+------------------+------------------------+| video 3 | video 4 |+------------------+------------------------+
To create Data Groups the File Ids for data units need to be mapped to the Data Group.
Refer to the following:
Copy
# --- Group definitions (name + UUIDs) ---groups = [ { "name": "group-001", "uuids": { "instructions": UUID("00000000-0000-0000-0000-000000000000"), # Replace with File ID of clustered_event_log_01.txt "top-left": UUID("11111111-1111-1111-1111-111111111111"), # Replace with File ID of 00001_normalized.mp4 "top-right": UUID("22222222-2222-2222-2222-222222222222"), # Replace with File ID of 00002_normalized.mp4 "bottom-left": UUID("33333333-3333-3333-3333-333333333333"), # Replace with File ID of 00009.mp4 "bottom-right": UUID("44444444-4444-4444-4444-444444444444"), # Replace with File ID of 00011_normalized.mp4 }, }, { "name": "group-002", "uuids": { "instructions": UUID("55555555-5555-5555-5555-555555555555"), # Replace with File ID of clustered_event_log_02.txt "top-left": UUID("66666666-6666-6666-6666-666666666666"), # Replace with File ID of 00012.mp4 "top-right": UUID("77777777-7777-7777-7777-777777777777"), # Replace with File ID of 00020.mp4 "bottom-left": UUID("88888888-8888-8888-8888-888888888888"), # Replace with File ID of 00030.mp4 "bottom-right": UUID("99999999-9999-9999-9999-999999999999"), # Replace with File ID of 00033.mp4 }, }, { "name": "group-003", "uuids": { "instructions": UUID("12312312-3123-1231-2312-312312312312"), # Replace with File ID of clustered_event_log_03.txt "top-left": UUID("23232323-2323-2323-2323-232323232323"), # Replace with File ID of 00034.mp4 "top-right": UUID("31313131-3131-3131-3131-313131313131"), # Replace with File ID of 00035_normalized.mp4 "bottom-left": UUID("45645645-6456-4564-5645-645645645645"), # Replace with File ID of 00038_normalized.mp4 "bottom-right": UUID("56565656-6565-5656-6565-656565656565 "), # Replace with File ID of 00045.mp4 }, }, # More groups...]
Run this script to create Data Groups:
Copy
from uuid import UUIDfrom encord.constants.enums import DataTypefrom encord.objects.metadata import DataGroupMetadatafrom encord.orm.storage import DataGroupCustom, StorageItemTypefrom encord.user_client import EncordUserClient# --- Configuration ---SSH_PATH = "/Users/chris-encord/ssh-private-key.txt" # Replace with the file path to your SSH keyFOLDER_ID = "00000000-0000-0000-0000-000000000000" # Replace with the Folder IDDATASET_ID = "00000000-0000-0000-0000-000000000000" # Replace with the Dataset IDPROJECT_ID = "00000000-0000-0000-0000-000000000000" # Replace with the Project ID# --- Connect to Encord ---user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH, # For US platform users use "https://api.us.encord.com" domain="https://api.encord.com",)folder = user_client.get_storage_folder(FOLDER_ID)# --- Reusable layout and settings ---layout = { "direction": "column", "first": {"type": "data_unit", "key": "instructions"}, "second": { "direction": "column", "first": { "direction": "row", "first": {"type": "data_unit", "key": "top-left"}, "second": {"type": "data_unit", "key": "top-right"}, "splitPercentage": 50, }, "second": { "direction": "row", "first": {"type": "data_unit", "key": "bottom-left"}, "second": {"type": "data_unit", "key": "bottom-right"}, "splitPercentage": 50, }, "splitPercentage": 50, }, "splitPercentage": 20,}settings = {"tile_settings": {"instructions": {"is_read_only": True}}}# --- Group definitions (name + UUIDs) ---groups = [ { "name": "group-001", "uuids": { "instructions": UUID("00000000-0000-0000-0000-000000000000"), # Replace with File ID of clustered_event_log_01.txt "top-left": UUID("11111111-1111-1111-1111-111111111111"), # Replace with File ID of 00001_normalized.mp4 "top-right": UUID("22222222-2222-2222-2222-222222222222"), # Replace with File ID of 00002_normalized.mp4 "bottom-left": UUID("33333333-3333-3333-3333-333333333333"), # Replace with File ID of 00009.mp4 "bottom-right": UUID("44444444-4444-4444-4444-444444444444"), # Replace with File ID of 00011_normalized.mp4 }, }, { "name": "group-002", "uuids": { "instructions": UUID("55555555-5555-5555-5555-555555555555"), # Replace with File ID of clustered_event_log_02.txt "top-left": UUID("66666666-6666-6666-6666-666666666666"), # Replace with File ID of 00012.mp4 "top-right": UUID("77777777-7777-7777-7777-777777777777"), # Replace with File ID of 00020.mp4 "bottom-left": UUID("88888888-8888-8888-8888-888888888888"), # Replace with File ID of 00030.mp4 "bottom-right": UUID("99999999-9999-9999-9999-999999999999"), # Replace with File ID of 00033.mp4 }, }, { "name": "group-003", "uuids": { "instructions": UUID("12312312-3123-1231-2312-312312312312"), # Replace with File ID of clustered_event_log_03.txt "top-left": UUID("23232323-2323-2323-2323-232323232323"), # Replace with File ID of 00034.mp4 "top-right": UUID("31313131-3131-3131-3131-313131313131"), # Replace with File ID of 00035_normalized.mp4 "bottom-left": UUID("45645645-6456-4564-5645-645645645645"), # Replace with File ID of 00038_normalized.mp4 "bottom-right": UUID("56565656-6565-5656-6565-656565656565 "), # Replace with File ID of 00045.mp4 }, }, # More groups...]# Create the data groupsfor g in groups: group = folder.create_data_group( DataGroupCustom( name=g["name"], layout=layout, layout_contents=g["uuids"], settings=settings, ) ) print(f"✅ Created group '{g['name']}' with UUID {group}")# Add all the data groups in a folder to a Datasetgroup_items = folder.list_items(item_types=[StorageItemType.GROUP])d = user_client.get_dataset(DATASET_ID)d.link_items([item.uuid for item in group_items])# Add the Dataset with the Data Groups to a Projectp = user_client.get_project(PROJECT_ID)rows = p.list_label_rows_v2(include_children=True)# Label Rows of Data Groups use DataGroupMetadata for the layout to Annotate and Reviewfor row in rows: if row.data_type == DataType.GROUP: row.initialise_labels() assert isinstance(row.metadata, DataGroupMetadata) print(row.metadata.children)
Annotation of videos depends on your Ontology. Our Ontology E2E Data Groups uses classifications.
In this section, you’ll see the following Collaborators:
Annotators labelling data
Reviewers reviewing labels created by Annotators
Team Manager managing the Annotators and Reviewers
Project Admin managing the Project and exporting labels
1
Prepare to Label
Team Manager or Project Admin
The Team Manager or Project Admin can prioritize certain data to be labelled and reviewed first. Let’s prioritize a few Data Groups to be labelled first by setting the priority for those files to 75.
Set Priority to 75
Annotators
Annotators can configure the Annotate Label Editor so they can more effectively and efficiently label data.
2
Label Data
Team Manager or Project Admin
The Team Manager or Project Admin can monitor the performance and progress of the annotation team.
Annotators
Annotators use the text file to determine if the Prediction and Summary for each video are correct.
Use hotkeys to speed up your annotation process.
3
Review Labels
Team Manager or Project Admin
The Team Manager or Project Admin can monitor the performance and progress of the review team.
Review Labels
Reviewers verify that the labels are correct.
You can approve labels/classifications on a task one at a time (from the left panel) or all at once (using the Approval all button).
When there is an issue with labels/classifications, Reviewers can:
Reject the task and add a comment about why a task was rejected. Rejected tasks go back to the person who added the labels/classifications.
Edit labels directly using the Edit labels button and then approve the task.
4
Export Labels
Only Project Admins can export labels from Encord.