Encordβs multimodal platform and configurable label editor make it possible to support data curation and annotation workloads on various types of media, and in various annotation configurations. Follow the end to end walkthrough below to learn how to setup an Audio annotation workload where each task allows you to directly compare two audio files. This could be applicable for example when rating two audio files against each other, or in RLHF flows.
Custom metadata can store any information which is not the data itself in a key-value format. Custom metadata is often used to describe information useful in curation and management at scale. Encord also uses metadata to create annotation specific layouts in the editor. You set up both using a metadata schema.
To handle your custom metadata schema across multiple teams within the same Workspace, we recommend using namespacing for metadata keys in the schema. This ensures that different teams can define and manage their own metadata schema without conflicts. For example, team A could use video.description, while team B could use audio.description. Another example could be TeamName.MetadataKey. This approach maintains clarity and avoids key collisions across departments.
Metadata Schema keys support letters (a-z, A-Z), numbers (0-9), and blank spaces ( ), hyphens (-), underscores (_), and periods (.). Metadata schema keys are case sensitive.
Use add_scalar to add a scalar key to your metadata schema.
Scalar Key
Description
Display Benefits
boolean
Binary data type with values βtrueβ or βfalseβ.
Filtering by binary values
datetime
ISO 8601 formatted date and time.
Filtering by time and date
number
Numeric data type supporting float values.
Filtering by numeric values
uuid
UUIDv4 formatted unique identifier for a data unit.
Filtering by customer specified unique identifier
varchar
Textual data type. Formally string. string can be used as an alias for varchar, but we STRONGLY RECOMMEND that you use varchar.
Displaying data correctly in custom Label Editor layouts and filtering by string.
text
Text data with unlimited length (example: transcripts for audio). Formally long_string. long_string can be used as an alias for text, but we STRONGLY RECOMMEND that you use text.
Storing and filtering large amounts of text.
Use add_enum and add_enum_options to add an enum and enum options to your metadata schema.
Key
Description
Display Benefits
enum
Enumerated type with predefined set of values.
Facilitates categorical filtering and data validation
Use add_embedding to add an embedding to your metadata schema.
Incorrectly specifying a data type in the schema can cause errors when filtering your data in Index or Active. If you encounter errors while filtering, verify your schema is correct. If your schema has errors, correct the errors, re-import the schema, and then re-sync your Active Project.
# Import dependenciesfrom encord import EncordUserClientfrom encord.metadata_schema import MetadataSchemaSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH)# Create the schemametadata_schema = user_client.metadata_schema()# Add display parameters for data appearing in custom editor layoutsmetadata_schema.add_scalar("encord-layout-group", data_type="varchar")metadata_schema.add_scalar("encord-editor-grid-position", data_type="varchar")# Add various metadata fieldsmetadata_schema.add_scalar("metadata_1", data_type="boolean")metadata_schema.add_scalar("metadata_2", data_type="datetime")metadata_schema.add_scalar("metadata_3", data_type="number")metadata_schema.add_scalar("metadata_4", data_type="uuid")metadata_schema.add_scalar("metadata_5", data_type="varchar")metadata_schema.add_scalar("metadata_6", data_type="text")# Add an enum fieldmetadata_schema.add_enum("my-enum", values=["enum-value-01", "enum-value-02", "enum-value-03"])# Add embedding fieldsmetadata_schema.add_embedding('my-test-active-embedding', size=512)metadata_schema.add_embedding('my-test-index-embedding', size=<values-from-1-to-4096>)# Save the schemametadata_schema.save()# Print the schema for verificationprint(metadata_schema)
For a list of supported file formats for each data type, go here.
Waveform generation for long audio files can cause lag. To avoid this, generate the waveform offline and upload it with the audio file. For more information go here.
The following is an example JSON file for uploading two audio files to Encord. Both files include clientMetadata to ensure they display correctly in custom Label Editor layouts.
The "encord-layout-group" key determines which files are shown together β files with the same "encord-layout-group" value are displayed simultaneously. Both files in the example appears in the Label Editor simultaneously since they have the same "encord-layout-group" value.
The "encord-editor-grid-position" key, set to either A or B, specifies the exact position of each file within the label editor.
One audio file includes audioMetadata and one does not.
We strongly recommend including audioMetadata with each audio file when importing your audio files at scale. Including audioMetadata significantly improves the import speed of data when importing data at scale. When the audioMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.
To ensure smoother uploads and faster completion times, and avoid hitting absolute file limits, we recommend adding smaller batches of data. Limit uploads to 100 videos or up to 1,000 images at a time. You can also create multiple Datasets, all of which can be linked to a single Project. Familiarize yourself with our limits and best practices for data import/registration before adding data to Encord.
Navigate to Files section of Index in the Encord platform.
Click into a Folder.
Click + Upload files.
A dialog appears.
Click Import from cloud data.
We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.
Click Add JSON or CSV files to add a JSON or CSV file specifying cloud data that is to be added.
To use your data in Encord, it must be uploaded to the Encord Files storage. Once uploaded, your data can be reused across multiple Projects and contain no labels or annotations themselves. Files stores your data, while Projects store your labels. The following script creates a folder in Files and uses your AWS integration to register data in that folder.The following script creates a new folder in Files and initiates uploads from AWS. It works for all file types.
If Upload is still in progress, try again later! is returned, use the
script to check the upload status to see whether the upload has finished.
Ensure that you:
Replace <private_key_path> with the path to your private key.
Replace <integration_title> with the title of the integration you want to use.
Replace <folder_name> with the folder name. The scripts assume that the specified folder name is unique.
Replace A folder to store my files with a meaningful description for your folder.
Replace "my": "folder_metadata" with any metadata you want to add to the folder.
The script has several possible outputs:
βUpload is still in progress, try again later!β: The registration has not finished. Run this script again later to check if the data registration has finished.
βUpload completedβ: The registration completed. If any files failed to upload, the URLs are listed.
βUpload failedβ: The entire registration failed, and not just individual files. Ensure your JSON file is formatted correctly.
Copy
# Import dependenciesfrom encord import EncordUserClientfrom encord.orm.dataset import LongPollingStatus # Ensure correct import path# Instantiate user client. Replace <private_key_path> with the path to your private keyuser_client = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path="<private_key_path>")# Specify the integration you want to useintegrations = user_client.get_cloud_integrations()integration_idx = [i.title for i in integrations].index("<integration_title>")integration = integrations[integration_idx].id# Create a storage folderfolder_name = "<folder_name>"folder_description = "A folder to store my files"folder_metadata = {"my": "folder_metadata"}storage_folder = user_client.create_storage_folder( folder_name, folder_description, client_metadata=folder_metadata)# Initiate cloud data registrationupload_job_id = storage_folder.add_private_data_to_folder_start( integration_id=integration, private_files="path/to/json/file.json", ignore_errors=True)# Check upload statusres = storage_folder.add_private_data_to_folder_get_result(upload_job_id, timeout_seconds=5)print(f"Execution result: {res}")if res.status == LongPollingStatus.PENDING: print("Upload is still in progress, try again later!")elif res.status == LongPollingStatus.DONE: print("Upload completed") if res.unit_errors: print("The following URLs failed to upload:") for e in res.unit_errors: print(e.object_urls)else: print(f"Upload failed: {res.errors}")
To ensure smoother uploads and faster completion times, and avoid hitting absolute file limits, we recommend adding smaller batches of data. Limit uploads to 100 videos or up to 1,000 images at a time. You can also create multiple Datasets, all of which can be linked to a single Project. Familiarize yourself with our limits and best practices for data import/registration before adding data to Encord.
Navigate to the Datasets section under the Annotate heading.
Click the Dataset you want to attach data to.
Click +Attach existing files.
If the files you want have not been uploaded into Encord yet, click +Upload files to upload new files.
Select the folders containing the files you want to attach to the Dataset. To select individual files, double-click a folder to see its contents, and select the files you want to add to the Dataset.
Click Attach data to attach the selected files to the Dataset.
In the Encord platform, select Projects under Annotate.
Click the + New annotation project button to create a new Project.
Give the Project a meaningful title and description.
An optional Project tags drop-down is visible. Project tags are useful for categorizing and finding your Projects. Select as many tags as are relevant for your Project.
Click the Attach ontology button.
Select the Ontology you created in STEP 5 and click the Attach button.
Click OK to attach the Ontology to the Project.
Click the Attach datasets button.
Select the Dataset you created in STEP 4 and click the Attach button.
Click OK to attach the Dataset(s) to the Project.
Ensure the default Workflow shown suits your needs.
Click Create project to finish creating the Project.
More than a single file appears in the Label Editor. Switch back and forth between the files you are labeling or reviewing using the Annotate from this tile icon.
Label audio files using the Classification specified from your Ontology.
To label audio files:
We recommend using hotkeys to speed up and streamline your labeling process. For this example, the Classification hot keys are 2 and Q, but for you they might be different.
Go to Project > [Your Project Title] > Queue > Annotate.The Project Annotate Queue appears with a list of audio files for labeling.
Click Initiate next to a file from the Project Queue.Two audio files appear in the Label Editor.
Press 2 to select the Winner Classification.The options for the Classification appear.
Press Q to select Yes.
Click and then drag the sections of the audio file that appears at the top of the Label Editor.
Press N to save the Classifications on the audio file.
Click the Annotate from this tile icon to switch to annotating the bottom audio file.
Press Q to select Yes.
Click and then drag the sections of the audio file that appears at the top of the Label Editor.
Press N to save the Classifications on the audio file.
Click Submit.The next set of audio files for annotation appears.
Approving tasks moves the tasks to the next stage in a workflow. In this example, that is Complete.Rejecting tasks sends the tasks back to the first stage in a workflow. In this example, that is Annotate.