Custom metadata, also known as client metadata, is supplementary information you can add to all data imported/registered with Encord. It is provided in the form of a Python dictionary, as shown in the examples. This metadata serves several key functions:
Before you can filter your data or create a Collection based on your data’s custom metadata, the custom metadata must exist in your Annotate Project.
This content applies to custom metadata (clientMetadata), which is the metadata associated with individual data units. This is distinct from videoMetadata that is used to specify video parameters when using Strict client-only access. It is also distinct from patient metadata in DICOM files.
Custom metadata (clientMetadata) is accessed by specifying the dataset using the <dataset_hash>. All Projects that have the specified Dataset attached contain custom metadata.
While not required, we strongly recommend importing a metadata schema before importing custom metadata into Encord. The process we recommend:
Import a metadata schema. If a metadata schema already exists, you can import metadata. You can run a small piece of code to verify that a metadata schema exists
BEST PRACTICE: If you want to use Index or Active with your video data, we STRONGLY RECOMMEND using key frames, custom metadata, and custom embeddings. When specifying key frames set the sampling_rate to 0. This imports only the first frame and any key frames you specify in the video. This can significantly speed up the import of your data into Active and Index and help you to focus on only data you identify as critical.
The following table provides some guidance for the examples provided after the table.
Title
Description
Template
Provides the proper JSON format to import videos into Encord. This template provides examples from the most basic to the most complex.
Data
Imports videos into Encord.
Why would I do this?
You ONLY want to add labels and classifications to your data.
You DO NOT want to use Index or Active.
Key Frames
Imports videos with an Encord title and specifies key frames (frames of interest) for Active and Index.
Why would I do this?
You ONLY want to see frames that you deem critical in Active and Index.
You want to significantly improve the time to import videos into Active and Index.
config is optional when specifying key frames for Active and Index:
Specifying a sampling_rate of 0 only imports the first frame and all key frames of your video into Active and Index.
"config": { "sampling_rate": "<samples-per-second>", "keyframe_mode": "frame" or "seconds", },
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Custom Metadata
Imports videos with an Encord title, specifies key frames (frames of interest), and custom metadata for Active and Index.
Custom metadata can be applied to the entire video or individual frames in the video.
Why would I do this?
Importing custom metadata allows you to filter your data in Active and Index to make it easier to find the data you want to focus on. This speeds up creating Collections and by extension Datasets.
Specifying key frames means you ONLY want to see frames that you deem critical in Active and Index AND you want to significantly improve the time to import videos into Active and Index.
config is optional when specifying key frames for Active and Index:
Specifying a sampling_rate of 0 only imports the first frame and all key frames of your video into Active and Index.
"config": { "sampling_rate": "<samples-per-second>", "keyframe_mode": "frame" or "seconds", },
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Embeddings
Imports videos with an Encord title, specifies key frames (frames of interest), custom metadata, and custom embeddings for Active and Index. This example includes the following custom metadata types: boolean, varchar, datetime, uuid, number.
Why would I do this?
Importing custom embeddings allows you to use scatter plots to examine your data AND allows you to use similarity search and natural language searches. Index supports embedding dimensions 1 to 4096, while Active supports embedding dimensions 1 to 2000.
Importing custom metadata allows you to filter your data in Active and Index to make it easier to find the data you want to focus on. This speeds up creating Collections and by extension Datasets.
Specifying key frames means you ONLY want to see frames that you deem critical in Active and Index AND you want to significantly improve the time to import videos into Active and Index.
config is optional when specifying custom embeddings for Active and Index:
Specifying a sampling_rate of 0 only imports the first frame and all key frames of your video into Active and Index.
"config": { "sampling_rate": "<samples-per-second>", "keyframe_mode": "frame" or "seconds", },
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Imports videos with the videoMetadata flag. When the videoMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.
The following is an example JSON file for uploading two audio files to Encord.
Template: Imports audio files with an Encord title, and with custom metadata. Custom metadata only appears in the Encord UI in Active and Index as an option to filter your data.
Audio Metadata: Imports one audio file with the audiometadata flag. When the audiometadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.
The following is an example JSON file for uploading PDFs to Encord.
Template: Imports PDFs with an Encord title, and with custom metadata. Custom metadata only appears in the Encord UI in Active and Index as an option to filter your data.
Data: Imports two PDFs with no title or custom metadata.
Custom Metadata: Imports two pdfs with a title and custom metadata.
The following is an example JSON file for uploading text files to Encord.
Template: Imports text files with an Encord title, and with custom metadata. Custom metadata only appears in the Encord UI in Active and Index as an option to filter your data.
Data: Imports two text files with no title or custom metadata.
Custom Metadata: Imports two text files with a title and custom metadata.
For detailed information about the JSON file format used for import go here.
The JSON structure for single images parallels that of videos.
Template: Provides the proper JSON format to import images into Encord.
Examples:
Data Imports the images only.
Custom Metadata: Imports images with an Encord title for the images and with custom metadata for each image. Custom metadata only appears in Active and Index as an option to filter your data. This example includes the following custom metadata types: boolean, varchar, datetime, uuid, number.
Embeddings: Imports images with an Encord title, custom metadata, and custom embeddings for each image. This example includes the following custom metadata types: boolean, varchar, datetime, uuid, number.
Image Metadata: Imports images with image metadata. This improves the import speed for your images.
For detailed information about the JSON file format used for import go here.
Image groups are collections of images that are processed as one annotation task.
Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
Image groups do NOT require ‘write’ permissions to your cloud storage.
Custom metadata is defined per image group, not per image. See our documentation here to learn how to add clientMetadata to images in an image group.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image groups in the dataset are skipped.
The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).
Template: Provides the proper JSON format to import image groups into Encord.
Examples:
Data: Imports the image groups only.
Custom Metadata: Imports image groups with an Encord title for the image groups and with custom metadata for each image. Custom metadata only appears in Active and Index as an option to filter your data. This example includes the following custom metadata types: boolean, varchar, datetime, uuid, number.
For detailed information about the JSON file format used for import go here.
Image sequences are collections of images that are processed as one annotation task and represented as a video.
Images within image sequences may be altered as images of varying sizes and resolutions are made to match that of the first image in the sequence.
Creating Image sequences from cloud storage requires ‘write’ permissions, as new files have to be created in order to be read as a video.
Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
Custom client metadata is defined per image sequence, not per image.
If skip_duplicate_urls is set to true, all URLs exactly matching existing image sequences in the dataset are skipped.
The only difference between adding image groups and image sequences using a JSON file is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.
The position of each image within the sequence needs to be specified in the key (objectUrl_{position_number}).
Encord supports up to 32,767 entries (21:50 minutes) for a single image sequence. We recommend up to 10,000 to 15,000 entries for a single image sequence for best performance. If you need a longer sequence, we recommend using video instead of an image sequence.
Template: Provides the proper JSON format to import image groups into Encord.
** Examples:**
Data: Imports the images groups only.
Custom Metadata: Imports image groups and custom metadata. This example includes the following custom metadata types: boolean, varchar, datetime, uuid, number.
For detailed information about the JSON file format used for import go here.
Each dicom_series element can contain one or more DICOM series.
Each series requires a title and at least one object URL, as shown in the example below.
If skip_duplicate_urls is set to true, all object URLs exactly matching existing DICOM files in the dataset will be skipped.
Custom metadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specific during the upload to Encord.
The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.
The first series contains only a single object URL, as it is composed of a single file.
The second series contains 3 object URLs, as it is composed of three separate files.
The third series contains 2 object URLs, as it is composed of two separate files.
For each DICOM upload, an additional DicomSeries file is created. This file represents the series file-set. Only DicomSeries are displayed in the Encord application.
We recommend importing custom metadata when you import your data, because importing with your data can significantly save you time when importing at scale. However, you can import custom metadata on data that already exists in Encord.
Importing with Custom Embeddings
You can import custom embeddings with custom metadata. When importing custom embeddings with custom metadata keep the following in mind:
config is optional when importing your custom embeddings:
"config": { "sampling_rate": "<samples-per-second>", "keyframe_mode": "frame" or "seconds",},
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.
Examples
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundlefrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)updates = { # Imports custom metadata "<data-hash-1>": {"metadata-1": "value", "metadata-2": "value"}, "<data-hash-2>": {"metadata-1": "value", "metadata-2": "value"}, # Imports custom metadata and specifies key frames for Active and Index "<data-hash-3>": { "metadata-1": "value", "metadata-2": "value", "$encord": { "frames": {"111", "113", "117", "119"} } }, # Imports custom metadata and specifies key frames, custom metadata on frames, and the custom embeddings for those key frames "<data-hash-3>": { "metadata-1": "value", "metadata-2": "value", "$encord": { "frames": { "<frame-number-or-seconds>": { "metadata-1": "value", "metadata-2": "value", "<my-embedding>": [1.0, 2.0, 3.0] }, "<frame-number-or-seconds>": { "metadata-1": "value", "metadata-2": "value", "<my-embedding>": [1.0, 2.0, 3.0] } } } }, # Imports custom metadata and specifies key frames and the custom embeddings for those key frames "<data-hash-4>": { "metadata-1": "value", "metadata-2": "value", "$encord": { "config": { "sampling_rate": <samples-per-second>, # VIDEO ONLY (optional default = 1 sample/second) "keyframe_mode": "frame" or "seconds", # VIDEO ONLY (optional default = "frame") }, "frames": { "<frame-number-or-seconds>": { "metadata-1": "value", "metadata-2": "value", "<my-embedding>": [1.0, 2.0, 3.0] }, "<frame-number-or-seconds>": { "metadata-1": "value", "metadata-2": "value", "<my-embedding>": [1.0, 2.0, 3.0] } } } } },}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
Before importing custom metadata to Encord, first import a metadata schema.
We strongly recommend that you upload your custom metadata to Folders, instead of importing using Datasets. Importing custom metadata to data in folders allows you to filter your data in Index by custom metadata.
After importing or updating custom metadata, verify that your custom metadata (list the data units with custom metadata) applied correctly. Do not simply add a print command after importing or updating your custom metadata.
Import custom metadata to all data units in a Folder
This code allows you to update ALL custom metadata on ALL data units in a Folder in Index. This code OVERWRITES all existing custom metadata on a data unit.
# Import dependenciesfrom encord import EncordUserClientfrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key-file>"FOLDER_HASH = "<unique-folder-id>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)folder = user_client.get_storage_folder(FOLDER_HASH)items = folder.list_items()for item in items: item.update(client_metadata={"metadata": "value", "metadata": "value"})
The Specific Data Units code enables you to update custom metadata for specific data units in Index. It does not overwrite all existing custom metadata on a data unit. Instead, it updates metadata that matches existing keys with new values and adds any new custom metadata keys to the data unit without affecting other existing metadata.
The All data units in a Project code updates the custom metadata for all data units in the specified Project. Replace the client_metadata with the metadata you want to update.
# Import dependenciesfrom encord import EncordUserClient# AuthenticationSSH_PATH = "<private_key_path>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)# Define a dictionary with item UUIDs and their respective metadata updatesupdates = { "<data-unit-id>": {"metadata": "metadata-value"}, "<data-unit-id>": {"metadata": False}, "<data-unit-id>": {"metadata": "metadata-value"}, # "<data-unit-id>": {"metadata": true}}# Update the storage items based on the dictionaryfor item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # update the item with the new metadata item.update(client_metadata=curr_metadata)
Bulk import custom metadata to all data units in a Folder
This code allows you to update custom metadata on all data units in a Folder in Index. This code OVERWRITES all existing custom metadata on a data unit.
Using bundle allows you to update up to 1000 label rows at a time.
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundlefrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<ssh-private-key>"FOLDER_HASH = "<unique-folder-id>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)folder = user_client.get_storage_folder(FOLDER_HASH)items = folder.list_items()# Use the Bundle context managerwith Bundle() as bundle: for item in items: # Update each item with client metadata item.update(client_metadata={"metadata-1": "value", "metadata-2": False}, bundle=bundle)
Bulk custom metadata import on specific data units
This code allows you to update custom metadata on specific data units in a Folder in Index. This code DOES NOT OVERWRITE existing custom metadata on a data unit. It does overwrite custom metadata with existing values and adds new custom metadata to the data unit.
Using bundle allows you to update up to 1000 label rows at a time.
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundlefrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<ssh-private-key>"FOLDER_HASH = "<unique-folder-id>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)folder = user_client.get_storage_folder(FOLDER_HASH)updates = { # "<data-unit-id>": {"metadata-1": "metadata-value"}, # "<data-unit-id>": {"metadata-2": False}, # "<data-unit-id>": {"metadata-1": "metadata-value"}, # "<data-unit-id>": {"metadata-2": true}}# Use the Bundle context managerwith Bundle() as bundle: for storage_item in folder.list_items(): # Update each item with client metadata update = updates[storage_item.uuid] storage_item.update(client_metadata=update, bundle=bundle)
The following script adds clientMetadata to all images / frames in a specified Image Group.
Ensure that you:
Replace <private_key_path> with the file path to your private SSH key.
Replace <image-group-id> with the File ID (UUID) of the target Image Group.
Customize the _get_metadata_for_image function with the clientMetadata you want to add. To add unique metadata for each image, make the function dynamic by passing additional variables.
from uuid import UUIDfrom encord import EncordUserClientfrom encord.http.bundle import Bundle# Initialize the SDK clientuser_client = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path="<private_key_path>" )# Replace image-group-id with the File ID of the image groupimage_group_uuid = "<image-group-id>"# Function to define metadata for each image. Can be made dynamic by passing variables.def _get_metadata_for_image(image): return { "somekindof": "string", "somekindof": "number" }# Fetch the uploaded image groupuploaded_image_group = user_client.get_storage_item(image_group_uuid)# Retrieve and update metadata for each image in the groupframe_items = uploaded_image_group.get_child_items()with Bundle() as bundle: for frame_item in frame_items: # Update client metadata for each image frame_item.update(client_metadata=_get_metadata_for_image(frame_item), bundle=bundle) print (frame_item.client_metadata)# Re-fetch and verify updatesupdated_frame_items = uploaded_image_group.get_child_items()for updated_frame_item in updated_frame_items: expected_metadata = _get_metadata_for_image(updated_frame_item) assert updated_frame_item.client_metadata == expected_metadataprint("Client metadata successfully added and verified for all images in the Image Group.")
We strongly recommend that you upload your custom metadata to Folders, instead of importing using Datasets. Importing custom metadata to data in Folders allows you to filter your data in Index by custom metadata.
The following code lists the custom metadata of all data units in the specified Dataset. The code prints the custom metadata along with the data unit’s index within the dataset.
# Import dependenciesfrom encord import EncordUserClientfrom encord.client import DatasetAccessSettings# Authenticate with Encord using the path to your private keyclient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path="<private_key_path>")# Specify a dataset to read or write metadata todataset = client.get_dataset("<dataset_hash>")# Fetch the dataset's metadatadataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=true))# Read the metadata of all data units in the dataset.for data_unit, data_row in enumerate(dataset.data_rows): print(f"{data_row.client_metadata} - Data Unit: {data_unit}")
Before importing custom metadata to Encord, first import a metadata schema.
We strongly recommend that you import your custom metadata to Folders, instead of importing to Datasets. Importing custom metadata to data in folders allows you to filter your data in Index by custom metadata.
Import custom metadata (clientMetadata) to all data units in a Dataset
The following code adds the same custom metadata (clientMetadata) to each data unit in the specified dataset. The code prints the custom metadata along with the data units index within the dataset, so that you can verify that the custom metadata was set correctly.
# Import dependenciesfrom encord import EncordUserClientfrom encord.client import DatasetAccessSettings# Authenticate with Encord using the path to your private keyclient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path="<private_key_path>")# Specify a dataset to read or write metadata todataset = client.get_dataset("<dataset_hash>")# Fetch the Dataset's metadatadataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=true))# Add metadata to all data units in the Dataset.# Replace {"my": "metadata"} with the metadata you want to addfor data_unit, data_row in enumerate(dataset.data_rows): data_row.client_metadata = {"my": "metadata"} data_row.save() print(f"{data_row.client_metadata} - Data Unit: {data_unit}")
keyframes is reserved for use with frames of interest in videos. Specifying keyframes on specific frames ensures that those frames import into Index and Active. That means frames specified using keyframes are available to filter your frames and for calculating embeddings on your data.
You can include keyframes while importing your videos or after you import your videos.
Import keyframes to Specific Data Units (Folder):
This code allows you to import keyframes on specific videos in Index. This code DOES NOT OVERWRITE all existing custom metadata on a data unit. It does overwrite custom metadata with existing values and adds new custom metadata to the data unit.
# Import dependenciesfrom encord import EncordUserClient# AuthenticationSSH_PATH = "<private_key_path>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)# Define a dictionary with item UUIDs and their keyframes updatesupdates = { "<data-unit-id>": {"keyframes": [<frame_number>, <frame_number>, <frame_number>, <frame_number>, <frame_number>]}, "<data-unit-id>": {"keyframes": [<frame_number>, <frame_number>, <frame_number>, <frame_number>, <frame_number>]}, "<data-unit-id>": {"keyframes": [<frame_number>, <frame_number>, <frame_number>, <frame_number>, <frame_number>]}, "<data-unit-id>": {"keyframes": [<frame_number>, <frame_number>, <frame_number>, <frame_number>, <frame_number>]}}# Update the storage items based on the dictionaryfor item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # update the item with the new metadata item.update(client_metadata=curr_metadata)
Once your custom metadata is imported to a Folder, you can create Collections based on your custom metadata and then create Datasets and Projects based on the Collections.
To create a Dataset from an Index Collection:
Log in to the Encord platform.
The landing page for the Encord platform appears.
Go to Index > Files.
The All folders pages appears with a list of all folders in Encord.
Click in to a folder.
The landing page for the folder appears and the View in Explorer button is enabled.
Click the View in Explorer button.
The Index Explorer page appears.
Search, sort, and filter your data until you have the subset of the data you need.
Select one or more of the images in the Explorer workspace.
A ribbon appears at the top of the Explorer workspace.
Click Select all to select all the images in the subset.
Click Add to a Collection.
Click New Collection.
Specify a meaningful title and description for the Collection.
The title specified here is applied as a tag/label to every selected image.
Click Collections.
The Collections page appears.
Select the checkbox for the Collection to create a Dataset.
Click Create Dataset.
The Create Dataset dialog appears.
Specify meaningful content for the following:
Dataset Title
Dataset Description
Select Split image groups/sequences to extract images from the groups or sequences and add each image separately to the Dataset, if your Collection includes images from a group or sequence.
Once your custom metadata is included in your Annotate Project (Folder or Dataset), you can create Collections based on your custom metadata and then send those Collections to Annotate.