Use this file to discover all available pages before exploring further.
Encord enables the use of custom embeddings for images, image sequences, image groups, and individual video frames. Custom embeddings allow you to incorporate your own feature representations into Encord’s platform, enhancing capabilities like similarity search, visualizations, and data filtering. This flexibility supports more advanced workflows and deeper insights tailored to your specific use cases.
Support for videos (in their entirety) is coming soon.
We currently support embeddings of dimensions ranging from 1 to 4096 for Index, and 1 to 2000 for Active, following on from our in-house clip Embeddings.
To bring your custom embeddings into Encord, you first need to create a key in your metadata schema. After the key is in your schema, you can import your custom embeddings.
A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.If you do not include a key in your metadata schema, your imported embeddings are treated as strings.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.
Use add_embedding to add an embedding to your metadata schema.
# Import dependenciesfrom encord import EncordUserClientfrom encord.metadata_schema import MetadataSchemaSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH)# Create the schemametadata_schema = user_client.metadata_schema()# Add embedding fieldsmetadata_schema.add_embedding('my-test-active-embedding', size=512)metadata_schema.add_embedding('my-test-index-embedding', size=<values-from-1-to-4096>)# Save the schemametadata_schema.save()# Print the schema for verificationprint(metadata_schema)
With the key in the custom metadata schema ready, we can now import our embeddings.Custom embedding sizes are flexible and can be set anywhere between 1 and 4096.You can import embeddings after you have added your data or during your data registration.
Your key frames (frames specified with or without embeddings) always appear in Index, regardless of what sampling rate you specify.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.
This JSON file imports embeddings while registering your data with Index from a cloud integration.config is optional when importing your custom embeddings:
"config": { "sampling_rate": "<samples-per-second>", "keyframe_mode": "frame" or "seconds",},
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundlefrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)updates = { "<data-hash-1>": { "$encord": { "frames": { "<frame-number-1>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values }, "<frame-number-2>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values } } } }, "<data-hash-2>": { "$encord": { "config": { "sampling_rate": <samples-per-second>, # VIDEO ONLY (optional default = 1 sample/second) "keyframe_mode": "frame" or "seconds", # VIDEO ONLY (optional default = "frame") }, "frames": { "<frame-number-1>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values }, "<frame-number-2>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values } } } },}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
The custom embeddings format for images, text files, PDFs, and audio files follows the same format as importing custom metadata.
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundle# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)# Define a dictionary with item UUIDs and their respective metadata updatesupdates = { "<data-ID-1>": {"<my-embedding>": [1.0, 2.0, 3.0]}, "<data-ID-2>": {"<my-embedding>": [1.0, 2.0, 3.0]}}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
Once you import custom embeddings to your data (during cloud data registration or to existing data in Encord), there is a bit more setup required in Index and Active. These steps can only be performed from the UI.