Encord enables the use of custom embeddings for images, image sequences, image groups, and individual video frames. Custom embeddings allow you to incorporate your own feature representations into Encord’s platform, enhancing capabilities like similarity search, visualizations, and data filtering. This flexibility supports more advanced workflows and deeper insights tailored to your specific use cases.
Support for videos (in their entirety) is coming soon.
To bring your custom embeddings into Encord, you first need to create a key in your metadata schema. After the key is in your schema, you can import your custom embeddings.
To use custom embeddings in Index:
Create a new embedding type in your Schema.
Upload your embeddings.
Select your custom embeddings from the Embeddings view.
Before you can use embedding plots with your custom embeddings, you need to configure your root Folder in Files.
A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.
If you do not include a key in your metadata schema, your imported embeddings are treated as strings.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.
Use add_embedding to add an embedding to your metadata schema.
# Import dependenciesfrom encord import EncordUserClientfrom encord.metadata_schema import MetadataSchemaSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH)# Create the schemametadata_schema = user_client.metadata_schema()# Add embedding fieldsmetadata_schema.add_embedding('my-test-active-embedding', size=512)metadata_schema.add_embedding('my-test-index-embedding', size=<values-from-1-to-4096>)# Save the schemametadata_schema.save()# Print the schema for verificationprint(metadata_schema)
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundle# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)# Define a dictionary with item UUIDs and their respective metadata updatesupdates = { "<data-ID-1>": {"<my-embedding>": [1.0, 2.0, 3.0]}, "<data-ID-2>": {"<my-embedding>": [1.0, 2.0, 3.0]}}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundlefrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)updates = { "<data-hash-1>": { "$encord": { "frames": { "<frame-number-1>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values }, "<frame-number-2>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values } } } }, "<data-hash-2>": { "$encord": { "config": { "sampling_rate": <samples-per-second>, # VIDEO ONLY (optional default = 1 sample/second) "keyframe_mode": "frame" or "seconds", # VIDEO ONLY (optional default = "frame") }, "frames": { "<frame-number-1>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values }, "<frame-number-2>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values } } } },}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
Before you can perform filtering, use similarity searches, or use embedding plots with your custom embeddings, you need to configure your top-level Folder in Files.
To configure Folders for Embedding Plots:
Go to Index > Files.
A list of Folders available to you appears on the My Files page.
Do one of the following:
Select the check box for the Folder.
Click into the Folder.
The Upgrade Folder button appears.
Click Upgrade Folder.
The Folder upgrades dialog appears.
Expand the Custom Embeddings drop down.
Select a custom embedding from the list.
Click Add.
The custom embedding appears under Custom Embeddings.
You can add multiple embeddings. Only one embedding can be active in Index at a time.
Expand your selected custom embedding.
Select any of the following:
Similarity search
Compute UMAP Embedding Reduction
Compute Advanced Quality Metrics
Click Save and process changes.
A dialog appears informing you that the folder upgrade was successful.
You are now ready to use your custom embeddings.
Upgrade your top-level folder before trying to view embedding plots.
Encord Index incorporates embedding plots — a two-dimensional visualization technique employed to represent intricate, high-dimensional data in a more comprehensible and visually coherent manner. This technique reduces data dimensionality while preserving the inherent structure and patterns within the original data.
The embedding plot aids in identifying interesting/noteworthy clusters, inspecting outliers, and excluding unwanted samples.
Use Custom Embedding Plots
Notice how images are clustered around certain regions. By defining a rectangular area on the plot, users can quickly isolate and analyze data points within that defined region. This approach facilitates the exploration of commonalities among these samples.
Hover over clusters or individual data points on the plot to visually check frames.
Upon selecting a region, the content within the Explorer page adjusts accordingly. Various actions can be executed with the chosen group: