Custom Embeddings Support

We support embeddings for images, image sequences, image groups, and individual frames in videos.

Support for videos (in their entirety) is coming soon.

Use Custom Embeddings in Index

To bring your custom embeddings into Encord, you first need to create a key in your metadata schema. After the key is in your schema, you can import your custom embeddings.

To use custom embeddings in Index:

  1. Create a new embedding type in your Schema.
  2. Upload your embeddings.
  3. Select your custom embeddings from the Embeddings view.
Before you can use embedding plots with your custom embeddings, you need to configure your root Folder in Files.

Step 1: Create a New Embedding Type

A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.

If you do not include a key in your metadata schema, your imported embeddings are treated as strings.

Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.

For this example, my-embedding is used for our custom embedding key.

Create key in custom metadata schema
# Import dependencies
from encord import EncordUserClient

SSH_PATH = "<file-path-to-ssh-private-key>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH
)

current_metadata_schema = user_client.get_client_metadata_schema()
new_metadata_schema = current_metadata_schema
new_metadata_schema["my-embedding"] = "embedding"
user_client.set_client_metadata_schema(new_metadata_schema)

Verify that the key is in the schema using the following:

Verify Schema
# Import dependencies
from encord import EncordUserClient

SSH_PATH = "<file-path-to-ssh-private-key>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH
)

user_client.get_client_metadata_schema()

print(schema)

Step 2: Upload Embeddings

With the key in the custom metadata schema ready, we can now import our embeddings.

Index supports custom embeddings from a range of 1 to 4096.

You can import embeddings after you have imported your data or during your data import.

Your key frames (frames specified with or without embeddings) always appear in Index, regardless of what sampling rate you specify.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.

If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.

Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.

Import while importing videos

This JSON file imports embeddings while importing your data into Index from a cloud integration.

config is optional when importing your custom embeddings:

"config": {
    "sampling_rate": "<samples-per-second>",
    "keyframe_mode": "frame" or "seconds",
},

If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.

Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.

Import to Videos already in Index

Import on specific images

The custom embeddings format for images follows the same format as importing custom metadata.

# Import dependencies
from encord import EncordUserClient
from encord.http.bundle import Bundle

# Authentication
SSH_PATH = "<file-path-to-ssh-private-key>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH,
)

# Define a dictionary with item UUIDs and their respective metadata updates
updates = {
    "<data-hash-1>": {"<my-embedding>": [1.0, 2.0, 3.0]},
    "<data-hash-2>": {"<my-embedding>": [1.0, 2.0, 3.0]}
}

# Use the Bundle context manager
with Bundle() as bundle:
    # Update the storage items based on the dictionary
    for item_uuid, metadata_update in updates.items():
        item = user_client.get_storage_item(item_uuid=item_uuid)

        # Make a copy of the current metadata and update it with the new metadata
        curr_metadata = item.client_metadata.copy()
        curr_metadata.update(metadata_update)

        # Update the item with the new metadata and bundle
        item.update(client_metadata=curr_metadata, bundle=bundle)

Step 3: Select your Custom Embeddings

You DO NOT need to re-index your data in Index for your embeddings to appear. For more information on re-indexing refer to our documentation.

After you import your custom embeddings they are available for:

  • Filtering using custom embeddings

  • Similarity searches using your custom embeddings

  • Embedding view and 2D plots with selection based workflows

Before you can use embedding plots with your custom embeddings, you need to configure your root Folder in Files.

Filtering with Custom Embeddings

  1. Click Filter. The Filter tab appears.

  2. Click Add filter. A menu appears.

  3. Click Custom Embeddings from the menu.

  4. Select your custom embedding to filter your data.

  5. Select True to display images, frames, or videos with the custom embeddings.

Similarity Searches with Custom Embeddings

  1. Click the Embeddings icon in the Explorer. The Embeddings screen appears.

  2. Select the embedding you want to use from the Select custom embeddings menu.

  3. Click the Grid icon.

  4. Hover over an image or frame with the custom embedding.

  5. Click the Similarity Search icon. Images and Frames sort according to similarity.

Adjust Similarity Search Distance

  1. Click the Embeddings icon in the Explorer. The Embeddings screen appears.

  2. Select the embedding you want to use from the Select custom embeddings menu.

  3. Click the Grid icon.

  4. Hover over an image or frame with the custom embedding.

  5. Click the Similarity Search icon. Images and Frames sort according to similarity AND a Distance filter appears.

    Distance filter

  6. Adjust the Distance filter slider to change the similarity search results.

Embedding View with Index

Encord Index incorporates embedding plots — a two-dimensional visualization technique employed to represent intricate, high-dimensional data in a more comprehensible and visually coherent manner. This technique reduces data dimensionality while preserving the inherent structure and patterns within the original data.

The embedding plot aids in identifying interesting/noteworthy clusters, inspecting outliers, and excluding unwanted samples.

Before you can use embedding plots with your custom embeddings, you need to configure your root Folder in Files.

To configure Folders for Embedding Plots:

  1. Go to Index > Files.

    A list of Folders available to you appears on the My Files page.

  2. Do one of the following:

    • Select the check box for the Folder.
    • Click into the Folder.

    The Upgrade Folder button appears.

    Upgrade Folder

  3. Click Upgrade Folder.

    The Folder upgrades dialog appears.

  4. Expand the Custom Embeddings drop down.

    Select Custom Embeddings

  5. Select a custom embedding from the list.

  6. Click Add.

    The custom embedding appears under Custom Embeddings.

    You can add multiple embeddings. Only one embedding can be active in Index at a time.

  7. Expand your selected custom embedding.

    Configure Custom Embedding

  8. Select Embedding reduction.

  9. Click Save and process changes.

    A dialog appears informing you that the folder upgrade was successful.

    You are now ready to view Embedding Plots using your custom embeddings.

Use Custom Embedding Plots

Vibrant 2D data embedding plot highlighting data patterns and clusters

Notice how images are clustered around certain regions. By defining a rectangular area on the plot, users can quickly isolate and analyze data points within that defined region. This approach facilitates the exploration of commonalities among these samples.

Hover over clusters or individual data points on the plot to visually check frames.

Upon selecting a region, the content within the Explorer page adjusts accordingly. Various actions can be executed with the chosen group:

  • Use Collections to tag and group images.
  • Establish subsets similar to these and then conduct comparisons.