Custom Embeddings

TLDR;

Do you already know what you are doing and only want to look over a Jupyter Notebook example? We provide one here.

Active's Default Embeddings

Currently in Active, we calculate and display embeddings using purpose-built embeddings models. These models are excellent for a wide range of tasks and across the board performance. However, highly specialised tasks require a highly specialized model. The embeddings generated from our purpose-built embeddings models are used for: Natural Language Search (not supported for custom embeddings), image similarity search, and the embeddings view (where you view reduced embeddings).

Custom Embeddings Support

We currently support embeddings of dimension 512 following on from our in-house clip Embeddings.

We support embeddings for images, image sequences and image groups. Support for Videos is coming soon.

Use Custom Embeddings in Active

To use custom embeddings in Active:

  1. Add an Embedding type to your custom metadata schema.
  2. Upload your embeddings as custom metadata.
  3. Create an Annotate Project.
  4. Select your custom embeddings as you import the Project to Active .

Step 1: Create an Embedding Field

A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.

For this example, embeddings is used for our key.

# Import dependencies
from encord import EncordUserClient

SSH_PATH = "<file-path-to-ssh-private-key>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH
)

current_metadata_schema = user_client.get_client_metadata_schema()
new_metadata_schema = current_metadata_schema
new_metadata_schema["embeddings"] = "embedding"
user_client.set_client_metadata_schema(new_metadata_schema)

Verify that the key is in the schema using the following:

# Import dependencies
from encord import EncordUserClient

SSH_PATH = "<file-path-to-ssh-private-key>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH
)

user_client.get_client_metadata_schema()

print(schema)

Step 2: Upload Embeddings

With the custom metadata schema ready, we can now import our embeddings.

Embeddings MUST be of dimension 512.

Embeddings use the following format when uploading:


 client_metadata = {
     "<embedding_key>": {
         <frame_number>: [float, ...]
     }
 }
 client_metadata = {
    "embeddings": {
        0: [0.1, 0.2, 0.3, ..., -0.1]
    }
}

Bulk import on specific data units

This code allows you to update embeddings on specific data units in a Folder in Index. This code DOES NOT OVERWRITE existing embeddings on a data unit. It does overwrite embeddings with existing values and adds new embeddings to the data unit.

Using bundle allows you to update up to 10,000 label rows at a time.


# Import dependencies
from encord import EncordUserClient
from encord.http.bundle import Bundle
from encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy

# Authentication
SSH_PATH = "<ssh-private-key>"
FOLDER_HASH = "<unique-folder-id>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH,
)

folder = user_client.get_storage_folder(FOLDER_HASH)
updates = {
    "<data-unit-id>": {"<embedding_key>": {
         <frame_number>: [float, ...]
         }
     },
    "<data-unit-id>": {"<embedding_key>": {
         <frame_number>: [float, ...]
         }
     },
    "<data-unit-id>": {"<embedding_key>": {
         <frame_number>: [float, ...]
         }
     },
    "<data-unit-id>": {"<embedding_key>": {
         <frame_number>: [float, ...]
         }
     }
}

# Use the Bundle context manager
with Bundle() as bundle:
    for storage_item in folder.list_items():
        # Update each item with client metadata
        update = updates[storage_item.uuid]
        storage_item.update(client_metadata=update, bundle=bundle)

Step 3 : Create a Project in Annotate

The following task outlines the basics of creating an Annotate Project. For detailed instructions, refer to the documentation here.

To create an Annotate Project:

  1. Go to Annotate > Projects.
    The Annotate Projects list appears.

  2. Click New annotation project.
    The Create New Project page appears.

  3. Provide a meaningful title and description.

  4. Select an Ontology.

  5. Select one or more Datasets.

  6. Load a Workflow template.

  7. Add collaborators to the Project.

  8. Click Create Project.

Step 4: Import Project with Custom Embeddings

Before you can use your custom embeddings in Encord Active Projects, you need to import the custom embeddings. This is performed while you import your Annotate Project into Active.

For existing Active Projects, you can import custom embeddings for your Project import if your Project imported to Data or Labels. Importing to Metrics & Embeddings requires deleting the Project in Active and re-importing the Project with your custom embeddings.

ℹ️

Note

After updating your embeddings, sync your Active Project to automatically apply the new embeddings.

To import a Project with custom embeddings:

  1. Log in to the Encord platform.
    The landing page for the Encord platform appears.

  2. Create a Project (Annotation Project or Training Project) in Encord Annotate.

  3. Click Active from the main menu.
    The landing page for Active appears.

  4. Click the Import Annotate Project button.
    The Import an Annotate Project to Encord Active dialog appears.

  5. Click the Import button for the Annotate Project you want to import.
    The Confirm Project Import dialog appears.

    Import custom embeddings.

  6. Select the custom embeddings you want to import to the Project, under Metrics & Embeddings.

  7. Select Import under Metrics & Embeddings.

  8. Click Proceed.
    The Annotate Project imports with your custom embeddings.

End-to-End Custom Embeddings Example

We provide an end-to-end example using a Jupyter Notebook here.