At least one data integration is required to upload cloud data to Encord. Encord can integrate with the following cloud service providers:

Any files you upload to Encord must be stored in folders. Click here to learn how to create a folder to store your files.


Import cloud data to Files

Step 1: Create a JSON or CSV File for Import

Before importing your cloud data to Encord you must first create a JSON or CSV file specifying the files you want to import.

JSON Format

We provide helpful scripts and examples that automatically generate JSON and CSV files for all the files in a folder or bucket within your cloud storage. This makes importing large datasets easier and more efficient.

The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the files you want to upload to Encord. You can add one data type at a time, or combine multiple data types in one JSON.

The supported top-level keys are: videos, audio, image_groups, images, and dicom_series. The details for each data format are given in the sections below.

Add the "skip_duplicate_urls": true flag at the top level to make the uploads idempotent. Skipping URLs can help speed up large upload operations. Since previously processed assets do not have to be uploaded again, you can simply retry the failed operation without editing the upload specification file. The flag’s default value isfalse.

Encord enforces the following upload limits for each JSON file used for file uploads:

  • Up to 1 million URLs
  • A maximum of 500,000 items (e.g. images, image groups, videos, DICOMs)
  • URLs can be up to 16 KB in size

Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.

CSV Format

In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.

Details for each data format are given in the sections below.

Encord supports up to 10,000 entries for upload in the CSV file.
  • Object URLs can’t contain whitespace.
  • For backwards compatibility reasons, a single column CSV is supported. A file with the single ObjectUrl column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: “Expected a video, got a file of type XXX”.

Step 2: Import your cloud data

We recommend uploading smaller batches of data: limit uploads to 100 videos and up to 1000 images at a time. Familiarize yourself with our limits and best practices for data import before uploading data to Encord.
  1. Navigate to Files section of Index in the Encord platform.
  2. Click into a Folder.
  3. Click + Upload files. A dialog appears.
  1. Click Import from cloud data.
We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.
  1. Click Add JSON or CSV files to add a JSON or CSV file specifying cloud data that is to be added.
You can also upload your data directly in the Datasets screen. Click here for instructions.

Custom metadata

Custom metadata can only be added through JSON uploads in the Encord Platform or via the Encord SDK.

Custom metadata, also known as client metadata, is supplementary information you can add to all data imported into Encord. It is provided in the form of a Python dictionary, as shown in examples. Client metadata serves several key functions:

You can optionally add some custom metadata per data item in the clientMetadata field (examples show how this is done) of your JSON file.

We enforce a 10MB limit on the custom metadata for each data item. Internally, we store custom metadata as a PostgreSQL jsonb type. Read the relevant PostgreSQL documentation about the jsonb type and its behaviors. For example, jsonb type does not preserve key order or duplicate keys.

Metadata schema

Metadata schemas, including custom embeddings, can only be imported through the Encord SDK.

Based on your Data Discoverability Strategy, you need to create a metadata schema. The schema provides a method of organization for your custom metadata. Encord supports:

  • Scalers: Methods for filtering.
  • Enums: Methods with options for filtering.
  • Embeddings: Method for embedding plot visualization, similarity search, and natural language search.
Metadata Schema keys support letters (a-z, A-Z), numbers (0-9), and the following blank spaces ( ), hyphens (-), underscores (_), and periods (.).

Custom metadata

Custom metadata refers to any additional information you attach to files, allowing for better data curation and management based on your specific needs. It can include any details relevant to your workflow, helping you organize, filter, and retrieve data more efficiently. For example, for a video of a construction site, custom metadata could include fields like "site_location": "Algiers", "project_phase": "foundation", or "weather_conditions": "sunny". This enables more precise tracking and management of your data.

Before importing any files with custom metadata to Encord, we recommend that you import a metadata schema. Encord uses metadata schemas to validate custom metadata uploaded to Encord and to instruct Index and Active how to display your metadata.

To handle your custom metadata schema across multiple teams within the same organization, we recommend using namespacing for metadata keys in the schema. This ensures that different teams can define and manage their own metadata schema without conflicts. For example, team A could use video.description, while team B could use audio.description. Another example could be TeamName.MetadataKey. This approach maintains clarity and avoids key collisions across departments.

Metadata schema table

Metadata Schema keys support letters (a-z, A-Z), numbers (0-9), and blank spaces ( ), hyphens (-), underscores (_), and periods (.). Metadata schema keys are case sensitive.

Use add_scalar to add a scalar key to your metadata schema.

Scalar KeyDescriptionDisplay Benefits
booleanBinary data type with values “true” or “false”.Filtering by binary values
datetimeISO 8601 formatted date and time.Filtering by time and date
numberNumeric data type supporting float values.Filtering by numeric values
uuidCustomer specified unique identifier for a data unit.Filtering by customer specified unique identifier
varcharTextual data type. Formally string. string can be used as an alias for varchar, but we STRONGLY RECOMMEND that you use varchar.Filtering by string.
textText data with unlimited length (example: transcripts for audio). Formally long_string. long_string can be used as an alias for text, but we STRONGLY RECOMMEND that you use text.Storing and filtering large amounts of text.

Use add_enum and add_enum_options to add an enum and enum options to your metadata schema.

KeyDescriptionDisplay Benefits
enumEnumerated type with predefined set of values.Facilitates categorical filtering and data validation

Use add_embedding to add an embedding to your metadata schema.

KeyDescriptionDisplay Benefits
embedding512 dimension embeddings for Active, 1 to 4096 for Index.Filtering by embeddings, similarity search, 2D scatter plot visualization (Coming Soon)

Incorrectly specifying a data type in the schema can cause errors when filtering your data in Index or Active. If you encounter errors while filtering, verify your schema is correct. If your schema has errors, correct the errors, re-import the schema, and then re-sync your Active Project.

Import your metadata schema to Encord

Verify your schema

After importing your schema to Encord we recommend that you verify that the import is successful. Run the following code to verify your metadata schema imported and that the schema is correct.

Update custom metadata (JSON)

When updating custom metadata using a JSON file, you MUST specify "skip_duplicate_urls": true and "upsert_metadata": true.

Specifying the "skip_duplicate_urls": true and "upsert_metadata": true flags in the JSON file means the import does the following:

  • New files (and the custom metadata for those files) import into Encord.

  • Existing files have their existing custom metadata overwritten with the custom metadata specified in the JSON file.

To update custom metadata with a JSON file:

  1. Create an upload JSON file with the updated custom metadata. Include the "skip_duplicate_urls": true and "upsert_metadata": true flags.
  • Custom metadata updates require "skip_duplicate_urls": true to function. It does not work if "skip_duplicate_urls": false.
  • Only custom metadata for pre-existing files is updated. Any new files present in the JSON are uploaded.
Update custom metadata example
{
  "videos": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-video-title.mp4",
      "clientMetadata": {"optional": "metadata"}
    }
  ],
  "skip_duplicate_urls": true,
  "upsert_metadata": true
}
  1. Start a new file upload to Encord using the new JSON file.

Custom Embeddings

Metadata schemas, including custom embeddings, can only be imported through the Encord SDK.

Encord enables the use of custom embeddings for images, image sequences, image groups, and individual video frames.

To learn how to use custom embeddings in Encord, see our documentation here.

Step 1: Create a New Embedding Type

A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.

If you do not include a key in your metadata schema, your imported embeddings are treated as strings.

Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.

Use add_embedding to add an embedding to your metadata schema.

KeyDescriptionDisplay Benefits
embedding512 dimension embeddings for Active, 1 to 4096 for Index.Filtering by embeddings, similarity search, 2D scatter plot visualization (Coming Soon)

Step 2: Upload Embeddings

With the key in the custom metadata schema ready, we can now import our embeddings.

Active embeddings MUST be of dimension 512. Index supports custom embeddings from a range of 1 to 4096.

You can import embeddings after you have imported your data or during your data import.

Your key frames (frames specified with or without embeddings) always appear in Index, regardless of what sampling rate you specify.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.

If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.

Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.

Import while importing videos

This JSON file imports embeddings while importing your data into Index from a cloud integration.

config is optional when importing your custom embeddings:

"config": {
    "sampling_rate": "<samples-per-second>",
    "keyframe_mode": "frame" or "seconds",
},

If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.

Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.

Import to Videos already in Index

Import on specific images

The custom embeddings format for images follows the same format as importing custom metadata.

# Import dependencies
from encord import EncordUserClient
from encord.http.bundle import Bundle

# Authentication
SSH_PATH = "<file-path-to-ssh-private-key>"

# Authenticate with Encord using the path to your private key
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH,
)

# Define a dictionary with item UUIDs and their respective metadata updates
updates = {
    "<data-hash-1>": {"<my-embedding>": [1.0, 2.0, 3.0]},
    "<data-hash-2>": {"<my-embedding>": [1.0, 2.0, 3.0]}
}

# Use the Bundle context manager
with Bundle() as bundle:
    # Update the storage items based on the dictionary
    for item_uuid, metadata_update in updates.items():
        item = user_client.get_storage_item(item_uuid=item_uuid)

        # Make a copy of the current metadata and update it with the new metadata
        curr_metadata = item.client_metadata.copy()
        curr_metadata.update(metadata_update)

        # Update the item with the new metadata and bundle
        item.update(client_metadata=curr_metadata, bundle=bundle)


Check Data Upload Status

You can check the progress of the processing job by clicking the bell icon in the top right corner of the Encord app.

  • A spinning progress indicator shows that the processing job is still in progress.
  • If successful, the processing completes with a green tick icon.
  • If unsuccessful, there is a red cross icon, as seen below.

If the upload is unsuccessful, ensure that:

  • Your provider permissions are set correctly
  • The object data format is supported
  • The upload JSON or CSV file is correctly formatted.

Check which files failed to upload by clicking the Export icon to download a CSV log file. Every row in the CSV corresponds to a file which failed to be uploaded.

You only see failed uploads if the Ignore individual file errors toggle was not enabled for your data upload.

Helpful Scripts and Examples

Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for the dataset creation process, by constructing the URLs from the specified path in your private storage.