We provide helpful scripts and examples that automatically generate JSON and CSV files for all the files in a folder or bucket within your cloud storage. This makes importing large datasets easier and more efficient.
The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the files you want to upload to Encord. You can add one data type at a time, or combine multiple data types in one JSON.The supported top-level keys are: videos, audio, image_groups, images, and dicom_series. The details for each data format are given in the sections below.
See our tips for increasing the speed of file registration here.
Add the "skip_duplicate_urls": true flag at the top level to make the uploads idempotent. Skipping URLs can help speed up large upload operations. Since previously processed assets do not have to be uploaded again, you can simply retry the failed operation without editing the upload specification file. The flag’s default value isfalse.
Encord enforces the following upload limits for each JSON file used for file registration:
Up to 1 million URLs
A maximum of 500,000 items (e.g. images, image groups, videos, DICOMs)
URLs can be up to 16 KB in size
Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.
Each object in the videos array is a JSON object with the key objectUrl specifying the full URL of where to find the video resource. The title field is optional. If omitted, the video file path and name are used as the default title. For example, if the file is located at https://encord-solutions-bucket.s3.eu-west-2.amazonaws.com/path/to/my/bucket/video23.mp4, the title defaults to /path/to/my/bucket/video23.mp4.
videoMetadata must be specified when a Strict client-only access integration is used. In all other cases, videoMetadata is optional, but including it significantly reduces import times.
Key or Flag
Required?
Default value
”objectUrl”
Yes
”title”
No
the file’s path + title
”videoMetadata”
No
”clientMetadata”
No
”createVideo”
No
false
Keys / Flags that are not required can be omitted from the JSON file entirely.
The JSON format allows you to specify videoMetadata for video files. videoMetadata is essential information used by the Label Editor and is crucial for aligning annotations to the correct frame.
When the videoMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.
videoMetadata must be specified when a Strict client-only access integration is used. In all other cases, videoMetadata is optional.
width / height: Dimensions of the video (in pixels).
file_size: The size of the file (in bytes).
mime_type: Specifies the file type extension according to the MIME standard.
When videos are supplied with video metadata, Encord assumes the metadata to be correct and our servers will neither download nor pre-process your data. This may be a particularly useful feature for customers with strict data compliance concerns.One way to find the necessary metadata is shown below. Run the following commands in your terminal.
ffmpeg -i 'video_title.mp4' to retrieve fps, duration, width, and height - as highlighted below.
ls -l 'video_title.mp4' to retrieve the file size - as highlighted below.
Each object in the audio file array is a JSON object with the key objectUrl specifying the full URL of where to find the audio resource. The title field is optional. If omitted, the audio file path and name are used as the default title. For example, if the file is located at https://encord-solutions-bucket.s3.eu-west-2.amazonaws.com/path/to/my/bucket/song23.mp3, the title defaults to /path/to/my/bucket/song23.mp3.
Audio metadata is distinct from client metadata. clientMetadata allows you to add metadata that can be used for filtering your data in Index. You can use text to import transcripts for your audio file.
audioMetadata must be specified when a Strict client-only access integration is used. In all other cases, audioMetadata is optional, but including it significantly reduces import times.
Key or Flag
Required?
Default value
”objectUrl”
Yes
”title”
No
The file’s path + title
”clientMetadata”
No
”audioMetadata”
No
Keys / Flags that are not required can be omitted from the JSON file entirely.
The JSON format allows you to specify audioMetadata for audio files. This is optional information.
When the audioMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. It is crucial that the metadata you provide is accurate.
Each object in the PDF array is a JSON object with the key objectUrl specifying the full URL of where to find the PDF. The title field is optional. If omitted, the PDF path and name are used as the default title. For example, if the file is located at https://encord-solutions-bucket.s3.eu-west-2.amazonaws.com/path/to/my/bucket/my-document.pdf, the title defaults to /path/to/my/bucket/my-document.pdf.
PDF metadata is distinct from client metadata. clientMetadata allows you to add metadata that can be used for filtering your data in Index.
PDF Metadata must be specified when a Strict client-only access integration is used. In all other cases, pdfMetadata is optional, but including it significantly reduces import times.
Key or Flag
Required?
Default value
”objectUrl”
Yes
”title”
No
The file’s path + title
”clientMetadata”
No
”pdfMetadata”
No
Keys / Flags that are not required can be omitted from the JSON file entirely.
The JSON format allows you to specify pdfMetadata for documents. This is optional information.
When the pdfMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. It is crucial that the metadata you provide is accurate.
Each object in the text file array is a JSON object with the key objectUrl specifying the full URL of where to find the text file. The title field is optional. If omitted, the text file path and name are used as the default title. For example, if the file is located at https://encord-solutions-bucket.s3.eu-west-2.amazonaws.com/path/to/my/bucket/my-file.html, the title defaults to /path/to/my/bucket/my-file.html.
Text files include .txt, .html, .md, .xml, and more.
Text metadata is distinct from client metadata. clientMetadata allows you to add metadata that can be used for filtering your data in Index.
textMetadata must be specified when a Strict client-only access integration is used. In all other cases, textMetadata is optional, but including it significantly reduces import times.
Key or Flag
Required?
Default value
”objectUrl”
Yes
”title”
No
The file’s path + title
”clientMetadata”
No
”textMetadata”
No
Keys / Flags that are not required can be omitted from the JSON file entirely.
The JSON format allows you to specify textMetadata for documents. This is optional information.
When the textMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. It is crucial that the metadata you provide is accurate.
The JSON structure for single images parallels that of videos. The title field is optional. If omitted, the image file path and name are used as the default title. For example, if the file is located at https://encord-solutions-bucket.s3.eu-west-2.amazonaws.com/path/to/my/bucket/image23.jpg, the title defaults to /path/to/my/bucket/image23.jpg.
Key or Flag
Required?
Default value
”objectUrl”
Yes
”title”
No
The file’s path + title
”imageMetadata”*
No
”clientMetadata”
No
”createVideo”
No
false
imageMetadata must be specified when a Strict client-only access integration is used. In all other cases, imageMetadata is optional, but including it significantly reduces import times.
Keys / Flags that are not required can be omitted from the JSON file entirely.
The JSON format allows you to specify imageMetadata for image files. imageMetadata contains essential information used by the Label Editor and is crucial for aligning annotations to the correct image properties.
When the imageMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.
imageMetadata must be specified when a Strict client-only access integration is used. In all other cases, imageMetadata is optional.
Image groups are collections of images that are processed as one annotation task.
Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
Image groups do not require ‘write’ permissions to your cloud storage.
Custom client metadata is defined per image group, not per image. See our documentation here to learn how to add clientMetadata to images in an image group.
Key or Flag
Required?
Default value
Note
”objectUrl_“
Yes
is the number the file occupies in the sequence, starting from 0
”title”
No
”clientMetadata”
No
”createVideo”
No
false
The position of each image within the sequence needs to be specified in the key - e.g. objectUrl_{position_number} as seen in the example below.
Keys / Flags that are not required can be omitted from the JSON file entirely.
Custom metadata (clientMetadata) can be added to individual frames in an image group. However, the frames must first be imported into Index, after which you can create an image group from the frames using the SDK.
Image sequences are collections of images that are processed as one annotation task and represented as a video.
Images within image sequences may be altered as images of varying sizes are resolutions are made to match that of the first image in the sequence.
Creating Image sequences from cloud storage requires ‘write’ permissions, as new files have to be created in order to be read as a video.
Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
Custom client metadata is defined per image sequence, not per image.
The only difference between adding image groups and image sequences via a JSON is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.
Key or Flag
Required?
Default value
”objectUrl”
Yes
”title”
No
”clientMetadata”
No
”createVideo”
no
false
The position of each image within the sequence needs to be specified in the key - e.g objectUrl_{position_number}. See the example below.
Keys / Flags that are not required can be omitted from the JSON file entirely.
Each dicom_series element can contain one or more DICOM series.
Each series requires a title and at least one object URL, as shown in the following example.
Key or Flag
Required?
Default value
Note
”objectUrl_“
Yes
is the number the file occupies in the sequence, starting from 0
”title”
Yes
”clientMetadata”
No
”createVideo”
No
true (change this to false for image groups)
Keys / Flags that are required, such as clientMetadata, can be omitted from the JSON file entirely. clientMetadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specific during the upload to Encord.
The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.
The first series contains only a single object URL, as it is composed of a single file.
The second series contains 3 object URLs, as it is composed of three separate files.
The third series contains 2 object URLs, as it is composed of two separate files.
For each DICOM upload, an additional DicomSeries file is created. This file represents the series file-set. Only DicomSeries are displayed in the Encord application.
In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.Details for each data format are given in the sections below.
Encord supports up to 10,000 entries for upload in the CSV file.
Object URLs can’t contain whitespace.
For backwards compatibility reasons, a single column CSV is supported. A file with the single ObjectUrl column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: “Expected a video, got a file of type XXX”.
A CSV file containing videos should contain two columns with the following mandatory column headings:
‘ObjectURL’ and ‘Video title’. All headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the video resource.
The ‘Video title’ column containing the video_title. If left blank, the original file name is used.
In the example below files 1, 2 and 4 will be assigned the names in the title column, while file 3 will keep its original file name.
ObjectUrl
Video title
path/to/storage-location/frame1.mp4
Video 1
path/to/storage-location/frame2.mp4
Video 2
path/to/storage-location/frame3.mp4
path/to/storage-location/frame4.mp4
Video 3
Single images
A CSV file containing single images should contain two columns with the following mandatory headings:
‘ObjectURL’ and ‘Image title’. All headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the image resource.
The ‘Image title’ column containing the image_title. If left blank, the original file name is used.
In the example below files 1, 2 and 4 will be assigned the names in the title column, while file 3 will keep its original file name.
A CSV file containing image groups should contain three columns with the following mandatory headings:
‘ObjectURL’, ‘Image group title’, and ‘Create video’. All three headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.
The ‘Image group title’ column containing the image_group_title. This field is mandatory, as it determines which image group a file will be assigned to.
In the example below the first two URLs are grouped together into ‘Group 1’, while the following two files are grouped together into ‘Group 2’.
A CSV file containing image sequences should contain three columns with the following mandatory headings: ‘ObjectURL’, ‘Image group title’, and ‘Create video’. All three headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.
The ‘Image group title’ column containing the image_group_title. This field is mandatory, as it determines which image sequence a file will be assigned to. The dimensions of the image sequence are determined by the first file in the sequence.
The ‘Create video’ column. This can be left blank, as the default value is ‘true’.
In the example below the first two URLs are grouped together into ‘Sequence 1’, while the second two files are grouped together into ‘Sequence 2’.
ObjectUrl
Image group title
Create video
path/to/storage-location/frame1.jpg
Sequence 1
true
path/to/storage-location/frame2.jpg
Sequence 1
true
path/to/storage-location/frame3.jpg
Sequence 2
true
path/to/storage-location/frame4.jpg
Sequence 2
true
Image groups and image sequences are only distinguished by the presence of the ‘Create video’ column.
Image sequences require ‘write’ permissions against your storage bucket to save the compressed video.
DICOM
A CSV file containing DICOM files should contain two columns with the following mandatory headings: ‘ObjectURL’ and ‘Dicom title’. Both headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.
The ‘Series title’ column containing the dicom_title. When two files are given the same title they are grouped into the same DICOM series. If left blank, the original file name is used.
In the example below the first two files are grouped into ‘dicom series 1’, the next two files are grouped into ‘dicom series 2’, while the final file will remain separated as ‘dicom series 3’.
You can upload multiple file types with a single CSV file by using a new header each time there is a change of file type. Three headings will be required if image sequences are included.
Since the ‘Create video’ column defaults to true all files that are not image sequences must contain the value false
The example below shows a CSV file for the following:
To ensure smoother uploads and faster completion times, and avoid hitting absolute file limits, we recommend adding smaller batches of data. Limit uploads to 100 videos or up to 1,000 images at a time. You can also create multiple Datasets, all of which can be linked to a single Project. Familiarize yourself with our limits and best practices for data import/registration before adding data to Encord.
Navigate to Files section of Index in the Encord platform.
Click into a Folder.
Click + Upload files.
A dialog appears.
Click Import from cloud data.
We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.
Click Add JSON or CSV files to add a JSON or CSV file specifying cloud data that is to be added.
You can also register your data directly in the Datasets screen. Click here for instructions.
Custom metadata can only be added through JSON uploads in the Encord Platform or using the Encord SDK.
Custom metadata, also known as client metadata, is supplementary information you can add to all data imported into Encord. It is provided in the form of a Python dictionary, as shown in examples. Client metadata serves several key functions:
You can optionally add some custom metadata per data item in the clientMetadata field (examples show how this is done) of your JSON file.
We enforce a 10MB limit on the custom metadata for each data item. Internally, we store custom metadata as a PostgreSQL jsonb type. Read the relevant PostgreSQL documentation about the jsonb type and its behaviors. For example, jsonb type does not preserve key order or duplicate keys.
Metadata schemas, including custom embeddings, can only be imported through the Encord SDK.
Based on your Data Discoverability Strategy, you need to create a metadata schema. The schema provides a method of organization for your custom metadata. Encord supports:
Scalars: Methods for filtering.
Enums: Methods with options for filtering.
Embeddings: Method for embedding plot visualization, similarity search, and natural language search.
Metadata Schema keys support letters (a-z, A-Z), numbers (0-9), and the following blank spaces ( ), hyphens (-), underscores (_), and periods (.).
Custom metadata refers to any additional information you attach to files, allowing for better data curation and management based on your specific needs. It can include any details relevant to your workflow, helping you organize, filter, and retrieve data more efficiently. For example, for a video of a construction site, custom metadata could include fields like "site_location": "Algiers", "project_phase": "foundation", or "weather_conditions": "sunny". This enables more precise tracking and management of your data.Before importing any files with custom metadata to Encord, we recommend that you import a metadata schema. Encord uses metadata schemas to validate custom metadata uploaded to Encord and to instruct Index and Active how to display your metadata.
To handle your custom metadata schema across multiple teams within the same organization, we recommend using namespacing for metadata keys in the schema. This ensures that different teams can define and manage their own metadata schema without conflicts. For example, team A could use video.description, while team B could use audio.description. Another example could be TeamName.MetadataKey. This approach maintains clarity and avoids key collisions across departments.
Metadata Schema keys support letters (a-z, A-Z), numbers (0-9), and blank spaces ( ), hyphens (-), underscores (_), and periods (.). Metadata schema keys are case sensitive.
Use add_scalar to add a scalar key to your metadata schema.
Scalar Key
Description
Display Benefits
boolean
Binary data type with values “true” or “false”.
Filtering by binary values
datetime
ISO 8601 formatted date and time.
Filtering by time and date
number
Numeric data type supporting float values.
Filtering by numeric values
uuid
UUIDv4 formatted unique identifier for a data unit.
Filtering by customer specified unique identifier
varchar
Textual data type. Formally string. string can be used as an alias for varchar, but we STRONGLY RECOMMEND that you use varchar.
Filtering by string.
text
Text data with unlimited length (example: transcripts for audio). Formally long_string. long_string can be used as an alias for text, but we STRONGLY RECOMMEND that you use text.
Storing and filtering large amounts of text.
Use add_enum and add_enum_options to add an enum and enum options to your metadata schema.
Key
Description
Display Benefits
enum
Enumerated type with predefined set of values.
Facilitates categorical filtering and data validation
Use add_embedding to add an embedding to your metadata schema.
Incorrectly specifying a data type in the schema can cause errors when filtering your data in Index or Active. If you encounter errors while filtering, verify your schema is correct. If your schema has errors, correct the errors, re-import the schema, and then re-sync your Active Project.
# Import dependenciesfrom encord import EncordUserClientfrom encord.metadata_schema import MetadataSchemaSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH)# Create the schemametadata_schema = user_client.metadata_schema()# Add various metadata fieldsmetadata_schema.add_scalar("metadata_1", data_type="boolean")metadata_schema.add_scalar("metadata_2", data_type="datetime")metadata_schema.add_scalar("metadata_3", data_type="number")metadata_schema.add_scalar("metadata_4", data_type="uuid")metadata_schema.add_scalar("metadata_5", data_type="varchar")metadata_schema.add_scalar("metadata_6", data_type="text")# Add an enum fieldmetadata_schema.add_enum("my-enum", values=["enum-value-01", "enum-value-02", "enum-value-03"])# Add embedding fieldsmetadata_schema.add_embedding('my-test-active-embedding', size=512)metadata_schema.add_embedding('my-test-index-embedding', size=<values-from-1-to-4096>)# Save the schemametadata_schema.save()# Print the schema for verificationprint(metadata_schema)
After importing your schema to Encord we recommend that you verify that the import is successful. Run the following code to verify your metadata schema imported and that the schema is correct.
Copy
# Import dependenciesfrom encord import EncordUserClientfrom encord.metadata_schema import MetadataSchemaSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH)# Create the schemametadata_schema = user_client.metadata_schema()# Print the schema for verificationprint(metadata_schema)
When updating custom metadata using a JSON file, you MUST specify "skip_duplicate_urls": true and "upsert_metadata": true.Specifying the "skip_duplicate_urls": true and "upsert_metadata": true flags in the JSON file does the following:
New files registered with Encord and custom metadata for those files is added.
Existing files have their existing custom metadata overwritten with the custom metadata specified in the JSON file.
To update custom metadata with a JSON file:
Create a registration JSON file with the updated custom metadata. Include the "skip_duplicate_urls": true and "upsert_metadata": true flags.
Custom metadata updates require "skip_duplicate_urls": true to function. It does not work if "skip_duplicate_urls": false.
Only custom metadata for pre-existing files is updated. Any new files present in the JSON are uploaded.
A key is required in your custom metadata schema for your embeddings. You can use any string as the key for your embeddings. We strongly recommend that you use a string that is meaningful.If you do not include a key in your metadata schema, your imported embeddings are treated as strings.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.
Use add_embedding to add an embedding to your metadata schema.
# Import dependenciesfrom encord import EncordUserClientfrom encord.metadata_schema import MetadataSchemaSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH)# Create the schemametadata_schema = user_client.metadata_schema()# Add embedding fieldsmetadata_schema.add_embedding('my-test-active-embedding', size=512)metadata_schema.add_embedding('my-test-index-embedding', size=<values-from-1-to-4096>)# Save the schemametadata_schema.save()# Print the schema for verificationprint(metadata_schema)
With the key in the custom metadata schema ready, we can now import our embeddings.Custom embedding sizes are flexible and can be set anywhere between 1 and 4096.You can import embeddings after you have added your data or during your data registration.
Your key frames (frames specified with or without embeddings) always appear in Index, regardless of what sampling rate you specify.
Embedding key names can contain alphanumeric (a-z, A-Z, 0-1) characters, hyphens, and underscores.
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundle# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)# Define a dictionary with item UUIDs and their respective metadata updatesupdates = { "<data-ID-1>": {"<my-embedding>": [1.0, 2.0, 3.0]}, "<data-ID-2>": {"<my-embedding>": [1.0, 2.0, 3.0]}}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
This JSON file imports embeddings while registering your data with Index from a cloud integration.config is optional when importing your custom embeddings:
Copy
"config": { "sampling_rate": "<samples-per-second>", "keyframe_mode": "frame" or "seconds",},
If config is not specified, the sampling_rate is 1 frame per second, and the keyframe_mode is frame.
Specifying a sampling_rate of 0 only imports the first frame and all keyframes of your video into Index.
# Import dependenciesfrom encord import EncordUserClientfrom encord.http.bundle import Bundlefrom encord.orm.storage import StorageFolder, StorageItem, StorageItemType, FoldersSortBy# AuthenticationSSH_PATH = "<file-path-to-ssh-private-key>"# Authenticate with Encord using the path to your private keyuser_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key( ssh_private_key_path=SSH_PATH,)updates = { "<data-hash-1>": { "$encord": { "frames": { "<frame-number-1>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values }, "<frame-number-2>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values } } } }, "<data-hash-2>": { "$encord": { "config": { "sampling_rate": <samples-per-second>, # VIDEO ONLY (optional default = 1 sample/second) "keyframe_mode": "frame" or "seconds", # VIDEO ONLY (optional default = "frame") }, "frames": { "<frame-number-1>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values }, "<frame-number-2>": { "<my-embedding>": [1.0, 2.0, 3.0], # custom embedding ("embedding") with float values } } } },}# Use the Bundle context managerwith Bundle() as bundle: # Update the storage items based on the dictionary for item_uuid, metadata_update in updates.items(): item = user_client.get_storage_item(item_uuid=item_uuid) # Make a copy of the current metadata and update it with the new metadata curr_metadata = item.client_metadata.copy() curr_metadata.update(metadata_update) # Update the item with the new metadata and bundle item.update(client_metadata=curr_metadata, bundle=bundle)
To speed up file registration with Encord, you can include metadata for each file in the upload JSON. This metadata is used directly without additional validation and is not stored on our servers. Ensuring accuracy in the metadata you provide is essential to maintain precise labels.
The metadata referenced here is distinct from clientMetadata and serves a different purpose. Documentation for clientMetadata can be found here.
imageMetadata for images:
mimeType: MIME type of the image (e.g., image/jpeg).
fileSize: Size of the file in bytes.
width: Width of the image in pixels.
height: Height of the image in pixels.
audioMetadata for audio files:
duration_seconds (float): Audio duration in seconds.
file_size (int): Size of the audio file in bytes.
mime_type (str): MIME type (e.g., audio/mpeg, audio/wav).
You can check the progress of the processing job by clicking the bell icon in the top right corner of the Encord app.
A spinning progress indicator shows that the processing job is still in progress.
If successful, the processing completes with a green tick icon.
If unsuccessful, there is a red cross icon, as seen below.
If the upload is unsuccessful, ensure that:
Your provider permissions are set correctly
The object data format is supported
The upload JSON or CSV file is correctly formatted.
Check which files failed to upload by clicking the Export icon to download a CSV log file. Every row in the CSV corresponds to a file which failed to be uploaded.
You only see failed uploads if the Ignore individual file errors toggle was not enabled during cloud data registration.
Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for uploading cloud data to Encord, by constructing the URLs from the specified path in your private storage.
Legacy: those without regions or those with S3-<region> in the URL
AWS best practice is to use Virtual-hosted style. Path-style is planned to be deprecated and the legacy URLs are already deprecated.We support Virtual-hosted style, Path-style and S3 protocol object URLs. We recommend you use Virtual-hosted style object URLs wherever possible.Object URLs can be found in the Properties tab of the object in question. Navigate to AWS S3 >bucket>object> Properties to find the Object URL.Here’s a python script which creates a JSON file for single images by constructing the URLs from the specified path in a given S3 bucket. You’ll need to configure the following variables to match your setup.
region: the AWS region where your S3 bucket is.
aws_profile: the name of the profile in the AWS ~/.aws/credentials file. See AWS Credentials Documentation to properly set up the credentials file.
bucket_name: the name of your S3 bucket you want to pull files from.
s3_directory: the path to the directory in the S3 bucket where your files are stored.
In this Amazon S3 Virtual-hosted style URLs example, my-bucket is the bucket name,
us-west-2 is the region, and images/dogs is the S3 directory:
The following Python script generates a JSON file for uploading cloud data to Encord, specifically for single images stored in a designated GCP Storage bucket. The resulting JSON file includes only images.
To run this script, you must have gsutil installed.
Before using the script, make sure to:
Specify your bucket name in the bucket_name variable.
Decide which GCP authentication method to use. Scripts for 3 options are provided.