Upload Cloud Data
This documentation is only relevant for customers with early access to Encord Files. Contact support@encord.com to learn more and gain access. If you do not have access to Files, see our documentation here to learn about uploading cloud data. Data from your private cloud can be uploaded to Files, or directly to Dataset.
At least one data integration is required to upload cloud data to Encord. Encord can integrate with the following cloud service providers:
Upload cloud data to Files
- Navigate to Files section of Index in the Encord platform.
- Click into a Folder.
- Click + Upload files. A dialog appears.
- Click Import from cloud data.
Upload cloud data to Datasets
-
Select the Dataset you want to upload data to.
-
Click +Upload files.
-
Select an folder to store the files in, or create a new folder.
-
Select the Import from private cloud tab and select the integration you want to use.
- Click Add JSON or CSV files to upload a JSON or CSV file specifying the cloud data that is to be added to the Dataset. Turn on the Ignore individual file errors toggle to ignore errors caused by files not supported by Encord.
- Click Import to add your cloud data to the Dataset.
Check upload status
You can check the progress of the processing job by clicking the bell icon in the top right.
A spinning progress indicator shows that the processing job is still in progress.
- If successful, the processing completes with a green tick icon.
- If unsuccessful, there is a red cross icon, as seen below.
If this is the case, please check that your provider permissions have been set correctly, that the object data format is supported, and that the JSON or CSV file is correctly formatted.
Check which files failed to upload by clicking the Export icon to download a CSV log file. Every row in the CSV will correspond to a file which failed to be uploaded.
Specify cloud data
To upload private cloud data, you must supply either a JSON or CSV file, specifying the URLs of all the files you want to add. Click Add JSON or CSV files when uploading cloud data to add a JSON or CSV file.
JSON Format
The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the files you want to upload to Encord. You can add one data type at a time, or combine multiple data types in one JSON.
The supported top-level keys are: videos
, audio
, image_groups
, images
, and dicom_series
. The details for each data format are given in the sections below.
skip_duplicate_urls
key can be included and set to true
, ensuring that all object URLs that exactly match existing files in the Dataset are skipped.Encord enforces the following upload limits for each JSON file used for file uploads:
- Up to 1 million URLs
- A maximum of 500,000 items (e.g. images, image groups, videos, DICOMs)
- URLs can be up to 16 KB in size
Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.
Client metadata & skip duplicate URLs
You can optionally add some custom client metadata per data item in the clientMetadata
field (examples show how this is done). Client metadata is separate from video_metadata
, and is intended as an arbitrary store of data you want to associate with a file.
We enforce a 10MB limit on the client metadata for each data item. Internally, we store client metadata is stored as a PostgreSQL jsonb
type. Read the relevant PostgreSQL documentation about the jsonb
type and its behaviors. For example, jsonb
type does not preserve key order or duplicate keys.
Add the "skip_duplicate_urls": true
flag at the top level to make the uploads idempotent. Skipping URLs in the Dataset can help speed up large upload operations. Since previously processed assets do not have to be uploaded again, you can simply retry the failed operation without editing the upload specification file. The flag’s default value isfalse
.
Update client metadata
To update client metadata:
- Create an upload JSON file with the updated client metadata. Include the
"skip_duplicate_urls": true
andupsert_metadata: true
flags.
- Client metadata updates require
skip_duplicate_urls: true
to function. It does not work ifskip_duplicate_urls: false
. - Only client metadata for pre-existing files is updated. Any new files present in the JSON are uploaded.
{
"videos": [
{
"objectUrl": "<object url_1>"
},
{
"objectUrl": "<object url_2>",
"title": "my-custom-video-title.mp4",
"clientMetadata": {"optional": "metadata"}
}
],
"skip_duplicate_urls": true,
"upsert_metadata": true
}
- Start a new file upload to Encord using the new JSON file.
When using a Multi-Region Access Point
When using a Multi-Region Access Point for your AWS S3 buckets, specify objects using the ARN of the Multi-Region Access Point followed by the object name. The following example demonstrates how to specify video files from a Multi-Region Access Point.
{
"videos": [
{
"objectUrl": "Multi-Region-Access-Point-ARN + <object name_1>"
},
{
"objectUrl": "Multi-Region-Access-Point-ARN + <object name_2>",
"title": "my-custom-video-title.mp4",
"clientMetadata": {"optional": "metadata"}
}
],
"skip_duplicate_urls": true
}
CSV Format
In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.
Details for each data format are given in the sections below.
- Object URLs can’t contain whitespace.
- For backwards compatibility reasons, a single column CSV is supported. A file with the single
ObjectUrl
column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: “Expected a video, got a file of type XXX”.
Help Scripts and Examples
Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for the dataset creation process, by constructing the URLs from the specified path in your private storage.
Was this page helpful?