Private cloud integration
Before adding your cloud data to a dataset, you need to integrate your cloud storage with Encord. Please see the Data integrations section to learn how to create integrations for AWS S3 , Azure blob, GCP storage or Open Telekom Cloud.
To add your cloud-stored data, turn on the Private cloud toggle in the Upload data part of the data creation flow.

To add your cloud data
- Select the relevant integration using the Select integration drop down
- Upload an appropriately formatted JSON or CSV file specifying the data you would like to add to the dataset
Please see below on how to format an appropriate JSON or CSV file. Once the file has been specified, select one of your integrations then click the upload rectangle or drag the file into it.
Your stored objects may contain files which are not supported by Encord and which may produce errors on upload. If this is the case, toggle the 'Ignore individual file errors' toggle.
Once the JSON or CSV file is uploaded, click the Create dataset button. The data will now be fetched from your cloud storage and processed asynchronously. This processing involves fetching appropriate metadata and other file information to help us render the files appropriately and to check for any framerate inconsistencies. We do not store your files in any way.
You can check the progress of the processing job by clicking the notification bell in the top right. A spinning progress indicator will indicate the processing job is still in progress. If successful, the processing will complete with a green tick; if not, there will be a red cross. If this is the case, please check that your provider permissions have been set correctly, that the object data format is supported, and that the JSON or CSV file is correctly formatted.
JSON format
The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the content
you wish add to the dataset. The object URLs must not contain any whitespace. You can add one data type at a time, or
combine multiple data types in one JSON file according to your preferences or development flows. The supported top-level
keys are videos
, image_groups
, images
, and dicom_series
. The format for each is described in detail below.
You can optionally add some custom client metadata per data item in the clientMetadata
field.
See examples below on how to add this.
It is important to know that we enforce a 10MB limit on the client metadata per data item.
Also, this metadata is being stored as a PostgreSQL jsonb
type internally.
Please read the relevant PostgreSQL docs about the jsonb
type and its behaviours.
For example, jsonb
type will not preserve key order or duplicate keys.
Add the top level "skip_duplicate_urls": true
flag to make the uploads idempotent.
Skipping URLs already in the dataset can help complete large upload operations, which may have been interrupted due to unstable network, etc. Since previously processed assets don't have to be uploaded again, you can simply retry the failed operation without editing the upload specification file.
See the example in the Videos section below.
The exact semantics are discussed in the relevant sections below.
The default of this flag is set to false
.
This is currently only supported for the JSON uploads.
Videos
Each object in the videos
array is a JSON object with the key objectUrl
specifying the full URL of where to find the
video resource.
The title
field is optional.
If not specified, the file name of the video will be used.
Here we add the skip_duplicate_urls
flag and set it to true
.
If set to true
, videos that have been previously uploaded to the dataset with the same object URL will be skipped.
The default of this flag is set to false
.
See the sample below.
{
"videos": [
{
"objectUrl": "<object url>"
},
{
"objectUrl": "<object url>",
"title": "my-custom-video-title.mp4",
"clientMetadata": {"optional": "metadata"}
}
],
"skip_duplicate_urls": true,
}
Single Images
The JSON structure for images
parallels that of videos
.
The title
field is optional.
If not specified, the file name of the image will be used.
If the skip_duplicate_urls
is set to true
, images that have been previously uploaded to the dataset with the same object URL will be skipped.
The default of this flag is set to false
.
See the sample below.
{
"images": [
{
"objectUrl": "<object url>"
},
{
"objectUrl": "<object url>",
"title": "my-custom-image-title.jpeg",
"clientMetadata": {"optional": "metadata"}
}
]
}
Image groups
Image groups are a group of images that should be processed as one annotation task. Encord supports representing image groups in two formats. The first format is called the "native" or "original" representation where images are presented unaltered. This means that images of different sizes and resolutions can form one image group, and no data is lost. The second format is what we call the video representation, sometimes also known as an image sequence. For further details, see the relevant editor documentation. For the details on how to select the original/video representation when uploading from private cloud, please consult the documentation below.
Each object in the image_groups
array represents an individual image group. In all cases, it is necessary to specify
the title
of the group and the position of each image in the group by naming the keys according to the sequence number
e.g. objectUrl_#{sequence_number}
. The objectUrl_#{sequence_number}
keys need to be in order for the upload to succeed,
as shown in the sample below.
The createVideo
argument specifies if an image group will be created as the video representation or not.
It's an optional parameter to the JSON format. Leave the parameter out, or include it and set to true
to use the video representation. Include the value and set explicitly to false
to use the original images.
CSV details are provided below.
The column create_video
is mandatory,
if value is left blank at any given row the default value will be true
, explicitly set it to false
to opt out of
video representation. Image groups without a video representation do not require write permissions to your private bucket.
See the sample below.
If the skip_duplicate_urls
is set to true
, image groups where all object URLs exactly match an existing image group in the dataset will be skipped.
The default of this flag is set to false
.
The custom client metadata is per image group, not per image.
{
"image_groups": [
{
"title": "<title 1>",
"createVideo": true,
"objectUrl_0": "<object url>"
},
{
"title": "<title 2>",
"createVideo": false,
"objectUrl_0": "<object url>",
"objectUrl_1": "<object url>",
"objectUrl_2": "<object url>",
"clientMetadata": {"optional": "metadata"}
}
]
}
DICOM
Like image_groups
, the dicom_series
elements require a title and a sequenced object URL. See the sample below.
If the skip_duplicate_urls
is set to true
, DICOM series where all object URLs exactly match an existing DICOM series in the dataset will be skipped.
The default of this flag is set to false
.
The custom client metadata is per dicom series.
{
"dicom_series": [
{
"title": "<title 1>",
"objectUrl_0": "<object url>"
},
{
"title": "<title 2>",
"objectUrl_0": "<object url>",
"objectUrl_1": "<object url>",
"objectUrl_2": "<object url>",
"clientMetadata": {"optional": "metadata"}
}
]
}
CSV format
The CSV file should be structured with three columns with the following headings:
ObjectUrl
, Image group title
, and Create video
. ObjectUrl is used for all data modalities, and specifies the URL
of the resource. The object URLs are from your cloud provider and can not contain any whitespace.
The other two columns must be present in the CSV in all cases, but only need to contain a value when creating an image group.
Image group title
is the name of the group to which to assign the image. You can leave this column blank in other cases.
The Create video
argument specifies if the image group will be created as the video representation or not. Again, the column
heading is necessary in all cases, but a value is only necessary when dealing with image groups. The behavior will default to
true
(use the video representation) -- set the value to false
for all images in an image group to use the original representation.
Note that the video representation requires write permissions against your storage bucket to save the compressed video. The original
image representation does not require any permissions beyond reading individual objects.
Here is the format if there are 3 videos, and 3 images split into 2 image groups
ObjectUrl | Image group title | Create video |
---|---|---|
<object url> | ||
<object url> | ||
<object url> | ||
<object url> | <title 1> | true |
<object url> | <title 1> | true |
<object url> | <title 2> | false |
See below for examples for each of the providers we support.
Examples and Helpful Scripts
Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for the dataset creation process, by constructing the URLs from the specified path in your private storage.
AWS S3
AWS S3 object URLs can follow a few set patterns:
- Virtual-hosted style:
https://<bucket-name>.s3.<region>.amazonaws.com/<key-name>
- Path-style:
https://s3.<region>.amazonaws.com/<bucket-name>/<key-name>
- S3 protocol:
S3://<bucket-name>/<key-name>
- Legacy: those without regions or those with
S3-<region>
in the URL
AWS best practice is to use Virtual-hosted style. Path-style is planned to be deprecated and the legacy URLs are already deprecated.
We support Virtual-hosted style, Path-style and S3 protocol object URLs. We recommend you use Virtual-hosted style object URLs wherever possible.
Object URLs can be found in the Properties tab of the object in question. Navigate to AWS S3 > bucket > object > Properties to find the Object URL.

Here is an example of a JSON file with two images, two videos, and three image groups
{
"images": [
{
"objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Image1.png"
},
{
"objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Image2.png"
}
],
"videos": [
{
"objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Cooking.mp4"
},
{
"objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Oranges.mp4"
}
],
"image_groups": [
{
"title": "apple-samsung-light",
"createVideo": true,
"objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(32).jpg",
"objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(33).jpg",
"objectUrl_2": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(34).jpg",
"objectUrl_3": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(35).jpg"
},
{
"title": "apple-samsung-dark",
"createVideo": true,
"objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(32).jpg",
"objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(33).jpg",
"objectUrl_2": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(34).jpg",
"objectUrl_3": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(35).jpg"
},
{
"title": "apple-ios-light",
"createVideo": false,
"objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(32).jpg",
"objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(33).jpg"
}
]
}
Here are the same object URLs in a CSV file
ObjectUrl | Image group title |
---|---|
https://cord-dev.s3.eu-west-2.amazonaws.com/Cooking.mp4 | |
https://cord-dev.s3.eu-west-2.amazonaws.com/Oranges.mp4 | |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(32).jpg | apple-samsung-light |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(33).jpg | apple-samsung-light |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(34).jpg | apple-samsung-light |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(35).jpg | apple-samsung-light |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(32).jpg | apple-samsung-dark |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(33).jpg | apple-samsung-dark |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(34).jpg | apple-samsung-dark |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(35).jpg | apple-samsung-dark |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(32).jpg | apple-ios-light |
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(33).jpg | apple-ios-light |
Here's a python script which creates a JSON file for single images by constructing the URLs from the specified path in a given S3 bucket. You'll need to configure the following variables to match your setup.
- region: needs to be the AWS resource region you intend to use. For S3, it's the region where your bucket is.
- aws_profile: is the name of the profile in the AWS ~/.aws/credentials file. See AWS Credentials Documentation to set up the credentials file properly.
- bucket_name: the name of your S3 bucket you want to pull files from.
- s3_directory: the path to the directory where your files are stored inside the S3 bucket. Include all slashes but final slash. For example:
# my file is at my-bucket/some_top_level_dir/video_files/my_video.mp4
# then set s3 directory as follows
s3_directory = 'some_top_level_dir/video_files'
And the script itself:
import boto3
import logging
import sys
import json
from botocore.config import Config
region = 'FILL_ME_IN'
aws_profile = 'FILL_ME_IN'
bucket_name = 'FILL_ME_IN'
s3_directory = 'FILL_ME_IN'
domain = f's3.{region}.amazonaws.com'
root_url = f'https://{domain}/{bucket_name}'
session = boto3.Session(profile_name=aws_profile)
sandbox_s3_client = session.client('s3')
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)
images = []
for object_summary in bucket.objects.all():
split_key = object_summary.key.split('/')
if len(split_key) >= 2 and '/'.join(split_key[0:-1]) == s3_directory:
object_url = f'{root_url}/{object_summary.key}'
images.append({'objectUrl': object_url})
outer_json_dict = {
"images": images
}
output_file = open(f'upload_images_{s3_directory}.json', 'w')
json.dump(outer_json_dict, output_file, indent=4)
output_file.close()
Azure blob
{
"videos": [
{
"objectUrl": "https://myaccount.blob.core.windows.net/myblob"
},
{
"objectUrl": "https://myaccount.blob.core.windows.net/mycontainer/myblob.jpg"
},
{
"objectUrl": "https://myaccount.blob.core.windows.net/mycontainer/myblobs/myblob.jpg"
}
],
"image_groups": [
{
"title": "image_group_1",
"objectUrl_0": "https://myaccount.blob.core.windows.net/mycontainer/myblob1.jpg",
"objectUrl_1": "https://myaccount.blob.core.windows.net/mycontainer/myblob2.jpg"
},
{
"title": "image_group2",
"objectUrl_0": "https://myaccount.blob.core.windows.net/mycontainer/myblob3.jpg",
"objectUrl_1": "https://myaccount.blob.core.windows.net/mycontainer/myblob4.jpg"
}
]
}
GCP storage
{
"videos": [
{
"objectUrl": "gs://example-url/object.mp4"
}
],
"image_groups": [
{
"title": "image_group_1",
"objectUrl_0": "https://storage.cloud.google.com/example-image-bucket/object_1.jpg",
"objectUrl_1": "https://storage.cloud.google.com/example-image-bucket/object_2.jpg"
},
{
"title": "image_group_2",
"objectUrl_0": "https://storage.cloud.google.com/example-image-bucket/object_3.jpg",
"objectUrl_1": "https://storage.cloud.google.com/example-image-bucket/object_4.jpg"
}
]
}
Open Telekom Cloud OSS
{
"dicom_series": [
{
"title": "OPEN_TELEKOM_DICOM_SERIES",
"objectUrl_0": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-0",
"objectUrl_1": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-1",
"objectUrl_2": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-2",
"objectUrl_3": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-3"
}
]
}