Skip to main content

Private cloud integration

Before adding your cloud data to a dataset, you need to integrate your cloud storage with Encord. Please see the Data integrations section to learn how to create integrations for AWS S3 , Azure blob, GCP storage or Open Telekom Cloud.

To add your cloud-stored data, toggle the Private cloud toggle to on in the Upload data part of the data creation flow.

To add your cloud data

  1. Select the relevant integration using the 'Select integration' drop down
  2. Upload an appropriately formatted JSON or CSV file specifying the data you would like to add to the dataset

Please see below on how to format an appropriate JSON or CSV file. Once the file has been specified, select one of your integrations then click the upload rectangle or drag the file into it.

Your stored objects may contain files which are not supported by Encord and which may produce errors on upload. If this is the case, toggle the 'Ignore individual file errors' toggle.

Once the JSON or CSV file is uploaded, click the Create dataset button. The data will now be fetched from your cloud storage and processed asynchronously. This processing involves fetching appropriate metadata and other file information to help us render the files appropriately and to check for any framerate inconsistencies. We do not store your files in any way.

You can check the progress of the processing job by clicking the notification bell in the top right. A spinning progress indicator will indicate the processing job is still in progress. If successful, the processing will complete with a green tick; if not, there will be a red cross. If this is the case, please check that your provider permissions have been set correctly, that the object data format is supported, and that the JSON or CSV file is correctly formatted.

JSON format#

The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the content you wish add to the dataset. The object URLs can not contain any whitespace. You can add one data type at a time, or combine multiple data types in one JSON file according to your preferences or development flows. The supported top-level keys are videos, image_groups, images, and dicom_series. The format for each is described in detail below.

Videos#

Each object in the videos array is a JSON object with the key objectUrl specifying the full URL of where to find the video resource. See the sample below.

{  "videos": [    {      "objectUrl": "<object url>"    },    {      "objectUrl": "<object url>"    }  ]}

Single Images#

The JSON structure for images parallels that of videos. See the sample below.

{  "images": [    {      "objectUrl": "<object url>"    },    {      "objectUrl": "<object url>"    }  ],}

Image Sequences#

Each object in the image_groups array represents a titled sequence of images which should form one sequence. As such, it is necessary to specify the title of the sequence and the position of each image in the sequence by naming the keys according to the sequence number e.g. objectUrl_#{sequence_number}. See the sample below.

{  "image_groups": [    {      "title": "<title 1>",      "objectUrl_0": "<object url>"    },    {      "title": "<title 2>",      "objectUrl_0": "<object url>",      "objectUrl_1": "<object url>",      "objectUrl_2": "<object url>"    }  ]}

DICOM#

Like image_groups, the dicom_series elements require a title and a sequenced object URL. See the sample below.

{  "dicom_series": [    {      "title": "<title 1>",      "objectUrl_0": "<object url>"    },    {      "title": "<title 2>",      "objectUrl_0": "<object url>",      "objectUrl_1": "<object url>",      "objectUrl_2": "<object url>"    }  ]}

CSV format#

The CSV file should be structured as two columns each with a heading. ObjectUrl contains the object URLs and Image group title with the image group name if the object is an image, otherwise leave blank. The object URLs are from your cloud provider and can not contain any whitespace. Here is the format if there are 3 videos, and 3 images split into 2 image groups

ObjectUrlImage group title
<object url>
<object url>
<object url>
<object url><title 1>
<object url><title 1>
<object url><title 2>

See below for examples for each of the providers we support.

Examples and Helpful Scripts#

Use the following examples and helpful scripts to quickly learn how to create JSON and CSV files formatted for the dataset creation process, by constructing the URLs from the specified path in your private storage.

AWS S3

AWS S3 object URLs can follow a few set patterns:

  • Virtual-hosted style: https://<bucket-name>.s3.<region>.amazonaws.com/<key-name>
  • Path-style: https://s3.<region>.amazonaws.com/<bucket-name>/<key-name>
  • S3 protocol: S3://<bucket-name>/<key-name>
  • Legacy: those without regions or those with S3-<region> in the URL

AWS best practice is to use Virtual-hosted style. Path-style is planned to be deprecated and the legacy URLs are already deprecated.

We support Virtual-hosted style, Path-style and S3 protocol object URLs. We recommend you use Virtual-hosted style object URLs wherever possible.

Object URLs can be found in the Properties tab of the object in question. Navigate to AWS S3 > bucket > object > Properties to find the Object URL.

Here is an example of a JSON file with two images, two videos, and three image groups


{  "images": [    {      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Image1.png"    },    {      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Image2.png"    }  ],  "videos": [    {      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Cooking.mp4"    },    {      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Oranges.mp4"    }  ],  "image_groups": [    {      "title": "apple-samsung-light",      "objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(32).jpg",      "objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(33).jpg",      "objectUrl_2": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(34).jpg",      "objectUrl_3": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(35).jpg"    },    {      "title": "apple-samsung-dark",      "objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(32).jpg",      "objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(33).jpg",      "objectUrl_2": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(34).jpg",      "objectUrl_3": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(35).jpg"    },    {      "title": "apple-ios-light",      "objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(32).jpg",      "objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(33).jpg"    }  ]}

Here are the same object URLs in a CSV file

ObjectUrlImage group title
https://cord-dev.s3.eu-west-2.amazonaws.com/Cooking.mp4
https://cord-dev.s3.eu-west-2.amazonaws.com/Oranges.mp4
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(32).jpgapple-samsung-light
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(33).jpgapple-samsung-light
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(34).jpgapple-samsung-light
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(35).jpgapple-samsung-light
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(32).jpgapple-samsung-dark
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(33).jpgapple-samsung-dark
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(34).jpgapple-samsung-dark
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(35).jpgapple-samsung-dark
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(32).jpgapple-ios-light
https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(33).jpgapple-ios-light

Here's a python script which creates a JSON file for single images by constructing the URLs from the specified path in a given S3 bucket. You'll need to configure the following variables to match your setup.

  1. region: needs to be the AWS resource region you intend to use. For S3, it's the region where your bucket is.
  2. aws_profile: is the name of the profile in the AWS ~/.aws/credentials file. See AWS Credentials Documentation to set up the credentials file properly.
  3. bucket_name: the name of your S3 bucket you want to pull files from.
  4. s3_directory: the path to the directory where your files are stored inside the S3 bucket. Include all slashes but final slash. For example:
# my file is at my-bucket/some_top_level_dir/video_files/my_video.mp4# then set s3 directory as followss3_directory = 'some_top_level_dir/video_files'

And the script itself:

import boto3import loggingimport sysimport jsonfrom botocore.config import Config
region = 'FILL_ME_IN'aws_profile = 'FILL_ME_IN'bucket_name = 'FILL_ME_IN's3_directory = 'FILL_ME_IN'
domain = f's3.{region}.amazonaws.com'root_url = f'https://{domain}/{bucket_name}'session = boto3.Session(profile_name=aws_profile)sandbox_s3_client = session.client('s3')s3 = boto3.resource('s3')bucket = s3.Bucket(bucket_name)
images = []for object_summary in bucket.objects.all():    split_key = object_summary.key.split('/')
    if len(split_key) >= 2 and '/'.join(split_key[0:-1]) == s3_directory:        object_url = f'{root_url}/{object_summary.key}'        images.append({'objectUrl': object_url})
outer_json_dict = {    "images": images}
output_file = open(f'upload_images_{s3_directory}.json', 'w')json.dump(outer_json_dict, output_file, indent=4)output_file.close()
Azure blob
Pardon our dust! Please contact hello@encord.com to find out more.
GCP storage
Pardon our dust! Please contact hello@encord.com to find out more.
Open Telekom Cloud OSS
{  "dicom_series": [    {      "title": "OPEN_TELEKOM_DICOM_SERIES",      "objectUrl_0": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-0",      "objectUrl_1": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-1",      "objectUrl_2": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-2",      "objectUrl_3": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-3"    }  ]}