Private cloud integration

Private cloud integration

Before adding your cloud data to a Dataset, you need to integrate your cloud storage with Encord.

Please see the Data integrations section to learn how to create integrations for:


To add your cloud data to a Dataset:

  1. Turn on the Import from integration toggle in the Create dataset part of the data creation flow when creating a new Dataset.

  2. Select the relevant integration using the Select integration drop-down.

  1. Upload an appropriately formatted JSON or CSV file specifying the data you would like to add to the Dataset. Your stored objects may contain files that are not supported by Encord, which may produce errors on upload - toggle the Ignore individual file errors toggle to ignore these.

👍

Tip

We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.

👍

Tip

We recommend setting the expiration time for pre-signed URLs, in your cloud storage settings, to be greater than the time it takes to complete an annotation task. More information can be found in the documentation of your cloud service provider:

  1. Click Add data to add data.

ℹ️

Note

The data will be fetched from your cloud storage and processed asynchronously. This involves fetching appropriate metadata and other file information to help us render the files appropriately and to check for any framerate inconsistencies. We do not store your files in any way.

Checking upload status

You can check the progress of the processing job by clicking in the top right.
A spinning progress indicator will indicate the processing job is still in progress.

  • If successful, the processing will complete with a icon.
  • If unsuccessful, there will be a icon. Ensure that your provider permissions have been set correctly, that the object data format is supported, and that the JSON or CSV file is correctly formatted.

Check which files failed to upload by clicking the icon to download a CSV log file. Every row in the CSV will correspond to a file which failed to be uploaded.

ℹ️

Note

You will only see failed uploads if the Ignore individual file errors toggle was not enabled when uploading your data.


Creating a Dataset using cloud data

To create a Dataset using data from your private cloud, you will need to upload either a JSON or CSV file, specifying the URLs of all the files you'd like to add.

👍

Tip

We recommend uploading files in batches not exceeding 2GB, to ensure upload does not exceed 3 hours.

JSON format

The JSON file format is a JSON object with top-level keys specifying the type of data and object URLs of the content you wish to add to the dataset. Object URLs must not contain any whitespace. You can add one data type at a time, or combine multiple data types in one JSON file according to your preferences or development flows. The supported top-level keys are: videos, image_groups, images, and dicom_series. The details for each data format are given in the sections below.

❗️

CRITICAL INFORMATION

Encord supports up to 10,000 entries in the JSON file when uploading data to Encord.

Videos

Each object in the videos array is a JSON object with the key objectUrl specifying the full URL of where to find the video resource. The title field is optional. If not specified, the video's file name will be used.

  • Video metadata (separate from client metadata) may be specified for videos. Click here to read more.

  • If skip_duplicate_urls is set to true, all object URLs that exactly match existing videos in the Dataset will be skipped.

Key or FlagRequired?Default value
"objectUrl"Yes
"title"No<file title>
"clientMetadata"No
"skip_duplicate_urls"Nofalse
"createVideo"Nofalse

ℹ️

Note

Keys / Flags that aren't required can be omitted from the JSON file entirely.

{
  "videos": [
    {
      "objectUrl": "<object url_1>"
    },
    {
      "objectUrl": "<object url_2>",
      "title": "my-custom-video-title.mp4",
      "clientMetadata": {"optional": "metadata"}
    }
  ],
  "skip_duplicate_urls": true
}
Video metadata

The JSON format allows you to specify videoMetadata for video files. videoMetadata is essential information used by the Label Editor and is crucial for aligning annotations to the correct frame.

❗️

CRITICAL INFORMATION

When the videometadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.

ℹ️

Note

videoMetadata must be specified when a Strict client-only access integration is used. In all other cases videoMetadata is optional.

Example JSON including video metadata
{
    "videos": [
      {
        "objectUrl": "video_file.mp4",
        "videoMetadata": {
            "fps": 23.98,
            "duration": 29.09,
            "width": 1280,
            "height": 720,
            "file_size": 5468354,
            "mime_type": "video/mp4"
        }
      }
    ]
  }

  • fps: Frames per second.
  • duration: Duration of the video (in seconds).
  • width / height: Dimensions of the video (in pixels).
  • file_size: The size of the file (in bytes).
  • mime_type: Specifies the file type extension according to the MIME standard.

When videos are supplied with video metadata, Encord assumes the metadata to be correct and our servers will neither download nor pre-process your data. This may be a particularly useful feature for customers with strict data compliance concerns.

One way to find the necessary metadata is shown below. Run the following commands in your terminal.

  • ffmpeg -i 'video_title.mp4' to retrieve fps, duration, width, and height - as highlighted below.
  • ls -l 'video_title.mp4' to retrieve the file size - as highlighted below.

Single images

The JSON structure for single images parallels that of videos.

  • The title field is optional.
  • If not specified, the file name of the image will be used.
  • If skip_duplicate_urls is set to true, images that have been previously uploaded to the dataset with the same object URL will be skipped.
  • Image metadata (separate from client metadata) may be specified for images. Click here to read more.
Key or FlagRequired?Default value
"objectUrl"Yes
"title"No<file title>
"clientMetadata"No
"skip_duplicate_urls"Nofalse
"createVideo"Nofalse

ℹ️

Note

Keys / Flags that are not required can be omitted from the JSON file entirely.

{
  "images": [
    {
      "objectUrl": "<object url>"
    },
    {
      "objectUrl": "<object url>",
      "title": "my-custom-image-title.jpeg",
      "clientMetadata": {"optional": "metadata"}
    }
  ]
}
Image metadata

The JSON format allows you to specify imageMetadata for image files. imageMetadata contains essential information used by the Label Editor and is crucial for aligning annotations to the correct image properties.

❗️

CRITICAL INFORMATION

When the imageMetadata flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.

ℹ️

Note

imageMetadata must be specified when a Strict client-only access integration is used. In all other cases, imageMetadata is optional.

Example JSON including image metadata
{
  "images": [
    {
      "objectUrl": "s3://my_image.jpg",
      "imageMetadata": {
        "mimeType": "image/jpg",
        "fileSize": 124,
        "width": 640,
        "height": 480
      }
    }
  ]
}
  • objectUrl: URL or path to the image file.
  • mimeType: The MIME type of the image file (e.g., image/jpg, image/png).
  • fileSize: The size of the image file in bytes.
  • width: The width of the image in pixels.
  • height: The height of the image in pixels.

Image groups

  • Image groups are collections of images that are processed as one annotation task.
  • Images within image groups remain unaltered, meaning that images of different sizes and resolutions can form an image group without the loss of data.
  • Image groups do not require 'write' permissions to your cloud storage.
  • Custom client metadata is defined per image group, not per image.
  • If skip_duplicate_urls is set to true, all URLs exactly matching existing image groups in the dataset will be skipped.
Key or FlagRequired?Default value
"objectUrl"Yes
"title"Yes<file title>
"clientMetadata"No
"skip_duplicate_urls"Nofalse
"createVideo"Yestrue (change this to false for image groups)

ℹ️

Note

The position of each image within the sequence needs to be specified in the key - e.g. objectUrl_{position_number} as seen in the example below.

ℹ️

Note

Keys / Flags that aren't required can be omitted from the JSON file entirely.

ℹ️

Note

Set the "createVideo" flag to false for image groups.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": false,
      "objectUrl_0": "<object url>"
    },
    {
      "title": "<title 2>",
      "createVideo": false,
      "objectUrl_0": "<object url>",
      "objectUrl_1": "<object url>",
      "objectUrl_2": "<object url>",
      "clientMetadata": {"optional": "metadata"}
    }
  ]
}

Image sequences

  • Image sequences are collections of images that are processed as one annotation task and represented as a video.
  • Images within image sequences may be altered as images of varying sizes are resolutions are made to match that of the first image in the sequence.
  • Creating Image sequences from cloud storage requires 'write' permissions, as new files have to be created in order to be read as a video.
  • Each object in the image_groups array with the createVideo flag set to true represents a single image sequence.
  • Custom client metadata is defined per image sequence, not per image.
  • If skip_duplicate_urls is set to true, all URLs exactly matching existing image sequences in the Dataset are skipped.

👍

Tip

The only difference between adding image groups and image sequences via a JSON is that image sequences require the createVideo flag to be set to true. Both use the key image_groups.

Key or FlagRequired?Default value
"objectUrl"Yes
"title"Yes<file title>
"clientMetadata"No
"skip_duplicate_urls"Nofalse
"createVideo"Yestrue

ℹ️

Note

The position of each image within the sequence needs to be specified in the key - e.g objectUrl_{position_number}. See the example below.

ℹ️

Note

Keys / Flags that are not required can be omitted from the JSON file entirely.

{
  "image_groups": [
    {
      "title": "<title 1>",
      "createVideo": true,
      "objectUrl_0": "<object url>"
    },
    {
      "title": "<title 2>",
      "createVideo": true,
      "objectUrl_0": "<object url>",
      "objectUrl_1": "<object url>",
      "objectUrl_2": "<object url>",
      "clientMetadata": {"optional": "metadata"}
    }
  ]
}

DICOM

ℹ️

Note

Ensure your DICOM files and metadata follow the format outlined in the official DICOM specification.

  • Each dicom_series element can contain one or more DICOM series.
  • Each file requires a title and at least one object URL, as shown in the example below.
  • If skip_duplicate_urls is set to true, all object URLs exactly matching existing DICOM files in the Dataset are skipped.
Key or FlagRequired?Default value
"objectUrl"Yes
"title"Yes<file title>
"clientMetadata"No
"skip_duplicate_urls"Nofalse
"createVideo"Yesfalse

ℹ️

Note

Keys / Flags that are not required, such as clientMetadata, can be omitted from the JSON file entirely. clientMetadata is distinct from patient metadata, which is included in the .dcm file and does not have to be specified during the upload to Encord.

The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.

  • The first series contains only a single object URL, as it is composed of a single file.
  • The second series contains 3 object URLs, as it is composed of three separate files.
  • The third series contains 2 object URLs, as it is composed of two separate files.
{
  "dicom_series": [
    {
      "title": "<series-1>",
      "objectUrl_0": "https://my-bucket/.../study1-series1-file.dcm"
    },
    {
      "title": "<series-2>",
      "objectUrl_0": "https://my-bucket/.../study1-series2-file1.dcm",
      "objectUrl_1": "https://my-bucket/.../study1-series2-file2.dcm",
      "objectUrl_2": "https://my-bucket/.../study1-series2-file3.dcm",
    },
      {
      "title": "<series-3>",
      "objectUrl_0": "https://my-bucket/.../study1-series3-file1.dcm",
      "objectUrl_1": "https://my-bucket/.../study1-series3-file2.dcm",
    }
  ]
}


NIfTI

  • Each series requires a title and at least one object URL.
  • If skip_duplicate_urls is set to true, all object URLs exactly matching existing NIfTI files in the Dataset are skipped.
Key or FlagRequired?Default value
"objectUrl"Yes
"title"Yes<file title>
"clientMetadata"No
"skip_duplicate_urls"Nofalse
"createVideo"Yesfalse

The following is an example JSON file for uploading two NIfTI files to Encord.

{
  "nifti_files": [
    {
      "title": "<file-1>",
      "objectUrl_1": "https://my-bucket/.../nifti-file1.nii"
    },
    {
      "title": "<file-2>",
      "objectUrl_0": "https://my-bucket/.../nifti-file2.nii.gz",
    }
  ]
}


Multiple file types

You can upload multiple file types using a single JSON file. The example below shows 1 image, 2 videos, 2 image sequences, and 1 image group.

ℹ️

Note

Keys / Flags that are not required can be omitted from the JSON file entirely.


{
  "images": [
    {
      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Image1.png"
    }
  ],
  "videos": [
    {
      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Cooking.mp4"
    },
    {
      "objectUrl": "https://cord-dev.s3.eu-west-2.amazonaws.com/Oranges.mp4"
    }
  ],
  "image_groups": [
    {
      "title": "apple-samsung-light",
      "createVideo": true,
      "objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(32).jpg",
      "objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(33).jpg",
      "objectUrl_2": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(34).jpg",
      "objectUrl_3": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/1-Samsung-S4-Light+Environment/1+(35).jpg"
    },
    {
      "title": "apple-samsung-dark",
      "createVideo": true,
      "objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(32).jpg",
      "objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(33).jpg",
      "objectUrl_2": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(34).jpg",
      "objectUrl_3": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/2-samsung-S4-Dark+Environment/2+(35).jpg"
    }
  ],
  "image_groups": [
    {
      "title": "apple-ios-light",
      "createVideo": false,
      "objectUrl_0": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(32).jpg",
      "objectUrl_1": "https://cord-dev.s3.eu-west-2.amazonaws.com/food-dataset/Apple/3-IOS-4-Light+Environment/3+(33).jpg"
    }
  ]
}


Client metadata & skip duplicate URLs

You can optionally add some custom client metadata per data item in the clientMetadata field (examples below show how this is done). Client metadata is separate from video metadata, and is intended as an arbitrary store of data you would like to associate with any particular file.

We enforce a 10MB limit on the client metadata per data item. Also, this metadata is being stored as a PostgreSQL jsonb type internally. Read the relevant PostgreSQL docs about the jsonb type and its behaviors. For example, jsonb type will not preserve key order or duplicate keys.

Add the "skip_duplicate_urls": true flag at the top level to make the uploads idempotent. Skipping URLs in the dataset can help speed up large upload operations. Since previously processed assets don't have to be uploaded again, you can simply retry the failed operation without editing the upload specification file. The flag's default value isfalse.

ℹ️

Note

These features are currently only supported for JSON uploads.

When using a Multi-Region Access Point

When using a Multi-Region Access Point for your AWS S3 buckets, objects are specified using the ARN of the Multi-Region Access Point followed by the object name. The following example shows how video files from a Multi-Region Access Point would be specified.

👍

Tip

We provide a number of scripts to create a JSON file for uploading cloud data here. The AWS example includes a multi-region access point.

{
  "videos": [
    {
      "objectUrl": "Multi-Region-Access-Point-ARN + <object name_1>"
    },
    {
      "objectUrl": "Multi-Region-Access-Point-ARN + <object name_2>",
      "title": "my-custom-video-title.mp4",
      "clientMetadata": {"optional": "metadata"}
    }
  ],
  "skip_duplicate_urls": true
}

CSV format

In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.

❗️

CRITICAL INFORMATION

Encord supports up to 10,000 entries in the CSV file when uploading data to Encord.

🚧

Caution

  • Object URLs cannot contain whitespace.
  • For backwards compatibility reasons, a single column CSV is supported. A file with the single ObjectUrl column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: "Expected a video, got a file of type XXX".
Videos

Videos

A CSV file containing videos should contain two columns with the following mandatory column headings:
'ObjectURL' and 'Video title'. All headings are case-insensitive.

  • The 'ObjectURL' column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the video resource.

  • The 'Video title' column containing the video_title. If left blank, the original file name is used.

In the example below files 1, 2 and 4 are assigned the names in the title column, while file 3 keeps its original file name.

ObjectUrlVideo title
https://storage/frame1.mp4Video 1
https://storage/frame2.mp4Video 2
https://storage/frame3.mp4
https://storage/frame4.mp4Video 3
Single images

A CSV file containing single images MUST contain two columns with the following mandatory headings:
'ObjectURL' and 'Image title'. All headings are case-insensitive.

  • The 'ObjectURL' column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the image resource.

  • The 'Image title' column containing the image_title. If left blank, the original file name is used.

In the following example files 1, 2 and 4 are assigned the names in the title column, while file 3 keeps its original file name.

ObjectUrlImage title
https://storage/frame1.jpgImage 1
https://storage/frame2.jpgImage 2
https://storage/frame3.jpg
https://storage/frame4.jpgImage 3
Image groups

Image groups

A CSV file containing image groups MUST contain three columns with the following mandatory headings:
'ObjectURL', 'Image group title', and 'Create video'. All three headings are case-insensitive.

  • The 'ObjectURL' column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.

  • The 'Image group title' column containing the image_group_title. This field is mandatory, as it determines which image group a file will be assigned to.

In the following example the first two URLs are grouped together into 'Group 1', while the following two files are grouped together into 'Group 2'.

ObjectUrlImage group titleCreate video
https://storage/frame1.jpgGroup 1false
https://storage/frame2.jpgGroup 1false
https://storage/frame3.jpgGroup 2false
https://storage/frame4.jpgGroup 2false

ℹ️

Note

Image groups do not require 'write' permissions.

Image sequences

Image sequences

A CSV file containing image sequences MUST contain three columns with the following mandatory headings: 'ObjectURL', 'Image group title', and 'Create video'. All three headings are case-insensitive.

  • The 'ObjectURL' column containing the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.

  • The 'Image group title' column containing the image_group_title. This field is mandatory, as it determines which image sequence a file will be assigned to. The dimensions of the image sequence are determined by the first file in the sequence.

  • The 'Create video' column. This can be left blank, as the default value is 'true'.

In the example below the first two URLs are grouped together into 'Sequence 1', while the second two files are grouped together into 'Sequence 2'.

ObjectUrlImage group titleCreate video
https://storage/frame1.jpgSequence 1true
https://storage/frame2.jpgSequence 1true
https://storage/frame3.jpgSequence 2true
https://storage/frame4.jpgSequence 2true

👍

Tip

Image groups and image sequences are only distinguished by the presence of the 'Create video' column.

ℹ️

Note

Image sequences require 'write' permissions against your storage bucket to save the compressed video.

DICOM

A CSV file containing DICOM files MUST contain two columns with the following headings: 'ObjectURL' and 'Series title'. Both headings are case-insensitive.

  • The 'ObjectURL' column contains the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.

  • The 'Series title' column contains the dicom_title. When two files are given the same title they are grouped into the same DICOM series. If left blank, the original file name is used.

In the following example the first two files are grouped into 'dicom series 1', the next two files are grouped into 'dicom series 2', while the final file will remain separated as 'dicom series 3'.

ObjectUrlSeries title
https://storage/frame1.dcmdicom series 1
https://storage/frame2.dcmdicom series 1
https://storage/frame3.dcmdicom series 2
https://storage/frame4.dcmdicom series 2
https://storage/frame5.dcmdicom series 3
NIfTI

A CSV file containing NIfTI files MUST contain two columns with the following headings: 'ObjectURL' and 'NIfTI title'. Both headings are case-insensitive.

  • The 'ObjectURL' column contains the objectUrl. This field is mandatory for each file, as it specifies the full URL of the resource.

  • The 'NIfTI title' column contains the title of the Nifti file. If left blank, the original file name is used.

The following example shows how to format the CSV file to upload two NIfTI files to Encord.

ObjectUrlNIfTI title
https://storage/niftifile1.nii.gzBrain Image 1
https://storage/niftifile2.niiBrain Image 2
Multiple file types

Multiple file types

You can upload multiple file types with a single CSV file by using a new header each time there is a change of file type. Three headings will be required if image sequences are included.

🚧

Caution

Since the 'Create video' column defaults to "true" all files that aren't image sequences have to contain the value "false"

The following example shows a CSV file for the following:

  • Two image sequences composed of 2 files each.
  • One image group composed of 2 files.
  • One single image.
  • One video.
ObjectUrlImage group titleCreate video
https://storage/frame1.jpgSequence 1true
https://storage/frame2.jpgSequence 1true
https://storage/frame3.jpgSequence 2true
https://storage/frame4.jpgSequence 2true
https://storage/frame5.jpgGroup 1false
https://storage/frame6.jpgGroup 1false
ObjectUrlImage titleCreate video
https://storage/frame1.jpgImage 1false
ObjectUrlImage titleCreate video
https://storage/video.mp4Video 1false

Helpful Scripts and Examples

Use the following scripts to create JSON and CSV files to upload your cloud data to Encord.

AWS S3
Follow these steps to create a JSON file for videos or images by constructing URLs to files in a specific S3 bucket.
  1. Get an AWS Access Key

    • Sign in to the AWS Management Console.
    • Navigate to the IAM (Identity and Access Management) service.
    • Create a new IAM user (or select an existing user).
    • Assign the necessary permissions (e.g., S3 access) to the user.
    • Generate an access key for the user, which includes an AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
  2. Specify the Credentials in a Local File

    • Create a file named credentials.env in a secure location on your local machine.
    • Add the following content to the credentials.env file, replacing YOUR_ACCESS_KEY_ID and YOUR_SECRET_ACCESS_KEY with your actual AWS access key values:
      export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
      export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
      
  3. Source the Credentials File in Your Shell

    • Open a terminal window.
    • Navigate to the directory where your credentials.env file is located.
    • Source the credentials file to set the environment variables:
      source credentials.env
      
  4. Run the Script

    • With the environment variables set, you can now run the script below.
    • Execute the script:
      python your_script_name.py
      

The following python script creates a JSON file for videos or images by constructing URLs to files in a specific S3 bucket. Ensure that you:

  • Replace <bucket-region> with the AWS bucket region your bucket is located.
  • Replace <aws-profile> with the name of the profile in the AWS ~/.aws/credentials file. See AWS Credentials Documentation for information on setting up your credentials file.
  • Replace <s3-bucket-name> with the name of the S3 bucket you want to upload files from.
  • Replace <s3-directory> with the path to the directory where your files are stored inside the S3 bucket. Include all slashes except for the final slash. For example the file my-bucket/some_top_level_dir/video_files/my_video.mp4 is in the S3 directory some_top_level_dir/video_files.
  • Replace <data-modality> with the modality of the files you want to upload. This can only be videos or images.
  • (If using a Multi-region access point) replace <global-access-point> with the ARN of the multi-region access point.
import boto3
import json
from botocore.config import Config

REGION = "<bucket-region>"
AWS_PROFILE = "<aws-profile>"
BUCKET_NAME = "<s3-bucket-name>"
S3_DIRECTORY = "<s3-directory>"
DATA_MODALITY = "<data-modality>"
GLOBAL_ENDPOINT = "<global-access-point>"  # Optional, set to None if not using

# AWS S3 domain and root URL
DOMAIN = f's3.{REGION}.amazonaws.com'
ROOT_URL = GLOBAL_ENDPOINT if GLOBAL_ENDPOINT else f'https://{DOMAIN}/{BUCKET_NAME}'

# AWS session and S3 resource
session = boto3.Session(profile_name=AWS_PROFILE)
s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET_NAME)

# Function to generate JSON upload specification
def generate_upload_spec(bucket_name, s3_directory, data_modality, root_url):
    files = []
    for object_summary in bucket.objects.all():
        key_split = object_summary.key.split('/')
        key_path = "/".join(key_split[:-1])

        if key_path == s3_directory:
            object_url = f'{root_url}/{object_summary.key}'
            files.append({'objectUrl': object_url})

    # Create the JSON structure based on data modality
    outer_json_dict = {data_modality: files}

    # Write the JSON to a file
    output_filename = f'{bucket_name}-{s3_directory.replace("/", "_")}.json'
    with open(output_filename, 'w') as output_file:
        json.dump(outer_json_dict, output_file, indent=4)
    
    print(f'JSON upload specification file created: {output_filename}')

# Run the function with provided configuration
generate_upload_spec(BUCKET_NAME, S3_DIRECTORY, DATA_MODALITY, ROOT_URL)
Azure blob
{
    "videos": [
        {
            "objectUrl": "https://myaccount.blob.core.windows.net/myblob"
        },
        {
            "objectUrl": "https://myaccount.blob.core.windows.net/mycontainer/myblob.jpg"
        },
        {
            "objectUrl": "https://myaccount.blob.core.windows.net/mycontainer/myblobs/myblob.jpg"
        }
    ],
    "image_groups": [
      {
        "title": "image_group_1",
        "objectUrl_0": "https://myaccount.blob.core.windows.net/mycontainer/myblob1.jpg",
        "objectUrl_1": "https://myaccount.blob.core.windows.net/mycontainer/myblob2.jpg"
      },
      {
        "title": "image_group2",
        "objectUrl_0": "https://myaccount.blob.core.windows.net/mycontainer/myblob3.jpg",
        "objectUrl_1": "https://myaccount.blob.core.windows.net/mycontainer/myblob4.jpg"
      }
    ]
}
GCP storage
{
    "videos": [
        {
            "objectUrl": "gs://example-url/object.mp4"
        }
    ],
    "image_groups": [
      {
        "title": "image_group_1",
        "objectUrl_0": "https://storage.cloud.google.com/example-image-bucket/object_1.jpg",
        "objectUrl_1": "https://storage.cloud.google.com/example-image-bucket/object_2.jpg"
        
      },
      {
        "title": "image_group_2",
        "objectUrl_0": "https://storage.cloud.google.com/example-image-bucket/object_3.jpg",
        "objectUrl_1": "https://storage.cloud.google.com/example-image-bucket/object_4.jpg"
      }
    ]
}
Open Telekom Cloud OSS
{
  "dicom_series": [
    {
      "title": "OPEN_TELEKOM_DICOM_SERIES",
      "objectUrl_0": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-0",
      "objectUrl_1": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-1",
      "objectUrl_2": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-2",
      "objectUrl_3": "https://bucket-name.obs.eu-de.otc.t-systems.com/dicom-file-3"
    }
  ]
}