Using cloud storage data in Encord is a multi-step process:
Set up your cloud storage so Encord can access your data
Create a cloud storage integration on Encord to link to your cloud storage
Create a JSON or CSV file to import your data
Create a Dataset
Perform the registration using the JSON or CSV file
Before you can do anything with the Encord platform and cloud storage, you need to configure your cloud storage to work with Encord. Once the integration between Encord and your cloud storage is complete, you can then use your data in Encord.
In order to integrate with Open Telecom Cloud, you need to:
On the Encord platform enter the Access key ID
and Secret access key
, which should be located in the access key file, generated with the creation of the user. (if the access key has been misplaced, a new one can be created from the IAM User menu).
Optionally check the box to enable Strict client-only access, server-side media features will not be available if you would like Encord to sign URLs, but refrain from downloading any media files onto Encord servers. Read more about this feature here.
Finally, click the Create button at the bottom of the pop-up. The integration will now appear in the list of integrations in the ‘Integrations’ tab.
All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the same way, by using a JSON or CSV file. The file includes links to all images, image groups, videos and DICOM files in your cloud storage.
Encord enforces the following upload limits for each JSON file used for file registration:
Optimal upload chunking can vary depending on your data type and the amount of associated metadata. For tailored recommendations, contact Encord support. We recommend starting with smaller uploads and gradually increasing the size based on how quickly jobs are processed. Generally, smaller chunks result in faster data reflection within the platform.
clientMetadata
) to specify key frames, custom metadata, and custom embeddings. For more information go here or here for information on using the SDK.For detailed information about the JSON file format used for import go here.
The information provided about each of the following data types is designed to get you up and running as quickly as possible without going too deeply into the why or how. Look at the template for each data type, then the examples, and adjust the examples to suit your needs.
skip_duplicate_urls
is set to true
, all object URLs that exactly match existing images/videos in the dataset are skipped.Audio files
The following is an example JSON file for uploading two audio files to Encord.
audiometadata
flag. When the audiometadata
flag is present in the JSON file, we directly use the supplied metadata without performing any additional validation, and do not store the file on our servers. To guarantee accurate labels, it is crucial that the metadata you provide is accurate.Text Files
The following is an example JSON file for uploading text files to Encord.
Single images
For detailed information about the JSON file format used for import go here.
The JSON structure for single images parallels that of videos.
Template: Provides the proper JSON format to import images into Encord.
Examples:
Data Imports the images only.
Image Metadata: Imports images with image metadata. This improves the import speed for your images.
Image groups
For detailed information about the JSON file format used for import go here.
skip_duplicate_urls
is set to true
, all URLs exactly matching existing image groups in the dataset are skipped.objectUrl_{position_number}
).Template: Provides the proper JSON format to import image groups into Encord.
Examples:
Image sequences
For detailed information about the JSON file format used for import go here.
image_groups
array with the createVideo
flag set to true
represents a single image sequence.skip_duplicate_urls
is set to true
, all URLs exactly matching existing image sequences in the dataset are skipped.createVideo
flag to be set to true
. Both use the key image_groups
.objectUrl_{position_number}
).Template: Provides the proper JSON format to import image groups into Encord.
** Examples:**
DICOM
For detailed information about the JSON file format used for import go here.
dicom_series
element can contain one or more DICOM series.skip_duplicate_urls
is set to true
, all object URLs exactly matching existing DICOM files in the dataset will be skipped..dcm
file and does not have to be specific during the upload to Encord. The following is an example JSON for uploading three DICOM series belonging to a study. Each title and object URL correspond to individual DICOM series.
For each DICOM upload, an additional DicomSeries
file is created. This file represents the series file-set. Only DicomSeries
are displayed in the Encord application.
Multiple file types
You can upload multiple file types using a single JSON file. The example below shows 1 image, 2 videos, 2 image sequences, and 1 image group.
In the CSV file format, the column headers specify which type of data is being uploaded. You can add and single file format at a time, or combine multiple data types in a single CSV file.
Details for each data format are given in the sections below.
ObjectUrl
column is interpreted as a request for video upload. If your objects are of a different type (for example, images), this error displays: “Expected a video, got a file of type XXX”.Videos
A CSV file containing videos should contain two columns with the following mandatory column headings:
‘ObjectURL’ and ‘Video title’. All headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl
. This field is mandatory for each file, as it specifies the full URL of the video resource.
The ‘Video title’ column containing the video_title
. If left blank, the original file name is used.
In the example below files 1, 2 and 4 will be assigned the names in the title column, while file 3 will keep its original file name.
ObjectUrl | Video title |
---|---|
path/to/storage-location/frame1.mp4 | Video 1 |
path/to/storage-location/frame2.mp4 | Video 2 |
path/to/storage-location/frame3.mp4 | |
path/to/storage-location/frame4.mp4 | Video 3 |
Single images
A CSV file containing single images should contain two columns with the following mandatory headings:
‘ObjectURL’ and ‘Image title’. All headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl
. This field is mandatory for each file, as it specifies the full URL of the image resource.
The ‘Image title’ column containing the image_title
. If left blank, the original file name is used.
In the example below files 1, 2 and 4 will be assigned the names in the title column, while file 3 will keep its original file name.
ObjectUrl | Image title |
---|---|
path/to/storage-location/frame1.jpg | Image 1 |
path/to/storage-location/frame2.jpg | Image 2 |
path/to/storage-location/frame3.jpg | |
path/to/storage-location/frame4.jpg | Image 3 |
Image groups
A CSV file containing image groups should contain three columns with the following mandatory headings:
‘ObjectURL’, ‘Image group title’, and ‘Create video’. All three headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl
. This field is mandatory for each file, as it specifies the full URL of the resource.
The ‘Image group title’ column containing the image_group_title
. This field is mandatory, as it determines which image group a file will be assigned to.
In the example below the first two URLs are grouped together into ‘Group 1’, while the following two files are grouped together into ‘Group 2’.
ObjectUrl | Image group title | Create video |
---|---|---|
path/to/storage-location/frame1.jpg | Group 1 | false |
path/to/storage-location/frame2.jpg | Group 1 | false |
path/to/storage-location/frame3.jpg | Group 2 | false |
path/to/storage-location/frame4.jpg | Group 2 | false |
Image sequences
A CSV file containing image sequences should contain three columns with the following mandatory headings: ‘ObjectURL’, ‘Image group title’, and ‘Create video’. All three headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl
. This field is mandatory for each file, as it specifies the full URL of the resource.
The ‘Image group title’ column containing the image_group_title
. This field is mandatory, as it determines which image sequence a file will be assigned to. The dimensions of the image sequence are determined by the first file in the sequence.
The ‘Create video’ column. This can be left blank, as the default value is ‘true’.
In the example below the first two URLs are grouped together into ‘Sequence 1’, while the second two files are grouped together into ‘Sequence 2’.
ObjectUrl | Image group title | Create video |
---|---|---|
path/to/storage-location/frame1.jpg | Sequence 1 | true |
path/to/storage-location/frame2.jpg | Sequence 1 | true |
path/to/storage-location/frame3.jpg | Sequence 2 | true |
path/to/storage-location/frame4.jpg | Sequence 2 | true |
DICOM
A CSV file containing DICOM files should contain two columns with the following mandatory headings: ‘ObjectURL’ and ‘Dicom title’. Both headings are case-insensitive.
The ‘ObjectURL’ column containing the objectUrl
. This field is mandatory for each file, as it specifies the full URL of the resource.
The ‘Series title’ column containing the dicom_title
. When two files are given the same title they are grouped into the same DICOM series. If left blank, the original file name is used.
In the example below the first two files are grouped into ‘dicom series 1’, the next two files are grouped into ‘dicom series 2’, while the final file will remain separated as ‘dicom series 3’.
ObjectUrl | Series title |
---|---|
path/to/storage-location/frame1.dcm | dicom series 1 |
path/to/storage-location/frame2.dcm | dicom series 1 |
path/to/storage-location/frame3.dcm | dicom series 2 |
path/to/storage-location/frame4.dcm | dicom series 2 |
path/to/storage-location/frame5.dcm | dicom series 3 |
Multiple file types
You can upload multiple file types with a single CSV file by using a new header each time there is a change of file type. Three headings will be required if image sequences are included.
true
all files that are not image sequences must contain the value false
The example below shows a CSV file for the following:
ObjectUrl | Image group title | Create video |
---|---|---|
path/to/storage-location/frame1.jpg | Sequence 1 | true |
path/to/storage-location/frame2.jpg | Sequence 1 | true |
path/to/storage-location/frame3.jpg | Sequence 2 | true |
path/to/storage-location/frame4.jpg | Sequence 2 | true |
path/to/storage-location/frame5.jpg | Group 1 | false |
path/to/storage-location/frame6.jpg | Group 1 | false |
ObjectUrl | Image title | Create video |
path/to/storage-location/frame1.jpg | Image 1 | false |
ObjectUrl | Image title | Create video |
full/storage/path/video.mp4 | Video 1 | false |
To use your data in Encord, it must be uploaded to the Encord Files storage. Once uploaded, your data can be reused across multiple Projects and contain no labels or annotations themselves. Files stores your data, while Projects store your labels. The following script creates a folder in Files and uses your AWS integration to register data in that folder.
The following script creates a new folder in Files and initiates uploads from AWS. It works for all file types.
Upload is still in progress, try again later!
is returned, use the
script to check the upload status to see whether the upload has finished.Ensure that you:
<private_key_path>
with the path to your private key.<integration_title>
with the title of the integration you want to use.<folder_name>
with the folder name. The scripts assume that the specified folder name is unique.path/to/json/file.json
with the path to a JSON file specifying which cloud storage files should be uploaded.A folder to store my files
with a meaningful description for your folder."my": "folder_metadata"
with any metadata you want to add to the folder.The script has several possible outputs:
If Step 5 returns "Upload is still in progress, try again later!"
, run the following code to query the Encord server again. Ensure that you replace <upload_job_id>
with the output by the previous code. In the example above upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727
.
The script has several possible outputs:
“Upload is still in progress, try again later!”: The registration has not finished. Run this script again later to check if the data registration has finished.
“Upload completed”: The registration completed. If any files failed to upload, the URLs are listed.
“Upload failed”: The entire registration failed, and not just individual files. Ensure your JSON file is formatted correctly.
timeout_seconds
argument from the
add_private_data_to_dataset_get_result() method performs status checks until the status upload has finished.The following example creates a Dataset called “Houses” that expects data hosted on OTC.
<private_key_path>
with the file path for your private key.Now that you registered your data and created a Dataset, it is time to add your files to the Dataset. The following scripts add all files in a specified folder to a Dataset.
<private_key_path>
with the path to your private key.<folder_name>
with the name you want to give your Storage folder.<dataset_hash>
with the hash of the Dataset you want to add the data units to.After adding your files to the Dataset, verify that all the files you expect to be there made it into the Dataset.
The following script prints the URLs of all the files in a Dataset. Ensure that you:
<private_key_path>
with the path to your private key.<dataset_hash>
with the hash of your Dataset.