Each dataset is identified using a unique "<dataset_hash>"
- a unique ID that can be found within a dataset's URL in the Encord web-app.

You can add data to datasets in two ways, depending on where your data is stored. You can either:
- Add data to Encord-hosted storage, and add it to a dataset from there.
- Add data hosted on a private cloud server directly to a dataset.
Uploading data to Encord-hosted storage
As mentioned above, the <dataset_hash>
specifies which dataset data will be added to in all code samples below. The following file types can be added:
Uploading videos
Use the upload_video() method to upload a video to a dataset specified using the <dataset_hash>
from Encord storage.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to upload your video(s) to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Upload the video to the dataset by specifying the file path to the video
dataset.upload_video("path/to/your/video.mp4")
Uploading single images
This section will show how to upload single images
Use the upload_image() method to upload a single image to a dataset specified using the <dataset_hash>
from Encord storage.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to upload your image(s) to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Upload the image to the dataset by specifying the file path to the image
dataset.upload_image("path/to/your/image.jpeg")
Uploading image groups & image sequences
Tip
Confused about the difference between image groups and image sequences? Click here to learn more!
Use the create_image_group() method to combine images into image groups and image sequences, and add it to a dataset specified using Encord storage.
Image groups
Image groups are created using the create_image_group()
method method with create_video=False
as an argument. as shown below.
# Import dependencies
from encord import Dataset, EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to upload your image group(s) to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Create the image group from images specified by the file paths. The create_video flag should to be set to True
dataset.create_image_group(
[
"path/to/your/img1.jpeg",
"path/to/your/img2.jpeg",
],
create_video=False
)
Note
Images in an image group will be assigned a
data_sequence number
, which is based on the order or the files listed in the argument tocreate_image_group()
above. If the ordering is important, make sure to provide a list with filenames in the correct order.
Image sequences
Image sequences are created using the create_image_group()
method.
Note
create_video
is set toTrue
by default and can therefore be omitted when creating an image sequence.
Tip
Learn the difference between image groups and image sequences here.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to add the image sequence to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Create the image sequence from images specified by the file paths. The create_video flag should to be set to True
dataset.create_image_group(
[
"path/to/your/img1.jpeg",
"path/to/your/img2.jpeg",
],
create_video=True
)
Note
Image sequences are composed of images with the same resolution, so if img1.jpeg and img2.jpeg from the example above are of shape [1920, 1080] and [1280, 720], respectively, they will end up in each of their own image sequence.
Adding data from private cloud storage
Adding private cloud storage data to a dataset involves the following three steps:
- Obtaining a list of all available cloud integrations.
- Get your integration's ID.
- Upload your cloud data to the dataset.
Tip
If you don't need a full tutorial, check out our SDK recipe for uploading data from a cloud server.
- Retrieve a list of available cloud integrations using the user_client.get_cloud_integrations() method.
Tip
Data integrations have to be created in the Encord web-app. See our documentation here to learn how to do so.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
#This part will print a list of integrations available to you.
integrations = user_client.get_cloud_integrations()
print("Integration Options:")
print(integrations)
"Integration Options:"
id: "integration-id-1"
title: "example-aws-integration"
id: "integration-id-2"
title: "example-gcp-integration"
id: "integration-id-3"
title: "example-otc-integration"
- To obtain your integration's ID, replace EXAMPLE-TITLE in the code below with the name of the integration (obtained in step 1) you wish to use for your data upload.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Select the integration with the title EXAMPLE-TITLE, and assign its ID to the variable 'integration'
integrations = user_client.get_cloud_integrations()
integration_idx = [i.title for i in integrations].index("EXAMPLE-TITLE")
integration = integrations[integration_idx].id
- The final step is uploading data from your private cloud, by substituting MY-INTEGRATION-ID in the code samples below with the integration ID you obtained in step 2.
Note
A JSON file specifying which files from your private cloud should be uploaded is required in this step. Click here to see an example JSON for each file type, and to learn more.
There are two methods of uploading your data:
-
Method A: Useful when testing data uploads, or dealing with a small number of files. Once you start this process the upload status will be constantly queried, and you won't be able to do anything else until it finishes running.
-
Method B: Useful when dealing with large numbers of files, or large files. This method only initializes the upload job, allowing you to use your computer for other purposes while the data upload runs in the background. You'll have to manually query our servers for status updates for your data upload.
Uploading your data - Method A
Once you start this process the upload status will be constantly queried, and you won't be able to do anything else until it finishes running.
In the sample code below:
- Replace MY-INTEGRATION-ID with the integration ID you obtained in step 2.
- Replace path/to/json/file.json with the path to your JSON file.
Tip
The dataset hash can be found within the URL once a dataset has been selected:
app.encord.com/projects/view/<dataset_hash>/summary
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to add your data to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Specify the integration by substituting the integration ID
integration = "MY-INTEGRATION-ID"
# The lines below will start the upload job and await for it to finish - an exception will be raised when errors occur
response = dataset.add_private_data_to_dataset(
integration, "path/to/json/file.json"
)
Uploading your data - Method B
This method only initializes the upload job, allowing you to use your computer for other purposes while the data upload runs in the background. You'll have to manually query our servers for status updates for your data upload.
In the sample code below:
- Replace MY-INTEGRATION-ID with the integration ID you obtained in step 2.
- Replace path/to/json/file.json with the path to your JSON file.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to add your data to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Specify the integration by substituting the integration ID
integration = "MY-INTEGRATION-ID"
# Start the data upload and define the upload job's ID
upload_job_id = dataset.add_private_data_to_dataset_start(
integration, "path/to/json/file.json"
)
The add_private_data_to_dataset_get_result() method used in the following way will perform one quick status check.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to add your data to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Perform a status check on the upload job by specifying the upload job ID
print(
dataset.add_private_data_to_dataset_get_result(
"<upload_job_id>",
timeout_seconds=0,
)
)
status: "upload-status"
data_hashes_with_titles: "data-hashes"
errors: "will-display-any-errors"
units_pending_count: "number-of-files-pending-upload"
units_done_count: "number-of-files-uploaded"
units_error_count: "number-of-errors"
Omitting the "timeout_seconds=0" argument from the add_private_data_to_dataset_get_result() method performs status checks until the status upload has finished.
The output will be a list of data hashes and titles for successful data uploads.
# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to add your data to using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# using add_private_data_to_dataset_get_result without the timeout_seconds flag will await for job to finish
res = dataset.add_private_data_to_dataset_get_result(upload_job_id)
if res.status == LongPollingStatus.DONE:
upload_result=res.data_hashes_with_titles
else:
raise Exception(res.errors) # one can specify custom error handling
### prints data hash and title of successful uploads.
print(upload_result)
data_hashes_with_titles: "data-hashes"
Reading and updating data
To inspect data within a dataset use the Dataset.data_rows() method. You will get a list of DataRows. Check our documentation for the DataRow class for information on which fields can be accessed and updated.
Deleting data from a dataset
Note
While it's possible to delete data from a dataset, entire datasets can't be deleted via the SDK or the API as this is a significant and irreversible operation. Please use our web-app to delete a dataset.
Use the dataset.delete_data() method to delete from a dataset.
In the code sample below replace <video1_data_hash> and <image_group1_data_hash> with the hashes for the data units you would like to remove from the dataset. If the data unit being removed is saved on Encord-hosted storage, the corresponding file will be deleted.
Note
Please ensure that the list contains videos/image groups from the same dataset which is used to initialize the dataset. Any videos or image groups which do not belong to the dataset used for initialization will be ignored.
# Import dependencies
from encord import EncordUserClient
# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to delete data from by using its dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Specify the which files are to be deleted by including the data hashes in the list
dataset.delete_data(
[
"<video1_data_hash>",
"<image_group1_data_hash>",
]
)