Add local data to Datasets

All Datasets are identified using a unique ID called a <dataset_hash>, which can be found in the Encord platform.

👍

Tip

To learn how to import cloud data into Encord, see our documentation here.


Upload private cloud data

All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the exact same way.

Use the script below to upload your private cloud data to a specified Dataset.

👍

Tip

If the following script returns "Upload is still in progress, try again later!", check the upload status at a later time.


# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus

# Instantiate user client. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")

# Specify the dataset you want to upload data to by replacing <dataset_hash> with the dataset hash
dataset = user_client.get_dataset("<dataset_hash>")

# Specify the integration you want to upload data to by replacing <integration_title> with the integration title
integrations = user_client.get_cloud_integrations()
integration_idx = [i.title for i in integrations].index("<integration_title>")
integration = integrations[integration_idx].id

# Initiate cloud data upload. Replace path/to/json/file.json with the path to your JSON file
upload_job_id = dataset.add_private_data_to_dataset_start(
    integration, "path/to/json/file.json"
)

# timeout_seconds determines how long the code will wait after initiating upload until continuing and checking upload status
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")


if res.status == LongPollingStatus.PENDING:
    print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
    print("Upload completed without errors")
else:
    print(f"Errors: {res.errors}")
add_private_data_to_dataset job started with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
SDK process can be terminated, this will not affect successful job execution.
You can follow the progress in the web app via notifications.
add_private_data_to_dataset job completed with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
Execution result: DatasetDataLongPolling(status=<LongPollingStatus.DONE: 'DONE'>, data_hashes_with_titles=[DatasetDataInfo(data_hash='cd42333d-8014-46q7-837b-5bf68b9b5', title='funny_image.jpg')], errors=[], units_pending_count=0, units_done_count=1, units_error_count=0)
Upload completed without errors

Check data upload

If the code returns "Upload is still in progress, try again later!", run the following code to query the Encord server again. Replace upload_job_id with the output by the previous code. In the example above upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.

# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus

# Instantiate user client
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="/Users/encord/.ssh/new-key-db-private-key.txt")

# Check upload status
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")

if res.status == LongPollingStatus.PENDING:
    print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
    print("Upload completed without errors")
else:
    print(f"Errors: {res.errors}")

👍

Tip

Omitting the timeout_seconds argument from the add_private_data_to_dataset_get_result() method performs status checks until the status upload has finished.


Local data

Uploading videos

Use the upload_video() method to upload a video to a Dataset specified using the <dataset_hash>.


# Import dependencies
from encord import Dataset, EncordUserClient

# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify the Dataset you want to upload your video(s) to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
    "<dataset_hash>"
    )

# Upload the video to the Dataset by specifying the file path to the video
dataset.upload_video(
    "path/to/your/video.mp4"
    )


Uploading single images

Use the upload_image() method to upload a single image to a dataset specified using the <dataset_hash>.


# Import dependencies
from encord import Dataset, EncordUserClient

# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify the Dataset you want to upload your images to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
    "<dataset_hash>"
    )

# Upload the image to the Dataset by specifying the file path to the image
dataset.upload_image(
    "path/to/your/image.jpeg"
    )


Uploading image groups & image sequences

👍

Tip

Confused about the difference between image groups and image sequences? Click here to learn more!

Use the create_image_group() method to combine images into image groups and image sequences, and add it to a Dataset.


Image groups

Image groups are created using the create_image_group() method with create_video=False as an argument. Specify the file paths of each image you want to include in the image group in the script below.

👍

Tip

Images in an image group will be assigned a data_sequence number, which is based on the order or the files listed in the argument to create_image_group(). If the ordering is important to you, make sure that your filenames are listed in the correct order.


# Import dependencies
from encord import Dataset, EncordUserClient

# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify the Dataset you want to upload your image group to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
    "<dataset_hash>"
    )

# Create the image group. Include the paths of all images that are to be included in the image group. 
# The create_video flag must to be set to False
dataset.create_image_group(
    [
        "path/to/your/img1.jpeg",
        "path/to/your/img2.jpeg",
    ],
    create_video=False
)


Image sequences

Image sequences are created using the create_image_group() method. Image sequences can only be composed of images that have the same dimensions. Images with different dimensions are made into separate image sequences. Learn more about image sequences here.

ℹ️

Note

create_video is set to True by default and can therefore be omitted when creating an image sequence.

👍

Tip

Learn the difference between image groups and image sequences here.


# Import dependencies
from encord import EncordUserClient

# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify the Dataset you want to upload your image sequence to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
    "<dataset_hash>"
    )

# Create the image sequence. Include the paths of all images that are to be included in the image sequence. 
# The create_video flag must to be set to False
dataset.create_image_group(
    [
        "path/to/your/img1.jpeg",
        "path/to/your/img2.jpeg",
    ],
    create_video=True
)

ℹ️

Note

Image sequences are composed of images with the same resolution. If img1.jpeg and img2.jpeg are of shape [1920, 1080] and [1280, 720], respectively, each ends up in their own image sequence.


Uploading DICOM series

In the following script, replace path/to/your/dicom-img1.jpeg and the other example file paths with the paths to the files you want to include in your DICOM series.

# Import dependencies
from encord import Dataset, EncordUserClient

# Authenticate with Encord. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify the Dataset you want to upload your DICOM files to. Replace <dataset_hash> with the hash of your Dataset
dataset = user_client.get_dataset(
    "<dataset_hash>"
    )

# Add a DICOM series to the Dataset by specifying the file path to all files to include.
dataset.create_dicom_series(
    [
        "path/to/your/dicom-img1.jpeg",
        "path/to/your/dicom-img2.jpeg",
        "path/to/your/dicom-img3.jpeg"
    ]
)

Reading and updating data

To inspect data within a dataset use the .data_rows() property in the Dataset class. .data_rows() returns a list of DataRows. Check our documentation for the DataRow class for information on which fields can be accessed and updated.