Create Datasets

Creating a Dataset and adding files to a Dataset are two distinct steps. Click here to learn how to add data to an existing Dataset.
Datasets cannot be deleted using the SDK or the API. Use the Encord platform to delete Datasets.

The following example creates a Dataset called “Houses” that expects data hosted on AWS S3.

  • Substitute <private_key_path> with the file path for your private key.
  • Replace “Houses” with the name you want your Dataset to have.
Storage locationStorageLocation method argumentRepresented by
AWS S3AWS1
GCPGCP2
Azure blobAZURE3
Open Telekom CloudOTC4
Encord storageCORD_STORAGE0

Create a Dataset from Label Rows

Use the following script to create a new Dataset from the label rows of a specific Project.

  • Replace <private_key_path> with the path to your private key.
  • Replace <project_hash> with the hash of the Project containing the data units you want to create a new Dataset from.
  • Replace My new Dataset with the name you want to give your new Dataset.

If create_backing_folder is True, a mirrored Dataset is created. Mirrored Datasets sync the content of the backed Folder with the Dataset.

# Import dependencies
from encord.orm.dataset import StorageLocation
from encord.user_client import EncordUserClient


# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Specify a Project
project = user_client.get_project("<project_hash>")

# Get the UUIDs of the items to be added to the new Dataset
item_uuids = [lr.backing_item_uuid for lr in project.list_label_rows_v2() if "subset_me" in lr.data_title]

# Create new Dataset and link the items
response = user_client.create_dataset(
    dataset_title="My new Dataset",
    dataset_type=StorageLocation.CORD_STORAGE,
    create_backing_folder=False
)
dataset = client.get_dataset(response.dataset_hash)
dataset.link_items(item_uuids)

List existing Datasets

Use the EncordUserClient method to query and list the user client’s Datasets.

The following example fetches all Datasets available to the user. Substitute <private_key_path> with the file path for your private key.

The Dataset hash can be found within the URL once a Dataset has been selected: app.encord.com/projects/view/\<dataset_hash>/summary or app.us.encord.com/projects/view/\<dataset_hash>/summary

The type attribute in the output refers to the StorageLocation