Create Datasets using the SDK

First select where your data will be hosted with the appropriate StorageLocation.

ℹ️

Note

Creating a dataset and adding data to a dataset are two distinct steps. Click here to learn how to add data to an existing dataset.

ℹ️

Note

Datasets cannot be deleted using the SDK or the API. Please use the Encord platform to delete Datasets.

The following example will create a Dataset called “Example Title” that will expect data hosted on AWS S3. Substitute <private_key_path> with the file path for your private key.


# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import StorageLocation

# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
    )

# Create a dataset by specifying a title as well as a storage location
dataset = user_client.create_dataset(
    "Example Title", StorageLocation.AWS
)

# Prints the dataset, as shown in the example output
print(dataset)
{
    "title": "Example Title",
    "type": 1,
    "dataset_hash": "<dataset_hash>",
    "user_hash": "<user_hash>",
}

If your data is hosted on a different cloud server, replace the argument AWS for the StorageLocation method with the relevant argument.

Storage locationStorageLocation method argument
AWS S3AWS
GCPGCP
Azure blobAZURE
Open telekom cloudOTC
Encord storageCORD_STORAGE

👍

Tip

If you wish to upload your data from local storage to Encord host-storage, use CORD_STORAGE as an argument for the StorageLocation method. Click here to learn how to upload data to Encord-hosted storage.


List existing Datasets

Use the EncordUserClient method to query and list the user client's Datasets.

In the example below, a user authenticates with Encord and then fetches all Datasets available to them. Substitute <private_key_path> with the file path for your private key.

👍

Tip

The dataset hash can be found within the URL once a dataset has been selected:
app.encord.com/projects/view/<dataset_hash>/summary


# Import dependencies
from encord import EncordUserClient

# Authenticate with Encord using the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path='<private_key_path>')

# List existing datasets
datasets = user_client.get_datasets()
print(datasets)
[
    {
        "dataset": DatasetInfo(
                dataset_hash="<dataset_hash>",
                user_hash="<user_hash>",
                title="Example title",
                description="Example description ... ",
                type=0,  # encord.orm.dataset.StorageLocation
                created_at=datetime.datetime(...),
                last_edited_at=datetime.datetime(...)
            ),
        "user_role": DatasetUserRole.ADMIN
    },
    # ...
]

The type attribute in the output refers to the StorageLocation used when a dataset was created.

👍

Tip

EncordUserClient.get_datasets() has multiple optional arguments that allow you to query datasets with specific > characteristics. For example, if you only want datasets with titles starting with “Validation”, you could use user_client.get_datasets(title_like="Validation%"). Other keyword arguments such as created_before or edited_after may also be of interest.