Add files to Datasets

ℹ️

Note

When using the legacy method add_private_data_to_dataset_start to upload files directly to a Dataset, you must specify a StorageFolder object or a folder uuid.

Once your data is uploaded to Files it can be attached to multiple Datasets. The following scripts add all files in a specified folder to a Dataset.

  • Replace <private_key_path> with the path to your private key.
  • Replace <folder_name> with the name you want to give your Storage folder.
  • Replace <dataset_hash> with the hash of the Dataset you want to add the data units to.

ℹ️

Note

Files added to the folder at a later time will not be automatically added to the Dataset.

from encord import EncordUserClient

# Authentication
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
)

# Find the storage folder by name
folder_name = "<folder_name>"  # Replace with your folder's name
folders = list(user_client.find_storage_folders(search=folder_name, page_size=1000)) # Default page_size is 100.

dataset = user_client.get_dataset("<dataset_hash>")

# Ensure the folder was found
if folders:
    storage_folder = folders[0]

    # List all data units
    items = list(storage_folder.list_items())

    # Collect all item UUIDs
    item_uuids = [item.uuid for item in items]

    # Output the retrieved data units
    for item in items:
        print(f"UUID: {item.uuid}, Name: {item.name}, Type: {item.item_type}")

    # Link all items at once if there are any
    if item_uuids:
        dataset.link_items(item_uuids)
else:
    print("Folder not found.")