The following script prints the storage locations of all files in a Dataset. This includes the cloud storage locations for private cloud data, and Encord storage location for local data in the Dataset. Knowing where your files are storage helps to cross-verify that all data from a cloud bucket has been added to a Dataset.

To learn how to view the storage locations of all files in a Project, see our documentation here.

In the following script, ensure that you:

  • Replace <private_key_path> with the path to your private key.
  • Replace <dataset_hash> with the hash of the Dataset you want to know the storage locations for.
# Import dependencies
from encord import EncordUserClient, Project,Dataset
from encord.objects.project import ProjectDataset
from encord.orm.dataset import DatasetAccessSettings

# Instantiate client
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
)

# Print URLs of all files in the Dataset
dataset_level_file_links = []
dataset: Dataset = user_client.get_dataset("<dataset_hash>")
for data in dataset.list_data_rows():
    dataset_level_file_links.append(data.file_link)
print(dataset_level_file_links)