Data uploaded prior to the release of Index is stored in Mirrored Datasets. Learn more about Mirrored Datasets here.

Datasets are subsets of your files that can be attached to one or more Projects for annotation. Datasets are created from files you upload to Encord.

Creating Datasets

  1. Click the New dataset button in the Datasets section in Annotate.
  1. Give your Dataset a meaningful title and description. A clear title and description keeps your data organized.
Toggle Looking to create a mirrored dataset? to create a Mirrored Dataset.
  1. Click Create dataset to create the Dataset.

Attach data

After a Dataset has been created, you can attach data.

We recommend uploading smaller batches of data: limit uploads to 100 videos and up to 1000 images at a time. You have the option to create multiple Datasets, all of which can be linked to a single Project. Familiarize yourself with our limits and best practices for data import before uploading data to Encord.
  1. Navigate to the Datasets section under the Annotate heading.
  2. Click the Dataset you want to add data to.
  3. Click +Attach existing files.
If the files you want have not been uploaded into Encord yet, click +Upload files to upload new files.
  1. Select the folders containing the files you want to attach to the Dataset. To select individual files, double-click a folder to see its contents, and select the files you want to add to the Dataset.

  2. Click Attach data to attach the selected files to the Dataset.


Upload cloud data to Datasets

We recommend uploading files in batches not exceeding 2GB, to ensure upload does not exceed 3 hours.
  1. Create a Dataset.

  2. Select the Dataset you want to upload data to.

  3. Click +Upload files.

  1. Select a folder to store the files in, or create a new folder.

  2. Select the Import from private cloud tab and select the integration you want to use.

  1. Click Add JSON or CSV files to upload a JSON or CSV file specifying the cloud data that is to be added to the Dataset. Turn on the Ignore individual file errors toggle to ignore errors caused by files not supported by Encord.
We recommend enabling the Ignore individual file errors toggle. This ensures that the entire upload does not fail if only one file cannot be added.
  1. Click Import to add your cloud data to the Dataset.
The data is be fetched from your cloud storage and processed asynchronously. This involves fetching appropriate metadata and other file information to help us render the files appropriately and to check for any framerate inconsistencies. We do not store your files in any way.

Mirrored Datasets

Mirrored Datasets provide a continuity solution that retains the organization of data prior to the release of Index. With the transition to Index, all existing data within Datasets has been transferred to Files in the form of Mirrored Datasets. Mirrored Datasets can be managed using both the Files and Datasets sections of the Encord platform.

For example, moving a file named “chicken.mp4” from a mirrored Dataset titled “Animal videos” to another mirrored Dataset called “Chicken videos”, results in “chicken.mp4” being visible in all Projects associated with “Chicken videos”.


Entity relationships

The following diagram illustrates how Datasets relate to other entities in Encord.

  • Projects bring together Ontologies, Datasets, Workflows, and collaborators.
  • A Project can have multiple Datasets attached to it, but only one Ontology.
  • One Ontology can be attached to multiple Projects.

Roles and permissions

Collaborator permissions can be set in the Team section of the Dataset Settings.

PermissionAdminViewer
View dataset
Add data
Adjust settings

Manage Datasets

Use the Datasets tab in the navigation bar to manage your Datasets.

Click a Dataset to:

  • Upload additional files to the Dataset.
  • Remove files from the Dataset.
  • Manage who has access to the Dataset.

The dashboard is split into two tabs:


Data tab

Use the data tab to upload, and manage existing files.

  • A - Click and select (or drag-and-drop) files into the area highlighted below to upload files to a Dataset.

  • B - Manage files contained in the Dataset.

    • Edit the filename by clicking the Edit icon.
    • Select a file by clicking the checkbox next to the file name.
    • Select a file and click the Delete button to delete the file from a Dataset.
    • Re-encode a file by selecting the file and clicking the Re-encode(auto) button.

Settings tab

Team

The Team pane shows a list of collaborators on the Dataset.

  • Invite collaborators by clicking the + Invite collaborators button and adding their emails.
  • New collaborators are assigned the ‘Viewer’ role by default. A ‘Viewer’ cannot make changes to the Dataset, only an ‘Admin’ can.
  • Collaborators can be upgraded to ‘Admin’ using the 3 dots to the right of their name.
  • Click the Delete icon to delete a collaborator.
An ‘Admin’ cannot be reverted to a ‘Viewer’. To do so you must delete and re-invite the user.

Linked Projects

The Projects pane shows a list of Projects using the Dataset.

Click on View to navigate to that Project.


Danger zone (Delete Datasets)

Use the Danger zone pane to delete Datasets.

Click the Delete dataset button to delete the entire Dataset. You are prompted to type the word ‘delete’ into the resulting pop-up to delete the Dataset.

Deleting a Dataset cannot be undone. Make sure you want to perform this action before continuing.

Joining Datasets in your Org

Organization Admins can search for and join any Datasets that exist within the Organization.

  1. Navigate to Datasets under the Annotate heading in the Encord platform.
  2. Select the All Encord datasets tab.
  3. Find the Dataset you want to join.
  4. Click Join dataset to join the Dataset.

When an Organization Admin joins a Dataset, they are automatically assigned the Admin user role for that Dataset.

Datasets can be filtered by Dataset owner.
See all Datasets you belong to by clicking the Filter by search bar, and selecting My datasets only.