Data Management and Curation

Index Get Started

STEP 1: Set up your Org

Add Users

An overview of all your Organization’s users and user roles is found on the Users tab of your Organization.

User roles

Organizations have several kinds of users.

  • Internal: Users that your Organization directly employs. Can be either Member or Admin.
  • External: Users not directly employed, or contractually employed by your Organization. This includes external annotation teams.
  • Workforce: Users belonging to a Workforce Organization that are added to another Organization’s Project or Dataset.

Internal users can have the Member OR Admin role in your Organization. The following table outlines permissions of both internal user roles.

AdminMember
Executive privileges over the Organization such as adding and removing users, and the ability to view all Projects in the Organization.No administrative privileges over your Organization. Can only view Projects they create, or have been invited to.
In addition to having a user role within the Organization, all users have distinct roles in Projects, Datasets, and Ontologies.

Adding and removing users

Only Organization Admins have the ability to add users to, or remove users from, the Organization.

Users belonging to your Organization are managed on the Users tab of your Organization dashboard. The Users tab displays by default when you navigate to your Organization. All users belonging to your Organization are listed on the Users tab.

To add new users to your Organization:

  1. Click the + Add user button. A dialog appears
  1. Type the email addresses of the users you want to add.
  2. Select the role you want the users to have.
  3. Click Add to add the users to your Organization.

Add User Groups

User groups are collections of members that are grouped together, allowing them to be added to Projects, Datasets, and Ontologies collectively. User groups are managed on the Groups tab of your Organization’s dashboard.

Create user groups

To create a user group:

  1. Navigate to the Groups tab of your Organization.

  2. Click + Create group. A dialog appears.

  1. Give your group a meaningful name and description.
  2. Search for and select users to include in the group.
  3. Click Add to add the selected users to the group. Users can be removed by clicking the delete icon next to the user.
  4. Click Create group to create the user group.

Add Project Tags

Project tags serve as a labeling system that helps to categorize, group, and filter Projects within your Organization. Project tags are created and managed in the Project tags tab of the Organization’s dashboard.

Project tags are applied to Projects on the Project level, not from your Organization dashboard.
The Project tags tab is only visible to Organization Admins, and only Organization Admins can create Project tags.

Create Project tags

Project tags must be created before they can be added to a Project.

  1. Click + New project tag on the Project tags tab.
  1. Give the new Project tag a name.

  2. Press Enter to create the tag.

All Project tag names must be unique.

STEP 2: Data Discoverability Strategy

Index is purpose built to accelerate the speed and ease with which you find the best data from your data lake. Using Index effectively requires some up front planning on your part before even touching the Encord platform. To get the quickest ROI from Index you need a Data Discoverability Strategy. This helps to curate your data in the most efficient manner. Index allows you to visually inspect your data, but if you have billions (yes billions with a B) of data units you cannot visually inspect every single data unit in your data lake. Index provides a number of ways to sort and filter your data. But to turn that lake of data into something more manageable at scale and speed, we want to focus on exactly the things that are critical for you. Building a Data Discoverability Strategy helps you achieve that.

AcceleratorDescription

Key Frames

Video Only
Imports frames of interest into Index. You can specify a sampling rate (default 1 FPS) and key frames for import into Index or you can import only key frames.

How does this help?

You ensure critical data imports to Index. This “pre-filters” your data so the data available from your videos is already of a high quality. You control the amount of frames imported into Index. This can significantly speed up how quickly video data imports.

  • Sampling rate + key frames: Control how much of a video imports while ensuring critical frames are included.
  • Key frames only: Only frames you deem critical import into Index. This can significantly reduce video import time into Index.

What do I need to do?

You DO NOT need to make a Metadata Schema to specify key frames when importing videos.

  1. Decide on the key frames you want for each video and specify those frames when importing the video.
  2. Specify the key frames when importing videos. Refer to the docs for AWS, GCP, Azure, and OTC to specify the key frames during video imports. When importing local data or if you want to specify key frame after you import your videos, go here.

Custom Metadata

Provides custom filtering criteria for ALL data that has custom metadata.

How does this help?

You are able to filter your your data on the criteria that is important to you and your use cases. - Want to filter based on your companies UUID for the data. No problem.

  • Want to add transcription data (through custom metadata) and search or filter based on the transcription. Easy Peasy.
  • Need to specify priority on the data and then filter on that priority. Custom metadata helps support that.
  • Want to filter based on time and date stamps. You can.

What do I need to do?

  1. Decide on the filtering criteria you need. You can always update this later.
  2. Create a metadata schema in Encord.
  3. Create a Folder in Index.
  4. While importing add custom metadata to each data unit you want to be able to filter using your criteria. Refer to the docs for AWS, GCP, Azure, and OTC. When importing local data or if you want to apply custom metadata after importing your data, go here.

Custom Embeddings

Provides visualization mechanism to find patterns and similarity in your data.

How does this help?

  • Embedding plots provide at a glance analysis for massive amounts of data.
  • Data similarity search across your data.
  • Natural language search across your data.

What do I need to do?

  1. Decide on the embeddings you need. You can always update this later.
  2. Create a metadata schema in Encord.
  3. Create a Folder in Index.
  4. While importing add custom embeddings to each data unit you want to using custom embeddings with. Refer to the docs for AWS, GCP, Azure, and OTC. When importing local data or if you want to apply custom metadata after importing your data, go here.

STEP 3: Create a Cloud Integration

Select your cloud provider.

STEP 4: Create Metadata Schema

Based on your Data Discoverability Strategy, you need to create a metadata schema. The schema provides a method of organization for your custom metadata. Encord supports:

  • Scalers: Methods for filtering.
  • Enums: Methods with options for filtering.
  • Embeddings: Method for embedding plot visualization, similarity search, and natural language search.

Custom metadata

Custom metadata refers to any additional information you attach to files, allowing for better data curation and management based on your specific needs. It can include any details relevant to your workflow, helping you organize, filter, and retrieve data more efficiently. For example, for a video of a construction site, custom metadata could include fields like "site_location": "Algiers", "project_phase": "foundation", or "weather_conditions": "sunny". This enables more precise tracking and management of your data.

Before importing any files with custom metadata to Encord, we recommend that you import a metadata schema. Encord uses metadata schemas to validate custom metadata uploaded to Encord and to instruct Index and Active how to display your metadata.

Metadata schema table

Use add_scalar to add a scalar key to your metadata schema.

Scalar KeyDescriptionDisplay Benefits
booleanBinary data type with values “true” or “false”.Filtering by binary values
datetimeISO 8601 formatted date and time.Filtering by time and date
numberNumeric data type supporting float values.Filtering by numeric values
uuidCustomer specified unique identifier for a data unit.Filtering by customer specified unique identifier
varcharTextual data type. Formally string. string can be used as an alias for varchar, but we STRONGLY RECOMMEND that you use varchar.Filtering by string.
textText data with unlimited length (example: transcripts for audio). Formally long_string. long_string can be used as an alias for text, but we STRONGLY RECOMMEND that you use text.Storing and filtering large amounts of text.

Use add_enum and add_enum_options to add an enum and enum options to your meta data schema.

KeyDescriptionDisplay Benefits
enumEnumerated type with predefined set of values.Facilitates categorical filtering and data validation

Use add_embedding to add an embedding to your metadata schema.

KeyDescriptionDisplay Benefits
embedding512 dimension embeddings for Active, 1 to 4096 for Index.Filtering by embeddings, similarity search, 2D scatter plot visualization (Coming Soon)

Incorrectly specifying a data type in the schema can cause errors when filtering your data in Index or Active. If you encounter errors while filtering, verify your schema is correct. If your schema has errors, correct the errors, re-import the schema, and then re-sync your Active Project.

Import your metadata schema to Encord

Verify your schema

After importing your schema to Encord we recommend that you verify that the import is successful. Run the following code to verify your metadata schema imported and that the schema is correct.

STEP 5: Create a Folder in Index

You must create a folder in Index to store your files.

  1. Navigate to Files under the Index heading in the Encord platform.
  2. Click the + New folder button to create a new folder. A dialog to create a new folder appears.
  1. Give the folder a meaningful name and description.

  2. Click Create to create the folder. The folder is listed in Files.

STEP 6: Create JSON or CSV for Import

To import files from cloud storage into Encord, you must create a JSON or CSV file specifying the files you want to upload.

Find helpful scripts for creating JSON and CSV files for the data upload process here.

All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the same way, by using a JSON or CSV file. The file includes links to all of the images, image groups, videos and DICOM files in your cloud storage.

For a list of supported file formats for each data type, go here
Encord supports file names up to 300 characters in length for any file or video for upload.

STEP 7: Import your data

Import Cloud Data

We recommend uploading smaller batches of data: limit uploads to 100 videos and up to 1000 images at a time. Familiarize yourself with our limits and best practices for data import before uploading data to Encord.
  1. Navigate to Files section of Index in the Encord platform.
  2. Click into a Folder.
  3. Click + Upload files. A dialog appears.
  1. Click Import from cloud data.
We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.
  1. Click Add JSON or CSV files to add a JSON or CSV file specifying cloud data that is to be added.

Import Local Data

We recommend uploading smaller batches of data: limit uploads to 100 videos and up to 1000 images at a time. Familiarize yourself with our limits and best practices for data import before uploading data to Encord.
  1. Navigate to Files section of Index in the Encord platform.
  2. Click into a Folder.
  3. Click + Upload files. A dialog appears.
  1. Click one of the following:

    • Upload: Upload images, videos, and audio files. Upload dialog
    • Batch images as: Upload image batches as image groups or image sequences. Batch images as
    • DICOM/NifTi: Upload DICOM or NifTi series. DICOM/NifTi
  2. Click Upload after selecting your images or series.

    Your files upload into the Folder in Encord.

STEP 8: Create a Collection using Index

A Collection is a container for data units (images or videos) that you can use to group your data units together.

Creation of a Collection involves filtering and sorting your data. Once you have selected a smaller group of images, videos or audio files, create a Collection.

  1. Log in to the Encord platform. The landing page for the Encord platform appears.

  2. Go to Index > Files. The All folders page appears with a list of all folders in Encord.

  3. Click in to a Folder. The landing page for the Folder appears and the Explorer button is enabled.

  4. Click the Explorer button. The Index Explorer page appears.

  1. Search, sort, and filter your data until you have the subset of the data you need.
  1. Select one or more of the images/frames in the Explorer workspace. A ribbon appears at the top of the Explorer workspace.

    Selecting a video frame selects the entire video. Specific frames from a video cannot be selected.
  2. Click Select all to select all the images in the subset.

  3. Click Add to a Collection.

  1. Click New Collection.

  2. Specify a meaningful title and description for the Collection.

    The title specified here is applied as a tag/label to every selected image.
  3. Click Collections to verify the Collection appears in the Collections list.

STEP 9: Create a Dataset from a Collection

Once you have a Collection, you can create a Dataset from your Collection.