Skip to main content
This walkthrough takes you through the complete applied AI workflow in Encord: ingesting data, curating a training set, setting up an annotation Project, labeling with quality control, and exporting labels for training. By the end, you’ll have a working pattern you can adapt for your own use case.

What you’ll need

  • An Encord account with Admin or Member access
  • Data files in cloud storage (AWS S3, GCP, or Azure) or available for local upload
  • A clear labeling schema (the classes and attributes you want to annotate)
If you don’t have data ready, you can use local uploads to get started quickly.

Step 1: Register your data

In: Index
  1. Navigate to Index in the Encord platform.
  2. Create a Folder to organize your files (e.g. training-data/v1).
  3. Register your data using one of the following methods:
    • Cloud storage — provide a JSON file of cloud URIs pointing to your files in AWS, GCP, or Azure
    • Cloud sync — connect a cloud folder and sync automatically as new files are added
    • Local upload — drag and drop files directly into the platform
Once registered, your files are indexed and available for curation. Encord does not copy your data — files stay in your own storage. See Work with Data for detailed instructions.

Step 2: Curate your dataset

In Index Before annotating, invest time in curation. This ensures you annotate the right data, not just all data.
  1. Open your Folder in Index and explore your data visually.
  2. Use Embedding plots to visualize the distribution of your data. Look for:
    • Dense clusters — likely duplicates or over-represented conditions
    • Sparse regions — edge cases or rare conditions worth prioritizing
  3. Run duplicate detection to identify and remove near-identical samples.
  4. Use quality metric filters to remove corrupt, blurry, or low-quality files.
  5. Use natural language search or metadata filters to find specific conditions (e.g. “night driving”, “small objects”, or specific sensor IDs).
  6. Select the samples you want to annotate and save them as a Collection.
Aim for a collection that is diverse and representative of the conditions your model will encounter in production — not just whatever data was easiest to collect.
See Collections for instructions.

Step 3: Create a Dataset

In Annotate
  1. Navigate to Datasets under Annotate.
  2. Click + New dataset.
  3. Give the Dataset a meaningful name and description.
  4. Add your curated Collection from Index to the Dataset.
The Dataset is now available to attach to annotation Projects.

Step 4: Create an Ontology

In Annotate An Ontology defines your labeling schema — the classes, attributes, and relationships that annotators will use.
  1. Navigate to Ontologies under Annotate.
  2. Click + New ontology.
  3. Add the object classes and classification types your model needs.
  4. For each class, add any nested attributes (e.g. for a vehicle class: type, color, occlusion level).
Keep your ontology focused. Every attribute you add increases annotation time and complexity. Only include what your model actually needs.
See Create Ontologies for guidance.

Step 5: Create an annotation Project

In Annotate:
  1. Navigate to Projects under Annotate.
  2. Click + New annotation project.
  3. Configure the Project:
    • Title and description — give the Project a clear name
    • Ontology — attach the Ontology you created in Step 4
    • Dataset — attach the Dataset from Step 3
    • Workflow — select or build a Workflow (e.g. Annotate → Review → Complete)
    • Collaborators — invite annotators, reviewers, and team managers
  4. Click Create project.
See Create a Project for the full setup guide.

Step 6: Label your data

In: Annotate — Label Editor With the Project created, annotators can start labeling:
  1. Open the Project and navigate to the Queue tab.
  2. Click Start task to open the Label Editor on the highest-priority task, or Initiate to open a specific task.
  3. Use the Label Editor tools to annotate each asset:
    • Use SAM 2 for fast, accurate segmentation with a single click
    • Use Interpolation to propagate bounding boxes or polygons across video frames
    • Use classification tools to assign frame-level or object-level attributes
  4. Submit the task when complete.
If Task Agents are configured, pre-labels from AI models will already be present when the annotator opens the task — reducing labeling time significantly.

Step 7: Review and QA

In: Annotate — Review stage Tasks submitted by annotators move to the Review stage in your Workflow.
  1. Reviewers open tasks from the Queue and inspect the labels.
  2. For each task, the reviewer either:
    • Approves — the task moves to the next stage or completion
    • Rejects — the task is returned to the annotator with issue notes
  3. Reviewers can raise Issues on specific labels or frames to flag problems for discussion.
Monitor overall quality using the Analytics tab: track approval rates, rejection rates, time per task, and open issues by annotator or reviewer.

Step 8: Export labels

In: Annotate Once tasks are complete and approved:
  1. Click Export in the Project.
  2. Choose your export format:
    • JSON — Encord’s native format, full fidelity
    • COCO — standard format for object detection and segmentation
  3. Specify which workflow stage to export from (typically Complete).
  4. Optionally save a label version for reproducibility.
  5. Click Export.
Your labels are now ready for your training pipeline. See Export Labels for the full export workflow.

Step 9: Train and evaluate

Take your exported labels into your training pipeline. After training and deploying your model, import its predictions back into Active to evaluate performance:
  1. Import model predictions into Active.
  2. Review automatic metrics (mAP, mAR, F1 Score).
  3. Use embedding plots and filters to find where the model underperforms.
  4. Create a new Collection of high-value samples — data where the model fails or is uncertain.
  5. Send the Collection back to Annotate (Step 3) and repeat.
This loop — curate, annotate, train, evaluate, repeat — is the core of applied AI development.

What’s next