Skip to main content
Applied AI is an iterative process. Data doesn’t flow through a pipeline once — it cycles continuously between collection, labeling, training, deployment, and back again. This page maps the full data lifecycle in Encord and explains what happens at each stage.

Overview

Ingest → Organize → Curate → Annotate → Review → Export → Train → Evaluate → (repeat)
Each stage connects directly to the next within Encord. You don’t need to move files between systems — the same data registered in Index flows into Annotate for labeling and into Active for evaluation.

Stage 1: Ingest

Tool: Index The first step is registering your data with Encord. Encord supports:
  • Cloud storage (AWS S3, GCP Cloud Storage, Azure Blob Storage) — register by providing a JSON list of file URIs; files remain in your bucket
  • Cloud sync — automatically sync a cloud folder so new files are registered as they arrive
  • Local upload — upload files directly for smaller datasets or quick experimentation
After registration, files are indexed and made available for curation, annotation, and evaluation. See Work with Data for setup instructions.

Stage 2: Organize

Tool: Index Once ingested, data is organized into Folders — logical containers that group related files. Folders support:
  • Nested hierarchies to mirror your project or domain structure
  • Access controls to restrict who can view or modify data
  • Metadata attachment for filtering and downstream use
At this stage you also define any custom metadata you want to attach — sensor IDs, collection dates, geographic tags, or domain-specific fields. Metadata is used throughout Index for filtering and curation, and is passed through to Annotate and Active. See Custom Metadata for schema setup.

Stage 3: Curate

Tool: Index Curation is the process of selecting which data to annotate. Annotating everything is rarely optimal — curation helps you focus effort on the most valuable data.

What to do at this stage

  • Explore embeddings — visualize your dataset in 2D embedding space. Identify dense clusters (likely duplicates or over-represented conditions) and sparse regions (edge cases you need more of)
  • Remove duplicates — use Encord’s duplicate detection to eliminate near-identical samples before annotating them
  • Filter by quality — use off-the-shelf quality metrics to remove blurry, corrupt, or overexposed samples
  • Search for edge cases — use natural language search or similarity search to find specific conditions you know your model struggles with

Collections

Save your curated selection as a Collection. Collections are named, versioned subsets of your data that can be:
  • Sent to Annotate as an annotation batch
  • Exported directly as a dataset
  • Shared with teammates for review
See Collections for full documentation.

Stage 4: Annotate

Tool: Annotate Annotation turns raw data into labeled training data. In Encord, annotation is organized around Projects, which bring together:
  • A Dataset (one or more collections of data files)
  • An Ontology (the labeling schema — classes, attributes, and relationships)
  • A Workflow (the stages a task passes through before completion)
  • Collaborators (annotators, reviewers, and managers)

What to do at this stage

  1. Create or select an ontology — define the classes and attributes your model needs
  2. Create a dataset from your curated collection
  3. Set up a project with an appropriate workflow (e.g. Annotate → Review → Complete)
  4. Assign and prioritize tasks — use the Queue to manage task distribution
  5. Label data using the Label Editor, with AI assistance where available
  6. Review and QA — reviewers approve, reject, or raise issues on submitted tasks
See Create a Project to get started.

Stage 5: Export

Tool: Annotate Once annotation is complete, labels are exported for use in training. Encord supports export in:
  • JSON (Encord format) — full fidelity, including all attributes and metadata
  • COCO — standard format for object detection and segmentation
  • Custom formats via SDK — transform labels programmatically using the Python SDK
You can export:
  • All labels in a project
  • Labels from a specific workflow stage
  • Labels for a selected subset of tasks
Label versions can also be saved — snapshots of your labels at a point in time — for reproducibility and regression tracking. See Export Labels for the full export workflow.

Stage 6: Train and deploy

Your ML infrastructure Take your exported labels into your training pipeline and train your model. This step happens outside Encord, in your own infrastructure. After training and deploying, you will have model predictions on new data — which feeds back into Stage 7.

Stage 7: Evaluate

Tool: Active Import your model’s predictions into Encord Active to evaluate performance against ground truth labels.

What to do at this stage

  • Compare predictions to ground truth — see where your model agrees and disagrees with human labels
  • Review automatic metrics — mAP, mAR, F1 Score, precision, recall by class
  • Find failure modes — identify underperforming clusters, edge cases, and underrepresented classes
  • Surface labeling errors — Active can detect labels that are likely mistakes by comparing them to model outputs

Closing the loop

Once you’ve identified where the model fails, use Active to:
  1. Create a Collection of the high-value samples — data where the model is uncertain, wrong, or underrepresented
  2. Send the collection back to Annotate for re-labeling or additional annotation
  3. Merge the new labels with your existing dataset
  4. Retrain and evaluate again
This feedback loop is what separates teams that improve their models continuously from those that don’t. See Active Overview for full documentation.

Lifecycle at a glance

StageToolKey action
IngestIndexRegister data from cloud storage
OrganizeIndexCreate folders, attach metadata
CurateIndexFilter, deduplicate, build collections
AnnotateAnnotateLabel with human + AI; review and QA
ExportAnnotateExport labels in JSON or COCO
TrainYour infraTrain and deploy your model
EvaluateActiveImport predictions, find failure modes
LoopIndex + AnnotateCurate high-value data, re-annotate

Where to go next