Data Lifecycle

Applied AI is an iterative process. Data doesn’t flow through a pipeline once — it cycles continuously between collection, labeling, training, deployment, and back again. This page maps the full data lifecycle in Encord and explains what happens at each stage.

Overview

Ingest → Organize → Curate → Annotate → Review → Export → Train → Evaluate → (repeat)

Each stage connects directly to the next within Encord. You don’t need to move files between systems — the same data registered in Index flows into Annotate for labeling and into Active for evaluation.

Stage 1: Ingest

Tool: Index The first step is registering your data with Encord. Encord supports:

Cloud storage (AWS S3, GCP Cloud Storage, Azure Blob Storage) — register by providing a JSON list of file URIs; files remain in your bucket
Cloud sync — automatically sync a cloud folder so new files are registered as they arrive
Local upload — upload files directly for smaller datasets or quick experimentation

After registration, files are indexed and made available for curation, annotation, and evaluation. See Work with Data for setup instructions.

Stage 2: Organize

Tool: Index Once ingested, data is organized into Folders — logical containers that group related files. Folders support:

Nested hierarchies to mirror your project or domain structure
Access controls to restrict who can view or modify data
Metadata attachment for filtering and downstream use

At this stage you also define any custom metadata you want to attach — sensor IDs, collection dates, geographic tags, or domain-specific fields. Metadata is used throughout Index for filtering and curation, and is passed through to Annotate and Active. See Custom Metadata for schema setup.

Stage 3: Curate

Tool: Index Curation is the process of selecting which data to annotate. Annotating everything is rarely optimal — curation helps you focus effort on the most valuable data.

What to do at this stage

Explore embeddings — visualize your dataset in 2D embedding space. Identify dense clusters (likely duplicates or over-represented conditions) and sparse regions (edge cases you need more of)
Remove duplicates — use Encord’s duplicate detection to eliminate near-identical samples before annotating them
Filter by quality — use off-the-shelf quality metrics to remove blurry, corrupt, or overexposed samples
Search for edge cases — use natural language search or similarity search to find specific conditions you know your model struggles with

Collections

Save your curated selection as a Collection. Collections are named, versioned subsets of your data that can be:

Sent to Annotate as an annotation batch
Exported directly as a dataset
Shared with teammates for review

See Collections for full documentation.

Stage 4: Annotate

Tool: Annotate Annotation turns raw data into labeled training data. In Encord, annotation is organized around Projects, which bring together:

A Dataset (one or more collections of data files)
An Ontology (the labeling schema — classes, attributes, and relationships)
A Workflow (the stages a task passes through before completion)
Collaborators (annotators, reviewers, and managers)

What to do at this stage

Create or select an ontology — define the classes and attributes your model needs
Create a dataset from your curated collection
Set up a project with an appropriate workflow (e.g. Annotate → Review → Complete)
Assign and prioritize tasks — use the Queue to manage task distribution
Label data using the Label Editor, with AI assistance where available
Review and QA — reviewers approve, reject, or raise issues on submitted tasks

See Create a Project to get started.

Stage 5: Export

Tool: Annotate Once annotation is complete, labels are exported for use in training. Encord supports export in:

JSON (Encord format) — full fidelity, including all attributes and metadata
COCO — standard format for object detection and segmentation
Custom formats via SDK — transform labels programmatically using the Python SDK

You can export:

All labels in a project
Labels from a specific workflow stage
Labels for a selected subset of tasks

Label versions can also be saved — snapshots of your labels at a point in time — for reproducibility and regression tracking. See Export Labels for the full export workflow.

Stage 6: Train and deploy

Your ML infrastructure Take your exported labels into your training pipeline and train your model. This step happens outside Encord, in your own infrastructure. After training and deploying, you will have model predictions on new data — which feeds back into Stage 7.

Stage 7: Evaluate

Tool: Active Import your model’s predictions into Encord Active to evaluate performance against ground truth labels.

What to do at this stage

Compare predictions to ground truth — see where your model agrees and disagrees with human labels
Review automatic metrics — mAP, mAR, F1 Score, precision, recall by class
Find failure modes — identify underperforming clusters, edge cases, and underrepresented classes
Surface labeling errors — Active can detect labels that are likely mistakes by comparing them to model outputs

Closing the loop

Once you’ve identified where the model fails, use Active to:

Create a Collection of the high-value samples — data where the model is uncertain, wrong, or underrepresented
Send the collection back to Annotate for re-labeling or additional annotation
Merge the new labels with your existing dataset
Retrain and evaluate again

This feedback loop is what separates teams that improve their models continuously from those that don’t. See Active Overview for full documentation.

Lifecycle at a glance

Stage	Tool	Key action
Ingest	Index	Register data from cloud storage
Organize	Index	Create folders, attach metadata
Curate	Index	Filter, deduplicate, build collections
Annotate	Annotate	Label with human + AI; review and QA
Export	Annotate	Export labels in JSON or COCO
Train	Your infra	Train and deploy your model
Evaluate	Active	Import predictions, find failure modes
Loop	Index + Annotate	Curate high-value data, re-annotate

Where to go next

Annotation and Curation — detailed guide to labeling and dataset curation
End-to-End Walkthrough — a complete worked example
Work with Data — data ingestion and registration
Active Overview — model evaluation and active learning

​Overview

​Stage 1: Ingest

​Stage 2: Organize

​Stage 3: Curate

​What to do at this stage

​Collections

​Stage 4: Annotate

​What to do at this stage

​Stage 5: Export

​Stage 6: Train and deploy

​Stage 7: Evaluate

​What to do at this stage

​Closing the loop

​Lifecycle at a glance

​Where to go next

Overview

Stage 1: Ingest

Stage 2: Organize

Stage 3: Curate

What to do at this stage

Collections

Stage 4: Annotate

What to do at this stage

Stage 5: Export

Stage 6: Train and deploy

Stage 7: Evaluate

What to do at this stage

Closing the loop

Lifecycle at a glance

Where to go next