Overview
Stage 1: Ingest
Tool: Index The first step is registering your data with Encord. Encord supports:- Cloud storage (AWS S3, GCP Cloud Storage, Azure Blob Storage) — register by providing a JSON list of file URIs; files remain in your bucket
- Cloud sync — automatically sync a cloud folder so new files are registered as they arrive
- Local upload — upload files directly for smaller datasets or quick experimentation
Stage 2: Organize
Tool: Index Once ingested, data is organized into Folders — logical containers that group related files. Folders support:- Nested hierarchies to mirror your project or domain structure
- Access controls to restrict who can view or modify data
- Metadata attachment for filtering and downstream use
Stage 3: Curate
Tool: Index Curation is the process of selecting which data to annotate. Annotating everything is rarely optimal — curation helps you focus effort on the most valuable data.What to do at this stage
- Explore embeddings — visualize your dataset in 2D embedding space. Identify dense clusters (likely duplicates or over-represented conditions) and sparse regions (edge cases you need more of)
- Remove duplicates — use Encord’s duplicate detection to eliminate near-identical samples before annotating them
- Filter by quality — use off-the-shelf quality metrics to remove blurry, corrupt, or overexposed samples
- Search for edge cases — use natural language search or similarity search to find specific conditions you know your model struggles with
Collections
Save your curated selection as a Collection. Collections are named, versioned subsets of your data that can be:- Sent to Annotate as an annotation batch
- Exported directly as a dataset
- Shared with teammates for review
Stage 4: Annotate
Tool: Annotate Annotation turns raw data into labeled training data. In Encord, annotation is organized around Projects, which bring together:- A Dataset (one or more collections of data files)
- An Ontology (the labeling schema — classes, attributes, and relationships)
- A Workflow (the stages a task passes through before completion)
- Collaborators (annotators, reviewers, and managers)
What to do at this stage
- Create or select an ontology — define the classes and attributes your model needs
- Create a dataset from your curated collection
- Set up a project with an appropriate workflow (e.g. Annotate → Review → Complete)
- Assign and prioritize tasks — use the Queue to manage task distribution
- Label data using the Label Editor, with AI assistance where available
- Review and QA — reviewers approve, reject, or raise issues on submitted tasks
Stage 5: Export
Tool: Annotate Once annotation is complete, labels are exported for use in training. Encord supports export in:- JSON (Encord format) — full fidelity, including all attributes and metadata
- COCO — standard format for object detection and segmentation
- Custom formats via SDK — transform labels programmatically using the Python SDK
- All labels in a project
- Labels from a specific workflow stage
- Labels for a selected subset of tasks
Stage 6: Train and deploy
Your ML infrastructure Take your exported labels into your training pipeline and train your model. This step happens outside Encord, in your own infrastructure. After training and deploying, you will have model predictions on new data — which feeds back into Stage 7.Stage 7: Evaluate
Tool: Active Import your model’s predictions into Encord Active to evaluate performance against ground truth labels.What to do at this stage
- Compare predictions to ground truth — see where your model agrees and disagrees with human labels
- Review automatic metrics — mAP, mAR, F1 Score, precision, recall by class
- Find failure modes — identify underperforming clusters, edge cases, and underrepresented classes
- Surface labeling errors — Active can detect labels that are likely mistakes by comparing them to model outputs
Closing the loop
Once you’ve identified where the model fails, use Active to:- Create a Collection of the high-value samples — data where the model is uncertain, wrong, or underrepresented
- Send the collection back to Annotate for re-labeling or additional annotation
- Merge the new labels with your existing dataset
- Retrain and evaluate again
Lifecycle at a glance
| Stage | Tool | Key action |
|---|---|---|
| Ingest | Index | Register data from cloud storage |
| Organize | Index | Create folders, attach metadata |
| Curate | Index | Filter, deduplicate, build collections |
| Annotate | Annotate | Label with human + AI; review and QA |
| Export | Annotate | Export labels in JSON or COCO |
| Train | Your infra | Train and deploy your model |
| Evaluate | Active | Import predictions, find failure modes |
| Loop | Index + Annotate | Curate high-value data, re-annotate |
Where to go next
- Annotation and Curation — detailed guide to labeling and dataset curation
- End-to-End Walkthrough — a complete worked example
- Work with Data — data ingestion and registration
- Active Overview — model evaluation and active learning

