> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encord.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Lifecycle

> How data moves through Encord from initial ingestion through curation, annotation, evaluation, and export — and how to close the loop between models and data.

Applied AI is an iterative process. Data doesn't flow through a pipeline once — it cycles continuously between collection, labeling, training, deployment, and back again. This page maps the full data lifecycle in Encord and explains what happens at each stage.

***

## Overview

```
Ingest → Organize → Curate → Annotate → Review → Export → Train → Evaluate → (repeat)
```

Each stage connects directly to the next within Encord. You don't need to move files between systems — the same data registered in Index flows into Annotate for labeling and into Active for evaluation.

***

## Stage 1: Ingest

**Tool: Index**

The first step is registering your data with Encord. Encord supports:

* **Cloud storage** (AWS S3, GCP Cloud Storage, Azure Blob Storage) — register by providing a JSON list of file URIs; files remain in your bucket
* **Cloud sync** — automatically sync a cloud folder so new files are registered as they arrive
* **Local upload** — upload files directly for smaller datasets or quick experimentation

After registration, files are indexed and made available for curation, annotation, and evaluation.

See [Work with Data](/platform-documentation/Curate/add-files/index-register-cloud-data-cloud-sync) for setup instructions.

***

## Stage 2: Organize

**Tool: Index**

Once ingested, data is organized into **Folders** — logical containers that group related files. Folders support:

* Nested hierarchies to mirror your project or domain structure
* Access controls to restrict who can view or modify data
* Metadata attachment for filtering and downstream use

At this stage you also define any **custom metadata** you want to attach — sensor IDs, collection dates, geographic tags, or domain-specific fields. Metadata is used throughout Index for filtering and curation, and is passed through to Annotate and Active.

See [Custom Metadata](/platform-documentation/Curate/custom-metadata/index-metadata-schema) for schema setup.

***

## Stage 3: Curate

**Tool: Index**

Curation is the process of selecting which data to annotate. Annotating everything is rarely optimal — curation helps you focus effort on the most valuable data.

### What to do at this stage

* **Explore embeddings** — visualize your dataset in 2D embedding space. Identify dense clusters (likely duplicates or over-represented conditions) and sparse regions (edge cases you need more of)
* **Remove duplicates** — use Encord's duplicate detection to eliminate near-identical samples before annotating them
* **Filter by quality** — use off-the-shelf quality metrics to remove blurry, corrupt, or overexposed samples
* **Search for edge cases** — use natural language search or similarity search to find specific conditions you know your model struggles with

### Collections

Save your curated selection as a **Collection**. Collections are named, versioned subsets of your data that can be:

* Sent to Annotate as an annotation batch
* Exported directly as a dataset
* Shared with teammates for review

See [Collections](/platform-documentation/Curate/curation-basics#collections) for full documentation.

***

## Stage 4: Annotate

**Tool: Annotate**

Annotation turns raw data into labeled training data. In Encord, annotation is organized around **Projects**, which bring together:

* A **Dataset** (one or more collections of data files)
* An **Ontology** (the labeling schema — classes, attributes, and relationships)
* A **Workflow** (the stages a task passes through before completion)
* **Collaborators** (annotators, reviewers, and managers)

### What to do at this stage

1. **Create or select an ontology** — define the classes and attributes your model needs
2. **Create a dataset** from your curated collection
3. **Set up a project** with an appropriate workflow (e.g. Annotate → Review → Complete)
4. **Assign and prioritize tasks** — use the Queue to manage task distribution
5. **Label data** using the Label Editor, with AI assistance where available
6. **Review and QA** — reviewers approve, reject, or raise issues on submitted tasks

See [Create a Project](/platform-documentation/GettingStarted/gettingstarted-create-project) to get started.

***

## Stage 5: Export

**Tool: Annotate**

Once annotation is complete, labels are exported for use in training.

Encord supports export in:

* **JSON** (Encord format) — full fidelity, including all attributes and metadata
* **COCO** — standard format for object detection and segmentation
* **Custom formats** via SDK — transform labels programmatically using the Python SDK

You can export:

* All labels in a project
* Labels from a specific workflow stage
* Labels for a selected subset of tasks

Label versions can also be saved — snapshots of your labels at a point in time — for reproducibility and regression tracking.

See [Export Labels](/platform-documentation/Annotate/annotate-projects/annotate-manage-annotation-projects#export-labels) for the full export workflow.

***

## Stage 6: Train and deploy

**Your ML infrastructure**

Take your exported labels into your training pipeline and train your model. This step happens outside Encord, in your own infrastructure.

After training and deploying, you will have model predictions on new data — which feeds back into Stage 7.

***

## Stage 7: Evaluate

**Tool: Active**

Import your model's predictions into Encord Active to evaluate performance against ground truth labels.

### What to do at this stage

* **Compare predictions to ground truth** — see where your model agrees and disagrees with human labels
* **Review automatic metrics** — mAP, mAR, F1 Score, precision, recall by class
* **Find failure modes** — identify underperforming clusters, edge cases, and underrepresented classes
* **Surface labeling errors** — Active can detect labels that are likely mistakes by comparing them to model outputs

### Closing the loop

Once you've identified where the model fails, use Active to:

1. Create a **Collection** of the high-value samples — data where the model is uncertain, wrong, or underrepresented
2. Send the collection back to **Annotate** for re-labeling or additional annotation
3. Merge the new labels with your existing dataset
4. Retrain and evaluate again

This feedback loop is what separates teams that improve their models continuously from those that don't.

See [Active Overview](/platform-documentation/Validation/active-how-to/active-model-predictions-eval) for full documentation.

***

## Lifecycle at a glance

| Stage    | Tool             | Key action                             |
| -------- | ---------------- | -------------------------------------- |
| Ingest   | Index            | Register data from cloud storage       |
| Organize | Index            | Create folders, attach metadata        |
| Curate   | Index            | Filter, deduplicate, build collections |
| Annotate | Annotate         | Label with human + AI; review and QA   |
| Export   | Annotate         | Export labels in JSON or COCO          |
| Train    | Your infra       | Train and deploy your model            |
| Evaluate | Active           | Import predictions, find failure modes |
| Loop     | Index + Annotate | Curate high-value data, re-annotate    |

***

## Where to go next

* [Annotation and Curation](/solutions-documentation/applied-ai/annotation-and-curation) — detailed guide to labeling and dataset curation
* [End-to-End Walkthrough](/solutions-documentation/applied-ai/end-to-end-walkthrough) — a complete worked example
* [Work with Data](/platform-documentation/Curate/add-files/index-register-cloud-data-cloud-sync) — data ingestion and registration
* [Active Overview](/platform-documentation/Validation/active-how-to/active-model-predictions-eval) — model evaluation and active learning