> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encord.com/llms.txt
> Use this file to discover all available pages before exploring further.

# End-to-End Walkthrough

> A complete walkthrough of building an applied AI data pipeline in Encord — from raw data to exported labels ready for training.

This walkthrough takes you through the complete applied AI workflow in Encord: ingesting data, curating a training set, setting up an annotation Project, labeling with quality control, and exporting labels for training.

By the end, you'll have a working pattern you can adapt for your own use case.

***

## What you'll need

* An Encord account with Admin or Member access
* Data files in cloud storage (AWS S3, GCP, or Azure) or available for local upload
* A clear labeling schema (the classes and attributes you want to annotate)

If you don't have data ready, you can use local uploads to get started quickly.

***

## Step 1: Register your data

**In: Index**

1. Navigate to **Index** in the Encord platform.
2. Create a **Folder** to organize your files (e.g. `training-data/v1`).
3. Register your data using one of the following methods:
   * **Cloud storage** — provide a JSON file of cloud URIs pointing to your files in AWS, GCP, or Azure
   * **Cloud sync** — connect a cloud folder and sync automatically as new files are added
   * **Local upload** — drag and drop files directly into the platform

Once registered, your files are indexed and available for curation. Encord does not copy your data — files stay in your own storage.

See [Work with Data](/platform-documentation/Curate/add-files/index-register-cloud-data-cloud-sync) for detailed instructions.

***

## Step 2: Curate your dataset

**In Index**

Before annotating, invest time in curation. This ensures you annotate the right data, not just all data.

1. Open your Folder in Index and explore your data visually.
2. Use **Embedding plots** to visualize the distribution of your data. Look for:
   * Dense clusters — likely duplicates or over-represented conditions
   * Sparse regions — edge cases or rare conditions worth prioritizing
3. Run **duplicate detection** to identify and remove near-identical samples.
4. Use **quality metric filters** to remove corrupt, blurry, or low-quality files.
5. Use **natural language search** or **metadata filters** to find specific conditions (e.g. "night driving", "small objects", or specific sensor IDs).
6. Select the samples you want to annotate and save them as a **Collection**.

<Tip>
  Aim for a collection that is diverse and representative of the conditions your model will encounter in production — not just whatever data was easiest to collect.
</Tip>

See [Collections](/platform-documentation/Curate/curation-basics#collections) for instructions.

***

## Step 3: Create a Dataset

**In Annotate**

1. Navigate to **Data** > **Datasets**.
2. Click **+ New dataset**.
3. Give the Dataset a meaningful name and description.
4. Add your curated Collection from Index to the Dataset.

The Dataset is now available to attach to annotation Projects.

***

## Step 4: Create an Ontology

**In Annotate**

An Ontology defines your labeling schema — the classes, attributes, and relationships that annotators will use.

1. Navigate to **Ontologies**.
2. Click **+ New ontology**.
3. Add the object classes and classification types your model needs.
4. For each class, add any nested attributes (e.g. for a `vehicle` class: `type`, `color`, `occlusion level`).

<Tip>
  Keep your ontology focused. Every attribute you add increases annotation time and complexity. Only include what your model actually needs.
</Tip>

See [Create Ontologies](/platform-documentation/Annotate/annotate-ontologies/annotate-create-ontologies) for guidance.

***

## Step 5: Create an annotation Project

**In Annotate:**

1. Navigate to **Projects**.
2. Click **+ New annotation project**.
3. Configure the Project:
   * **Title and description** — give the Project a clear name
   * **Ontology** — attach the Ontology you created in Step 4
   * **Dataset** — attach the Dataset from Step 3
   * **Workflow** — select or build a Workflow (e.g. Annotate → Review → Complete)
   * **Collaborators** — invite annotators, reviewers, and team managers
4. Click **Create project**.

See [Create a Project](/platform-documentation/GettingStarted/gettingstarted-create-project) for the full setup guide.

***

## Step 6: Label your data

**In: Annotate — Label Editor**

With the Project created, annotators can start labeling:

1. Open the Project and navigate to the **Queue** tab.
2. Click **Start task** to open the Label Editor on the highest-priority task, or **Initiate** to open a specific task.
3. Use the Label Editor tools to annotate each asset:
   * Use **SAM 3** for fast, accurate segmentation with a single click
   * Use **Interpolation** to propagate bounding boxes or polygons across video frames
   * Use classification tools to assign frame-level or object-level attributes
4. Submit the task when complete.

If Agents are configured, pre-labels from AI models will already be present when the annotator opens the task — reducing labeling time significantly.

***

## Step 7: Review and QA

**In: Annotate — Review stage**

Tasks submitted by annotators move to the Review stage in your Workflow.

1. Reviewers open tasks from the Queue and inspect the labels.
2. For each task, the reviewer either:
   * **Approves** — the task moves to the next stage or completion
   * **Rejects** — the task is returned to the annotator with issue notes
3. Reviewers can raise **Issues** on specific labels or frames to flag problems for discussion.

Monitor overall quality using the **Analytics** tab: track approval rates, rejection rates, time per task, and open issues by annotator or reviewer.

***

## Step 8: Export labels

**In: Annotate**

Once tasks are complete and approved:

1. Click **Export** in the Project.
2. Choose your export format:
   * **JSON** — Encord's native format, full fidelity
   * **COCO** — standard format for object detection and segmentation
3. Specify which workflow stage to export from (typically *Complete*).
4. Optionally save a **label version** for reproducibility.
5. Click **Export**.

Your labels are now ready for your training pipeline.

See [Export Labels](/platform-documentation/Annotate/annotate-projects/annotate-manage-annotation-projects#export-labels) for the full export workflow.

***

## Step 9: Train and evaluate

Take your exported labels into your training pipeline. After training and deploying your model, import its predictions back into **Active** to evaluate performance:

1. Import model predictions into Active.
2. Review automatic metrics (mAP, mAR, F1 Score).
3. Use embedding plots and filters to find where the model underperforms.
4. Create a new Collection of high-value samples — data where the model fails or is uncertain.
5. Send the Collection back to Annotate (Step 3) and repeat.

This loop — curate, annotate, train, evaluate, repeat — is the core of applied AI development.

***

## What's next

* [Data Lifecycle](/solutions-documentation/applied-ai/data-lifecycle) — a deeper explanation of each stage
* [Annotation and Curation](/solutions-documentation/applied-ai/annotation-and-curation) — annotation tools, AI assistance, and curation workflows
* [Active Overview](/platform-documentation/Validation/active-how-to/active-model-predictions-eval) — model evaluation and active learning
* [SDK documentation](/sdk-documentation) — automate any step in this workflow programmatically