Annotation and Curation

Annotation and curation are the core activities in any applied AI data pipeline. This page explains how Encord supports both — from labeling individual assets to curating representative datasets at scale.

Annotation

Supported data types

Encord Annotate supports labeling across all major data modalities:

Modality	Annotation types
Images	Bounding boxes, polygons, polylines, keypoints, bitmasks, object primitives, classifications
Video	All image types, plus object tracking across frames
Audio	Transcription, sequence labeling, classification
Text / Documents	Entity recognition, classification, structured extraction
DICOM	Bounding boxes, polygons, and classifications on medical imaging series
Point clouds	3D bounding boxes, cuboids, and segmentation

Annotation tools

The Encord Label Editor provides purpose-built tools for each data type:

SAM 2 (Segment Anything Model 2) — segment and classify objects in images and video with a single click; up to 10x faster than manual polygon drawing
Interpolation — automatically propagate annotations across video frames using object tracking; up to 6x faster video labeling
Bitmask brushes — freehand pixel-level segmentation for irregular shapes
Polyline and polygon tools — precise boundary annotation for objects with defined edges
Keypoint tools — skeletal and pose annotation for human and object tracking
Classification tools — frame-level and object-level attribute assignment

AI-assisted labeling

Encord integrates with leading AI models to automate and accelerate the labeling process:

SAM 3 — interactive segmentation, built natively into the Label Editor
GPT-4o and Gemini — multimodal pre-labeling, classification, and structured extraction
LLaMA 3.2 — text and document processing
YOLO and custom models — import your own model predictions as pre-labels for human review

Agents allow you to configure automated pre-labeling pipelines that run before tasks reach human annotators. See Agent Configuration for setup.

Review and quality control

Annotation quality is enforced through configurable review workflows:

Annotators complete a task and submit it
The task moves to a Review stage, where a reviewer approves or rejects it
Rejected tasks are returned to the annotator with issue notes
Once approved, tasks move to the next stage or completion

Issues can be raised on specific labels or frames to flag ambiguities, errors, or edge cases. Issues are tracked within the project and can be assigned for resolution. Consensus labeling assigns the same task to multiple annotators independently. Disagreements are surfaced automatically, enabling inter-annotator agreement measurement and conflict resolution. See Consensus Workflows.

Curation

What is data curation?

Curation is the process of selecting, organizing, and filtering your data to ensure your training set is representative, balanced, and high quality. Poor curation leads to biased models, missed edge cases, and wasted annotation effort. Encord Index provides the curation tools applied AI teams need to work with large datasets before and after annotation.

Exploring and filtering data

Index gives you multiple ways to explore and segment your data:

Embedding plots — visualize your dataset in 2D embedding space to identify clusters, outliers, and coverage gaps
Natural language search — find images or video frames matching a text description using semantic search
Metadata filtering — filter by custom metadata fields, quality metrics, file properties, or annotation status
Similarity search — find visually similar items to a selected sample

Detecting data quality issues

Encord computes off-the-shelf quality metrics automatically for your data:

Duplicate detection — identify near-identical images that inflate dataset size without adding diversity
Outlier detection — surface anomalous or low-quality samples (blur, overexposure, corrupt frames)
Class imbalance — visualize the distribution of annotated classes to identify underrepresented categories
Label error detection — compare annotations against model predictions to surface likely labeling mistakes

Collections

A Collection is a saved subset of your data — a named group of data units selected through filtering, search, or manual selection. Collections are the primary unit of action in Encord’s curation workflow:

Send a collection to Annotate as a new annotation batch
Export a collection as a training dataset
Re-index a collection to refresh embeddings after changes
Share a collection with teammates for review or discussion

See Collections for full documentation.

Closing the loop

Curation is not a one-time activity. As models improve and production data changes, you need to continuously:

Import model predictions back into Active
Identify failure modes — classes or conditions where the model underperforms
Surface the data associated with those failures
Create a collection of high-value samples for re-annotation
Send that collection to Annotate
Retrain and evaluate

This loop is the core of applied AI data operations. See Data Lifecycle for the full picture.

​Annotation

​Supported data types

​Annotation tools

​AI-assisted labeling

​Review and quality control

​Curation

​What is data curation?

​Exploring and filtering data

​Detecting data quality issues

​Collections

​Closing the loop