Skip to main content
Annotation and curation are the core activities in any applied AI data pipeline. This page explains how Encord supports both — from labeling individual assets to curating representative datasets at scale.

Annotation

Supported data types

Encord Annotate supports labeling across all major data modalities:
ModalityAnnotation types
ImagesBounding boxes, polygons, polylines, keypoints, bitmasks, object primitives, classifications
VideoAll image types, plus object tracking across frames
AudioTranscription, sequence labeling, classification
Text / DocumentsEntity recognition, classification, structured extraction
DICOMBounding boxes, polygons, and classifications on medical imaging series
Point clouds3D bounding boxes, cuboids, and segmentation

Annotation tools

The Encord Label Editor provides purpose-built tools for each data type:
  • SAM 2 (Segment Anything Model 2) — segment and classify objects in images and video with a single click; up to 10x faster than manual polygon drawing
  • Interpolation — automatically propagate annotations across video frames using object tracking; up to 6x faster video labeling
  • Bitmask brushes — freehand pixel-level segmentation for irregular shapes
  • Polyline and polygon tools — precise boundary annotation for objects with defined edges
  • Keypoint tools — skeletal and pose annotation for human and object tracking
  • Classification tools — frame-level and object-level attribute assignment

AI-assisted labeling

Encord integrates with leading AI models to automate and accelerate the labeling process:
  • SAM 2 — interactive segmentation, built natively into the Label Editor
  • GPT-4o and Gemini — multimodal pre-labeling, classification, and structured extraction
  • LLaMA 3.2 — text and document processing
  • YOLO and custom models — import your own model predictions as pre-labels for human review
Task Agents and Editor Agents allow you to configure automated pre-labeling pipelines that run before tasks reach human annotators. See Agent Configuration for setup.

Review and quality control

Annotation quality is enforced through configurable review workflows:
  1. Annotators complete a task and submit it
  2. The task moves to a Review stage, where a reviewer approves or rejects it
  3. Rejected tasks are returned to the annotator with issue notes
  4. Once approved, tasks move to the next stage or completion
Issues can be raised on specific labels or frames to flag ambiguities, errors, or edge cases. Issues are tracked within the project and can be assigned for resolution. Consensus labeling assigns the same task to multiple annotators independently. Disagreements are surfaced automatically, enabling inter-annotator agreement measurement and conflict resolution. See Consensus Workflows.

Curation

What is data curation?

Curation is the process of selecting, organizing, and filtering your data to ensure your training set is representative, balanced, and high quality. Poor curation leads to biased models, missed edge cases, and wasted annotation effort. Encord Index provides the curation tools applied AI teams need to work with large datasets before and after annotation.

Exploring and filtering data

Index gives you multiple ways to explore and segment your data:
  • Embedding plots — visualize your dataset in 2D embedding space to identify clusters, outliers, and coverage gaps
  • Natural language search — find images or video frames matching a text description using semantic search
  • Metadata filtering — filter by custom metadata fields, quality metrics, file properties, or annotation status
  • Similarity search — find visually similar items to a selected sample

Detecting data quality issues

Encord computes off-the-shelf quality metrics automatically for your data:
  • Duplicate detection — identify near-identical images that inflate dataset size without adding diversity
  • Outlier detection — surface anomalous or low-quality samples (blur, overexposure, corrupt frames)
  • Class imbalance — visualize the distribution of annotated classes to identify underrepresented categories
  • Label error detection — compare annotations against model predictions to surface likely labeling mistakes

Collections

A Collection is a saved subset of your data — a named group of data units selected through filtering, search, or manual selection. Collections are the primary unit of action in Encord’s curation workflow:
  • Send a collection to Annotate as a new annotation batch
  • Export a collection as a training dataset
  • Re-index a collection to refresh embeddings after changes
  • Share a collection with teammates for review or discussion
See Collections for full documentation.

Closing the loop

Curation is not a one-time activity. As models improve and production data changes, you need to continuously:
  1. Import model predictions back into Active
  2. Identify failure modes — classes or conditions where the model underperforms
  3. Surface the data associated with those failures
  4. Create a collection of high-value samples for re-annotation
  5. Send that collection to Annotate
  6. Retrain and evaluate
This loop is the core of applied AI data operations. See Data Lifecycle for the full picture.

Where to go next