Annotation
Supported data types
Encord Annotate supports labeling across all major data modalities:| Modality | Annotation types |
|---|---|
| Images | Bounding boxes, polygons, polylines, keypoints, bitmasks, object primitives, classifications |
| Video | All image types, plus object tracking across frames |
| Audio | Transcription, sequence labeling, classification |
| Text / Documents | Entity recognition, classification, structured extraction |
| DICOM | Bounding boxes, polygons, and classifications on medical imaging series |
| Point clouds | 3D bounding boxes, cuboids, and segmentation |
Annotation tools
The Encord Label Editor provides purpose-built tools for each data type:- SAM 2 (Segment Anything Model 2) — segment and classify objects in images and video with a single click; up to 10x faster than manual polygon drawing
- Interpolation — automatically propagate annotations across video frames using object tracking; up to 6x faster video labeling
- Bitmask brushes — freehand pixel-level segmentation for irregular shapes
- Polyline and polygon tools — precise boundary annotation for objects with defined edges
- Keypoint tools — skeletal and pose annotation for human and object tracking
- Classification tools — frame-level and object-level attribute assignment
AI-assisted labeling
Encord integrates with leading AI models to automate and accelerate the labeling process:- SAM 2 — interactive segmentation, built natively into the Label Editor
- GPT-4o and Gemini — multimodal pre-labeling, classification, and structured extraction
- LLaMA 3.2 — text and document processing
- YOLO and custom models — import your own model predictions as pre-labels for human review
Review and quality control
Annotation quality is enforced through configurable review workflows:- Annotators complete a task and submit it
- The task moves to a Review stage, where a reviewer approves or rejects it
- Rejected tasks are returned to the annotator with issue notes
- Once approved, tasks move to the next stage or completion
Curation
What is data curation?
Curation is the process of selecting, organizing, and filtering your data to ensure your training set is representative, balanced, and high quality. Poor curation leads to biased models, missed edge cases, and wasted annotation effort. Encord Index provides the curation tools applied AI teams need to work with large datasets before and after annotation.Exploring and filtering data
Index gives you multiple ways to explore and segment your data:- Embedding plots — visualize your dataset in 2D embedding space to identify clusters, outliers, and coverage gaps
- Natural language search — find images or video frames matching a text description using semantic search
- Metadata filtering — filter by custom metadata fields, quality metrics, file properties, or annotation status
- Similarity search — find visually similar items to a selected sample
Detecting data quality issues
Encord computes off-the-shelf quality metrics automatically for your data:- Duplicate detection — identify near-identical images that inflate dataset size without adding diversity
- Outlier detection — surface anomalous or low-quality samples (blur, overexposure, corrupt frames)
- Class imbalance — visualize the distribution of annotated classes to identify underrepresented categories
- Label error detection — compare annotations against model predictions to surface likely labeling mistakes
Collections
A Collection is a saved subset of your data — a named group of data units selected through filtering, search, or manual selection. Collections are the primary unit of action in Encord’s curation workflow:- Send a collection to Annotate as a new annotation batch
- Export a collection as a training dataset
- Re-index a collection to refresh embeddings after changes
- Share a collection with teammates for review or discussion
Closing the loop
Curation is not a one-time activity. As models improve and production data changes, you need to continuously:- Import model predictions back into Active
- Identify failure modes — classes or conditions where the model underperforms
- Surface the data associated with those failures
- Create a collection of high-value samples for re-annotation
- Send that collection to Annotate
- Retrain and evaluate
Where to go next
- Annotate Overview — full annotation platform documentation
- Index Overview — data management and curation documentation
- Active Overview — model evaluation and active learning
- Data Lifecycle — the full data pipeline from ingestion to export
- End-to-End Walkthrough — a complete example

