Platform Architecture

Encord is a cloud-native platform built to handle petabytes of multimodal data and large annotation workforces. This page explains the core architectural components, how your data moves through the system, and which deployment model is right for your organization.

Core platform components

Encord consists of four tightly integrated product areas:

Component	Purpose
Index	Data ingestion, organization, search, and curation
Annotate	Annotation, review, QA workflows, and human feedback
Active	Model evaluation, analytics, and active learning
Agents	Automation, AI-assisted labeling, and model integrations

All four components share a common data layer — data registered once in Index is immediately accessible in Annotate and Active, without duplication or re-ingestion.

Data architecture

How data is stored

Encord operates on a bring-your-own-storage model. Your files remain in your cloud storage (AWS S3, GCP Cloud Storage, or Azure Blob Storage) at all times. Encord registers references to those files, not copies of them. This means:

Raw data never leaves your storage environment
You retain full ownership and control of your files
Encord only stores metadata, labels, annotations, and embeddings

The exception is Encord-managed storage, where files are uploaded directly to Encord’s GCP-hosted storage. This is suitable for teams without existing cloud infrastructure or for low-sensitivity data.

Data registration models

Standard integration (client cloud storage) Your data stays in your private cloud bucket. Encord is granted read access via a service account or IAM role. Files are streamed to the platform for display and annotation but never persisted outside your environment. Strict client external integration For highly regulated environments, Encord can operate in a mode where all data access is mediated through your own infrastructure. Data is never cached or proxied through Encord servers. Encord-managed storage Files are uploaded to and stored within Encord’s GCP environment. Suitable for teams that do not have existing cloud storage or who need the simplest possible setup. See Data Integrations for setup instructions.

Deployment models

Encord offers three deployment options to match your security and infrastructure requirements.

Cloud (Encord-managed)

The default deployment. Encord manages all infrastructure on Google Cloud Platform. Your data stays in your own storage bucket; Encord only accesses it to serve annotations.

Fastest to deploy
No infrastructure management required
SOC 2 Type II, HIPAA, GDPR compliant
Regular updates and patches managed by Encord

VPC (Virtual Private Cloud)

Encord is deployed within your own cloud environment (AWS, GCP, or Azure). All compute and application servers run inside your network boundary.

All traffic stays within your network
You control firewall rules, logging, and auditing
Requires coordination with Encord for initial setup

Contact support to discuss VPC deployment options.

On-premise / air-gapped

For environments with strict data residency or security requirements — including classified, regulated, or offline environments — Encord can be deployed with no outbound internet access.

No data leaves your internal network
Suitable for defense, healthcare, and regulated industries
Requires dedicated support engagement

Contact support to discuss air-gapped deployment.

Supported data types

Encord supports the following data types across Index, Annotate, and Active:

Modality	Formats
Images	`jpg`, `png`, `tiff`
Video	`mp4`, `mov`, `webm`, `matroska`, `3gp`, `m4a`, `mj2`
Audio	`mp3`, `wav`, `m4a`, `flac`, `mp4`, `eac3`
DICOM	Standard DICOM series and studies
Documents	PDF, HTML, plain text
Point clouds	LAS, PCD, and related 3D formats

Index supports up to 10,000,000 images per Folder (with unlimited Folders) and videos up to 2 hours at 30fps. Contact Encord for guidance on larger volumes.

Scalability

Encord’s architecture is designed to scale horizontally. As data volumes and team sizes grow:

Compute resources scale automatically with workload
Annotation queues support millions of tasks across distributed teams
Embedding and analytics pipelines process large datasets asynchronously
API rate limits and throughput can be raised for high-volume enterprise customers

Where to go next

Security and Compliance — certifications, access controls, and data governance
Scaling and Operations — workforce structure, QA, and project management at scale
Data Integrations — connecting AWS, GCP, and Azure storage
Supported Data — full list of supported file formats

​Core platform components

​Data architecture

​How data is stored

​Data registration models

​Deployment models

​Cloud (Encord-managed)

​VPC (Virtual Private Cloud)

​On-premise / air-gapped

​Supported data types

​Scalability

​Where to go next