Skip to main content
Encord is a cloud-native platform built to handle petabytes of multimodal data and large annotation workforces. This page explains the core architectural components, how your data moves through the system, and which deployment model is right for your organization.

Core platform components

Encord consists of four tightly integrated product areas:
ComponentPurpose
IndexData ingestion, organization, search, and curation
AnnotateAnnotation, review, QA workflows, and human feedback
ActiveModel evaluation, analytics, and active learning
AgentsAutomation, AI-assisted labeling, and model integrations
All four components share a common data layer — data registered once in Index is immediately accessible in Annotate and Active, without duplication or re-ingestion.

Data architecture

How data is stored

Encord operates on a bring-your-own-storage model. Your files remain in your cloud storage (AWS S3, GCP Cloud Storage, or Azure Blob Storage) at all times. Encord registers references to those files, not copies of them. This means:
  • Raw data never leaves your storage environment
  • You retain full ownership and control of your files
  • Encord only stores metadata, labels, annotations, and embeddings
The exception is Encord-managed storage, where files are uploaded directly to Encord’s GCP-hosted storage. This is suitable for teams without existing cloud infrastructure or for low-sensitivity data.

Data registration models

Standard integration (client cloud storage) Your data stays in your private cloud bucket. Encord is granted read access via a service account or IAM role. Files are streamed to the platform for display and annotation but never persisted outside your environment. Strict client external integration For highly regulated environments, Encord can operate in a mode where all data access is mediated through your own infrastructure. Data is never cached or proxied through Encord servers. Encord-managed storage Files are uploaded to and stored within Encord’s GCP environment. Suitable for teams that do not have existing cloud storage or who need the simplest possible setup. See Data Integrations for setup instructions.

Deployment models

Encord offers three deployment options to match your security and infrastructure requirements.

Cloud (Encord-managed)

The default deployment. Encord manages all infrastructure on Google Cloud Platform. Your data stays in your own storage bucket; Encord only accesses it to serve annotations.
  • Fastest to deploy
  • No infrastructure management required
  • SOC 2 Type II, HIPAA, GDPR compliant
  • Regular updates and patches managed by Encord

VPC (Virtual Private Cloud)

Encord is deployed within your own cloud environment (AWS, GCP, or Azure). All compute and application servers run inside your network boundary.
  • All traffic stays within your network
  • You control firewall rules, logging, and auditing
  • Requires coordination with Encord for initial setup
Contact support to discuss VPC deployment options.

On-premise / air-gapped

For environments with strict data residency or security requirements — including classified, regulated, or offline environments — Encord can be deployed with no outbound internet access.
  • No data leaves your internal network
  • Suitable for defense, healthcare, and regulated industries
  • Requires dedicated support engagement
Contact support to discuss air-gapped deployment.

Supported data types

Encord supports the following data types across Index, Annotate, and Active:
ModalityFormats
Imagesjpg, png, tiff
Videomp4, mov, webm, matroska, 3gp, m4a, mj2
Audiomp3, wav, m4a, flac, mp4, eac3
DICOMStandard DICOM series and studies
DocumentsPDF, HTML, plain text
Point cloudsLAS, PCD, and related 3D formats
Index supports up to 10,000,000 images per Folder (with unlimited Folders) and videos up to 2 hours at 30fps. Contact Encord for guidance on larger volumes.

Scalability

Encord’s architecture is designed to scale horizontally. As data volumes and team sizes grow:
  • Compute resources scale automatically with workload
  • Annotation queues support millions of tasks across distributed teams
  • Embedding and analytics pipelines process large datasets asynchronously
  • API rate limits and throughput can be raised for high-volume enterprise customers

Where to go next