Watch our video tutorial on creating AWS S3 integrations, or follow the step-by-step guide below for even more details!

Video Tutorial - Creating an AWS integration

Navigate to the Encord app and select Settings from the drop-down on the top right. Open the Integrations tab.

To create an AWS S3 integration, click the + Add integration button in the Integrations tab of the Encord web-app.

ℹ️

Note

Strict client-only access can be added in step 4 below.

Select AWS S3 at the top of the chooser.

ℹ️

Note

It's essential you don't close this window until you have finished the whole integration process.

In order to integrate with AWS S3, you will need to:

  1. Create a permission policy for your resources that will allow appropriate access to Encord.
  2. Create a role for Encord and attach the policy so that Encord can access those resources.
  3. Activate Cross-origin resource sharing which allows Encord to access those resources from a web browser.
  4. Create and test the integration.

ℹ️

Note

Your S3 bucket permissions should be set to be blocking all public access.


1. Create a permission policy

Log in to your AWS account. Navigate to your Identity and Access Management (IAM) dashboard and go to 'Policies' on the left-hand side.

ℹ️

Note

Policies can also be created using this AWS tool.

  • Click on 'Create policy' and then click on 'JSON'.

In Encord, copy the JSON from step 1 of the integration window shown below, and paste into the AWS policy JSON editor opened in the previous step. Replace the arn:aws:s3YourBucket value for "Resource" with your bucket's Amazon Resource Name (ARN). The ARN can be found in the 'Properties' tab of your S3 bucket.

👍

Tip

If you don't expect to be creating image groups, the s3:PutObject action can be removed. However, this will prevent the re-encoding of videos hence we advise against removing this action.

Click the Next:tags button to add any tags according to your organization's resource tagging policy. Encord does not require any tags to function. Click the Next:Review button to proceed to the final step.

Give your policy a descriptive name (we will use it in the next step) and click the Create policy button. You now have a policy to apply to Encord once it has a defined role.


2. Create a role for Encord

  • Go to 'Roles' on the left-hand side and click the Create role button.

  • Select AWS Account as the 'Trusted entity type' and under the 'An AWS Account section', select Another AWS account.

From the 'Integrations' window in the Encord app, copy the Encord AWS account ID as well as the External ID shown below, and paste them into the relevant areas of the AWS trusted entity creation form. You have to check Require external ID under 'Options' in the form to reveal the External ID entry form. Click Next.

Attach the policy we created in step 1 and click the Next button. Give your role a descriptive name and click the Create role button. This is the role Encord will use to access this S3 bucket.

Now we need to let the Encord platform know the details of this role. In the AWS Console, click on the role you just created and copy the Role ARN as shown below.

Paste the ARN into the second entry area of step 2 in the Encord integration window, as shown below. The text after the final / is your role name - paste it into the first entry area above the ARN.

Now that the role is set up, the next step is to enable Cross-origin resource sharing (CORS) on your S3 bucket to ensure that data can successfully be loaded in your browser while using the Encord app.

ℹ️

Note

Correctly setting up the CORS permission is a critical step in completing your S3 integration, read below for detailed instruction.


3. Allow Cross-origin resource sharing (CORS)

Expand the thrid section in the integrations window. It will look something like this:

Copy the CORS JSON policy. Navigate to your S3 bucket and go to the 'Permissions' tab. Click Edit under the 'CORS Policy' heading and paste the JSON into the CORS editor. Click Save when you're done.

🚧

Caution

Due to the way AWS handles data, both tests may fail for newly created AWS S3 buckets causing you to see the 'Something went wrong' message in the label editor when trying load data from that bucket. It can take up to 24 hours for the issue to resolve itself, after which you can start labeling.


4. Create the integration

Optionally check the box to enable Strict client-only access, server-side media features will not be available if you would like Encord to sign URLs, but refrain from downloading any media files onto Encord servers. Read more about this feature here.

Give your integration a name (if you haven't already) and click the Create button at the bottom of the pop-up. The integration will now appear in the list of integrations in the 'Integrations' tab.

👍

Tip

We have a few helpful scripts and examples to get you started creating datasets from your Amazon S3 bucket.

5. Test the integration

To test that Encord can sync with your S3 bucket, click on the icon shown below.

Enter the URL of an object you would like to test the connection with and click Check Encord can access this URI. If the test is successful a green tick will appear next to Encord infrastructure as well as This machine, as seen in the screenshot for a GCP test below (the process for AWS is identical).

🚧

Caution

Due to the way AWS handles data, both tests may fail for newly created AWS S3 buckets causing you to see the 'Something went wrong' message in the label editor when trying load data from that bucket. It can take up to 24 hours for the issue to resolve itself, after which you can start labeling.

ℹ️

Note

This test checks whether Encord is able to assume the role defined for it. It doesn't check that we can necessarily access your buckets. If the test passes but data on-boarding still fails, please check Encord has bucket permissions and that the object URLs are correct.

Create a Multi-Region Access Point integration

Using Multi-Region Access Points requires you to do a few things differently when setting up an AWS integration.

  1. When creating a permission policy for your multi-region access point in AWS, make sure to list the ARN of the Multi-Region Access Point, as well as the ARNs of all constituent buckets in the JSON.
Example JSON
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": [
                "Your-Multi-Access-Point-ARN/*",
                "Bucket-1-ARN/*",
                "Bucket-2-ARN/*",
                "Bucket-3-ARN/*"
            ]
        }
    ]
}

  1. Make sure you create a CORS policy for every bucket that is included in your Multi-Region Access Point.

  2. When uploading data to a dataset using the Multi-Region Access Point integration, make sure your JSON file is formatted correctly for use with a Multi-Region Access point - as documented here.


Terraforming your AWS S3 Integration

This guide is intended only as a supplement to the excellent Terraform documentation provided by Hashicorp here.

Please note, it needs to be performed in conjunction with the Encord App Integration setup steps described here

Please do not just copy and paste the below, instead, use it as a template for Terraforming your Private Cloud Integration.

To integrate with Encord, you will need to create:

  1. An S3 Bucket
  2. An IAM Policy
  3. An IAM Role
  4. A CORS Policy

Below are some examples of how this might look:

Declaring your Terraform providers

In the below example, we're using Hashicorp's AWS provider aws and the tfvars utility that allows us to neatly define values to pass into variables.

We also define an alias as well as a Region for the AWS provider, which needs to match the location in which you want your bucket to be provisioned.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.1.0"
    }
    tfvars = {
      source  = "innovationnorway/tfvars"
      version = "0.0.1"
    }
  }
}

provider "aws" {
  alias  = "default"
  region = var.aws_region
}

Declaring your variables

In your variables.tf file, you will need to define the variables into which you want to pass values. An example is below:

variable "bucket_name" {
  description = "Name of the AWS S3 Bucket"
  type        = string
}

variable "policy_name" {
  description = "Name of the IAM Policy"
  type        = string
}

variable "role_name" {
  description = "Name of the IAM Role"
  type        = string
}

variable "external_aws_account_id" {
  description = "Account ID of the external AWS account you're connecting to - default value 312435012576 for Encord"
  type        = string
  default     = "312435012576"
}

variable "external_id" {
  description = "External account id - this is unique to your integration and can be found in the integration setup modal"
  type        = string
}

variable "aws_region" {
  description = "AWS Region in which bucket should be provisioned"
  type        = string
  default     = "eu-west-2"
}

Defining your variables in a .tfvars file to avoid having to manually edit the Terraform files

Since we've opted to use tfvars we need to create a corresponding .tfvars file and specify some values we wish to pass into the variables you just defined:

bucket_name             = "encord-test-bucket"
policy_name             = "encord-test-policy"
role_name               = "encord-test-role"
external_aws_account_id = "312435012576"      # This is the same for every integration since it is Encord's AWS account ID
external_id             = "external-id" # This comes from the integration setup modal within the Encord application and is unique for each integration you set up
aws_region              = "eu-west-2"          # Change this to the appropriate region in which your bucket is to be created

Creating the required resources

The resources you need to create include:

  1. The AWS S3 bucket itself
  2. The AWS Bucket CORS Policy to allow Cross Origin Resource Sharing with the Encord domains
  3. The IAM Role
  4. The IAM Policy
  5. The IAM Policy attachment that binds the Role to the Policy

Defining the Bucket, Bucket ACL, and Bucket CORS Policy:

resource "aws_s3_bucket" "bucket_name" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_cors_configuration" "bucket_cors_policy" {
  bucket = var.bucket_name
  cors_rule {
    allowed_headers = [
      "*"
    ]

    allowed_methods = ["GET",
    "PUT"] # The PUT method here is not necessary unless you intend to re-encode videos or work with image groups

    allowed_origins = [
      "https://app.encord.com",
      "https://api.encord.com",
      "https://dicom.encord.com"
    ]
    max_age_seconds = 3600
  }
}

Defining the IAM Policy:

resource "aws_iam_policy" "encord-test-policy" {
  name        = var.policy_name
  path        = "/"
  description = "video testing S3 policy"
  policy      = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "${aws_s3_bucket.bucket_name.arn}/*"
    }
  ]
}
POLICY
}
resource "aws_iam_role" "encord-test-role" {
  name               = var.role_name
  path               = "/"
  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${var.external_aws_account_id}:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "${var.external_id}"
        }
      }
    }
  ]
}
POLICY
}

Attaching the Role to the Policy:

resource "aws_iam_policy_attachment" "encord-test-policy-policy-attachment" {
  policy_arn = aws_iam_policy.encord-test-policy.arn
  roles      = [var.role_name]
  name       = "${var.policy_name}-policy-attachment"
}

Before applying any changes, run terraform plan to preview the changes and check you are happy with them.

Once your Terraform has been applied, return to the Encord application, and test your integration.


The entire resources file s3-resources.tf now looks like this:

resource "aws_s3_bucket" "bucket_name" {
  bucket = var.bucket_name
}

resource "aws_s3_bucket_cors_configuration" "bucket_cors_policy" {
  bucket = var.bucket_name
  cors_rule {
    allowed_headers = [
      "*"
    ]

    allowed_methods = ["GET",
    "PUT"] # The PUT method here is not necessary unless you intend to re-encode videos or work with image groups

    allowed_origins = [
      "https://app.encord.com",
      "https://api.encord.com",
      "https://dicom.encord.com"
    ]
    max_age_seconds = 3600
  }
}


resource "aws_iam_policy" "encord-test-policy" {
  name        = var.policy_name
  path        = "/"
  description = "video testing S3 policy"
  policy      = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject"
      ],
      "Resource": "${aws_s3_bucket.bucket_name.arn}/*"
    }
  ]
}
POLICY
}


resource "aws_iam_role" "encord-test-role" {
  name               = var.role_name
  path               = "/"
  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::${var.external_aws_account_id}:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "${var.external_id}"
        }
      }
    }
  ]
}
POLICY
}


resource "aws_iam_policy_attachment" "encord-test-policy-policy-attachment" {
  policy_arn = aws_iam_policy.encord-test-policy.arn
  roles      = [var.role_name]
  name       = "${var.policy_name}-policy-attachment"
}