AWS S3
Watch our video tutorial on creating AWS S3 integrations, or follow the step-by-step guide below for even more details.
In order to integrate with AWS S3, you must:
- Create a new AWS integration in Encord.
- Create a permission policy for your resources to allow Encord the necessary access.
- Create a role for Encord and attach the policy so that Encord can access those resources.
- Activate Cross-origin resource sharing to allow Encord to access those resources from a web browser.
- Test the integration to ensure it works.
Create an S3 bucket to store your files if you have not done so already. The S3 bucket must have STS available and enabled.
- Set your bucket permissions to block all public access.
- Ensure that the Storage Class of all files is set to ‘S3 Standard’.
1. Start setting up the AWS integration
- In the Integrations section of the Encord platform, click +New integration to create a new integration and select AWS.
Do not close this tab or window until you have finished the integration process. We advise opening AWS in a separate tab.
- Give your integration a meaningful title.
2. Create a permission policy
- In Encord, copy the JSON from Step 2 of the integration.
-
In AWS, navigate to Identity and Access Management (IAM) and select Policies.
-
Click Create policy to create a new policy.
-
Select JSON as the Policy editor
- Paste the JSON you copied from Encord into the Policy editor, replacing the
arn:aws:s3YourBucket
value for Resource with your bucket’s Amazon Resource Name (ARN). The ARN can be found in the Properties tab of your S3 bucket. When pasting your bucket ARN into the JSON policy editor, ensure that the Resource value ends in/*
. Click the Next button to continue.
s3:PutObject
is needed for features that require write permissions, including re-encoding data and creating image sequences.
- Add any tags according to your Organization’s resource tagging policy, and give your policy a descriptive name (used when creating a role for Encord). Click Create policy to finish creating your policy.
- Click Create policy to finish creating your policy.
3. Create a role for Encord
- In AWS, navigate to Roles and click the Create role button.
- For Trusted entity type select AWS Account and in the An AWS Account section select Another AWS account.
-
In Encord copy the Encord AWS account ID from Step 3 of the integration (shown below), and paste it into the Account ID field in AWS (shown above). In AWS, check Require external ID under Options to reveal the External ID field.
-
Navigate back to Encord and click Generate and copy to copy an External ID.
- In AWS, paste the External ID you generated into the External ID field and click Next.
- Select the AIM policy you created in Step 2 and click Next to attach it to the role.
-
Give your role a descriptive name and click the Create role button.
-
Copy the Role ARN and the name of the role you just created.
- In Encord, paste the name of the role and the Role ARN into Step 3 of the integration.
4. Allow Cross-origin resource sharing (CORS)
- In Encord, expand Step 4 of the integration. Copy the CORS JSON policy.
- Navigate to the Permissions tab of your S3 bucket. Scroll to the bottom of the page and click Edit in the Cross-origin resource sharing (CORS) heading.
- Paste the JSON into the editor that pops up. Click Save changes to finish settings up CORS.
- Navigate back to Encord and click Create to finish the integration set up.
5. Test the integration
- Click the Run a test button on the integration, to test the integration.
- Paste the URL of any object in the bucket and click Check Encord can access this URL. If the test is successful a green tick appears next to Encord infrastructure and This machine.
Failing to set a cache-control header can result in the Cache policy not set
error when testing the integration.
Uploading AWS data
Navigate to the Upload cloud data page for guidance on how to upload files stored in AWS.
Create a Multi-Region Access Point integration
Using Multi-Region Access Points requires you to do a few things differently when setting up an AWS integration.
- When creating a permission policy for your multi-region access point in AWS, make sure to list the ARN of the Multi-Region Access Point, as well as the ARNs of all constituent buckets in the JSON.
-
Make sure you create a CORS policy for every bucket that is included in your Multi-Region Access Point.
-
When uploading data to a dataset using the Multi-Region Access Point integration, make sure your JSON file is formatted correctly for use with a Multi-Region Access point - as documented here.
Performance enhancements
Cache policy not set
error when testing the integration.Caching can be enabled on the parent folder of the bucket containing the objects you want to label to improve the speed at which each video frame is displayed in the Label Editor. This is done by setting a cache-control header. Failing to set a cache-control header on all objects can lead to data loading slowly on our platform.
Terraforming your AWS S3 Integration
This guide is intended only as a supplement to the excellent Terraform documentation provided by Hashicorp here.
Please note, it needs to be performed in conjunction with the Encord App Integration setup steps described here
Please do not just copy and paste the below, instead, use it as a template for Terraforming your Private Cloud Integration.
To integrate with Encord, you will need to create:
- An S3 Bucket
- An IAM Policy
- An IAM Role
- A CORS Policy
Below are some examples of how this might look:
Declaring your Terraform providers
In the below example, we’re using Hashicorp’s AWS provider aws
and the tfvars
utility that allows us to neatly define values to pass into variables.
We also define an alias as well as a Region for the AWS provider, which needs to match the location in which you want your bucket to be provisioned.
Declaring your variables
In your variables.tf
file, you will need to define the variables into which you want to pass values. An example is below:
Defining your variables in a .tfvars
file to avoid having to manually edit the Terraform files
Since we’ve opted to use tfvars
we need to create a corresponding .tfvars
file and specify some values we wish to pass into the variables you just defined:
Creating the required resources
The resources you need to create include:
- The AWS S3 bucket itself
- The AWS Bucket CORS Policy to allow Cross Origin Resource Sharing with the Encord domains
- The IAM Role
- The IAM Policy
- The IAM Policy attachment that binds the Role to the Policy
Defining the Bucket, Bucket ACL, and Bucket CORS Policy:
Defining the IAM Policy:
Attaching the Role to the Policy:
Before applying any changes, run terraform plan
to preview the changes and check you are happy with them.
Once your Terraform has been applied, return to the Encord application, and test your integration.
The entire resources file s3-resources.tf
now looks like this:
Was this page helpful?