Watch our video tutorial on creating AWS S3 integrations, or follow the step-by-step guide below for even more details.
Video Tutorial - Creating an AWS integration
In order to integrate with AWS S3, you must:
Create an S3 bucket to store your files if you have not done so already. The S3 bucket must have STS available and enabled.
Do not close this tab or window until you have finished the integration process. We advise opening AWS in a separate tab.
In Encord, copy the JSON from Step 2 of the integration.
For example:
In AWS, navigate to Identity and Access Management (IAM) and select Policies.
Click Create policy to create a new policy.
Select JSON as the Policy editor
arn:aws:s3YourBucket
value for Resource with your bucket’s Amazon Resource Name (ARN). The ARN can be found in the Properties tab of your S3 bucket. When pasting your bucket ARN into the JSON policy editor, ensure that the Resource value ends in /*
. Click the Next button to continue.s3:PutObject
is needed for features that require write permissions, including re-encoding data and creating image sequences.
s3:ListBucket
is OPTIONAL. Cloud Synced Folders requires s3:ListBucket
read permissions to sync data stored in your buckets to Encord Cloud Synced Folders .
In Encord copy the Encord AWS account ID from Step 3 of the integration (shown below), and paste it into the Account ID field in AWS (shown above). In AWS, check Require external ID under Options to reveal the External ID field.
Navigate back to Encord and click Generate and copy to copy an External ID.
Give your role a descriptive name and click the Create role button.
Copy the Role ARN and the name of the role you just created.
For customers working with teams in India
Some users find it more reliable to access our Encord App deployment focused for Indian users. Instruct your team based in India to use https://app.in.encord.com and add that domain to your permitted CORS settings.
Integration tests might temporarily be unsuccessful due to AWS data processing delays after setup. These delays can take up to 24 hours to resolve, after which labeling can begin.
Add Cache-Control headers to your AWS folders or objects, following the AWS instructions here. Adding Cache-Control headers significantly increases the speed at which your files load in the Label Editor.
This test confirms Encord can assume the specified role, but does not guarantee bucket access. If data onboarding is unsuccessful despite passing the test, verify Encord’s bucket permissions and the accuracy of your object URLs.
Not setting up the cache-control header in Step 5 can result in the Cache policy not set
error when testing the integration.
Integration tests might temporarily be unsuccessful due to AWS data processing delays after setup. These delays can take up to 24 hours to resolve, after which labeling can begin.
Navigate to the Register cloud data page for guidance on how to register files stored in AWS.
Using Multi-Region Access Points requires you to do a few things differently when setting up an AWS integration.
Example JSON
Make sure you create a CORS policy for every bucket that is included in your Multi-Region Access Point.
When uploading data to a dataset using the Multi-Region Access Point integration, make sure your JSON file is formatted correctly for use with a Multi-Region Access point - as documented here.
This guide is intended only as a supplement to the excellent Terraform documentation provided by Hashicorp here.
Please note, it needs to be performed in conjunction with the Encord App Integration setup steps described here
Please do not just copy and paste the below, instead, use it as a template for Terraforming your Private Cloud Integration.
To integrate with Encord, you will need to create:
Below are some examples of how this might look:
In the below example, we’re using Hashicorp’s AWS provider aws
and the tfvars
utility that allows us to neatly define values to pass into variables.
We also define an alias as well as a Region for the AWS provider, which needs to match the location in which you want your bucket to be provisioned.
In your variables.tf
file, you will need to define the variables into which you want to pass values. An example is below:
.tfvars
file to avoid having to manually edit the Terraform filesSince we’ve opted to use tfvars
we need to create a corresponding .tfvars
file and specify some values we wish to pass into the variables you just defined:
The resources you need to create include:
Before applying any changes, run terraform plan
to preview the changes and check you are happy with them.
Once your Terraform has been applied, return to the Encord application, and test your integration.
The entire resources file s3-resources.tf
now looks like this:
Watch our video tutorial on creating AWS S3 integrations, or follow the step-by-step guide below for even more details.
Video Tutorial - Creating an AWS integration
In order to integrate with AWS S3, you must:
Create an S3 bucket to store your files if you have not done so already. The S3 bucket must have STS available and enabled.
Do not close this tab or window until you have finished the integration process. We advise opening AWS in a separate tab.
In Encord, copy the JSON from Step 2 of the integration.
For example:
In AWS, navigate to Identity and Access Management (IAM) and select Policies.
Click Create policy to create a new policy.
Select JSON as the Policy editor
arn:aws:s3YourBucket
value for Resource with your bucket’s Amazon Resource Name (ARN). The ARN can be found in the Properties tab of your S3 bucket. When pasting your bucket ARN into the JSON policy editor, ensure that the Resource value ends in /*
. Click the Next button to continue.s3:PutObject
is needed for features that require write permissions, including re-encoding data and creating image sequences.
s3:ListBucket
is OPTIONAL. Cloud Synced Folders requires s3:ListBucket
read permissions to sync data stored in your buckets to Encord Cloud Synced Folders .
In Encord copy the Encord AWS account ID from Step 3 of the integration (shown below), and paste it into the Account ID field in AWS (shown above). In AWS, check Require external ID under Options to reveal the External ID field.
Navigate back to Encord and click Generate and copy to copy an External ID.
Give your role a descriptive name and click the Create role button.
Copy the Role ARN and the name of the role you just created.
For customers working with teams in India
Some users find it more reliable to access our Encord App deployment focused for Indian users. Instruct your team based in India to use https://app.in.encord.com and add that domain to your permitted CORS settings.
Integration tests might temporarily be unsuccessful due to AWS data processing delays after setup. These delays can take up to 24 hours to resolve, after which labeling can begin.
Add Cache-Control headers to your AWS folders or objects, following the AWS instructions here. Adding Cache-Control headers significantly increases the speed at which your files load in the Label Editor.
This test confirms Encord can assume the specified role, but does not guarantee bucket access. If data onboarding is unsuccessful despite passing the test, verify Encord’s bucket permissions and the accuracy of your object URLs.
Not setting up the cache-control header in Step 5 can result in the Cache policy not set
error when testing the integration.
Integration tests might temporarily be unsuccessful due to AWS data processing delays after setup. These delays can take up to 24 hours to resolve, after which labeling can begin.
Navigate to the Register cloud data page for guidance on how to register files stored in AWS.
Using Multi-Region Access Points requires you to do a few things differently when setting up an AWS integration.
Example JSON
Make sure you create a CORS policy for every bucket that is included in your Multi-Region Access Point.
When uploading data to a dataset using the Multi-Region Access Point integration, make sure your JSON file is formatted correctly for use with a Multi-Region Access point - as documented here.
This guide is intended only as a supplement to the excellent Terraform documentation provided by Hashicorp here.
Please note, it needs to be performed in conjunction with the Encord App Integration setup steps described here
Please do not just copy and paste the below, instead, use it as a template for Terraforming your Private Cloud Integration.
To integrate with Encord, you will need to create:
Below are some examples of how this might look:
In the below example, we’re using Hashicorp’s AWS provider aws
and the tfvars
utility that allows us to neatly define values to pass into variables.
We also define an alias as well as a Region for the AWS provider, which needs to match the location in which you want your bucket to be provisioned.
In your variables.tf
file, you will need to define the variables into which you want to pass values. An example is below:
.tfvars
file to avoid having to manually edit the Terraform filesSince we’ve opted to use tfvars
we need to create a corresponding .tfvars
file and specify some values we wish to pass into the variables you just defined:
The resources you need to create include:
Before applying any changes, run terraform plan
to preview the changes and check you are happy with them.
Once your Terraform has been applied, return to the Encord application, and test your integration.
The entire resources file s3-resources.tf
now looks like this: