Direct access integrations enable you to register data to Encord without requiring Encord to sign the URLs. Consequently, customers are tasked with providing direct, pre-signed URLs when utilizing a direct access integration. Direct access integrations support all data modalities.
Direct Access integrations are often paired with Strict client-only access for enhanced security when supplying private cloud data. This combination ensures the highest level of data protection: Strict client-only access prevents Encord from accessing your data, while Direct Access keeps Encord from signing the URLs, maintaining your data’s confidentiality. However, when Strict client-only access in enabled, all features relying on data conversion cannot be used.
Direct access integrations cannot be used to access data with cloud provider user-based access controls.

Use cases

  1. Public Dataset Aggregation: Researchers or analysts may need to aggregate public datasets from various sources for analysis or modeling purposes. Direct access integrations allow them to seamlessly gather data from publicly accessible sources without the need for authentication, simplifying the data acquisition process.
  2. Third-Party Data Integration: Businesses often rely on third-party data providers for enriching their datasets or enhancing analytical insights. With direct access integrations, organizations can easily incorporate data from external sources by providing pre-signed URLs, streamlining the integration process and enabling timely access to valuable data.
  3. Private Data Access: Organizations may have proprietary datasets stored in private cloud environments with restricted access controls, such as IP allowlists. Direct access integrations with Strict client-only access enable authorized users to directly upload and access this data in Encord without compromising security or violating access policies.
  4. Data protection: Direct access paired with Strict client-only access ensures compliance with stringent security requirements and regulatory standards, at the expense of some of Encord’s more advanced features.

Creating a Direct Access integration

  1. Click Add integration, in the Integrations section, to create a new integration.
  1. Select Direct Access from the list of possible integration types.
  2. Click Create.
After a Direct Access integration is created, it can be used to register data.
CORS configurations must be set up if you intend to add data from a cloud provider. Links on how to set up CORS integrations are listed below.

Uploading Direct Access Data

1

Create a Direct Access Integration

2

Create a JSON File for Data Registration

Create a JSON file to specify the files you want to register with Encord. The objectURLs in the JSON file must be pre-signed.
When a Direct Access integration is used with Strict client-only access, the JSON file must include metadata that specifies file information for each URL. If audioMetadata, imageMetadata, or videoMetadata flags are present in the JSON file, the system relies solely on the provided metadata without additional validation and does not store the file on our servers. Ensuring the accuracy of the provided metadata is essential..
{
  "images": [
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/image_file_001.jpg",
      "title": "my-custom-image-file-001.jpg",
      "imageMetadata": {
        "mimeType": "image/jpg",
        "fileSize": 124,
        "width": 640,
        "height": 480
      },
    },
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/image_file_002.jpg",
      "title": "my-custom-image-file-002.jpg",
      "imageMetadata": {
        "mimeType": "image/jpg",
        "fileSize": 124,
        "width": 640,
        "height": 480
      },
    },
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/image_file_003.jpg",
      "title": "my-custom-image-file-003.jpg",
      "imageMetadata": {
        "mimeType": "image/jpg",
        "fileSize": 124,
        "width": 640,
        "height": 480
      },
    },
    {
      "objectUrl": "https://encord-integration.s3.eu-west-2.amazonaws.com/images/image_file_004.jpg",
      "title": "my-custom-image-file-004.jpg",
      "imageMetadata": {
        "mimeType": "image/jpg",
        "fileSize": 124,
        "width": 640,
        "height": 480
      },
    }
  ],
  "skip_duplicate_urls": true
}
3

Import your Direct Access Data

To ensure smoother uploads and faster completion times, and avoid hitting absolute file limits, we recommend adding smaller batches of data. Limit uploads to 100 videos or up to 1,000 images at a time. You can also create multiple Datasets, all of which can be linked to a single Project. Familiarize yourself with our limits and best practices for data import/registration before adding data to Encord.
  1. Navigate to Files section of Index in the Encord platform.
  2. Click into a Folder.
  3. Click + Upload files. A dialog appears.
  1. Click Import from cloud data.
We recommend turning on the Ignore individual file errors feature. This ensures that individual file errors do not lead to the whole upload process being aborted.
  1. Click Add JSON or CSV files to add a JSON or CSV file specifying cloud data that is to be added.
To use your data in Encord, it must be uploaded to the Encord Files storage. Once uploaded, your data can be reused across multiple Projects and contain no labels or annotations themselves. Files stores your data, while Projects store your labels. The following script creates a folder in Files and uses your AWS integration to register data in that folder.The following script creates a new folder in Files and initiates uploads from AWS. It works for all file types.
If Upload is still in progress, try again later! is returned, use the script to check the upload status to see whether the upload has finished.
Ensure that you:
  • Replace <private_key_path> with the path to your private key.
  • Replace <integration_title> with the title of the integration you want to use.
  • Replace <folder_name> with the folder name. The scripts assume that the specified folder name is unique.
  • Replace path/to/json/file.json with the path to a JSON file specifying which cloud storage files should be uploaded.
  • Replace A folder to store my files with a meaningful description for your folder.
  • Replace "my": "folder_metadata" with any metadata you want to add to the folder.
The script has several possible outputs:
  • “Upload is still in progress, try again later!”: The registration has not finished. Run this script again later to check if the data registration has finished.
  • “Upload completed”: The registration completed. If any files failed to upload, the URLs are listed.
  • “Upload failed”: The entire registration failed, and not just individual files. Ensure your JSON file is formatted correctly.

# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus  # Ensure correct import path

# Instantiate user client. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path="<private_key_path>"
)

# Specify the integration you want to use
integrations = user_client.get_cloud_integrations()
integration_idx = [i.title for i in integrations].index("<integration_title>")
integration = integrations[integration_idx].id

# Create a storage folder
folder_name = "<folder_name>"
folder_description = "A folder to store my files"
folder_metadata = {"my": "folder_metadata"}
storage_folder = user_client.create_storage_folder(
    folder_name, folder_description, client_metadata=folder_metadata
)

# Initiate cloud data registration
upload_job_id = storage_folder.add_private_data_to_folder_start(
    integration_id=integration, private_files="path/to/json/file.json", ignore_errors=True
)

# Check upload status
res = storage_folder.add_private_data_to_folder_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")

if res.status == LongPollingStatus.PENDING:
    print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
    print("Upload completed")
    if res.unit_errors:
        print("The following URLs failed to upload:")
        for e in res.unit_errors:
            print(e.object_urls)
else:
    print(f"Upload failed: {res.errors}")