“Data Grouping” (Data Groups) allows you to allocate individual files to groups so that they are more easily annotated and reviewed. This allows you to unlock multi-tile and multi-modal functionality.

Basically, Data Groups are like image groups in Encord, except that Data Groups can include any data type (images, videos, audio files, text files, PDFs) AND that data groups support default and custom layouts for annotation and review.

You can use Data Groups in Consensus and non-Consensus Projects.

Data Group Layouts

The order of data units in a Data Group determines how they are arranged in the Label Editor. Encord supports the following:

  • Grid (default)
  • Custom Layouts

Grid Layout

+------------------+------------------------+
|   data unit 1    |      data unit 2       |
+------------------+------------------------+
|   data unit 3    |      data unit 4       |
+------------------+------------------------+

Custom Layout Example

+-------------------------------------------+
|              text file                    |
+------------------+------------------------+
|   data unit 1    |      data unit 2       |
+------------------+------------------------+
|   data unit 3    |      data unit 4       |
+------------------+------------------------+

Create Data Groups

At scale, we recommend using the SDK to create your Data Groups, specify the layout for your Data Groups, and add Data Groups to Datasets and Projects.

Each of the code examples does the following:

  1. Specifies the data units to add to a Data Group.

  2. Creates the Data Groups and specifies the layout in the Label Editor.

  3. Adds the Data Groups to a Dataset.

  4. Adds the Dataset to a Project.

from uuid import UUID
from typing import List

from encord.constants.enums import DataType
from encord.objects.metadata import DataGroupMetadata
from encord.orm.storage import DataGroupGrid, StorageItemType
from encord.user_client import EncordUserClient

# --- Configuration ---
SSH_PATH = "/Users/chris-encord/ssh-private-key.txt"  # Replace with the file path to your SSH key
FOLDER_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Folder ID
DATASET_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Dataset ID
PROJECT_ID = "00000000-0000-0000-0000-000000000000"  # Replace with the Project ID

# --- Connect to Encord ---
user_client: EncordUserClient = EncordUserClient.create_with_ssh_private_key(
    ssh_private_key_path=SSH_PATH,
    # For US platform users use "https://api.us.encord.com"
    domain="https://api.encord.com",
)

folder = user_client.get_storage_folder(FOLDER_ID)

# --- Group definitions (name + UUID list) ---
groups = [
    {
        "name": "group-grid-001",
        "uuids": [
            UUID("00000000-0000-0000-0000-000000000000"), # Replace with File ID. This data unit appears first in the grid.
            UUID("11111111-1111-1111-1111-111111111111"), # Replace with File ID. This data unit appears second in the grid.
            UUID("22222222-2222-2222-2222-222222222222"), # Replace with File ID. This data unit appears third in the grid.
            UUID("33333333-3333-3333-3333-333333333333"), # Replace with File ID. This data unit appears fourth in the grid.
        ],
    },
    {
        "name": "group-grid-002",
        "uuids": [
            UUID("44444444-4444-4444-4444-444444444444"), # Replace with File ID. This data unit appears first in the grid.
            UUID("55555555-5555-5555-5555-555555555555"), # Replace with File ID. This data unit appears second in the grid.
            UUID("66666666-6666-6666-6666-666666666666"), # Replace with File ID. This data unit appears third in the grid.
            UUID("77777777-7777-7777-7777-777777777777"), # Replace with File ID. This data unit appears fourth in the grid.
        ],
    },
    {
        "name": "group-grid-003",
        "uuids": [
            UUID("88888888-8888-8888-8888-888888888888"), # Replace with File ID. This data unit appears first in the grid.
            UUID("99999999-9999-9999-9999-999999999999"), # Replace with File ID. This data unit appears second in the grid.
            UUID("12312312-3123-1231-2312-312312312312"), # Replace with File ID. This data unit appears third in the grid.
            UUID("45645645-6456-4564-5645-645645645645"), # Replace with File ID. This data unit appears fourth in the grid.
        ],
    },
    # Add more groups as needed...
]

# --- Create the data groups using default grid layout ---
for g in groups:
    group = folder.create_data_group(
        DataGroupGrid(
            name=g["name"],
            layout_contents=g["uuids"],
        )
    )
    print(f"✅ Created group '{g['name']}' with UUID {group}")

# --- Add all the data groups in a folder to a dataset ---
group_items = folder.list_items(item_types=[StorageItemType.GROUP])
d = user_client.get_dataset(DATASET_ID)
d.link_items([item.uuid for item in group_items])

# --- Retrieve and inspect data group label rows ---
p = user_client.get_project(PROJECT_ID)
rows = p.list_label_rows_v2(include_children=True)

for row in rows:
    if row.data_type == DataType.GROUP:
        row.initialise_labels()
        assert isinstance(row.metadata, DataGroupMetadata)
        print(row.metadata.children)