Dataset

Access dataset related data and manipulate the dataset.

class encord.dataset.Dataset(client, orm_dataset)

property dataset_hash: str

Get the dataset hash (for example, the Dataset ID).

Return Type:

str

property title: str

Gets the title of the dataset.

Return Type:

str

property description: str

Gets the description of the Dataset.

Return Type:

str

property backing_folder_uuid: Optional[UUID]

The unique identifier for the Storage folder associated with this Dataset.

Return Type:

UUID

property storage_location: encord.orm.dataset.StorageLocation

Gets the storage location for the Dataset.

CORD_STORAGE = 0
AWS = 1
GCP = 2
AZURE = 3
OTC = 4
NEW_STORAGE = -99
This is a placeholder for a new storage location that is not yet supported by your SDK version. Update your SDK to the latest version.

Return Type:

StorageLocation

property data_rows: List[encord.orm.dataset.DataRow]

Part of the response of this function can be configured by the [encord.dataset.Dataset.set_access_settings()] method.

dataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=True))
print(dataset.data_rows)

Return Type:

ListDataRow

list_data_rows

Gets dataset rows (pointers to data, labels).

list_data_rows(title_eq=None, title_like=None, created_before=None, created_after=None, data_types=None)

Parameters:

title_eq (Optional[str]) – optional exact title row filter
title_like (Optional[str]) – optional fuzzy title row filter; SQL syntax
created_before (Union[str, datetime, None]) – optional datetime row filter
created_after (Union[str, datetime, None]) – optional datetime row filter
data_types (Optional[List[DataType]]) – optional data types row filter

Return type:

ListDataRow

Returns:

A list of DataRows object that match the filter.

Raises:

[AuthorisationError] – If the dataset API key is invalid.
[ResourceNotFoundError] – If no dataset exists by the specified dataset EntityId.
[UnknownError] – If an error occurs while retrieving the dataset.

    def list_data_rows(
        self,
        title_eq: Optional[str] = None,
        title_like: Optional[str] = None,
        created_before: Optional[Union[str, datetime]] = None,
        created_after: Optional[Union[str, datetime]] = None,
        data_types: Optional[List[DataType]] = None,
    ) -> List[DataRow]:
        """
        Retrieve dataset rows (pointers to data, labels).

        Args:
            title_eq: optional exact title row filter
            title_like: optional fuzzy title row filter; SQL syntax
            created_before: optional datetime row filter
            created_after: optional datetime row filter
            data_types: optional data types row filter

        Returns:
            List[DataRow]: A list of DataRows object that match the filter

        Raises:
            AuthorisationError: If the dataset API key is invalid.
            ResourceNotFoundError: If no dataset exists by the specified dataset EntityId.
            UnknownError: If an error occurs while retrieving the dataset.
        """

        return self._client.list_data_rows(title_eq, title_like, created_before, created_after, data_types)

refetch_data

The Dataset class only fetches its properties once. Use this function if you suspect the state of those properties to be dirty.

refetch_data()

Return type:

None

    def get_dataset(self) -> OrmDataset:
        """
        This function is exposed for convenience. You are encouraged to use the property accessors instead.
        """
        return self._client.get_dataset()

get_dataset

This function is exposed for convenience. You are encouraged to use the property accessors instead.

get_dataset()

Return type:

Dataset

    def get_dataset(self) -> OrmDataset:
        """
        This function is exposed for convenience. You are encouraged to use the property accessors instead.
        """
        return self._client.get_dataset()

set_access_settings

Specifies access settings for a dataset.

set_access_settings(dataset_access_settings, *, refetch_data=True)

Parameters:

dataset_access_settings (DatasetAccessSettings) – The access settings to use going forward
refetch_data (bool) – Whether a refetch_data() call should follow the update of the dataset access settings.

Return type:

None

    def set_access_settings(self, dataset_access_settings: DatasetAccessSettings, *, refetch_data: bool = True) -> None:
        """
        Args:
            dataset_access_settings: The access settings to use going forward
            refetch_data: Whether a `refetch_data()` call should follow the update of the dataset access settings.
        """
        self._client.set_access_settings(dataset_access_settings)
        if refetch_data:
            self.refetch_data()

add_users

Adds users to dataset.

If the user was already added, this operation succeeds but the user_role is unchanged. The existing user_role is reflected in the DatasetUser instance.

add_users(user_emails, user_role)

Parameters:

user_emails (List[str]) – list of user emails to be added
user_role (DatasetUserRole) – the user role to assign to all users

Return type:

List[DatasetUser]

    def add_users(self, user_emails: List[str], user_role: DatasetUserRole) -> List[DatasetUser]:
        """
        Add users to dataset. If the user was already added, this operation will succeed but the `user_role` will be
        unchanged. The existing `user_role` will be reflected in the `DatasetUser` instance.

        Args:
            user_emails: list of user emails to be added
            user_role: the user role to assign to all users
        """
        return self._client.add_users(user_emails, user_role)

upload_video

👍
Tip
Customers with access to Files are advised to use the StorageFolder class to upload data to Encord.

Uploads one or more videos to Encord storage.

upload_video(file_path, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False), title=None, folder_uuid = None)

Parameters:

file_path (str) – path to video e.g. /home/user/data/video.mp4
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
title (Optional str) – The video title. If unspecified, this will be the file name. This title should include an extension. For example “encord_video.mp4”.
folder_uuid (Optional UUID) - The uuid of the folder where the data is stored. We recommend using the StorageFolder class to upload data to Encord.

Returns:

Boolean

Raises:

[UploadOperationNotSupportedError] – If trying to upload to external datasets (for example, S3/GPC/Azure).

    def upload_video(
        self,
        file_path: str,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
    ):
        """
        Upload video to Encord storage.

        Args:
            file_path: path to video e.g. '/home/user/data/video.mp4'
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.
            title:
                The video title. If unspecified, this will be the file name. This title should include an extension.
                For example "encord_video.mp4".

        Returns:
            Bool.

        Raises:
            UploadOperationNotSupportedError: If trying to upload to external
                                              datasets (e.g. S3/GPC/Azure)
        """
        return self._client.upload_video(file_path, cloud_upload_settings=cloud_upload_settings, title=title)

create_image_group

👍
Tip
Customers with access to Files are advised to use the StorageFolder class to upload data to Encord.

Creates an image group or image sequence in Encord storage based on the create_video parameter. Use this option for uploading sequential images

dataset.create_image_group(file_paths, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False), title=None, create_video=True)

Parameters:

file_paths (Iterable[str]) – A list of paths to images, for example, [‘/home/user/data/img1.png’, ‘/home/user/data/img2.png’]
DEPRECATED: max_workers (Optional[int]) – This argument is ignored
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
title (Optional[str]) – The title of the image group. If unspecified a randomly generated title is created for you. This title should NOT include an extension. For example “encord_image_group”.
create_video (bool) – A flag specifying how image groups are stored. Default value is True. If True, an image sequence is created. If False, the images are saved as an image group.
folder_uuid (Optional UUID) - The uuid of the folder where the data is stored. We recommend using the StorageFolder class to upload data to Encord.

Return type:

Boolean

Returns:

Boolean

Raises:

[UploadOperationNotSupportedError] – If trying to upload to external datasets (for example, S3/GPC/Azure)

    def create_image_group(
        self,
        file_paths: Iterable[str],
        max_workers: Optional[int] = None,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
        *,
        create_video: bool = True,
    ):
        """
        Create an image group in Encord storage. Choose this type of image upload for sequential images. Else, you can
        choose the :meth:`.Dataset.upload_image` function.

        Args:
            file_paths: a list of paths to images, e.g.
                ['/home/user/data/img1.png', '/home/user/data/img2.png']
            max_workers:
                DEPRECATED: This argument will be ignored
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.
            title:
                The title of the image group. If unspecified this will be randomly generated for you. This title should
                NOT include an extension. For example "encord_image_group".
            create_video:
                A flag specifying how image groups are stored. If `True`, a compressed video will be created from
                the image groups. `True` was the previous default support. If `False`, the images
                are saved as a sequence of images.

        Returns:
            Bool.

        Raises:
            UploadOperationNotSupportedError: If trying to upload to external
                                              datasets (e.g. S3/GPC/Azure)
        """
        return self._client.create_image_group(
            file_paths,
            cloud_upload_settings=cloud_upload_settings,
            title=title,
            create_video=create_video,
        )

create_dicom_series

👍
Tip
Customers with access to Files are advised to use the StorageFolder class to upload data to Encord.

Uploads a DICOM series to Encord storage.

create_dicom_series(file_paths, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False), title=None)

Parameters:

file_paths (List[str]) – A list of paths to DICOM files, for example, [‘/home/user/data/DICOM_1.dcm’, ‘/home/user/data/DICOM_2.dcm’].
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
title (Optional[str]) – The title of the DICOM series. If unspecified a randomly generated title is created for you. This title should NOT include an extension. For example “dicom_series”.
folder_uuid (Optional UUID) - The uuid of the folder where the data is stored. We recommend using the StorageFolder class to upload data to Encord.

Return type:

Boolean

Returns:

Boolean

Raises:

[UploadOperationNotSupportedError] – If trying to upload to external datasets (for example, S3/GPC/Azure)

    def create_dicom_series(
        self,
        file_paths: List[str],
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
    ):
        """
        Upload a DICOM series to Encord storage

        Args:
            file_paths: a list of paths to DICOM files, e.g.
                ['/home/user/data/DICOM_1.dcm', '/home/user/data/DICOM_2.dcm']
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.
            title:
                The title of the DICOM series. If unspecified this will be randomly generated for you. This title should
                NOT include an extension. For example "encord_image_group".
        Returns:
            Bool.

        Raises:
            UploadOperationNotSupportedError: If trying to upload to external
                                              datasets (e.g. S3/GPC/Azure)
        """
        return self._client.create_dicom_series(file_paths, cloud_upload_settings=cloud_upload_settings, title=title)

upload_image

👍
Tip
Customers with access to Files are advised to use the StorageFolder class to upload data to Encord.

👍
Tip
If your images are sequential we recommend creating an image group using the [Dataset.create_image_group()] function. For more information compare https://docs.encord.com/docs/annotate-images and https://docs.encord.com/docs/annotate-videos.

Uploads a single image to Encord storage.

upload_image(file_path, title=None, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False), folder_uuid= None)

Parameters:

file_path (Union[Path, str]) – The file path to the image
title (Optional[str]) – The image title. If unspecified, this will be the file name. This title should include an extension. For example “encord_image.png”.
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
folder_uuid (Optional UUID) - The uuid of the folder where the data is stored. We recommend using the StorageFolder class to upload data to Encord.

Return type:

Image

    def upload_image(
        self,
        file_path: Union[Path, str],
        title: Optional[str] = None,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
    ) -> Image:
        """
        Upload a single image to Encord storage. If your images are sequential we recommend creating an image group via
        the :meth:`.Dataset.create_image_group` function. For more information please compare
        https://docs.encord.com/docs/annotate-images and https://docs.encord.com/docs/annotate-videos

        Args:
            file_path: The file path to the image
            title: The image title. If unspecified, this will be the file name. This title should include an extension.
                For example "encord_image.png".
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.

        """
        return self._client.upload_image(file_path, title, cloud_upload_settings)

link_items

This method adds files stored in Index to a specified Dataset.

link_items(item_uuids)

Parameters:

item_uuids (List[uuid]) - a list of uuids of files to be added to the Dataset.

delete_image_group

Deletes an image group in Encord storage.

delete_image_group(data_hash)

Parameters:

data_hash (str) – the hash of the image group to delete

    def delete_image_group(self, data_hash: str):
        """
        Delete an image group in Encord storage.

        Args:
            data_hash: the hash of the image group to delete
        """
        return self._client.delete_image_group(data_hash)

delete_data

Deletes a video/image group from a Dataset.

delete_data(data_hashes)

Parameters:

data_hashes (List[str]) – list of hash of the videos/image_groups you’d like to delete, all should belong to the same Dataset

    def delete_data(self, data_hashes: List[str]):
        """
        Delete a video/image group from a dataset.

        Args:
            data_hashes: list of hash of the videos/image_groups you'd like to delete, all should belong to the same
             dataset
        """
        return self._client.delete_data(data_hashes)

add_private_data_to_dataset

Appends data hosted on a private cloud to an existing dataset.

👍
Tip
For a more complete example of safe uploads, refer to our documentation here.

add_private_data_to_dataset(integration_id, private_files, ignore_errors=False)

Parameters:

integration_id (str) – The EntityId of the cloud integration you wish to use.
private_files (Union[str, Dict, Path, TextIO]) – A str path or Path object to a json file, json str or python dictionary of the files you wish to add
ignore_errors (bool) – When set to True, this will prevent individual errors from stopping the upload process.

Return type:

AddPrivateDataResponse

Returns:

add_private_data_response List of DatasetDataInfo objects containing data_hash and title.

    def add_private_data_to_dataset(
        self,
        integration_id: str,
        private_files: Union[str, Dict, Path, TextIO],
        ignore_errors: bool = False,
    ) -> AddPrivateDataResponse:
        """
        Append data hosted on a private cloud to an existing dataset.

        Args:
            integration_id:
                The `EntityId` of the cloud integration you wish to use.
            private_files:
                A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
            ignore_errors:
                When set to `True`, this will prevent individual errors from stopping the upload process.
        Returns:
            add_private_data_response List of DatasetDataInfo objects containing data_hash and title

        """
        return self._client.add_private_data_to_dataset(integration_id, private_files, ignore_errors)

add_private_data_to_dataset_start

Appends data hosted on a private cloud to an existing dataset.

This method initializes the upload in Encord’s backend. Once the upload id has been returned, you can exit the terminal while the job continues uninterrupted.

You can check upload job status at any point using the add_private_data_to_dataset_get_result() method. This can be done in a separate python session to the one where the upload was initialized.

add_private_data_to_dataset_start(integration_id, private_files, ignore_errors=False, folder)

Parameters:

integration_id (str) – The EntityId of the cloud integration you wish to use.
private_files (Union[str, Dict, Path, TextIO]) – A str path or Path object to a json file, json str or python dictionary of the files you wish to add
ignore_errors (bool) – When set to True, this will prevent individual errors from stopping the upload process.
folder (UUID or StorageFolder) - specify a StorageFolder object or folder uuid to specify where the files uploaded to the Dataset should be stored.

Return type:

String

Returns:

upload_job_id - UUID Identifier of upload job. This id enables the user to track the job progress using the SDK or Encord platform.

    def add_private_data_to_dataset_start(
        self,
        integration_id: str,
        private_files: Union[str, Dict, Path, TextIO],
        ignore_errors: bool = False,
    ) -> str:
        """
        Append data hosted on a private cloud to an existing dataset.

        This method inititalizes the upload in Encord's backend.
        Once the upload id has been returned, you can exit the terminal
        while the job continues uninterrupted.

        You can check upload job status at any point using
        the :meth:`add_private_data_to_dataset_get_result` method.
        This can be done in a separate python session to the one
        where the upload was initialized.

        Args:
            integration_id:
                The `EntityId` of the cloud integration you wish to use.
            private_files:
                A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
            ignore_errors:
                When set to `True`, this will prevent individual errors from stopping the upload process.
        Returns:
            str
                `upload_job_id` - UUID Identifier of upload job.
                This id enables the user to track the job progress via SDK, or web app.
        """
        return self._client.add_private_data_to_dataset_start(integration_id, private_files, ignore_errors)

add_private_data_to_dataset_get_result

Gets data upload status. Performs long polling process for timeout_seconds.

add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=604800)

Parameters:

upload_job_id (str) – UUID Identifier of upload job. This id enables the user to track the job progress using the SDK or web-app.
timeout_seconds (int) – Number of seconds the method will wait while waiting for a response. If timeout_seconds == 0, only a single checking request is performed. Response will be immediately returned.

Return type:

DatasetDataLongPolling

Returns:

DatasetDataLongPolling: Response containing details about job status, errors and progress.

    def add_private_data_to_dataset_get_result(
        self,
        upload_job_id: str,
        timeout_seconds: int = 7 * 24 * 60 * 60,  # 7 days
    ) -> DatasetDataLongPolling:
        """
        Fetch data upload status, perform long polling process for `timeout_seconds`.

        Args:
            upload_job_id:
                UUID Identifier of upload job. This id enables the user to track the job progress via SDK, or web app.
            timeout_seconds:
                Number of seconds the method will wait while waiting for a response.
                If `timeout_seconds == 0`, only a single checking request is performed.
                Response will be immediately returned.
        Returns:
            DatasetDataLongPolling
                Response containing details about job status, errors and progress.
        """
        return self._client.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds)

DEPRECATED - update_data_item

DEPRECATED: Updates a data item.

Use the individual setter properties of the respective [encord.orm.dataset.DataRow] instance instead. These can be retrieved using the [Dataset.data_rows()] function.

update_data_item(data_hash, new_title)

Parameters:

data_hash (str) – Data hash of the item being updated
new_title (str) – String containing the new title of the data item being updated.

Return type:

Boolean

Returns:

Returns a boolean for whether the update was successful.

    def update_data_item(self, data_hash: str, new_title: str) -> bool:
        """
        DEPRECATED: Use the individual setter properties of the respective :class:`encord.orm.dataset.DataRow`
        instance instead. These can be retrieved via the :meth:`.Dataset.data_rows` function.

        Update a data item

        Args:
            data_hash: str
                Data hash of the item being updated
            new_title:
                String containing the new title of the data item being updated
        Returns:
           Returns a boolean for whether the update was successful

        """
        return self._client.update_data_item(data_hash, new_title)

re_encode_data

Launches an async task that can re-encode a list of videos.

re_encode_data(data_hashes)

Parameters:

data_hashes (List[str]) – list of hash of the videos you’d like to re_encode, all should belong to the same dataset.

Returns:

EntityId(integer) of the async task launched.


    def re_encode_data(self, data_hashes: List[str]):
        """
        Launches an async task that can re-encode a list of videos.

        Args:
            data_hashes: list of hash of the videos you'd like to re_encode, all should belong to the same
             dataset
        Returns:
            EntityId(integer) of the async task launched.

        """
        return self._client.re_encode_data(data_hashes)

re_encode_data_status

Returns the status of an existing async task which is aimed at re-encoding videos.

re_encode_data_status(job_id)

Parameters:

job_id (int) – ID of the async task that was launched to re-encode the videos.

Return type:

ReEncodeVideoTask

Returns:

Object containing the status of the task, along with info about the new encoded videos in case the task has been completed.

    def re_encode_data_status(self, job_id: int):
        """
        Returns the status of an existing async task which is aimed at re-encoding videos.

        Args:
            job_id: id of the async task that was launched to re-encode the videos

        Returns:
            ReEncodeVideoTask: Object containing the status of the task, along with info about the new encoded videos
             in case the task has been completed
        """
        return self._client.re_encode_data_status(job_id)

run_ocr

Returns an optical character recognition result for a given image group :type image_group_id: str :param image_group_id: the id of the image group in this dataset to run OCR on.

run_ocr(image_group_id)

Return type:

ListImageGroupOCR

Returns:

Returns a list of ImageGroupOCR objects representing the text and corresponding coordinates found in each frame of the image group.

    def run_ocr(self, image_group_id: str) -> List[ImageGroupOCR]:
        """
        Returns an optical character recognition result for a given image group
        Args:
            image_group_id: the id of the image group in this dataset to run OCR on

        Returns:
            Returns a list of ImageGroupOCR objects representing the text and corresponding coordinates
            found in each frame of the image group
        """
        return self._client.run_ocr(image_group_id)

get_cloud_integrations

Gets the cloud integration information for a dataset.

get_cloud_integrations()

Return type:

ListCloudIntegration

    def get_cloud_integrations(self) -> List[CloudIntegration]:
        return self._client.get_cloud_integrations()

list_groups

List all user groups that belong to a specified Dataset.

list_groups()

Return type:

[DatasetGroup]

dataset_hash = convert_to_uuid(self.dataset_hash)
page = self._client.list_groups(dataset_hash)
yield from page.results

add_group

Add a user group to a specified Dataset.

add_group(group_hash, user_role)

Parameters:

group_hash: A group hash, or a list of group hashes to be added to the Dataset.
user_role: Specify the Dataset user role to assign to the user group. Dataset user roles are either Admin, or User. See DatasetUserRole for more information.

Return type:

None

if isinstance(group_hash, UUID):
    group_hash = [group_hash]
self._client.add_groups(self.dataset_hash, group_hash, user_role)

remove_group

Remove one, or multiple groups from a specified Dataset.

remove_group(group_hash)

Parameters:

group_hash: A group hash, or a list of group hashes to be added to the Dataset.

Return type:

None

if isinstance(group_hash, UUID):
    group_hash = [group_hash]
dataset_hash = convert_to_uuid(self.dataset_hash)
self._client.remove_groups(dataset_hash, group_hash)

Source

from datetime import datetime
from pathlib import Path
from typing import Dict, Iterable, List, Optional, TextIO, Union

from encord.client import EncordClientDataset
from encord.constants.enums import DataType
from encord.http.utils import CloudUploadSettings
from encord.orm.cloud_integration import CloudIntegration
from encord.orm.dataset import AddPrivateDataResponse, DataRow
from encord.orm.dataset import Dataset as OrmDataset
from encord.orm.dataset import (
    DatasetAccessSettings,
    DatasetDataLongPolling,
    DatasetUser,
    DatasetUserRole,
    Image,
    ImageGroupOCR,
    StorageLocation,
)


class Dataset:
    """
    Access dataset related data and manipulate the dataset.
    """

    def __init__(self, client: EncordClientDataset, orm_dataset: OrmDataset):
        self._client = client
        self._dataset_instance = orm_dataset

    @property
    def dataset_hash(self) -> str:
        """
        Get the dataset hash (i.e. the Dataset ID).
        """
        return self._dataset_instance.dataset_hash

    @property
    def title(self) -> str:
        return self._dataset_instance.title

    @property
    def description(self) -> str:
        return self._dataset_instance.description

    @property
    def storage_location(self) -> StorageLocation:
        return self._dataset_instance.storage_location

    @property
    def data_rows(self) -> List[DataRow]:
        """
        Part of the response of this function can be configured by the
        :meth:`encord.dataset.Dataset.set_access_settings` method.

        .. code::

            dataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=True))
            print(dataset.data_rows)
        """
        return self._dataset_instance.data_rows

    def list_data_rows(
        self,
        title_eq: Optional[str] = None,
        title_like: Optional[str] = None,
        created_before: Optional[Union[str, datetime]] = None,
        created_after: Optional[Union[str, datetime]] = None,
        data_types: Optional[List[DataType]] = None,
    ) -> List[DataRow]:
        """
        Retrieve dataset rows (pointers to data, labels).

        Args:
            title_eq: optional exact title row filter
            title_like: optional fuzzy title row filter; SQL syntax
            created_before: optional datetime row filter
            created_after: optional datetime row filter
            data_types: optional data types row filter

        Returns:
            List[DataRow]: A list of DataRows object that match the filter

        Raises:
            AuthorisationError: If the dataset API key is invalid.
            ResourceNotFoundError: If no dataset exists by the specified dataset EntityId.
            UnknownError: If an error occurs while retrieving the dataset.
        """

        return self._client.list_data_rows(title_eq, title_like, created_before, created_after, data_types)

    def refetch_data(self) -> None:
        """
        The Dataset class will only fetch its properties once. Use this function if you suspect the state of those
        properties to be dirty.
        """
        self._dataset_instance = self._client.get_dataset()

    def get_dataset(self) -> OrmDataset:
        """
        This function is exposed for convenience. You are encouraged to use the property accessors instead.
        """
        return self._client.get_dataset()

    def set_access_settings(self, dataset_access_settings: DatasetAccessSettings, *, refetch_data: bool = True) -> None:
        """
        Args:
            dataset_access_settings: The access settings to use going forward
            refetch_data: Whether a `refetch_data()` call should follow the update of the dataset access settings.
        """
        self._client.set_access_settings(dataset_access_settings)
        if refetch_data:
            self.refetch_data()

    def add_users(self, user_emails: List[str], user_role: DatasetUserRole) -> List[DatasetUser]:
        """
        Add users to dataset. If the user was already added, this operation will succeed but the `user_role` will be
        unchanged. The existing `user_role` will be reflected in the `DatasetUser` instance.

        Args:
            user_emails: list of user emails to be added
            user_role: the user role to assign to all users
        """
        return self._client.add_users(user_emails, user_role)

    def upload_video(
        self,
        file_path: str,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
    ):
        """
        Upload video to Encord storage.

        Args:
            file_path: path to video e.g. '/home/user/data/video.mp4'
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.
            title:
                The video title. If unspecified, this will be the file name. This title should include an extension.
                For example "encord_video.mp4".

        Returns:
            Bool.

        Raises:
            UploadOperationNotSupportedError: If trying to upload to external
                                              datasets (e.g. S3/GPC/Azure)
        """
        return self._client.upload_video(file_path, cloud_upload_settings=cloud_upload_settings, title=title)

    def create_image_group(
        self,
        file_paths: Iterable[str],
        max_workers: Optional[int] = None,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
        *,
        create_video: bool = True,
    ):
        """
        Create an image group in Encord storage. Choose this type of image upload for sequential images. Else, you can
        choose the :meth:`.Dataset.upload_image` function.

        Args:
            file_paths: a list of paths to images, e.g.
                ['/home/user/data/img1.png', '/home/user/data/img2.png']
            max_workers:
                DEPRECATED: This argument will be ignored
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.
            title:
                The title of the image group. If unspecified this will be randomly generated for you. This title should
                NOT include an extension. For example "encord_image_group".
            create_video:
                A flag specifying how image groups are stored. If `True`, a compressed video will be created from
                the image groups. `True` was the previous default support. If `False`, the images
                are saved as a sequence of images.

        Returns:
            Bool.

        Raises:
            UploadOperationNotSupportedError: If trying to upload to external
                                              datasets (e.g. S3/GPC/Azure)
        """
        return self._client.create_image_group(
            file_paths,
            cloud_upload_settings=cloud_upload_settings,
            title=title,
            create_video=create_video,
        )

    def create_dicom_series(
        self,
        file_paths: List[str],
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
    ):
        """
        Upload a DICOM series to Encord storage

        Args:
            file_paths: a list of paths to DICOM files, e.g.
                ['/home/user/data/DICOM_1.dcm', '/home/user/data/DICOM_2.dcm']
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.
            title:
                The title of the DICOM series. If unspecified this will be randomly generated for you. This title should
                NOT include an extension. For example "encord_image_group".
        Returns:
            Bool.

        Raises:
            UploadOperationNotSupportedError: If trying to upload to external
                                              datasets (e.g. S3/GPC/Azure)
        """
        return self._client.create_dicom_series(file_paths, cloud_upload_settings=cloud_upload_settings, title=title)

    def upload_image(
        self,
        file_path: Union[Path, str],
        title: Optional[str] = None,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
    ) -> Image:
        """
        Upload a single image to Encord storage. If your images are sequential we recommend creating an image group via
        the :meth:`.Dataset.create_image_group` function. For more information please compare
        https://docs.encord.com/docs/annotate-images and https://docs.encord.com/docs/annotate-videos

        Args:
            file_path: The file path to the image
            title: The image title. If unspecified, this will be the file name. This title should include an extension.
                For example "encord_image.png".
            cloud_upload_settings:
                Settings for uploading data into the cloud. Change this object to overwrite the default values.

        """
        return self._client.upload_image(file_path, title, cloud_upload_settings)
    
    def link_items(self, item_uuids: List[UUID]) -> List[DataRow]:
        return self._client.link_items(item_uuids)

    def delete_image_group(self, data_hash: str):
        """
        Delete an image group in Encord storage.

        Args:
            data_hash: the hash of the image group to delete
        """
        return self._client.delete_image_group(data_hash)

    def delete_data(self, data_hashes: List[str]):
        """
        Delete a video/image group from a dataset.

        Args:
            data_hashes: list of hash of the videos/image_groups you'd like to delete, all should belong to the same
             dataset
        """
        return self._client.delete_data(data_hashes)

    def add_private_data_to_dataset(
        self,
        integration_id: str,
        private_files: Union[str, Dict, Path, TextIO],
        ignore_errors: bool = False,
    ) -> AddPrivateDataResponse:
        """
        Append data hosted on a private cloud to an existing dataset.

        Args:
            integration_id:
                The `EntityId` of the cloud integration you wish to use.
            private_files:
                A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
            ignore_errors:
                When set to `True`, this will prevent individual errors from stopping the upload process.
        Returns:
            add_private_data_response List of DatasetDataInfo objects containing data_hash and title

        """
        return self._client.add_private_data_to_dataset(integration_id, private_files, ignore_errors)

    def add_private_data_to_dataset_start(
        self,
        integration_id: str,
        private_files: Union[str, Dict, Path, TextIO],
        ignore_errors: bool = False,
    ) -> str:
        """
        Append data hosted on a private cloud to an existing dataset.

        This method inititalizes the upload in Encord's backend.
        Once the upload id has been returned, you can exit the terminal
        while the job continues uninterrupted.

        You can check upload job status at any point using
        the :meth:`add_private_data_to_dataset_get_result` method.
        This can be done in a separate python session to the one
        where the upload was initialized.

        Args:
            integration_id:
                The `EntityId` of the cloud integration you wish to use.
            private_files:
                A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
            ignore_errors:
                When set to `True`, this will prevent individual errors from stopping the upload process.
        Returns:
            str
                `upload_job_id` - UUID Identifier of upload job.
                This id enables the user to track the job progress via SDK, or web app.
        """
        return self._client.add_private_data_to_dataset_start(integration_id, private_files, ignore_errors)

    def add_private_data_to_dataset_get_result(
        self,
        upload_job_id: str,
        timeout_seconds: int = 7 * 24 * 60 * 60,  # 7 days
    ) -> DatasetDataLongPolling:
        """
        Fetch data upload status, perform long polling process for `timeout_seconds`.

        Args:
            upload_job_id:
                UUID Identifier of upload job. This id enables the user to track the job progress via SDK, or web app.
            timeout_seconds:
                Number of seconds the method waits while waiting for a response.
                If `timeout_seconds == 0`, only a single checking request is immediately performed.
                Response is immediately returned.
                The default timeout loops internally, doing a request once a minute.
        Returns:
            DatasetDataLongPolling
                Response containing details about job status, errors and progress.
        """
        return self._client.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds)

    def update_data_item(self, data_hash: str, new_title: str) -> bool:
        """
        DEPRECATED: Use the individual setter properties of the respective :class:`encord.orm.dataset.DataRow`
        instance instead. These can be retrieved via the :meth:`.Dataset.data_rows` function.

        Update a data item

        Args:
            data_hash: str
                Data hash of the item being updated
            new_title:
                String containing the new title of the data item being updated
        Returns:
           Returns a boolean for whether the update was successful

        """
        return self._client.update_data_item(data_hash, new_title)

    def re_encode_data(self, data_hashes: List[str]):
        """
        Launches an async task that can re-encode a list of videos.

        Args:
            data_hashes: list of hash of the videos you'd like to re_encode, all should belong to the same
             dataset
        Returns:
            EntityId(integer) of the async task launched.

        """
        return self._client.re_encode_data(data_hashes)

    def re_encode_data_status(self, job_id: int):
        """
        Returns the status of an existing async task which is aimed at re-encoding videos.

        Args:
            job_id: id of the async task that was launched to re-encode the videos

        Returns:
            ReEncodeVideoTask: Object containing the status of the task, along with info about the new encoded videos
             in case the task has been completed
        """
        return self._client.re_encode_data_status(job_id)

    def run_ocr(self, image_group_id: str) -> List[ImageGroupOCR]:
        """
        Returns an optical character recognition result for a given image group
        Args:
            image_group_id: the id of the image group in this dataset to run OCR on

        Returns:
            Returns a list of ImageGroupOCR objects representing the text and corresponding coordinates
            found in each frame of the image group
        """
        return self._client.run_ocr(image_group_id)

    def get_cloud_integrations(self) -> List[CloudIntegration]:
        return self._client.get_cloud_integrations()

Dataset

Dataset

list_data_rows

refetch_data

get_dataset

set_access_settings

add_users

upload_video

👍
Tip

create_image_group

👍
Tip

create_dicom_series

👍
Tip

upload_image

👍
Tip

👍
Tip

link_items

delete_image_group

delete_data

add_private_data_to_dataset

👍
Tip

add_private_data_to_dataset_start

add_private_data_to_dataset_get_result

DEPRECATED - update_data_item

re_encode_data

re_encode_data_status

run_ocr

get_cloud_integrations

list_groups

add_group

remove_group

Source

Dataset

list_data_rows

refetch_data

get_dataset

set_access_settings

add_users

upload_video

👍Tip

create_image_group

👍Tip

create_dicom_series

👍Tip

upload_image

👍Tip

👍Tip

link_items

delete_image_group

delete_data

add_private_data_to_dataset

👍Tip

add_private_data_to_dataset_start

add_private_data_to_dataset_get_result

DEPRECATED - update_data_item

re_encode_data

re_encode_data_status

run_ocr

get_cloud_integrations

list_groups

add_group

remove_group

Source

👍
Tip

👍
Tip

👍
Tip

👍
Tip

👍
Tip

👍
Tip