> ## Documentation Index
> Fetch the complete documentation index at: https://docs.encord.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Dataset

## Dataset Objects

```python theme={"dark"}
class Dataset()
```

Access dataset-related data and manipulate the dataset.

#### dataset\_hash

```python theme={"dark"}
@property
def dataset_hash() -> str
```

Get the dataset hash (i.e. the Dataset ID).

**Returns**:

* `str` - The dataset hash.

#### title

```python theme={"dark"}
@property
def title() -> str
```

Get the title of the dataset.

**Returns**:

* `str` - The title of the dataset.

#### description

```python theme={"dark"}
@property
def description() -> str
```

Get the description of the dataset.

**Returns**:

* `str` - The description of the dataset.

#### storage\_location

```python theme={"dark"}
@property
def storage_location() -> StorageLocation
```

Get the storage location of the dataset.

**Returns**:

* `StorageLocation` - The storage location of the dataset.

#### backing\_folder\_uuid

```python theme={"dark"}
@property
def backing_folder_uuid() -> Optional[UUID]
```

Get the UUID of the backing folder.

**Returns**:

* `Optional[UUID]` - The UUID of the backing folder, if any.

#### data\_rows

```python theme={"dark"}
@property
def data_rows() -> List[DataRow]
```

Get the data rows of the dataset.

Part of the response of this function can be configured by the
[set\_access\_settings()](/sdk-documentation/sdk-references/dataset#set_access_settings) method.

```
    dataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=True))
    print(dataset.data_rows)
```

**Returns**:

* `List[DataRow]` - A list of DataRow objects.

#### list\_data\_rows

```python theme={"dark"}
def list_data_rows(title_eq: Optional[str] = None,
                   title_like: Optional[str] = None,
                   created_before: Optional[Union[str, datetime]] = None,
                   created_after: Optional[Union[str, datetime]] = None,
                   data_types: Optional[List[DataType]] = None,
                   data_hashes: Optional[List[str]] = None) -> List[DataRow]
```

Retrieve dataset rows (pointers to data, labels).

**Arguments**:

* `title_eq` - Optional exact title row filter.
* `title_like` - Optional fuzzy title row filter; SQL syntax.
* `created_before` - Optional datetime row filter.
* `created_after` - Optional datetime row filter.
* `data_types` - Optional data types row filter.
* `data_hashes` - Optional list of individual data unit hashes to include.

**Returns**:

* `List[DataRow]` - A list of DataRow objects that match the filter.

**Raises**:

* `AuthorisationError` - If the dataset API key is invalid.
* `ResourceNotFoundError` - If no dataset exists by the specified dataset EntityId.
* `UnknownError` - If an error occurs while retrieving the dataset.

#### refetch\_data

```python theme={"dark"}
def refetch_data() -> None
```

Refetch the dataset properties.

The Dataset class will only fetch its properties once. Use this function if you suspect the state of those
properties to be outdated.

#### get\_dataset

```python theme={"dark"}
def get_dataset() -> OrmDataset
```

Get the dataset instance.

This function is exposed for convenience. You are encouraged to use the property accessors instead.

**Returns**:

* `OrmDataset` - The dataset instance.

#### set\_access\_settings

```python theme={"dark"}
def set_access_settings(dataset_access_settings: DatasetAccessSettings,
                        *,
                        refetch_data: bool = True) -> None
```

Set access settings for the dataset.

**Arguments**:

* `dataset_access_settings` - The access settings to use going forward.
* `refetch_data` - Whether a `refetch_data()` call should follow the update of the dataset access settings.

#### add\_users

```python theme={"dark"}
def add_users(user_emails: List[str],
              user_role: DatasetUserRole) -> List[DatasetUser]
```

Add users to the dataset.

If the user already exists in the Dataset, this operation succeeds but the `user_role` remains unchanged. The
existing `user_role` is reflected in the `DatasetUser` instance returned.

**Arguments**:

* `user_emails` - List of user emails to be added.
* `user_role` - The user role to assign to all users.

**Returns**:

* `List[DatasetUser]` - A list of DatasetUser instances reflecting the added users.

#### list\_groups

```python theme={"dark"}
def list_groups() -> Iterable[DatasetGroup]
```

List all groups that have access to the dataset.

**Returns**:

* `Iterable[DatasetGroup]` - An iterable of DatasetGroup instances.

#### add\_group

```python theme={"dark"}
def add_group(group_hash: Union[List[UUID], UUID],
              user_role: DatasetUserRole) -> None
```

Add a group to the dataset.

**Arguments**:

* `group_hash` - List of group hashes to be added.
* `user_role` - The user role that the group will be given.

**Returns**:

None

#### remove\_group

```python theme={"dark"}
def remove_group(group_hash: Union[List[UUID], UUID]) -> None
```

Remove a group from the dataset.

**Arguments**:

* `group_hash` - List of group hashes to be removed.

**Returns**:

None

#### upload\_video

```python theme={"dark"}
def upload_video(
        file_path: Union[str, Path],
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
        folder: Optional[Union[UUID, StorageFolder]] = None) -> Video
```

Upload a video to Encord storage.

**Arguments**:

* `file_path` - Path to the video, e.g., '/home/user/data/video.mp4'.
* `cloud_upload_settings` - Settings for uploading data into the cloud. Change this object to overwrite the default values.
* `title` - The video title. If unspecified, this will be the file name. This title should include an extension. For example: "encord\_video.mp4".
* `folder` - When uploading to a non-mirror dataset, you have to specify the folder to store the file in. This can be either a [StorageFolder](/sdk-documentation/sdk-references/storage#storagefolder) instance or the UUID of the folder.

**Returns**:

* `Video` - An object describing the created video, see [Video](/sdk-documentation/sdk-references/orm.dataset#video).

**Raises**:

* `UploadOperationNotSupportedError` - If trying to upload to external datasets (e.g., S3/GPC/Azure).

#### create\_image\_group

```python theme={"dark"}
def create_image_group(
        file_paths: Iterable[Union[str, Path]],
        max_workers: Optional[int] = None,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
        *,
        create_video: bool = True,
        folder: Optional[Union[UUID,
                               StorageFolder]] = None) -> List[ImageGroup]
```

Create an image group in Encord storage.

Choose this type of image upload for sequential images. Else, you can choose the [upload\_image()](/sdk-documentation/sdk-references/dataset#upload_image) function.

**Arguments**:

* `file_paths` - A list of paths to images, e.g., \['/home/user/data/img1.png', '/home/user/data/img2.png'].
* `max_workers` - DEPRECATED: This argument will be ignored.
* `cloud_upload_settings` - Settings for uploading data into the cloud. Change this object to overwrite the default values.
* `title` - The title of the image group. If unspecified, this will be randomly generated for you. This title should NOT include an extension. For example, "encord\_image\_group".
* `create_video` - A flag specifying how image groups are stored. If `True`, a compressed video will be created from the image groups. `True` was the previous default support. If `False`, the images are saved as a sequence of images.
* `folder` - When uploading to a non-mirror dataset, you have to specify the folder to store the file in. This can be either a [StorageFolder](/sdk-documentation/sdk-references/storage#storagefolder) instance or the UUID of the folder.

**Returns**:

* `List[ImageGroup]` - A list containing the object(s) describing the created data unit(s). See [ImageGroup](/sdk-documentation/sdk-references/orm.dataset#imagegroup).

**Raises**:

* `UploadOperationNotSupportedError` - If trying to upload to external datasets (e.g., S3/GPC/Azure).
* `InvalidArgumentError` - If the folder is specified, but the dataset is a mirror dataset.

#### create\_dicom\_series

```python theme={"dark"}
def create_dicom_series(
        file_paths: Union[Collection[str], Collection[Path],
                          Collection[Union[Path, str]]],
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        title: Optional[str] = None,
        folder: Optional[Union[UUID, StorageFolder]] = None) -> Dict
```

Upload a DICOM series to Encord storage.

**Arguments**:

* `file_paths` - A collection of paths to DICOM files. Example: \['/home/user/data/DICOM\_1.dcm', '/home/user/data/DICOM\_2.dcm']
* `cloud_upload_settings` - Settings for uploading data into the cloud. Change this object to overwrite the default values.
* `title` - The title of the DICOM series. If unspecified, a random title will be generated. This title should NOT include an extension. Example: "encord\_image\_group"
* `folder` - When uploading to a non-mirror dataset, specify the folder to store the file in. This can be either a `StorageFolder` instance or the UUID of the folder.

**Returns**:

A dictionary describing the created series.

**Raises**:

* `UploadOperationNotSupportedError` - If trying to upload to external datasets (e.g., S3/GPC/Azure).
* `InvalidArgumentError` - If the folder is specified, but the dataset is a mirror dataset.

#### upload\_image

```python theme={"dark"}
def upload_image(
        file_path: Union[Path, str],
        title: Optional[str] = None,
        cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
        folder: Optional[Union[UUID, StorageFolder]] = None) -> Image
```

Upload a single image to Encord storage. For sequential images, consider creating an image group
using the [create\_image\_group()](/sdk-documentation/sdk-references/dataset#create_image_group) function.

**Arguments**:

* `file_path` - The file path to the image.
* `title` - The image title. If unspecified, the file name will be used. This title should include an extension. Example: "encord\_image.png"
* `cloud_upload_settings` - Settings for uploading data into the cloud. Change this object to overwrite the default values.
* `folder` - When uploading to a non-mirror dataset, specify the folder to store the file in. This can be either a `StorageFolder` instance or the UUID of the folder.

**Returns**:

Uploaded Image object.

#### link\_items

```python theme={"dark"}
def link_items(
    item_uuids: List[UUID],
    duplicates_behavior: DataLinkDuplicatesBehavior = DataLinkDuplicatesBehavior
    .SKIP
) -> List[DataRow]
```

Link storage items to the dataset, creating new data rows.

**Arguments**:

* `item_uuids` - List of item UUIDs to link to the dataset.
* `duplicates_behavior` - The behavior to follow when encountering duplicates. Defaults to `SKIP`. See also [DataLinkDuplicatesBehavior](/sdk-documentation/sdk-references/orm.dataset#datalinkduplicatesbehavior).

**Returns**:

List of DataRow objects representing linked items.

#### delete\_data

```python theme={"dark"}
def delete_data(data_hashes: Union[List[str], str])
```

Delete a video/image group from a dataset.

**Arguments**:

* `data_hashes` - List of hashes of the videos/image\_groups you'd like to delete. All should belong to the same dataset.

#### add\_private\_data\_to\_dataset

```python theme={"dark"}
def add_private_data_to_dataset(
        integration_id: str,
        private_files: Union[str, Dict, Path, TextIO],
        ignore_errors: bool = False) -> AddPrivateDataResponse
```

Append data hosted on a private cloud to an existing dataset.

For a more complete example of safe uploads, please follow the guide found in our docs under
:ref:`https://python.docs.encord.com/tutorials/datasets.html#adding-data-from-a-private-cloud`

**Arguments**:

* `integration_id` - The `EntityId` of the cloud integration you wish to use.
* `private_files` - A path to a JSON file, JSON string, Python dictionary, or a `Path` object containing the files you wish to add.
* `ignore_errors` - When set to `True`, prevent individual errors from stopping the upload process.

**Returns**:

List of DatasetDataInfo objects containing data\_hash and title.

#### add\_private\_data\_to\_dataset\_start

```python theme={"dark"}
def add_private_data_to_dataset_start(
        integration_id: str,
        private_files: Union[str, Dict, Path, TextIO],
        ignore_errors: bool = False,
        *,
        folder: Optional[Union[StorageFolder, UUID]] = None) -> str
```

Append data hosted on a private cloud to an existing dataset.

This method initializes the upload in Encord's backend.
Once the upload ID has been returned, you can exit the terminal
while the job continues uninterrupted.

You can check upload job status at any point using
the [add\_private\_data\_to\_dataset\_get\_result()](/sdk-documentation/sdk-references/dataset#add_private_data_to_dataset_get_result) method.
This can be done in a separate Python session to the one
where the upload was initialized.

**Arguments**:

* `integration_id` - The `EntityId` of the cloud integration you wish to use.
  private\_files:A path to a JSON file, JSON string, Python dictionary, or a `Path` object containing the files you wish to add.
* `ignore_errors` - When set to `True`, prevent individual errors from stopping the upload process.
* `folder` - When uploading to a non-mirror dataset, specify the folder to store the file in. This can be either a `StorageFolder` instance or the UUID of the folder.

**Returns**:

UUID Identifier of the upload job.
This ID enables the user to track the job progress via SDK or web app.

#### add\_private\_data\_to\_dataset\_get\_result

```python theme={"dark"}
def add_private_data_to_dataset_get_result(
        upload_job_id: str,
        timeout_seconds: int = 7 * 24 * 60 * 60) -> DatasetDataLongPolling
```

Fetch data upload status, perform long polling process for `timeout_seconds`.

**Arguments**:

* `upload_job_id` - UUID Identifier of the upload job. This ID enables the user to track the job progress via SDK or web app.
* `timeout_seconds` - Number of seconds the method will wait while waiting for a response. If `timeout_seconds == 0`, only a single checking request is performed. Response will be immediately returned.

**Returns**:

Response containing details about job status, errors, and progress.

#### update\_data\_item

```python theme={"dark"}
@deprecated(version="0.1.192", alternative="encord.storage.StorageItem.update")
def update_data_item(data_hash: str, new_title: str) -> bool
```

DEPRECATED: Use [update()](/sdk-documentation/sdk-references/storage#update) to update the
:attr:`encord.storage.StorageItem.name` of the underlying [StorageItem](/sdk-documentation/sdk-references/orm.storage#storageitem).

Update a data item.

**Arguments**:

* `data_hash` - Data hash of the item being updated.
* `new_title` - New title of the data item being updated.

**Returns**:

Boolean indicating whether the update was successful.

#### re\_encode\_data

```python theme={"dark"}
def re_encode_data(data_hashes: List[str])
```

Launch an async task that can re-encode a list of videos.

**Arguments**:

* `data_hashes` - List of hashes of the videos you'd like to re-encode. All should belong to the same dataset.

**Returns**:

Entity ID of the async task launched.

#### re\_encode\_data\_status

```python theme={"dark"}
def re_encode_data_status(job_id: int)
```

Returns the status of an existing async task aimed at re-encoding videos.

**Arguments**:

* `job_id` - ID of the async task that was launched to re-encode the videos.

**Returns**:

Object containing the status of the task, along with info about the new encoded videos
in case the task has been completed.

#### run\_ocr

```python theme={"dark"}
def run_ocr(image_group_id: str) -> List[ImageGroupOCR]
```

Returns an optical character recognition result for a given image group.

**Arguments**:

* `image_group_id` - The ID of the image group in this dataset to run OCR on.

**Returns**:

List of ImageGroupOCR objects representing the text and corresponding coordinates
found in each frame of the image group.

#### get\_cloud\_integrations

```python theme={"dark"}
@deprecated(version="0.1.154",
            alternative="EncordUserClient.get_cloud_integrations")
def get_cloud_integrations() -> List[CloudIntegration]
```

Retrieve a list of cloud integrations configured in Encord.

**Returns**:

List of CloudIntegration objects representing configured cloud integrations.
