Dataset
Access dataset related data and manipulate the dataset.
class encord.dataset.Dataset(client, orm_dataset)
property dataset_hash: str
Get the dataset hash (for example, the Dataset ID).
Return Type:
str
property title: str
Gets the title of the dataset.
Return Type:
str
property description: str
Gets the description of the Dataset.
Return Type:
str
property backing_folder_uuid: Optional[UUID]
The unique identifier for the Storage folder associated with this Dataset.
Return Type:
UUID
property storage_location: encord.orm.dataset.StorageLocation
Gets the storage location for the Dataset.
-
CORD_STORAGE = 0
-
AWS = 1
-
GCP = 2
-
AZURE = 3
-
OTC = 4
-
NEW_STORAGE = -99
This is a placeholder for a new storage location that is not yet supported by your SDK version. Update your SDK to the latest version.
Return Type:
property data_rows: List[encord.orm.dataset.DataRow]
Part of the response of this function can be configured by the [encord.dataset.Dataset.set_access_settings()] method.
dataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=True))
print(dataset.data_rows)
Return Type:
ListDataRow
list_data_rows
Gets dataset rows (pointers to data, labels).
list_data_rows(title_eq=None, title_like=None, created_before=None, created_after=None, data_types=None)
Parameters:
-
title_eq (Optional[str]) – optional exact title row filter
-
title_like (Optional[str]) – optional fuzzy title row filter; SQL syntax
-
created_before (Union[str, datetime, None]) – optional datetime row filter
-
created_after (Union[str, datetime, None]) – optional datetime row filter
-
data_types (Optional[List[DataType]]) – optional data types row filter
Return type:
ListDataRow
Returns:
A list of DataRows object that match the filter.
Raises:
-
[AuthorisationError] – If the dataset API key is invalid.
-
[ResourceNotFoundError] – If no dataset exists by the specified dataset EntityId.
-
[UnknownError] – If an error occurs while retrieving the dataset.
def list_data_rows(
self,
title_eq: Optional[str] = None,
title_like: Optional[str] = None,
created_before: Optional[Union[str, datetime]] = None,
created_after: Optional[Union[str, datetime]] = None,
data_types: Optional[List[DataType]] = None,
) -> List[DataRow]:
"""
Retrieve dataset rows (pointers to data, labels).
Args:
title_eq: optional exact title row filter
title_like: optional fuzzy title row filter; SQL syntax
created_before: optional datetime row filter
created_after: optional datetime row filter
data_types: optional data types row filter
Returns:
List[DataRow]: A list of DataRows object that match the filter
Raises:
AuthorisationError: If the dataset API key is invalid.
ResourceNotFoundError: If no dataset exists by the specified dataset EntityId.
UnknownError: If an error occurs while retrieving the dataset.
"""
return self._client.list_data_rows(title_eq, title_like, created_before, created_after, data_types)
refetch_data
The Dataset class only fetches its properties once. Use this function if you suspect the state of those properties to be dirty.
refetch_data()
Return type:
None
def get_dataset(self) -> OrmDataset:
"""
This function is exposed for convenience. You are encouraged to use the property accessors instead.
"""
return self._client.get_dataset()
get_dataset
This function is exposed for convenience. You are encouraged to use the property accessors instead.
get_dataset()
Return type:
def get_dataset(self) -> OrmDataset:
"""
This function is exposed for convenience. You are encouraged to use the property accessors instead.
"""
return self._client.get_dataset()
set_access_settings
Specifies access settings for a dataset.
set_access_settings(dataset_access_settings, *, refetch_data=True)
Parameters:
-
dataset_access_settings (DatasetAccessSettings) – The access settings to use going forward
-
refetch_data (bool) – Whether a refetch_data() call should follow the update of the dataset access settings.
Return type:
None
def set_access_settings(self, dataset_access_settings: DatasetAccessSettings, *, refetch_data: bool = True) -> None:
"""
Args:
dataset_access_settings: The access settings to use going forward
refetch_data: Whether a `refetch_data()` call should follow the update of the dataset access settings.
"""
self._client.set_access_settings(dataset_access_settings)
if refetch_data:
self.refetch_data()
add_users
Adds users to dataset.
If the user was already added, this operation succeeds but the user_role is unchanged. The existing user_role is reflected in the DatasetUser instance.
add_users(user_emails, user_role)
Parameters:
-
user_emails (List[str]) – list of user emails to be added
-
user_role (DatasetUserRole) – the user role to assign to all users
Return type:
List[DatasetUser]
def add_users(self, user_emails: List[str], user_role: DatasetUserRole) -> List[DatasetUser]:
"""
Add users to dataset. If the user was already added, this operation will succeed but the `user_role` will be
unchanged. The existing `user_role` will be reflected in the `DatasetUser` instance.
Args:
user_emails: list of user emails to be added
user_role: the user role to assign to all users
"""
return self._client.add_users(user_emails, user_role)
upload_video
Uploads one or more videos to Encord storage.
upload_video(file_path, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False), title=None)
Parameters:
-
file_path (str) – path to video e.g. ‘/home/user/data/video.mp4’
-
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
-
title (Optional[str]) – The video title. If unspecified, this will be the file name. This title should include an extension. For example “encord_video.mp4”.
Returns:
Boolean
Raises:
[UploadOperationNotSupportedError] – If trying to upload to external datasets (for example, S3/GPC/Azure).
def upload_video(
self,
file_path: str,
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
title: Optional[str] = None,
):
"""
Upload video to Encord storage.
Args:
file_path: path to video e.g. '/home/user/data/video.mp4'
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
title:
The video title. If unspecified, this will be the file name. This title should include an extension.
For example "encord_video.mp4".
Returns:
Bool.
Raises:
UploadOperationNotSupportedError: If trying to upload to external
datasets (e.g. S3/GPC/Azure)
"""
return self._client.upload_video(file_path, cloud_upload_settings=cloud_upload_settings, title=title)
create_image_group
Creates an image group in Encord storage.
Choose this type of image upload for sequential images, or use the Dataset.upload_image() function.
Parameters:
-
file_paths (Iterable[str]) – A list of paths to images, for example, [‘/home/user/data/img1.png’, ‘/home/user/data/img2.png’]
-
DEPRECATED: max_workers (Optional[int]) – This argument is ignored
-
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
-
title (Optional[str]) – The title of the image group. If unspecified a randomly generated title is created for you. This title should NOT include an extension. For example “encord_image_group”.
-
create_video (bool) – A flag specifying how image groups are stored. If True, a compressed video will be created from the image groups. True was the previous default support. If False, the images are saved as a sequence of images.
Return type:
Boolean
Returns:
Boolean
Raises:
[UploadOperationNotSupportedError] – If trying to upload to external datasets (for example, S3/GPC/Azure)
def create_image_group(
self,
file_paths: Iterable[str],
max_workers: Optional[int] = None,
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
title: Optional[str] = None,
*,
create_video: bool = True,
):
"""
Create an image group in Encord storage. Choose this type of image upload for sequential images. Else, you can
choose the :meth:`.Dataset.upload_image` function.
Args:
file_paths: a list of paths to images, e.g.
['/home/user/data/img1.png', '/home/user/data/img2.png']
max_workers:
DEPRECATED: This argument will be ignored
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
title:
The title of the image group. If unspecified this will be randomly generated for you. This title should
NOT include an extension. For example "encord_image_group".
create_video:
A flag specifying how image groups are stored. If `True`, a compressed video will be created from
the image groups. `True` was the previous default support. If `False`, the images
are saved as a sequence of images.
Returns:
Bool.
Raises:
UploadOperationNotSupportedError: If trying to upload to external
datasets (e.g. S3/GPC/Azure)
"""
return self._client.create_image_group(
file_paths,
cloud_upload_settings=cloud_upload_settings,
title=title,
create_video=create_video,
)
create_dicom_series
Uploads a DICOM series to Encord storage.
create_dicom_series(file_paths, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False), title=None)
Parameters:
-
file_paths (List[str]) – A list of paths to DICOM files, for example, [‘/home/user/data/DICOM_1.dcm’, ‘/home/user/data/DICOM_2.dcm’].
-
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
-
title (Optional[str]) – The title of the DICOM series. If unspecified a randomly generated title is created for you. This title should NOT include an extension. For example “dicom_series”.
Return type:
Boolean
Returns:
Boolean
Raises:
[UploadOperationNotSupportedError] – If trying to upload to external datasets (for example, S3/GPC/Azure)
def create_dicom_series(
self,
file_paths: List[str],
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
title: Optional[str] = None,
):
"""
Upload a DICOM series to Encord storage
Args:
file_paths: a list of paths to DICOM files, e.g.
['/home/user/data/DICOM_1.dcm', '/home/user/data/DICOM_2.dcm']
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
title:
The title of the DICOM series. If unspecified this will be randomly generated for you. This title should
NOT include an extension. For example "encord_image_group".
Returns:
Bool.
Raises:
UploadOperationNotSupportedError: If trying to upload to external
datasets (e.g. S3/GPC/Azure)
"""
return self._client.create_dicom_series(file_paths, cloud_upload_settings=cloud_upload_settings, title=title)
upload_image
Uploads a single image to Encord storage.
Tip
If your images are sequential we recommend creating an image group using the [Dataset.create_image_group()] function. For more information compare https://docs.encord.com/docs/annotate-images and https://docs.encord.com/docs/annotate-videos.
upload_image(file_path, title=None, cloud_upload_settings=CloudUploadSettings(max_retries=None, backoff_factor=None, allow_failures=False))
Parameters:
-
file_path (Union[Path, str]) – The file path to the image
-
title (Optional[str]) – The image title. If unspecified, this will be the file name. This title should include an extension. For example “encord_image.png”.
-
cloud_upload_settings (CloudUploadSettings) – Settings for uploading data into the cloud. Change this object to overwrite the default values.
Return type:
def upload_image(
self,
file_path: Union[Path, str],
title: Optional[str] = None,
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
) -> Image:
"""
Upload a single image to Encord storage. If your images are sequential we recommend creating an image group via
the :meth:`.Dataset.create_image_group` function. For more information please compare
https://docs.encord.com/docs/annotate-images and https://docs.encord.com/docs/annotate-videos
Args:
file_path: The file path to the image
title: The image title. If unspecified, this will be the file name. This title should include an extension.
For example "encord_image.png".
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
"""
return self._client.upload_image(file_path, title, cloud_upload_settings)
delete_image_group
Deletes an image group in Encord storage.
delete_image_group(data_hash)
Parameters:
data_hash (str) – the hash of the image group to delete
def delete_image_group(self, data_hash: str):
"""
Delete an image group in Encord storage.
Args:
data_hash: the hash of the image group to delete
"""
return self._client.delete_image_group(data_hash)
delete_data
Deletes a video/image group from a Dataset.
delete_data(data_hashes)
Parameters:
data_hashes (List[str]) – list of hash of the videos/image_groups you’d like to delete, all should belong to the same Dataset
def delete_data(self, data_hashes: List[str]):
"""
Delete a video/image group from a dataset.
Args:
data_hashes: list of hash of the videos/image_groups you'd like to delete, all should belong to the same
dataset
"""
return self._client.delete_data(data_hashes)
add_private_data_to_dataset
Appends data hosted on a private cloud to an existing dataset.
Tip
For a more complete example of safe uploads, refer to our documentation here.
add_private_data_to_dataset(integration_id, private_files, ignore_errors=False)
Parameters:
-
integration_id (str) – The EntityId of the cloud integration you wish to use.
-
private_files (Union[str, Dict, Path, TextIO]) – A str path or Path object to a json file, json str or python dictionary of the files you wish to add
-
ignore_errors (bool) – When set to True, this will prevent individual errors from stopping the upload process.
Return type:
Returns:
add_private_data_response List of DatasetDataInfo objects containing data_hash and title.
def add_private_data_to_dataset(
self,
integration_id: str,
private_files: Union[str, Dict, Path, TextIO],
ignore_errors: bool = False,
) -> AddPrivateDataResponse:
"""
Append data hosted on a private cloud to an existing dataset.
Args:
integration_id:
The `EntityId` of the cloud integration you wish to use.
private_files:
A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
ignore_errors:
When set to `True`, this will prevent individual errors from stopping the upload process.
Returns:
add_private_data_response List of DatasetDataInfo objects containing data_hash and title
"""
return self._client.add_private_data_to_dataset(integration_id, private_files, ignore_errors)
add_private_data_to_dataset_start
Appends data hosted on a private cloud to an existing dataset.
This method inititalizes the upload in Encord’s backend. Once the upload id has been returned, you can exit the terminal while the job continues uninterrupted.
You can check upload job status at any point using the add_private_data_to_dataset_get_result() method. This can be done in a separate python session to the one where the upload was initialized.
add_private_data_to_dataset_start(integration_id, private_files, ignore_errors=False)
Parameters:
-
integration_id (str) – The EntityId of the cloud integration you wish to use.
-
private_files (Union[str, Dict, Path, TextIO]) – A str path or Path object to a json file, json str or python dictionary of the files you wish to add
-
ignore_errors (bool) – When set to True, this will prevent individual errors from stopping the upload process.
Return type:
String
Returns:
upload_job_id - UUID Identifier of upload job. This id enables the user to track the job progress using the SDK or Encord platform.
def add_private_data_to_dataset_start(
self,
integration_id: str,
private_files: Union[str, Dict, Path, TextIO],
ignore_errors: bool = False,
) -> str:
"""
Append data hosted on a private cloud to an existing dataset.
This method inititalizes the upload in Encord's backend.
Once the upload id has been returned, you can exit the terminal
while the job continues uninterrupted.
You can check upload job status at any point using
the :meth:`add_private_data_to_dataset_get_result` method.
This can be done in a separate python session to the one
where the upload was initialized.
Args:
integration_id:
The `EntityId` of the cloud integration you wish to use.
private_files:
A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
ignore_errors:
When set to `True`, this will prevent individual errors from stopping the upload process.
Returns:
str
`upload_job_id` - UUID Identifier of upload job.
This id enables the user to track the job progress via SDK, or web app.
"""
return self._client.add_private_data_to_dataset_start(integration_id, private_files, ignore_errors)
add_private_data_to_dataset_get_result
Gets data upload status. Performs long polling process for timeout_seconds.
add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=604800)
Parameters:
-
upload_job_id (str) – UUID Identifier of upload job. This id enables the user to track the job progress using the SDK or web-app.
-
timeout_seconds (int) – Number of seconds the method will wait while waiting for a response. If timeout_seconds == 0, only a single checking request is performed. Response will be immediately returned.
Return type:
Returns:
DatasetDataLongPolling: Response containing details about job status, errors and progress.
def add_private_data_to_dataset_get_result(
self,
upload_job_id: str,
timeout_seconds: int = 7 * 24 * 60 * 60, # 7 days
) -> DatasetDataLongPolling:
"""
Fetch data upload status, perform long polling process for `timeout_seconds`.
Args:
upload_job_id:
UUID Identifier of upload job. This id enables the user to track the job progress via SDK, or web app.
timeout_seconds:
Number of seconds the method will wait while waiting for a response.
If `timeout_seconds == 0`, only a single checking request is performed.
Response will be immediately returned.
Returns:
DatasetDataLongPolling
Response containing details about job status, errors and progress.
"""
return self._client.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds)
DEPRECATED - update_data_item
DEPRECATED: Updates a data item.
Use the individual setter properties of the respective [encord.orm.dataset.DataRow] instance instead. These can be retrieved using the [Dataset.data_rows()] function.
update_data_item(data_hash, new_title)
Parameters:
-
data_hash (str) – Data hash of the item being updated
-
new_title (str) – String containing the new title of the data item being updated.
Return type:
Boolean
Returns:
Returns a boolean for whether the update was successful.
def update_data_item(self, data_hash: str, new_title: str) -> bool:
"""
DEPRECATED: Use the individual setter properties of the respective :class:`encord.orm.dataset.DataRow`
instance instead. These can be retrieved via the :meth:`.Dataset.data_rows` function.
Update a data item
Args:
data_hash: str
Data hash of the item being updated
new_title:
String containing the new title of the data item being updated
Returns:
Returns a boolean for whether the update was successful
"""
return self._client.update_data_item(data_hash, new_title)
re_encode_data
Launches an async task that can re-encode a list of videos.
re_encode_data(data_hashes)
Parameters:
data_hashes (List[str]) – list of hash of the videos you’d like to re_encode, all should belong to the same dataset.
Returns:
EntityId(integer) of the async task launched.
def re_encode_data(self, data_hashes: List[str]):
"""
Launches an async task that can re-encode a list of videos.
Args:
data_hashes: list of hash of the videos you'd like to re_encode, all should belong to the same
dataset
Returns:
EntityId(integer) of the async task launched.
"""
return self._client.re_encode_data(data_hashes)
re_encode_data_status
Returns the status of an existing async task which is aimed at re-encoding videos.
re_encode_data_status(job_id)
Parameters:
job_id (int) – ID of the async task that was launched to re-encode the videos.
Return type:
Returns:
Object containing the status of the task, along with info about the new encoded videos in case the task has been completed.
def re_encode_data_status(self, job_id: int):
"""
Returns the status of an existing async task which is aimed at re-encoding videos.
Args:
job_id: id of the async task that was launched to re-encode the videos
Returns:
ReEncodeVideoTask: Object containing the status of the task, along with info about the new encoded videos
in case the task has been completed
"""
return self._client.re_encode_data_status(job_id)
run_ocr
Returns an optical character recognition result for a given image group :type image_group_id: str :param image_group_id: the id of the image group in this dataset to run OCR on.
run_ocr(image_group_id)
Return type:
ListImageGroupOCR
Returns:
Returns a list of ImageGroupOCR objects representing the text and corresponding coordinates found in each frame of the image group.
def run_ocr(self, image_group_id: str) -> List[ImageGroupOCR]:
"""
Returns an optical character recognition result for a given image group
Args:
image_group_id: the id of the image group in this dataset to run OCR on
Returns:
Returns a list of ImageGroupOCR objects representing the text and corresponding coordinates
found in each frame of the image group
"""
return self._client.run_ocr(image_group_id)
get_cloud_integrations
Gets the cloud integration information for a dataset.
get_cloud_integrations()
Return type:
ListCloudIntegration
def get_cloud_integrations(self) -> List[CloudIntegration]:
return self._client.get_cloud_integrations()
list_groups
List all user groups that belong to a specified Dataset.
list_groups()
Return type:
[DatasetGroup]
dataset_hash = convert_to_uuid(self.dataset_hash)
page = self._client.list_groups(dataset_hash)
yield from page.results
add_group
Add a user group to a specified Dataset.
add_group(group_hash, user_role)
Parameters:
- group_hash: A group hash, or a list of group hashes to be added to the Dataset.
- user_role: Specify the Dataset user role to assign to the user group. Dataset user roles are either Admin, or User. See DatasetUserRole for more information.
Return type:
None
if isinstance(group_hash, UUID):
group_hash = [group_hash]
self._client.add_groups(self.dataset_hash, group_hash, user_role)
remove_group
Remove one, or multiple groups from a specified Dataset.
remove_group(group_hash)
Parameters:
- group_hash: A group hash, or a list of group hashes to be added to the Dataset.
Return type:
None
if isinstance(group_hash, UUID):
group_hash = [group_hash]
dataset_hash = convert_to_uuid(self.dataset_hash)
self._client.remove_groups(dataset_hash, group_hash)
Source
from datetime import datetime
from pathlib import Path
from typing import Dict, Iterable, List, Optional, TextIO, Union
from encord.client import EncordClientDataset
from encord.constants.enums import DataType
from encord.http.utils import CloudUploadSettings
from encord.orm.cloud_integration import CloudIntegration
from encord.orm.dataset import AddPrivateDataResponse, DataRow
from encord.orm.dataset import Dataset as OrmDataset
from encord.orm.dataset import (
DatasetAccessSettings,
DatasetDataLongPolling,
DatasetUser,
DatasetUserRole,
Image,
ImageGroupOCR,
StorageLocation,
)
class Dataset:
"""
Access dataset related data and manipulate the dataset.
"""
def __init__(self, client: EncordClientDataset, orm_dataset: OrmDataset):
self._client = client
self._dataset_instance = orm_dataset
@property
def dataset_hash(self) -> str:
"""
Get the dataset hash (i.e. the Dataset ID).
"""
return self._dataset_instance.dataset_hash
@property
def title(self) -> str:
return self._dataset_instance.title
@property
def description(self) -> str:
return self._dataset_instance.description
@property
def storage_location(self) -> StorageLocation:
return self._dataset_instance.storage_location
@property
def data_rows(self) -> List[DataRow]:
"""
Part of the response of this function can be configured by the
:meth:`encord.dataset.Dataset.set_access_settings` method.
.. code::
dataset.set_access_settings(DatasetAccessSettings(fetch_client_metadata=True))
print(dataset.data_rows)
"""
return self._dataset_instance.data_rows
def list_data_rows(
self,
title_eq: Optional[str] = None,
title_like: Optional[str] = None,
created_before: Optional[Union[str, datetime]] = None,
created_after: Optional[Union[str, datetime]] = None,
data_types: Optional[List[DataType]] = None,
) -> List[DataRow]:
"""
Retrieve dataset rows (pointers to data, labels).
Args:
title_eq: optional exact title row filter
title_like: optional fuzzy title row filter; SQL syntax
created_before: optional datetime row filter
created_after: optional datetime row filter
data_types: optional data types row filter
Returns:
List[DataRow]: A list of DataRows object that match the filter
Raises:
AuthorisationError: If the dataset API key is invalid.
ResourceNotFoundError: If no dataset exists by the specified dataset EntityId.
UnknownError: If an error occurs while retrieving the dataset.
"""
return self._client.list_data_rows(title_eq, title_like, created_before, created_after, data_types)
def refetch_data(self) -> None:
"""
The Dataset class will only fetch its properties once. Use this function if you suspect the state of those
properties to be dirty.
"""
self._dataset_instance = self._client.get_dataset()
def get_dataset(self) -> OrmDataset:
"""
This function is exposed for convenience. You are encouraged to use the property accessors instead.
"""
return self._client.get_dataset()
def set_access_settings(self, dataset_access_settings: DatasetAccessSettings, *, refetch_data: bool = True) -> None:
"""
Args:
dataset_access_settings: The access settings to use going forward
refetch_data: Whether a `refetch_data()` call should follow the update of the dataset access settings.
"""
self._client.set_access_settings(dataset_access_settings)
if refetch_data:
self.refetch_data()
def add_users(self, user_emails: List[str], user_role: DatasetUserRole) -> List[DatasetUser]:
"""
Add users to dataset. If the user was already added, this operation will succeed but the `user_role` will be
unchanged. The existing `user_role` will be reflected in the `DatasetUser` instance.
Args:
user_emails: list of user emails to be added
user_role: the user role to assign to all users
"""
return self._client.add_users(user_emails, user_role)
def upload_video(
self,
file_path: str,
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
title: Optional[str] = None,
):
"""
Upload video to Encord storage.
Args:
file_path: path to video e.g. '/home/user/data/video.mp4'
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
title:
The video title. If unspecified, this will be the file name. This title should include an extension.
For example "encord_video.mp4".
Returns:
Bool.
Raises:
UploadOperationNotSupportedError: If trying to upload to external
datasets (e.g. S3/GPC/Azure)
"""
return self._client.upload_video(file_path, cloud_upload_settings=cloud_upload_settings, title=title)
def create_image_group(
self,
file_paths: Iterable[str],
max_workers: Optional[int] = None,
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
title: Optional[str] = None,
*,
create_video: bool = True,
):
"""
Create an image group in Encord storage. Choose this type of image upload for sequential images. Else, you can
choose the :meth:`.Dataset.upload_image` function.
Args:
file_paths: a list of paths to images, e.g.
['/home/user/data/img1.png', '/home/user/data/img2.png']
max_workers:
DEPRECATED: This argument will be ignored
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
title:
The title of the image group. If unspecified this will be randomly generated for you. This title should
NOT include an extension. For example "encord_image_group".
create_video:
A flag specifying how image groups are stored. If `True`, a compressed video will be created from
the image groups. `True` was the previous default support. If `False`, the images
are saved as a sequence of images.
Returns:
Bool.
Raises:
UploadOperationNotSupportedError: If trying to upload to external
datasets (e.g. S3/GPC/Azure)
"""
return self._client.create_image_group(
file_paths,
cloud_upload_settings=cloud_upload_settings,
title=title,
create_video=create_video,
)
def create_dicom_series(
self,
file_paths: List[str],
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
title: Optional[str] = None,
):
"""
Upload a DICOM series to Encord storage
Args:
file_paths: a list of paths to DICOM files, e.g.
['/home/user/data/DICOM_1.dcm', '/home/user/data/DICOM_2.dcm']
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
title:
The title of the DICOM series. If unspecified this will be randomly generated for you. This title should
NOT include an extension. For example "encord_image_group".
Returns:
Bool.
Raises:
UploadOperationNotSupportedError: If trying to upload to external
datasets (e.g. S3/GPC/Azure)
"""
return self._client.create_dicom_series(file_paths, cloud_upload_settings=cloud_upload_settings, title=title)
def upload_image(
self,
file_path: Union[Path, str],
title: Optional[str] = None,
cloud_upload_settings: CloudUploadSettings = CloudUploadSettings(),
) -> Image:
"""
Upload a single image to Encord storage. If your images are sequential we recommend creating an image group via
the :meth:`.Dataset.create_image_group` function. For more information please compare
https://docs.encord.com/docs/annotate-images and https://docs.encord.com/docs/annotate-videos
Args:
file_path: The file path to the image
title: The image title. If unspecified, this will be the file name. This title should include an extension.
For example "encord_image.png".
cloud_upload_settings:
Settings for uploading data into the cloud. Change this object to overwrite the default values.
"""
return self._client.upload_image(file_path, title, cloud_upload_settings)
def delete_image_group(self, data_hash: str):
"""
Delete an image group in Encord storage.
Args:
data_hash: the hash of the image group to delete
"""
return self._client.delete_image_group(data_hash)
def delete_data(self, data_hashes: List[str]):
"""
Delete a video/image group from a dataset.
Args:
data_hashes: list of hash of the videos/image_groups you'd like to delete, all should belong to the same
dataset
"""
return self._client.delete_data(data_hashes)
def add_private_data_to_dataset(
self,
integration_id: str,
private_files: Union[str, Dict, Path, TextIO],
ignore_errors: bool = False,
) -> AddPrivateDataResponse:
"""
Append data hosted on a private cloud to an existing dataset.
Args:
integration_id:
The `EntityId` of the cloud integration you wish to use.
private_files:
A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
ignore_errors:
When set to `True`, this will prevent individual errors from stopping the upload process.
Returns:
add_private_data_response List of DatasetDataInfo objects containing data_hash and title
"""
return self._client.add_private_data_to_dataset(integration_id, private_files, ignore_errors)
def add_private_data_to_dataset_start(
self,
integration_id: str,
private_files: Union[str, Dict, Path, TextIO],
ignore_errors: bool = False,
) -> str:
"""
Append data hosted on a private cloud to an existing dataset.
This method inititalizes the upload in Encord's backend.
Once the upload id has been returned, you can exit the terminal
while the job continues uninterrupted.
You can check upload job status at any point using
the :meth:`add_private_data_to_dataset_get_result` method.
This can be done in a separate python session to the one
where the upload was initialized.
Args:
integration_id:
The `EntityId` of the cloud integration you wish to use.
private_files:
A `str` path or `Path` object to a json file, json str or python dictionary of the files you wish to add
ignore_errors:
When set to `True`, this will prevent individual errors from stopping the upload process.
Returns:
str
`upload_job_id` - UUID Identifier of upload job.
This id enables the user to track the job progress via SDK, or web app.
"""
return self._client.add_private_data_to_dataset_start(integration_id, private_files, ignore_errors)
def add_private_data_to_dataset_get_result(
self,
upload_job_id: str,
timeout_seconds: int = 7 * 24 * 60 * 60, # 7 days
) -> DatasetDataLongPolling:
"""
Fetch data upload status, perform long polling process for `timeout_seconds`.
Args:
upload_job_id:
UUID Identifier of upload job. This id enables the user to track the job progress via SDK, or web app.
timeout_seconds:
Number of seconds the method will wait while waiting for a response.
If `timeout_seconds == 0`, only a single checking request is performed.
Response will be immediately returned.
Returns:
DatasetDataLongPolling
Response containing details about job status, errors and progress.
"""
return self._client.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds)
def update_data_item(self, data_hash: str, new_title: str) -> bool:
"""
DEPRECATED: Use the individual setter properties of the respective :class:`encord.orm.dataset.DataRow`
instance instead. These can be retrieved via the :meth:`.Dataset.data_rows` function.
Update a data item
Args:
data_hash: str
Data hash of the item being updated
new_title:
String containing the new title of the data item being updated
Returns:
Returns a boolean for whether the update was successful
"""
return self._client.update_data_item(data_hash, new_title)
def re_encode_data(self, data_hashes: List[str]):
"""
Launches an async task that can re-encode a list of videos.
Args:
data_hashes: list of hash of the videos you'd like to re_encode, all should belong to the same
dataset
Returns:
EntityId(integer) of the async task launched.
"""
return self._client.re_encode_data(data_hashes)
def re_encode_data_status(self, job_id: int):
"""
Returns the status of an existing async task which is aimed at re-encoding videos.
Args:
job_id: id of the async task that was launched to re-encode the videos
Returns:
ReEncodeVideoTask: Object containing the status of the task, along with info about the new encoded videos
in case the task has been completed
"""
return self._client.re_encode_data_status(job_id)
def run_ocr(self, image_group_id: str) -> List[ImageGroupOCR]:
"""
Returns an optical character recognition result for a given image group
Args:
image_group_id: the id of the image group in this dataset to run OCR on
Returns:
Returns a list of ImageGroupOCR objects representing the text and corresponding coordinates
found in each frame of the image group
"""
return self._client.run_ocr(image_group_id)
def get_cloud_integrations(self) -> List[CloudIntegration]:
return self._client.get_cloud_integrations()