Upload private cloud data
All types of data (videos, images, image groups, image sequences, and DICOM) from a private cloud are added to a Dataset in the exact same way.
Use the script below to upload your private cloud data to a specified Dataset.
- Replace <dataset_hash> with the ID of the Dataset you want to upload your data to.
- Replace <integration_title> with the title of the integration you want to use. You can see all available integrations in the Encord platform, or using the SDK.
- Replace
path/to/json/file.json
with the path to your JSON file.
The script has several possible outputs:
-
"Upload is still in progress, try again later!": The upload has not finished. Run this script again later to check if the upload has finished.
-
"Upload completed": The upload completed. If any files failed to upload, the URLs are listed.
-
"Upload failed": The entire upload failed, and not just individual files. Ensure your JSON file is formatted correctly.
# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus
# Instantiate user client. Replace <private_key_path> with the path to your private key
user_client = EncordUserClient.create_with_ssh_private_key(ssh_private_key_path="<private_key_path>")
# Specify the dataset you want to upload data to by replacing <dataset_hash> with the dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
# Specify the integration you want to upload data to by replacing <integration_title> with the integration title
integrations = user_client.get_cloud_integrations()
integration_idx = [i.title for i in integrations].index("<integration_title>")
integration = integrations[integration_idx].id
# Initiate cloud data upload. Replace path/to/json/file.json with the path to your JSON file
upload_job_id = dataset.add_private_data_to_dataset_start(
integration, "path/to/json/file.json", ignore_errors=True
)
# timeout_seconds determines how long the code will wait after initiating upload until continuing and checking upload status
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")
if res.status == LongPollingStatus.PENDING:
print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
print("Upload completed")
else:
print(f"Upload failed: {res.errors}")
add_private_data_to_dataset job started with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
SDK process can be terminated, this will not affect successful job execution.
You can follow the progress in the web app via notifications.
add_private_data_to_dataset job completed with upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727.
Execution result: DatasetDataLongPolling(status=<LongPollingStatus.DONE: 'DONE'>, data_hashes_with_titles=[DatasetDataInfo(data_hash='cd42333d-8014-46q7-837b-5bf68b9b5', title='funny_image.jpg')], errors=[], units_pending_count=0, units_done_count=1, units_error_count=0)
Upload completed
Check data upload
If the code returns "Upload is still in progress, try again later!"
, run the following code to query the Encord server again. Ensure that you replace <upload_job_id>
with the output by the previous code. In the example above upload_job_id=c4026edb-4fw2-40a0-8f05-a1af7f465727
.
The script has several possible outputs:
-
"Upload is still in progress, try again later!": The upload has not finished. Run this script again later to check if the upload has finished.
-
"Upload completed": The upload completed. If any files failed to upload, the URLs are listed.
-
"Upload failed": The entire upload failed, and not just individual files. Ensure your JSON file is formatted correctly.
# Import dependencies
from encord import EncordUserClient
from encord.orm.dataset import LongPollingStatus
upload_job_id = <upload_job_id>
# Authenticate with Encord using the path to your private key.
user_client = EncordUserClient.create_with_ssh_private_key(
ssh_private_key_path="<private_key_path>"
)
# Specify the dataset you want to upload data to by replacing <dataset_hash> with the dataset hash
dataset = user_client.get_dataset("<dataset_hash>")
res = dataset.add_private_data_to_dataset_get_result(upload_job_id, timeout_seconds=5)
print(f"Execution result: {res}")
if res.status == LongPollingStatus.PENDING:
print("Upload is still in progress, try again later!")
elif res.status == LongPollingStatus.DONE:
print("Upload completed")
if res.data_unit_errors:
print("The following URLs failed to upload:")
for e in res.data_unit_errors:
print(e.object_urls)
else:
print(f"Upload failed: {res.errors}")
Tip
Omitting the
timeout_seconds
argument from the add_private_data_to_dataset_get_result() method performs status checks until the status upload has finished.