De-identifying DICOM Files
DICOM files may contain sensitive and personally identifiable information (PII) about patients, including their name, date of birth, medical record number. In these cases it is essential to anonymize each file, to protect patient privacy and comply with legal and ethical regulations related to healthcare data.
In this tutorial you will learn how to anonymize / de-identify DICOM files in two steps:
Finally, you will learn how to interpret the JSON output of the de-identification process.
Set up the de-identification function
Adjust the de-identification function found here to suit your needs.
De-identifying
The Python code below is used to add criteria as well as call the de-identification function.
criteria
used to evaluate each file to suit your needs - any number of criteria can be used.Setting evaluation criteria
Evaluation criteria are conditions that determine whether a file will be de-identified or not. Criteria can take many forms, but will always return either ‘true’ or ‘false’.
There are two distinct criteria functions:
-
“SaveDeidentifiedDicomConditionNotSubstr” will return ‘true’ if the first argument (PRIMARY in the example above), is not contained in the second argument (ImageType in the example above). In plain English, the example above checks whether the file’s ImageType doesn’t contain the word ‘Primary’, and returns ‘true’ if this condition is fulfilled.
-
”SaveDeidentifiedDicomConditionIn” will return ‘true’ if the first argument ([“ct”,“pt”,“nm”,“mr”,“mg”,“pt”] in the example above) is contained in the second argument Modality. In plain English, the example above checks whether any of the strings contained in the list are contained within the file’s Modality. If any one of them is, the function returns ‘true’.
Output
This section will explain the output of the de-identification function. Click the dropdown below to see a sample output file
Key | Description | Notes |
---|---|---|
url | A URL to the DICOM file | |
StudyInstanceUID | The study’s ID | |
StudyInstanceUID_deid | A converted version of the study’s ID used by Encord | Identical to StudyInstanceUID , unless StudyInstanceUID was invalid |
SeriesInstanceUID | The series’ ID | |
SeriesInstanceUID_deid | A converted version of the series’ ID used by Encord | Identical to SeriesInstanceUID , unless SeriesInstanceUID was invalid |
save_conditions_evaluations | Contains a list of conditions | These need to be met in order for a file to be de-identified |
condition | Contains a given condition’s details | A condition is satisfied when the value fulfils the condition_type |
value | The elements being evaluated by the condition_type | Can be thought of as the ‘answer’ to a condition |
condition_type | The type of condition being evaluated | Can be thought of as the ‘question’ to a condition |
dicom_tag | The DICOM tag on which a condition is being evaluated | |
save_condition | Does the condition have to be true or false for the file to pass | |
save_condition_series_agg | The series as a whole evaluated as true or false | For this to be true all save_condition s need to be true For this to be false all save_conditions s need to be false |
save_disabling_urls_series_agg | A list of files which did not meet the required save_condition | Only relevant if the series didn’t pass |
url_deid | The URL of the new, de-identified file. |
Was this page helpful?