Skip to content

Conversation

@CloseChoice
Copy link
Contributor

@CloseChoice CloseChoice commented Oct 28, 2025

supports #7804
Add support for the dicom file format.

This PR follows PR #7815 and PR #7325 closely.
Remarkable differences:
I made sure that we can load all of pydicom's test data, and encountered the force=True parameter that we explicitly support here. This allows to trying to load corrupted dicom files, we explicitly test this!

There is one dataset with all of dicom's test data on huggingface which can be loaded using this branch with the following script:

from datasets import load_dataset
from datasets import Features, ClassLabel
from datasets.features import Dicom

features = Features({
    "dicom": Dicom(force=True),  # necessary to be able to load one corrupted file
    "label": ClassLabel(num_classes=2)
})

ds = load_dataset("TobiasPitters/dicom-sample-dataset",
                  features=features)

error_count = 0

for idx, item in enumerate(ds["test"]):
    dicom = item["dicom"]

    try:
        print(f"Type: {type(dicom)}")
        if hasattr(dicom, 'PatientID'):
            print(f"PatientID: {dicom.PatientID}")
        if hasattr(dicom, 'StudyInstanceUID'):
            print(f"StudyInstanceUID: {dicom.StudyInstanceUID}")
        if hasattr(dicom, 'Modality'):
            print(f"Modality: {dicom.Modality}")
    except Exception as e:
        error_count += 1
        print(e)

print(f"Finished processing with {error_count} errors.")

todo:

  • add docs (will do so soon)

@CloseChoice CloseChoice marked this pull request as ready for review October 28, 2025 11:35
@CloseChoice CloseChoice changed the title Add pydicom support Add DICOM support Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant