Skip to content

Latest commit

 

History

History
56 lines (44 loc) · 3.13 KB

data-model.md

File metadata and controls

56 lines (44 loc) · 3.13 KB

Data model

IDC relies on DICOM data model for organizing images and image-derived data. At the same time, IDC includes certain attributes and data types that are outside of the DICOM data model. The Entity-Relationship (E-R) diagram and examples below summarize a simplified view of the IDC data model (you will find the explanation of how to interpret the notation used in this E-R diagram in this page from Mermaid documentation).

erDiagram
    COLLECTION ||--o{ CASE: contains
    CASE ||--o{ STUDY : contains
    STUDY ||--o{ SERIES : contains
    SERIES ||--o{ INSTANCE : contains
    ANALYSIS_RESULT ||--o{ SERIES : adds
    ANALYSIS_RESULT }o--o{ COLLECTION : spans
    CASE |o--o| CLINICAL_DATA : "may have"
    PROGRAM ||--o{ COLLECTION : contains

    PROGRAM {
        string program PK
    }

    COLLECTION {
        string collection_id 
        string source_doi
    }
    CASE {
        string PatientID
    }
    STUDY {
        string StudyInstanceUID
    }
    SERIES {
        string SeriesInstanceUID
    }
    INSTANCE {
        string SOPInstanceUID
    }
    ANALYSIS_RESULT {
        string analysis_result_id 
        string source_doi
    }
    CLINICAL_DATA {
        string CaseID
    }



Loading

IDC content is organized in Collections: groups of DICOM files that were collected through certain research activity.

Collections are organized into Programs, which group related collections, or those collections that were contributed under the same funding initiative or a consortium. Example: TCGA program contains TCGA-GBM, TCGA-BRCA and other collections. You will see Collections nested under Programs in the upper left section of the IDC Portal. You will also see the list of collections that meet the filter criteria in the top table on the right-hand side of the portal interface.

Individual DICOM files included in the collection contain attributes that organize content according to the data-model.md.

Each collection will contain data for one or more case, or patient. Data for the individual patient is organized in DICOM studies, which group images corresponding to a single imaging exam/enconter, and collected in a given session. Studies are composed of DICOM series, which in turn consist of DICOM instances. Each DICOM instance correspond to a single file on disk. As an example, in radiology imaging, individual instances would correspond to image slices in multi-slice acquisitions, and in digital pathology you will see a separate file/instance for each resolution layer of the image pyramid. When using IDC Portal, you will never encounter individual instances - you will only see them if you download data to your computer.

Analysis results collection is a very important concept in IDC. These contain analysis results that were not contributed as part of any specific collection. Such analysis results might be contributed by investigators unrelated to those that submitted the analyzed images, and may span images across multiple collections.