PanNuke

Dataset Information

The PanNuke dataset is a semi-automatically generated pathology dataset for nuclear instance segmentation, comprehensively covering nuclear labels of 19 different tissue types. The dataset contains a total of 7,904 images and 205,343 annotated nuclei, each with an instance segmentation mask and corresponding cell type labels (tumor epithelial cells, inflammatory cells, connective tissue cells). These images are sourced from the TCGA and GTEx projects.

The significance of the PanNuke dataset lies in its provision of a large-scale, diverse benchmark for training and evaluating machine learning models for nuclear detection, segmentation, and classification. This is crucial for the development of automated digital pathology tools that can help pathologists diagnose cancer and other diseases more accurately and efficiently, thereby improving patient outcomes. Additionally, the PanNuke dataset provides researchers with a valuable resource for studying the morphology and distribution characteristics of nuclei in different tissues and disease states. This can lead to a deeper understanding of disease mechanisms and provide clues for discovering new therapeutic targets and biomarkers.

Dataset Meta Information

Dimensions	Modality	Task Type	Anatomical Structures	Number of Categories	Data Volume	File Format
2D	Pathology	Segmentation	19 structures including Kidneys, Liver, etc.	6	7904	.npy

Resolution Details

Dataset Statistics	size
min	(224, 224)
median	(224, 224)
max	(224, 224)

Label Information Statistics

The dataset contains 47,055 labeled nucleus images with instance segmentation masks, mainly including 5 categories of nuclei and 1 image without nuclei.

Category	Count	Percentage
Neoplastic	20,414	43.38%
Non-Neoplastic Epithelial	8,380	17.81%
Inflammatory	9,840	20.69%
Connective	5,374	11.42%
Dead	2,547	5.41%
Non-Nuclei	500	1.06%

Visualization

Real annotation examples.

File Structure

The official dataset consists of three parts: fold 1, fold 2, and fold 3. The file structure of each part is as follows. The images folder contains image data, and the masks folder contains segmentation annotations.

Dataset
│
├── fold 1
│   ├── images
│   │   ├── fold 1
│   │   │  ├── images.npy
│   │   │  ├── types.npy
│   │   ├── GTVnd.nii.gz
│   │   ├── GTVp.nii.gz
│   └── masks
│   │   ├── fold 1
│   │   │  ├── masks.npy
│   │   ├── by-nc-sa.md
│   │   ├── README.md
│   └── README.md
├── fold2
│   ...
├── fold3
│   ...

Authors and Institutions

Jevgenij Gamper (Department of Computer Science, University of Warwick, UK)

Navid Alemi Koohbanani (Department of Computer Science, University of Warwick, UK)

Ksenija Benes (Department of Pathology, The Royal Wolverhampton NHS Trust, UK)

Simon Graham (Department of Computer Science, University of Warwick, UK)

Mostafa Jahanifar (R&D Department, NRP Co., Tehran, Iran)

Seyyed Ali Khurram (Department of Clinical Dentistry, University of Sheffield, UK)

Ayesha Azam (Department of Pathology, University Hospitals Coventry and Warwickshire, UK)

Katherine Hewitt (Department of Pathology, University Hospitals Coventry and Warwickshire, UK)

Nasir Rajpoot (Department of Computer Science, University of Warwick, UK)

Source Information

Official Website: https://warwick.ac.uk/fac/sci/dcs/research/tia/data/pannuke

Download Link: https://warwick.ac.uk/fac/sci/dcs/research/tia/data/pannuke

Article Address: https://arxiv.org/pdf/2003.10778v7

Publication Date: 2020-03

Citation

@article{gamper2020pannuke,
  title={Pannuke dataset extension, insights and baselines},
  author={Gamper, Jevgenij and Koohbanani, Navid Alemi and Benes, Ksenija and Graham, Simon and Jahanifar, Mostafa and Khurram, Syed Ali and Azam, Ayesha and Hewitt, Katherine and Rajpoot, Nasir},
  journal={arXiv preprint arXiv:2003.10778},
  year={2020}
}

Original introduction article is here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PanNuke.md

PanNuke.md

PanNuke

Dataset Information

Dataset Meta Information

Resolution Details

Label Information Statistics

Visualization

File Structure

Authors and Institutions

Source Information

Citation

Files

PanNuke.md

Latest commit

History

PanNuke.md

File metadata and controls

PanNuke

Dataset Information

Dataset Meta Information

Resolution Details

Label Information Statistics

Visualization

File Structure

Authors and Institutions

Source Information

Citation