Dataset Directory

Robust, well-annotated data is indispensable for developing effective machine learning tools for the clinical environment. With the Dataset Directory, we are alleviating some of the burden from industry by connecting machine learning practitioners with accessible and meaningful datasets for their projects.

The list below has organizations to contact about their datasets or with datasets ready to be pulled directly from their websites. Send information on datasets that are not listed and report faulty links to DSI@acr.org.

Cancer Genome Atlas Cervical Kidney Renal Papillary Cell Carcinoma

The Cancer Genome Atlas Cervical Kidney Renal Papillary Cell Carcinoma (KIRP) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

26667 Instances

Cancer Genome Atlas Breast Invasive Carcinoma

The Cancer Genome Atlas Breast Invasive Carcinoma (TCGA-BRCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

230167 Instances

Cancer Genome Atlas Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma

 The Cancer Genome Atlas Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (TCGA-CESC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

19135 Instances

Cancer Genome Atlas Colon Adenocarcinoma

The Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). 

The Cancer Imaging Archive

8387 Instances

Cancer Genome Atlas Esophageal Carcinoma

The Cancer Genome Atlas Esophageal Carcinoma (TCGA-ESCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

20593 Instances

Cancer Genome Atlas Kidney Chromophobe

The Cancer Genome Atlas Kidney Chromophobe (TCGA-KICH) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

9221 Instances

Cancer Genome Atlas Liver Hepatocellular Carcinoma

The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

Cancer Imaging Archive

125397 Instances

Cancer Genome Atlas Low Grade Glioma

The Cancer Genome Atlas Low Grade Glioma (TCGA-LGG) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). 

The Cancer Imaging Archive

241183 Instances

Cancer Genome Atlas Lung Adenocarcinoma

The Cancer Genome Atlas Lung Adenocarcinoma (TCGA-LUAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

48931 Instances

Cancer Genome Atlas Ovarian Cancer

The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

536662 Instances

Cancer Genome Atlas Prostate Adenocarcinoma

The Cancer Genome Atlas Prostate Adenocarcinoma (TCGA-PRAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

16790 Instances

Cancer Genome Atlas Rectum Adenocarcinoma

The Cancer Genome Atlas Rectum Adenocarcinoma (TCGA-READ) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). 

The Cancer Imaging Archive

1786 Instances

Cancer Genome Atlas Sarcoma

The Cancer Genome Atlas Sarcoma (TCGA-SARC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

5653 Instances

Cancer Genome Atlas Stomach Adenocarcinoma

The Cancer Genome Atlas Stomach Adenocarcinoma (TCGA-STAD) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA). 

The Cancer Imaging Archive

43908 Instances

Cancer Genome Atlas Thyroid Cancer

The Cancer Genome Atlas Thyroid Cancer (TCGA-THCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

2780 Instances

Cancer Genome Atlas Urothelial Bladder Carcinoma

The Cancer Genome Atlas Urothelial Bladder Carcinoma (TCGA-BLCA) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

78429 Instances

Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma

The Cancer Genome Atlas Uterine Corpus Endometrial Carcinoma (TCGA-UCEC) data collection is part of a larger effort to build a research community focused on connecting cancer phenotypes to genotypes by providing clinical images matched to subjects from The Cancer Genome Atlas (TCGA).

The Cancer Imaging Archive

71674 Instances

CHESTXRAY14

Frontal view chest X-ray images labeled considering 14 common thorax disease conditions.

National Institutes of Health

112129 Instances

Digital Database for Screening Mammography

The Digital Database for Screening Mammography (DDSM) is a resource for use by the mammographic image analysis research community. The primary purpose of the database is to facilitate sound research in the development of computer algorithms to aid in screening. Secondary purposes of the database may include the development of algorithms to aid in the diagnosis and the development of teaching or training aids. 

University of South Florida

55899 Instances

MURA

MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal.

Stanford ML Group

40561 Instances

Osteoarthritis Initiative

The Osteoarthritis Initiative (OAI) is a multi-center, longitudinal, prospective observational study of knee osteoarthritis (OA). The overall aim of the OAI is to develop a public domain research resource to facilitate the scientific evaluation of biomarkers for osteoarthritis as potential surrogate endpoints for disease onset and progression.

NIMH Data Archive

8892 Instances

Pediatric Bone Age

A contest took place at RSNA 2017 to correctly identify the age of a child from an X-ray of their hand. This is the dataset on Kaggle, making it easier to experiment with and perform educational demos. Additionally there may be new ideas for building smarter models for handling X-ray images. 

Stanford

14236 Instances