The Biggest Genome-and-Health Database Ever Built Just Opened - and 86 Percent of It Is People Medicine Used to Ignore

535,000 whole genomes linked to health records, from 747,000 volunteers - most of them from communities genomic research historically skipped. Precision medicine just got a dataset that looks like the country.

The largest collection of human genomes ever linked to real medical histories opened to researchers this week, and the headline number is not its size. It is who is in it [1].

The NIH's All of Us program released 535,000 whole-genome sequences tied to nearly 482,000 electronic health records, drawn from more than 747,000 volunteers, carrying over 1.3 billion genetic variants [1]. Bigger than anything before it - and built deliberately backwards from the field's history: more than 86 percent of participants come from communities medical research has long overlooked - racial and ethnic minorities, older adults, women, people with disabilities, rural Americans [1].

Who is in the dataset

Genomic research has historically skewed about 80-90 percent European-ancestry. All of Us inverts it. [1]

Data

Historical genomic datasets, European ancestry	85 percent of participants
All of Us, underrepresented communities	86 percent of participants

Why that inversion matters in a doctor's office: drug dosing, disease-risk scores, and diagnostic reference ranges are only as good as the people they were derived from. A risk score trained on one ancestry group misfires on others - quietly, at scale, for decades. A dataset where the overlooked majority is the majority means the next generation of those tools can work for the patients who were an asterisk in the last one [1][2].

The honest caveat: the program built this while under real budget pressure, and coverage notes its future funding is not assured [2]. That is a fight for another day. What exists now cannot be unbuilt - the sequences are sequenced, the records linked, the doors open to researchers. Half a million genomes that finally look like the country are on the table, and every clinic in America eventually inherits what gets learned from them.

See something we got wrong? Report an error and we will look into it. We correct the record, including our own.

The Biggest Genome-and-Health Database Ever Built Just Opened - and 86 Percent of It Is People Medicine Used to Ignore

Sources