The largest collection of human genomes ever linked to real medical histories opened to researchers this week, and the headline number is not its size. It is who is in it [1].
The NIH's All of Us program released 535,000 whole-genome sequences tied to nearly 482,000 electronic health records, drawn from more than 747,000 volunteers, carrying over 1.3 billion genetic variants [1]. Bigger than anything before it - and built deliberately backwards from the field's history: more than 86 percent of participants come from communities medical research has long overlooked - racial and ethnic minorities, older adults, women, people with disabilities, rural Americans [1].
Data
| Historical genomic datasets, European ancestry | 85 percent of participants |
|---|---|
| All of Us, underrepresented communities | 86 percent of participants |
Why that inversion matters in a doctor's office: drug dosing, disease-risk scores, and diagnostic reference ranges are only as good as the people they were derived from. A risk score trained on one ancestry group misfires on others - quietly, at scale, for decades. A dataset where the overlooked majority is the majority means the next generation of those tools can work for the patients who were an asterisk in the last one [1][2].
The honest caveat: the program built this while under real budget pressure, and coverage notes its future funding is not assured [2]. That is a fight for another day. What exists now cannot be unbuilt - the sequences are sequenced, the records linked, the doors open to researchers. Half a million genomes that finally look like the country are on the table, and every clinic in America eventually inherits what gets learned from them.