Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies.

TitleImputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies.
Publication TypeJournal Article
Year of Publication2018
AuthorsWojcik, GL, Fuchsberger, C, Taliun, D, Welch, R, Martin, AR, Shringarpure, S, Carlson, CS, Abecasis, G, Kang, HMin, Boehnke, M, Bustamante, CD, Gignoux, CR, Kenny, EE
JournalG3 (Bethesda)
Volume8
Issue10
Pagination3255-3267
Date Published2018 10 03
ISSN2160-1836
KeywordsComputational Biology, Databases, Nucleic Acid, Ethnic Groups, Genetic Association Studies, Genetics, Population, Genome-Wide Association Study, Humans, Linkage Disequilibrium, Models, Genetic, Polymorphism, Single Nucleotide, Reproducibility of Results, Selection, Genetic
Abstract

The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance mean imputed r at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.

DOI10.1534/g3.118.200502
Alternate JournalG3 (Bethesda)
PubMed ID30131328
PubMed Central IDPMC6169386
Grant ListU01 HG007417 / HG / NHGRI NIH HHS / United States
U01 HG007419 / HG / NHGRI NIH HHS / United States
R01 HG000376 / HG / NHGRI NIH HHS / United States
U01 HG007376 / HG / NHGRI NIH HHS / United States
S10 OD018522 / OD / NIH HHS / United States
U01 HG009080 / HG / NHGRI NIH HHS / United States
U01 DK062370 / DK / NIDDK NIH HHS / United States
T32 HG000044 / HG / NHGRI NIH HHS / United States
T32 GM007790 / GM / NIGMS NIH HHS / United States