Submitted by ja607 on
Title | On the cross-population generalizability of gene expression prediction models. |
Publication Type | Journal Article |
Year of Publication | 2020 |
Authors | Keys, KL, C Y Mak, A, White, MJ, Eckalbar, WL, Dahl, AW, Mefford, J, Mikhaylova, AV, Contreras, MG, Elhawary, JR, Eng, C, Hu, D, Huntsman, S, Oh, SS, Salazar, S, LeNoir, MA, Ye, JC, Thornton, TA, Zaitlen, N, Burchard, EG, Gignoux, CR |
Journal | PLoS Genet |
Volume | 16 |
Issue | 8 |
Pagination | e1008927 |
Date Published | 2020 08 |
ISSN | 1553-7404 |
Keywords | African Americans, Gene Expression Profiling, Genome-Wide Association Study, Humans, Models, Genetic, Quantitative Trait Loci, Reference Standards, RNA-Seq, Transcriptome |
Abstract | The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction. |
DOI | 10.1371/journal.pgen.1008927 |
Alternate Journal | PLoS Genet |
PubMed ID | 32797036 |
PubMed Central ID | PMC7449671 |
Grant List | R01 HL117004 / HL / NHLBI NIH HHS / United States TL4 GM118986 / GM / NIGMS NIH HHS / United States P60 MD006902 / MD / NIMHD NIH HHS / United States R01 ES015794 / ES / NIEHS NIH HHS / United States T34 GM008574 / GM / NIGMS NIH HHS / United States R01 HL128439 / HL / NHLBI NIH HHS / United States T32 HG000044 / HG / NHGRI NIH HHS / United States K01 HL140218 / HL / NHLBI NIH HHS / United States R56 HG010297 / HG / NHGRI NIH HHS / United States U01 HG007419 / HG / NHGRI NIH HHS / United States U01 HG009080 / HG / NHGRI NIH HHS / United States R01 HL135156 / HL / NHLBI NIH HHS / United States R21 ES024844 / ES / NIEHS NIH HHS / United States R01 HG010297 / HG / NHGRI NIH HHS / United States R01 HL104608 / HL / NHLBI NIH HHS / United States R01 HL141992 / HL / NHLBI NIH HHS / United States K12 GM081266 / GM / NIGMS NIH HHS / United States R00 HL135403 / HL / NHLBI NIH HHS / United States R01 MD010443 / MD / NIMHD NIH HHS / United States UL1 GM118985 / GM / NIGMS NIH HHS / United States RL5 GM118984 / GM / NIGMS NIH HHS / United States |