%0 Journal Article %J Eur J Hum Genet %D 2021 %T Polygenic risk modeling with latent trait-related genetic components. %A Aguirre, Matthew %A Tanigawa, Yosuke %A Venkataraman, Guhan Ram %A Tibshirani, Rob %A Hastie, Trevor %A Rivas, Manuel A %X

Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.

%B Eur J Hum Genet %8 2021 Feb 08 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/33558700?dopt=Abstract %R 10.1038/s41431-021-00813-0 %0 Journal Article %J Science %D 2021 %T Population sequencing data reveal a compendium of mutational processes in the human germ line. %A Seplyarskiy, Vladimir B %A Soldatov, Ruslan A %A Koch, Evan %A McGinty, Ryan J %A Goldmann, Jakob M %A Hernandez, Ryan D %A Barnes, Kathleen %A Correa, Adolfo %A Burchard, Esteban G %A Ellinor, Patrick T %A McGarvey, Stephen T %A Mitchell, Braxton D %A Vasan, Ramachandran S %A Redline, Susan %A Silverman, Edwin %A Weiss, Scott T %A Arnett, Donna K %A Blangero, John %A Boerwinkle, Eric %A He, Jiang %A Montgomery, Courtney %A Rao, D C %A Rotter, Jerome I %A Taylor, Kent D %A Brody, Jennifer A %A Chen, Yii-Der Ida %A de Las Fuentes, Lisa %A Hwu, Chii-Min %A Rich, Stephen S %A Manichaikul, Ani W %A Mychaleckyj, Josyf C %A Palmer, Nicholette D %A Smith, Jennifer A %A Kardia, Sharon L R %A Peyser, Patricia A %A Bielak, Lawrence F %A O'Connor, Timothy D %A Emery, Leslie S %A Gilissen, Christian %A Wong, Wendy S W %A Kharchenko, Peter V %A Sunyaev, Shamil %K Algorithms %K CpG Islands %K DNA Damage %K DNA Demethylation %K DNA Mutational Analysis %K DNA Replication %K Genetic Variation %K Genome, Human %K Germ Cells %K Germ-Line Mutation %K Humans %K Long Interspersed Nucleotide Elements %K Mutagenesis %K Oocytes %K Transcription, Genetic %X

Biological mechanisms underlying human germline mutations remain largely unknown. We statistically decompose variation in the rate and spectra of mutations along the genome using volume-regularized nonnegative matrix factorization. The analysis of a sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. We provide a biological interpretation for seven of these processes. We associate one process with bulky DNA lesions that are resolved asymmetrically with respect to transcription and replication. Two processes track direction of replication fork and replication timing, respectively. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions and a mutagenic effect of long interspersed nuclear elements. We localize a mutagenic process specific to oocytes from population sequencing data. This process appears transcriptionally asymmetric.

%B Science %V 373 %P 1030-1035 %8 2021 08 27 %G eng %N 6558 %1 https://www.ncbi.nlm.nih.gov/pubmed/34385354?dopt=Abstract %R 10.1126/science.aba7408 %0 Journal Article %J Cell %D 2021 %T Population-scale tissue transcriptomics maps long non-coding RNAs to complex disease. %A de Goede, Olivia M %A Nachun, Daniel C %A Ferraro, Nicole M %A Gloudemans, Michael J %A Rao, Abhiram S %A Smail, Craig %A Eulalio, Tiffany Y %A Aguet, Francois %A Ng, Bernard %A Xu, Jishu %A Barbeira, Alvaro N %A Castel, Stephane E %A Kim-Hellmuth, Sarah %A Park, YoSon %A Scott, Alexandra J %A Strober, Benjamin J %A Brown, Christopher D %A Wen, Xiaoquan %A Hall, Ira M %A Battle, Alexis %A Lappalainen, Tuuli %A Im, Hae Kyung %A Ardlie, Kristin G %A Mostafavi, Sara %A Quertermous, Thomas %A Kirkegaard, Karla %A Montgomery, Stephen B %K Coronary Artery Disease %K Diabetes Mellitus, Type 1 %K Diabetes Mellitus, Type 2 %K Disease %K Gene Expression Profiling %K Genetic Variation %K Humans %K Inflammatory Bowel Diseases %K Multifactorial Inheritance %K Organ Specificity %K Population %K Quantitative Trait Loci %K RNA, Long Noncoding %K Transcriptome %X

Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.

%B Cell %V 184 %P 2633-2648.e19 %8 2021 05 13 %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/33864768?dopt=Abstract %R 10.1016/j.cell.2021.03.050 %0 Journal Article %J Nat Med %D 2020 %T Phenome-based approach identifies RIC1-linked Mendelian syndrome through zebrafish models, biobank associations and clinical studies. %A Unlu, Gokhan %A Qi, Xinzi %A Gamazon, Eric R %A Melville, David B %A Patel, Nisha %A Rushing, Amy R %A Hashem, Mais %A Al-Faifi, Abdullah %A Chen, Rui %A Li, Bingshan %A Cox, Nancy J %A Alkuraya, Fowzan S %A Knapik, Ela W %K Abnormalities, Multiple %K Animals %K Behavior, Animal %K Biological Specimen Banks %K Chondrocytes %K Disease Models, Animal %K Extracellular Matrix %K Fibroblasts %K Guanine Nucleotide Exchange Factors %K Humans %K Models, Biological %K Musculoskeletal System %K Osteogenesis %K Phenomics %K Phenotype %K Procollagen %K Protein Transport %K Secretory Pathway %K Syndrome %K Zebrafish %K Zebrafish Proteins %X

Discovery of genotype-phenotype relationships remains a major challenge in clinical medicine. Here, we combined three sources of phenotypic data to uncover a new mechanism for rare and common diseases resulting from collagen secretion deficits. Using a zebrafish genetic screen, we identified the ric1 gene as being essential for skeletal biology. Using a gene-based phenome-wide association study (PheWAS) in the EHR-linked BioVU biobank, we show that reduced genetically determined expression of RIC1 is associated with musculoskeletal and dental conditions. Whole-exome sequencing identified individuals homozygous-by-descent for a rare variant in RIC1 and, through a guided clinical re-evaluation, it was discovered that they share signs with the BioVU-associated phenome. We named this new Mendelian syndrome CATIFA (cleft lip, cataract, tooth abnormality, intellectual disability, facial dysmorphism, attention-deficit hyperactivity disorder) and revealed further disease mechanisms. This gene-based, PheWAS-guided approach can accelerate the discovery of clinically relevant disease phenome and associated biological mechanisms.

%B Nat Med %V 26 %P 98-109 %8 2020 01 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/31932796?dopt=Abstract %R 10.1038/s41591-019-0705-y %0 Journal Article %J PLoS Genet %D 2020 %T A phenome-wide association study of 26 mendelian genes reveals phenotypic expressivity of common and rare variants within the general population. %A Tcheandjieu, Catherine %A Aguirre, Matthew %A Gustafsson, Stefan %A Saha, Priyanka %A Potiny, Praneetha %A Haendel, Melissa %A Ingelsson, Erik %A Rivas, Manuel A %A Priest, James R %K Alagille Syndrome %K Alleles %K Biological Variation, Population %K DiGeorge Syndrome %K European Continental Ancestry Group %K Female %K Gene Frequency %K Genetic Association Studies %K Genetic Predisposition to Disease %K Genetic Testing %K Genetic Variation %K Genome-Wide Association Study %K Humans %K Male %K Marfan Syndrome %K Noonan Syndrome %K Phenotype %K Polymorphism, Single Nucleotide %K United Kingdom %X

The clinical evaluation of a genetic syndrome relies upon recognition of a characteristic pattern of signs or symptoms to guide targeted genetic testing for confirmation of the diagnosis. However, individuals displaying a single phenotype of a complex syndrome may not meet criteria for clinical diagnosis or genetic testing. Here, we present a phenome-wide association study (PheWAS) approach to systematically explore the phenotypic expressivity of common and rare alleles in genes associated with four well-described syndromic diseases (Alagille (AS), Marfan (MS), DiGeorge (DS), and Noonan (NS) syndromes) in the general population. Using human phenotype ontology (HPO) terms, we systematically mapped 60 phenotypes related to AS, MS, DS and NS in 337,198 unrelated white British from the UK Biobank (UKBB) based on their hospital admission records, self-administrated questionnaires, and physiological measurements. We performed logistic regression adjusting for age, sex, and the first 5 genetic principal components, for each phenotype and each variant in the target genes (JAG1, NOTCH2 FBN1, PTPN1 and RAS-opathy genes, and genes in the 22q11.2 locus) and performed a gene burden test. Overall, we observed multiple phenotype-genotype correlations, such as the association between variation in JAG1, FBN1, PTPN11 and SOS2 with diastolic and systolic blood pressure; and pleiotropy among multiple variants in syndromic genes. For example, rs11066309 in PTPN11 was significantly associated with a lower body mass index, an increased risk of hypothyroidism and a smaller size for gestational age, all in concordance with NS-related phenotypes. Similarly, rs589668 in FBN1 was associated with an increase in body height and blood pressure, and a reduced body fat percentage as observed in Marfan syndrome. Our findings suggest that the spectrum of associations of common and rare variants in genes involved in syndromic diseases can be extended to individual phenotypes within the general population.

%B PLoS Genet %V 16 %P e1008802 %8 2020 11 %G eng %N 11 %1 https://www.ncbi.nlm.nih.gov/pubmed/33226994?dopt=Abstract %R 10.1371/journal.pgen.1008802 %0 Journal Article %J Nature %D 2020 %T A positively selected FBN1 missense variant reduces height in Peruvian individuals. %A Asgari, Samira %A Luo, Yang %A Akbari, Ali %A Belbin, Gillian M %A Li, Xinyi %A Harris, Daniel N %A Selig, Martin %A Bartell, Eric %A Calderon, Roger %A Slowikowski, Kamil %A Contreras, Carmen %A Yataco, Rosa %A Galea, Jerome T %A Jimenez, Judith %A Coit, Julia M %A Farroñay, Chandel %A Nazarian, Rosalynn M %A O'Connor, Timothy D %A Dietz, Harry C %A Hirschhorn, Joel N %A Guio, Heinner %A Lecca, Leonid %A Kenny, Eimear E %A Freeman, Esther E %A Murray, Megan B %A Raychaudhuri, Soumya %K Body Height %K Female %K Fibrillin-1 %K Gene Frequency %K Genome-Wide Association Study %K Heredity %K Humans %K Indians, South American %K Male %K Microfibrils %K Mutation, Missense %K Peru %K Selection, Genetic %X

On average, Peruvian individuals are among the shortest in the world. Here we show that Native American ancestry is associated with reduced height in an ethnically diverse group of Peruvian individuals, and identify a population-specific, missense variant in the FBN1 gene (E1297G) that is significantly associated with lower height. Each copy of the minor allele (frequency of 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). To our knowledge, this is the largest effect size known for a common height-associated variant. FBN1 encodes the extracellular matrix protein fibrillin 1, which is a major structural component of microfibrils. We observed less densely packed fibrillin-1-rich microfibrils with irregular edges in the skin of individuals who were homozygous for G1297 compared with individuals who were homozygous for E1297. Moreover, we show that the E1297G locus is under positive selection in non-African populations, and that the E1297 variant shows subtle evidence of positive selection specifically within the Peruvian population. This variant is also significantly more frequent in coastal Peruvian populations than in populations from the Andes or the Amazon, which suggests that short stature might be the result of adaptation to factors that are associated with the coastal environment in Peru.

%B Nature %V 582 %P 234-239 %8 2020 06 %G eng %N 7811 %1 https://www.ncbi.nlm.nih.gov/pubmed/32499652?dopt=Abstract %R 10.1038/s41586-020-2302-0 %0 Journal Article %J Genome Biol %D 2020 %T PSCAN: Spatial scan tests guided by protein structures improve complex disease gene discovery and signal variant detection. %A Tang, Zheng-Zheng %A Sliwoski, Gregory R %A Chen, Guanhua %A Jin, Bowen %A Bush, William S %A Li, Bingshan %A Capra, John A %X

Germline disease-causing variants are generally more spatially clustered in protein 3-dimensional structures than benign variants. Motivated by this tendency, we develop a fast and powerful protein-structure-based scan (PSCAN) approach for evaluating gene-level associations with complex disease and detecting signal variants. We validate PSCAN's performance on synthetic data and two real data sets for lipid traits and Alzheimer's disease. Our results demonstrate that PSCAN performs competitively with existing gene-level tests while increasing power and identifying more specific signal variant sets. Furthermore, PSCAN enables generation of hypotheses about the molecular basis for the associations in the context of protein structures and functional domains.

%B Genome Biol %V 21 %P 217 %8 2020 08 26 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/32847609?dopt=Abstract %R 10.1186/s13059-020-02121-0 %0 Journal Article %J Cell %D 2019 %T Personalized Medicine and the Power of Electronic Health Records. %A Abul-Husn, Noura S %A Kenny, Eimear E %X

Personalized medicine has largely been enabled by the integration of genomic and other data with electronic health records (EHRs) in the United States and elsewhere. Increased EHR adoption across various clinical settings and the establishment of EHR-linked population-based biobanks provide unprecedented opportunities for the types of translational and implementation research that drive personalized medicine. We review advances in the digitization of health information and the proliferation of genomic research in health systems and provide insights into emerging paths for the widespread implementation of personalized medicine.

%B Cell %V 177 %P 58-69 %8 2019 Mar 21 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/30901549?dopt=Abstract %R 10.1016/j.cell.2019.02.039 %0 Journal Article %J Am J Hum Genet %D 2019 %T Phenome-wide Burden of Copy-Number Variation in the UK Biobank. %A Aguirre, Matthew %A Rivas, Manuel A %A Priest, James %X

Copy-number variations (CNVs) represent a significant proportion of the genetic differences between individuals and many CNVs associate causally with syndromic disease and clinical outcomes. Here, we characterize the landscape of copy-number variation and their phenome-wide effects in a sample of 472,228 array-genotyped individuals from the UK Biobank. In addition to population-level selection effects against genic loci conferring high mortality, we describe genetic burden from potentially pathogenic and previously uncharacterized CNV loci across more than 3,000 quantitative and dichotomous traits, with separate analyses for common and rare classes of variation. Specifically, we highlight the effects of CNVs at two well-known syndromic loci 16p11.2 and 22q11.2, previously uncharacterized variation at 9p23, and several genic associations in the context of acute coronary artery disease and high body mass index. Our data constitute a deeply contextualized portrait of population-wide burden of copy-number variation, as well as a series of dosage-mediated genic associations across the medical phenome.

%B Am J Hum Genet %V 105 %P 373-383 %8 2019 Aug 01 %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/31353025?dopt=Abstract %R 10.1016/j.ajhg.2019.07.001 %0 Journal Article %J Elife %D 2019 %T Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. %A Sohail, Mashaal %A Maier, Robert M %A Ganna, Andrea %A Bloemendal, Alex %A Martin, Alicia R %A Turchin, Michael C %A Chiang, Charleston Wk %A Hirschhorn, Joel %A Daly, Mark J %A Patterson, Nick %A Neale, Benjamin %A Mathieson, Iain %A Reich, David %A Sunyaev, Shamil R %X

Genetic predictions of height differ among human populations and these differences have been interpreted as evidence of polygenic adaptation. These differences were first detected using SNPs genome-wide significantly associated with height, and shown to grow stronger when large numbers of sub-significant SNPs were included, leading to excitement about the prospect of analyzing large fractions of the genome to detect polygenic adaptation for multiple traits. Previous studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the analyses in the UK Biobank, a much more homogeneously designed study. We show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population stratification. More generally, our results imply that typical constructions of polygenic scores are sensitive to population stratification and that population-level differences should be interpreted with caution.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

%B Elife %V 8 %8 2019 Mar 21 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/30895926?dopt=Abstract %R 10.7554/eLife.39702 %0 Journal Article %J Science %D 2018 %T Phenotype risk scores identify patients with unrecognized Mendelian disease patterns. %A Bastarache, Lisa %A Hughey, Jacob J %A Hebbring, Scott %A Marlo, Joy %A Zhao, Wanke %A Ho, Wanting T %A Van Driest, Sara L %A McGregor, Tracy L %A Mosley, Jonathan D %A Wells, Quinn S %A Temple, Michael %A Ramirez, Andrea H %A Carroll, Robert %A Osterman, Travis %A Edwards, Todd %A Ruderfer, Douglas %A Velez Edwards, Digna R %A Hamid, Rizwan %A Cogan, Joy %A Glazer, Andrew %A Wei, Wei-Qi %A Feng, QiPing %A Brilliant, Murray %A Zhao, Zhizhuang J %A Cox, Nancy J %A Roden, Dan M %A Denny, Joshua C %K Databases, Genetic %K DNA Mutational Analysis %K Electronic Health Records %K Exome %K Genetic Association Studies %K Genetic Diseases, Inborn %K Genetic Predisposition to Disease %K Genetic Variation %K Humans %K Phenotype %K Risk Factors %X

Genetic association studies often examine features independently, potentially missing subpopulations with multiple phenotypes that share a single cause. We describe an approach that aggregates phenotypes on the basis of patterns described by Mendelian diseases. We mapped the clinical features of 1204 Mendelian diseases into phenotypes captured from the electronic health record (EHR) and summarized this evidence as phenotype risk scores (PheRSs). In an initial validation, PheRS distinguished cases and controls of five Mendelian diseases. Applying PheRS to 21,701 genotyped individuals uncovered 18 associations between rare variants and phenotypes consistent with Mendelian diseases. In 16 patients, the rare genetic variants were associated with severe outcomes such as organ transplants. PheRS can augment rare-variant interpretation and may identify subsets of patients with distinct genetic causes for common diseases.

%B Science %V 359 %P 1233-1239 %8 2018 03 16 %G eng %N 6381 %1 https://www.ncbi.nlm.nih.gov/pubmed/29590070?dopt=Abstract %R 10.1126/science.aal4043 %0 Journal Article %J Nat Genet %D 2017 %T Population- and individual-specific regulatory variation in Sardinia. %A Pala, Mauro %A Zappala, Zachary %A Marongiu, Mara %A Li, Xin %A Davis, Joe R %A Cusano, Roberto %A Crobu, Francesca %A Kukurba, Kimberly R %A Gloudemans, Michael J %A Reinier, Frederic %A Berutti, Riccardo %A Piras, Maria G %A Mulas, Antonella %A Zoledziewska, Magdalena %A Marongiu, Michele %A Sorokin, Elena P %A Hess, Gaelen T %A Smith, Kevin S %A Busonero, Fabio %A Maschio, Andrea %A Steri, Maristella %A Sidore, Carlo %A Sanna, Serena %A Fiorillo, Edoardo %A Bassik, Michael C %A Sawcer, Stephen J %A Battle, Alexis %A Novembre, John %A Jones, Chris %A Angius, Andrea %A Abecasis, Gonçalo R %A Schlessinger, David %A Cucca, Francesco %A Montgomery, Stephen B %K Alternative Splicing %K Chromosome Mapping %K Family Health %K Female %K Gene Expression Profiling %K Genetic Predisposition to Disease %K Genetic Variation %K Genetics, Population %K Genome-Wide Association Study %K Genotype %K Humans %K Italy %K Male %K Polymorphism, Single Nucleotide %K Quantitative Trait Loci %K Transcription Initiation Site %X

Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.

%B Nat Genet %V 49 %P 700-707 %8 2017 May %G eng %N 5 %1 https://www.ncbi.nlm.nih.gov/pubmed/28394350?dopt=Abstract %R 10.1038/ng.3840