%0 Journal Article %J Nat Genet %D 2021 %T Genetics of 35 blood and urine biomarkers in the UK Biobank. %A Sinnott-Armstrong, Nasa %A Tanigawa, Yosuke %A Amar, David %A Mars, Nina %A Benner, Christian %A Aguirre, Matthew %A Venkataraman, Guhan Ram %A Wainberg, Michael %A Ollila, Hanna M %A Kiiskinen, Tuomo %A Havulinna, Aki S %A Pirruccello, James P %A Qian, Junyang %A Shcherbina, Anna %A Rodriguez, Fatima %A Assimes, Themistocles L %A Agarwala, Vineeta %A Tibshirani, Robert %A Hastie, Trevor %A Ripatti, Samuli %A Pritchard, Jonathan K %A Daly, Mark J %A Rivas, Manuel A %K Biological Specimen Banks %K Biomarkers %K Cardiovascular Diseases %K Diabetes Mellitus, Type 2 %K DNA Copy Number Variations %K Genetic Pleiotropy %K HLA Antigens %K Humans %K Linkage Disequilibrium %K Liver-Specific Organic Anion Transporter 1 %K Mendelian Randomization Analysis %K Polymorphism, Single Nucleotide %K Proteins %K Renal Insufficiency, Chronic %K Serine Endopeptidases %K United Kingdom %X

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

%B Nat Genet %V 53 %P 185-194 %8 2021 02 %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/33462484?dopt=Abstract %R 10.1038/s41588-020-00757-z %0 Journal Article %J Eur J Hum Genet %D 2021 %T Polygenic risk modeling with latent trait-related genetic components. %A Aguirre, Matthew %A Tanigawa, Yosuke %A Venkataraman, Guhan Ram %A Tibshirani, Rob %A Hastie, Trevor %A Rivas, Manuel A %X

Polygenic risk models have led to significant advances in understanding complex diseases and their clinical presentation. While polygenic risk scores (PRS) can effectively predict outcomes, they do not generally account for disease subtypes or pathways which underlie within-trait diversity. Here, we introduce a latent factor model of genetic risk based on components from Decomposition of Genetic Associations (DeGAs), which we call the DeGAs polygenic risk score (dPRS). We compute DeGAs using genetic associations for 977 traits and find that dPRS performs comparably to standard PRS while offering greater interpretability. We show how to decompose an individual's genetic risk for a trait across DeGAs components, with examples for body mass index (BMI) and myocardial infarction (heart attack) in 337,151 white British individuals in the UK Biobank, with replication in a further set of 25,486 non-British white individuals. We find that BMI polygenic risk factorizes into components related to fat-free mass, fat mass, and overall health indicators like physical activity. Most individuals with high dPRS for BMI have strong contributions from both a fat-mass component and a fat-free mass component, whereas a few "outlier" individuals have strong contributions from only one of the two components. Overall, our method enables fine-scale interpretation of the drivers of genetic risk for complex traits.

%B Eur J Hum Genet %8 2021 Feb 08 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/33558700?dopt=Abstract %R 10.1038/s41431-021-00813-0 %0 Journal Article %J Am J Hum Genet %D 2020 %T Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases. %A DeBoever, Christopher %A Tanigawa, Yosuke %A Aguirre, Matthew %A McInnes, Greg %A Lavertu, Adam %A Rivas, Manuel A %K Asthma %K Databases, Factual %K Disease %K Female %K Genetics, Medical %K Genome-Wide Association Study %K Genotype %K Humans %K Male %K Neoplasms %K Phenotype %K United Kingdom %X

Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.

%B Am J Hum Genet %V 106 %P 611-622 %8 2020 05 07 %G eng %N 5 %1 https://www.ncbi.nlm.nih.gov/pubmed/32275883?dopt=Abstract %R 10.1016/j.ajhg.2020.03.007 %0 Journal Article %J PLoS Genet %D 2020 %T A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. %A Qian, Junyang %A Tanigawa, Yosuke %A Du, Wenfei %A Aguirre, Matthew %A Chang, Chris %A Tibshirani, Robert %A Rivas, Manuel A %A Hastie, Trevor %K Algorithms %K Asthma %K Biological Specimen Banks %K Body Height %K Body Mass Index %K Cholesterol %K Cohort Studies %K Genetics, Population %K Genome-Wide Association Study %K Genotype %K Humans %K Logistic Models %K Phenotype %K Polymorphism, Single Nucleotide %K Proportional Hazards Models %K United Kingdom %X

The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.

%B PLoS Genet %V 16 %P e1009141 %8 2020 10 %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/33095761?dopt=Abstract %R 10.1371/journal.pgen.1009141 %0 Journal Article %J PLoS Genet %D 2020 %T A phenome-wide association study of 26 mendelian genes reveals phenotypic expressivity of common and rare variants within the general population. %A Tcheandjieu, Catherine %A Aguirre, Matthew %A Gustafsson, Stefan %A Saha, Priyanka %A Potiny, Praneetha %A Haendel, Melissa %A Ingelsson, Erik %A Rivas, Manuel A %A Priest, James R %K Alagille Syndrome %K Alleles %K Biological Variation, Population %K DiGeorge Syndrome %K European Continental Ancestry Group %K Female %K Gene Frequency %K Genetic Association Studies %K Genetic Predisposition to Disease %K Genetic Testing %K Genetic Variation %K Genome-Wide Association Study %K Humans %K Male %K Marfan Syndrome %K Noonan Syndrome %K Phenotype %K Polymorphism, Single Nucleotide %K United Kingdom %X

The clinical evaluation of a genetic syndrome relies upon recognition of a characteristic pattern of signs or symptoms to guide targeted genetic testing for confirmation of the diagnosis. However, individuals displaying a single phenotype of a complex syndrome may not meet criteria for clinical diagnosis or genetic testing. Here, we present a phenome-wide association study (PheWAS) approach to systematically explore the phenotypic expressivity of common and rare alleles in genes associated with four well-described syndromic diseases (Alagille (AS), Marfan (MS), DiGeorge (DS), and Noonan (NS) syndromes) in the general population. Using human phenotype ontology (HPO) terms, we systematically mapped 60 phenotypes related to AS, MS, DS and NS in 337,198 unrelated white British from the UK Biobank (UKBB) based on their hospital admission records, self-administrated questionnaires, and physiological measurements. We performed logistic regression adjusting for age, sex, and the first 5 genetic principal components, for each phenotype and each variant in the target genes (JAG1, NOTCH2 FBN1, PTPN1 and RAS-opathy genes, and genes in the 22q11.2 locus) and performed a gene burden test. Overall, we observed multiple phenotype-genotype correlations, such as the association between variation in JAG1, FBN1, PTPN11 and SOS2 with diastolic and systolic blood pressure; and pleiotropy among multiple variants in syndromic genes. For example, rs11066309 in PTPN11 was significantly associated with a lower body mass index, an increased risk of hypothyroidism and a smaller size for gestational age, all in concordance with NS-related phenotypes. Similarly, rs589668 in FBN1 was associated with an increase in body height and blood pressure, and a reduced body fat percentage as observed in Marfan syndrome. Our findings suggest that the spectrum of associations of common and rare variants in genes involved in syndromic diseases can be extended to individual phenotypes within the general population.

%B PLoS Genet %V 16 %P e1008802 %8 2020 11 %G eng %N 11 %1 https://www.ncbi.nlm.nih.gov/pubmed/33226994?dopt=Abstract %R 10.1371/journal.pgen.1008802 %0 Journal Article %J Nat Commun %D 2019 %T Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. %A Tanigawa, Yosuke %A Li, Jiehan %A Justesen, Johanne M %A Horn, Heiko %A Aguirre, Matthew %A DeBoever, Christopher %A Chang, Chris %A Narasimhan, Balasubramanian %A Lage, Kasper %A Hastie, Trevor %A Park, Chong Y %A Bejerano, Gill %A Ingelsson, Erik %A Rivas, Manuel A %X

Population-based biobanks with genomic and dense phenotype data provide opportunities for generating effective therapeutic hypotheses and understanding the genomic role in disease predisposition. To characterize latent components of genetic associations, we apply truncated singular value decomposition (DeGAs) to matrices of summary statistics derived from genome-wide association analyses across 2,138 phenotypes measured in 337,199 White British individuals in the UK Biobank study. We systematically identify key components of genetic associations and the contributions of variants, genes, and phenotypes to each component. As an illustration of the utility of the approach to inform downstream experiments, we report putative loss of function variants, rs114285050 (GPR151) and rs150090666 (PDE3B), that substantially contribute to obesity-related traits and experimentally demonstrate the role of these genes in adipocyte biology. Our approach to dissect components of genetic associations across the human phenome will accelerate biomedical hypothesis generation by providing insights on previously unexplored latent structures.

%B Nat Commun %V 10 %P 4064 %8 2019 Sep 06 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/31492854?dopt=Abstract %R 10.1038/s41467-019-11953-9 %0 Journal Article %J Bioinformatics %D 2019 %T Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics. %A McInnes, Gregory %A Tanigawa, Yosuke %A DeBoever, Chris %A Lavertu, Adam %A Olivieri, Julia Eve %A Aguirre, Matthew %A Rivas, Manuel A %X

SUMMARY: Large biobanks linking phenotype to genotype have led to an explosion of genetic association studies across a wide range of phenotypes. Sharing the knowledge generated by these resources with the scientific community remains a challenge due to patient privacy and the vast amount of data. Here, we present Global Biobank Engine (GBE), a web-based tool that enables exploration of the relationship between genotype and phenotype in biobank cohorts, such as the UK Biobank. GBE supports browsing for results from genome-wide association studies, phenome-wide association studies, gene-based tests and genetic correlation between phenotypes. We envision GBE as a platform that facilitates the dissemination of summary statistics from biobanks to the scientific and clinical communities.

AVAILABILITY AND IMPLEMENTATION: GBE currently hosts data from the UK Biobank and can be found freely available at biobankengine.stanford.edu.

%B Bioinformatics %V 35 %P 2495-2497 %8 2019 Jul 15 %G eng %N 14 %1 https://www.ncbi.nlm.nih.gov/pubmed/30520965?dopt=Abstract %R 10.1093/bioinformatics/bty999 %0 Journal Article %J Am J Hum Genet %D 2019 %T Phenome-wide Burden of Copy-Number Variation in the UK Biobank. %A Aguirre, Matthew %A Rivas, Manuel A %A Priest, James %X

Copy-number variations (CNVs) represent a significant proportion of the genetic differences between individuals and many CNVs associate causally with syndromic disease and clinical outcomes. Here, we characterize the landscape of copy-number variation and their phenome-wide effects in a sample of 472,228 array-genotyped individuals from the UK Biobank. In addition to population-level selection effects against genic loci conferring high mortality, we describe genetic burden from potentially pathogenic and previously uncharacterized CNV loci across more than 3,000 quantitative and dichotomous traits, with separate analyses for common and rare classes of variation. Specifically, we highlight the effects of CNVs at two well-known syndromic loci 16p11.2 and 22q11.2, previously uncharacterized variation at 9p23, and several genic associations in the context of acute coronary artery disease and high body mass index. Our data constitute a deeply contextualized portrait of population-wide burden of copy-number variation, as well as a series of dosage-mediated genic associations across the medical phenome.

%B Am J Hum Genet %V 105 %P 373-383 %8 2019 Aug 01 %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/31353025?dopt=Abstract %R 10.1016/j.ajhg.2019.07.001