%0 Journal Article %J Nat Genet %D 2021 %T Genetics of 35 blood and urine biomarkers in the UK Biobank. %A Sinnott-Armstrong, Nasa %A Tanigawa, Yosuke %A Amar, David %A Mars, Nina %A Benner, Christian %A Aguirre, Matthew %A Venkataraman, Guhan Ram %A Wainberg, Michael %A Ollila, Hanna M %A Kiiskinen, Tuomo %A Havulinna, Aki S %A Pirruccello, James P %A Qian, Junyang %A Shcherbina, Anna %A Rodriguez, Fatima %A Assimes, Themistocles L %A Agarwala, Vineeta %A Tibshirani, Robert %A Hastie, Trevor %A Ripatti, Samuli %A Pritchard, Jonathan K %A Daly, Mark J %A Rivas, Manuel A %K Biological Specimen Banks %K Biomarkers %K Cardiovascular Diseases %K Diabetes Mellitus, Type 2 %K DNA Copy Number Variations %K Genetic Pleiotropy %K HLA Antigens %K Humans %K Linkage Disequilibrium %K Liver-Specific Organic Anion Transporter 1 %K Mendelian Randomization Analysis %K Polymorphism, Single Nucleotide %K Proteins %K Renal Insufficiency, Chronic %K Serine Endopeptidases %K United Kingdom %X

Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n = 363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1 s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n = 135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.

%B Nat Genet %V 53 %P 185-194 %8 2021 02 %G eng %N 2 %1 https://www.ncbi.nlm.nih.gov/pubmed/33462484?dopt=Abstract %R 10.1038/s41588-020-00757-z %0 Journal Article %J PLoS Genet %D 2020 %T A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. %A Qian, Junyang %A Tanigawa, Yosuke %A Du, Wenfei %A Aguirre, Matthew %A Chang, Chris %A Tibshirani, Robert %A Rivas, Manuel A %A Hastie, Trevor %K Algorithms %K Asthma %K Biological Specimen Banks %K Body Height %K Body Mass Index %K Cholesterol %K Cohort Studies %K Genetics, Population %K Genome-Wide Association Study %K Genotype %K Humans %K Logistic Models %K Phenotype %K Polymorphism, Single Nucleotide %K Proportional Hazards Models %K United Kingdom %X

The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.

%B PLoS Genet %V 16 %P e1009141 %8 2020 10 %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/33095761?dopt=Abstract %R 10.1371/journal.pgen.1009141