Publications
Adjusting for Principal Components of Molecular Phenotypes Induces Replicating False Positives. Genetics 211, 1179-1189 (2019).
Allelic Heterogeneity at the CRP Locus Identified by Whole-Genome Sequencing in Multi-ancestry Cohorts. Am J Hum Genet 106, 112-120 (2020).
Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases. Am J Hum Genet 106, 611-622 (2020).
Association between Smoking History and Tumor Mutation Burden in Advanced Non-Small Cell Lung Cancer. Cancer Res 81, 2566-2573 (2021).
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans. Genome Med 9, 98 (2017).
A common loss-of-function variant is associated with lower vitamin B concentration in African Americans. Blood 131, 2859-2863 (2018).
A common variant in PNPLA3 is associated with age at diagnosis of NAFLD in patients from a multi-ethnic biobank. J Hepatol 72, 1070-1081 (2020).
Components of genetic associations across 2,138 phenotypes in the UK Biobank highlight adipocyte biology. Nat Commun 10, 4064 (2019).
Covariate selection for association screening in multiphenotype genetic studies. Nat Genet 49, 1789-1795 (2017).
A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53, 1415-1424 (2021).
A cross-population atlas of genetic associations for 220 human phenotypes. Nat Genet 53, 1415-1424 (2021).
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 186, 753-761 (2017).
Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 186, 753-761 (2017).
Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet 52, 969-983 (2020).
Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models. J Comput Biol 27, 599-612 (2020).
Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat Genet 51, 36-41 (2019).
Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat Genet 51, 36-41 (2019).
Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat Genet 51, 36-41 (2019).
Error-prone bypass of DNA lesions during lagging-strand replication is a common source of germline and cancer mutations. Nat Genet 51, 36-41 (2019).
Evidence for secondary-variant genetic burden and non-random distribution across biological modules in a recessive ciliopathy. Nat Genet 52, 1145-1150 (2020).
A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. PLoS Genet 16, e1009141 (2020).
FasTag: Automatic text classification of unstructured medical narratives. PLoS One 15, e0234647 (2020).
Genetic diagnoses in epilepsy: The impact of dynamic exome analysis in a pediatric cohort. Epilepsia 61, 249-258 (2020).
Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system. Elife 6, (2017).
A genetic stochastic process model for genome-wide joint analysis of biomarker dynamics and disease susceptibility with longitudinal data. Genet Epidemiol 41, 620-635 (2017).
Genome sequencing analysis identifies Epstein-Barr virus subtypes associated with high risk of nasopharyngeal carcinoma. Nat Genet 51, 1131-1136 (2019).
Global Biobank Engine: enabling genotype-phenotype browsing for biobank summary statistics. Bioinformatics 35, 2495-2497 (2019).
Graphical analysis for phenome-wide causal discovery in genotyped population-scale biobanks. Nat Commun 12, 350 (2021).
Graphical analysis for phenome-wide causal discovery in genotyped population-scale biobanks. Nat Commun 12, 350 (2021).
Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat Med 25, 911-919 (2019).
Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol 21, 233 (2020).
Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol 21, 233 (2020).
Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol 21, 233 (2020).
Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol 21, 233 (2020).
Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements. Nat Genet 52, 1346-1354 (2020).
Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies. G3 (Bethesda) 8, 3255-3267 (2018).
Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions. Am J Epidemiol 186, 771-777 (2017).
Integration of multiomic annotation data to prioritize and characterize inflammation and immune-related risk variants in squamous cell lung cancer. Genet Epidemiol 45, 99-114 (2021).