%0 Journal Article %J Nat Commun %D 2021 %T Graphical analysis for phenome-wide causal discovery in genotyped population-scale biobanks. %A Amar, David %A Sinnott-Armstrong, Nasa %A Ashley, Euan A %A Rivas, Manuel A %K Biological Specimen Banks %K Cardiovascular Diseases %K Causality %K Computer Simulation %K Gene Regulatory Networks %K Genetic Pleiotropy %K Genetic Variation %K Genome-Wide Association Study %K Genotype %K Humans %K Mendelian Randomization Analysis %K Models, Theoretical %K Multifactorial Inheritance %K Phenotype %K Risk Factors %X

Causal inference via Mendelian randomization requires making strong assumptions about horizontal pleiotropy, where genetic instruments are connected to the outcome not only through the exposure. Here, we present causal Graphical Analysis Using Genetics (cGAUGE), a pipeline that overcomes these limitations using instrument filters with provable properties. This is achievable by identifying conditional independencies while examining multiple traits. cGAUGE also uses ExSep (Exposure-based Separation), a novel test for the existence of causal pathways that does not require selecting instruments. In simulated data we illustrate how cGAUGE can reduce the empirical false discovery rate by up to 30%, while retaining the majority of true discoveries. On 96 complex traits from 337,198 subjects from the UK Biobank, our results cover expected causal links and many new ones that were previously suggested by correlation-based observational studies. Notably, we identify multiple risk factors for cardiovascular disease, including red blood cell distribution width.

%B Nat Commun %V 12 %P 350 %8 2021 01 13 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/33441555?dopt=Abstract %R 10.1038/s41467-020-20516-2 %0 Journal Article %J Am J Hum Genet %D 2020 %T Assessing Digital Phenotyping to Enhance Genetic Studies of Human Diseases. %A DeBoever, Christopher %A Tanigawa, Yosuke %A Aguirre, Matthew %A McInnes, Greg %A Lavertu, Adam %A Rivas, Manuel A %K Asthma %K Databases, Factual %K Disease %K Female %K Genetics, Medical %K Genome-Wide Association Study %K Genotype %K Humans %K Male %K Neoplasms %K Phenotype %K United Kingdom %X

Population-scale biobanks that combine genetic data and high-dimensional phenotyping for a large number of participants provide an exciting opportunity to perform genome-wide association studies (GWAS) to identify genetic variants associated with diverse quantitative traits and diseases. A major challenge for GWAS in population biobanks is ascertaining disease cases from heterogeneous data sources such as hospital records, digital questionnaire responses, or interviews. In this study, we use genetic parameters, including genetic correlation, to evaluate whether GWAS performed using cases in the UK Biobank ascertained from hospital records, questionnaire responses, and family history of disease implicate similar disease genetics across a range of effect sizes. We find that hospital record and questionnaire GWAS largely identify similar genetic effects for many complex phenotypes and that combining together both phenotyping methods improves power to detect genetic associations. We also show that family history GWAS using cases ascertained on family history of disease agrees with combined hospital record and questionnaire GWAS and that family history GWAS has better power to detect genetic associations for some phenotypes. Overall, this work demonstrates that digital phenotyping and unstructured phenotype data can be combined with structured data such as hospital records to identify cases for GWAS in biobanks and improve the ability of such studies to identify genetic associations.

%B Am J Hum Genet %V 106 %P 611-622 %8 2020 05 07 %G eng %N 5 %1 https://www.ncbi.nlm.nih.gov/pubmed/32275883?dopt=Abstract %R 10.1016/j.ajhg.2020.03.007 %0 Journal Article %J Nature %D 2020 %T A brief history of human disease genetics. %A Claussnitzer, Melina %A Cho, Judy H %A Collins, Rory %A Cox, Nancy J %A Dermitzakis, Emmanouil T %A Hurles, Matthew E %A Kathiresan, Sekar %A Kenny, Eimear E %A Lindgren, Cecilia M %A MacArthur, Daniel G %A North, Kathryn N %A Plon, Sharon E %A Rehm, Heidi L %A Risch, Neil %A Rotimi, Charles N %A Shendure, Jay %A Soranzo, Nicole %A McCarthy, Mark I %K Animals %K Genetic Testing %K Genetic Variation %K Genomics %K Genotype %K Humans %K Phenotype %K Rare Diseases %X

A primary goal of human genetics is to identify DNA sequence variants that influence biomedical traits, particularly those related to the onset and progression of human disease. Over the past 25 years, progress in realizing this objective has been transformed by advances in technology, foundational genomic resources and analytical tools, and by access to vast amounts of genotype and phenotype data. Genetic discoveries have substantially improved our understanding of the mechanisms responsible for many rare and common diseases and driven development of novel preventative and therapeutic strategies. Medical innovation will increasingly focus on delivering care tailored to individual patterns of genetic predisposition.

%B Nature %V 577 %P 179-189 %8 2020 01 %G eng %N 7789 %1 https://www.ncbi.nlm.nih.gov/pubmed/31915397?dopt=Abstract %R 10.1038/s41586-019-1879-7 %0 Journal Article %J PLoS Genet %D 2020 %T A fast and scalable framework for large-scale and ultrahigh-dimensional sparse regression with application to the UK Biobank. %A Qian, Junyang %A Tanigawa, Yosuke %A Du, Wenfei %A Aguirre, Matthew %A Chang, Chris %A Tibshirani, Robert %A Rivas, Manuel A %A Hastie, Trevor %K Algorithms %K Asthma %K Biological Specimen Banks %K Body Height %K Body Mass Index %K Cholesterol %K Cohort Studies %K Genetics, Population %K Genome-Wide Association Study %K Genotype %K Humans %K Logistic Models %K Phenotype %K Polymorphism, Single Nucleotide %K Proportional Hazards Models %K United Kingdom %X

The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities for researchers to investigate the relationship between genotypic information and phenotypes of interest. Multiple regression methods, compared with genome-wide association studies (GWAS), have already been showed to greatly improve the prediction performance for a variety of phenotypes. In the high-dimensional settings, the lasso, since its first proposal in statistics, has been proved to be an effective method for simultaneous variable selection and estimation. However, the large-scale and ultrahigh dimension seen in the UK Biobank pose new challenges for applying the lasso method, as many existing algorithms and their implementations are not scalable to large applications. In this paper, we propose a computational framework called batch screening iterative lasso (BASIL) that can take advantage of any existing lasso solver and easily build a scalable solution for very large data, including those that are larger than the memory size. We introduce snpnet, an R package that implements the proposed algorithm on top of glmnet and optimizes for single nucleotide polymorphism (SNP) datasets. It currently supports ℓ1-penalized linear model, logistic regression, Cox model, and also extends to the elastic net with ℓ1/ℓ2 penalty. We demonstrate results on the UK Biobank dataset, where we achieve competitive predictive performance for all four phenotypes considered (height, body mass index, asthma, high cholesterol) using only a small fraction of the variants compared with other established polygenic risk score methods.

%B PLoS Genet %V 16 %P e1009141 %8 2020 10 %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/33095761?dopt=Abstract %R 10.1371/journal.pgen.1009141 %0 Journal Article %J PLoS Genet %D 2020 %T Modeling epistasis in mice and yeast using the proportion of two or more distinct genetic backgrounds: Evidence for "polygenic epistasis". %A Rau, Christoph D %A Gonzales, Natalia M %A Bloom, Joshua S %A Park, Danny %A Ayroles, Julien %A Palmer, Abraham A %A Lusis, Aldons J %A Zaitlen, Noah %K Alleles %K Animals %K Epistasis, Genetic %K Evolution, Molecular %K Genotype %K Humans %K Mice %K Models, Genetic %K Multifactorial Inheritance %K Phenotype %K Quantitative Trait Loci %K Saccharomyces cerevisiae %K Selection, Genetic %X

BACKGROUND: The majority of quantitative genetic models used to map complex traits assume that alleles have similar effects across all individuals. Significant evidence suggests, however, that epistatic interactions modulate the impact of many alleles. Nevertheless, identifying epistatic interactions remains computationally and statistically challenging. In this work, we address some of these challenges by developing a statistical test for polygenic epistasis that determines whether the effect of an allele is altered by the global genetic ancestry proportion from distinct progenitors.

RESULTS: We applied our method to data from mice and yeast. For the mice, we observed 49 significant genotype-by-ancestry interaction associations across 14 phenotypes as well as over 1,400 Bonferroni-corrected genotype-by-ancestry interaction associations for mouse gene expression data. For the yeast, we observed 92 significant genotype-by-ancestry interactions across 38 phenotypes. Given this evidence of epistasis, we test for and observe evidence of rapid selection pressure on ancestry specific polymorphisms within one of the cohorts, consistent with epistatic selection.

CONCLUSIONS: Unlike our prior work in human populations, we observe widespread evidence of ancestry-modified SNP effects, perhaps reflecting the greater divergence present in crosses using mice and yeast.

%B PLoS Genet %V 16 %P e1009165 %8 2020 10 %G eng %N 10 %1 https://www.ncbi.nlm.nih.gov/pubmed/33104702?dopt=Abstract %R 10.1371/journal.pgen.1009165 %0 Journal Article %J Curr Protoc Hum Genet %D 2019 %T Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits. %A Weissenkampen, J Dylan %A Jiang, Yu %A Eckert, Scott %A Jiang, Bibo %A Li, Bingshan %A Liu, Dajiang J %K Algorithms %K Genetic Predisposition to Disease %K Genome, Human %K Genome-Wide Association Study %K Genotype %K High-Throughput Nucleotide Sequencing %K Humans %K Multifactorial Inheritance %K Phenotype %K Polymorphism, Single Nucleotide %K Whole Exome Sequencing %K Whole Genome Sequencing %X

With the advent of Next Generation Sequencing (NGS) technologies, whole genome and whole exome DNA sequencing has become affordable for routine genetic studies. Coupled with improved genotyping arrays and genotype imputation methodologies, it is increasingly feasible to obtain rare genetic variant information in large datasets. Such datasets allow researchers to gain a more complete understanding of the genetic architecture of complex traits caused by rare variants. State-of-the-art statistical methods for the statistical genetics analysis of sequence-based association, including efficient algorithms for association analysis in biobank-scale datasets, gene-association tests, meta-analysis, fine mapping methods that integrate functional genomic dataset, and phenome-wide association studies (PheWAS), are reviewed here. These methods are expected to be highly useful for next generation statistical genetics analysis in the era of precision medicine. © 2019 by John Wiley & Sons, Inc.

%B Curr Protoc Hum Genet %V 101 %P e83 %8 2019 04 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/30849219?dopt=Abstract %R 10.1002/cphg.83 %0 Journal Article %J Nat Commun %D 2019 %T A multi-task convolutional deep neural network for variant calling in single molecule sequencing. %A Luo, Ruibang %A Sedlazeck, Fritz J %A Lam, Tak-Wah %A Schatz, Michael C %K Base Sequence %K Computational Biology %K DNA Mutational Analysis %K Genome, Human %K Genome-Wide Association Study %K Genomics %K Genotype %K Genotyping Techniques %K Humans %K INDEL Mutation %K Nanopores %K Neural Networks, Computer %K Polymorphism, Single Nucleotide %K Sequence Analysis, DNA %K Software %X

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.

%B Nat Commun %V 10 %P 998 %8 2019 03 01 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/30824707?dopt=Abstract %R 10.1038/s41467-019-09025-z %0 Journal Article %J Nat Commun %D 2019 %T Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. %A Schoech, Armin P %A Jordan, Daniel M %A Loh, Po-Ru %A Gazal, Steven %A O'Connor, Luke J %A Balick, Daniel J %A Palamara, Pier F %A Finucane, Hilary K %A Sunyaev, Shamil R %A Price, Alkes L %K Algorithms %K Alleles %K Biological Specimen Banks %K Gene Frequency %K Genome-Wide Association Study %K Genotype %K Humans %K Models, Genetic %K Polymorphism, Single Nucleotide %K Quantitative Trait, Heritable %K Selection, Genetic %K United Kingdom %X

Understanding the role of rare variants is important in elucidating the genetic basis of human disease. Negative selection can cause rare variants to have larger per-allele effect sizes than common variants. Here, we develop a method to estimate the minor allele frequency (MAF) dependence of SNP effect sizes. We use a model in which per-allele effect sizes have variance proportional to [p(1 - p)], where p is the MAF and negative values of α imply larger effect sizes for rare variants. We estimate α for 25 UK Biobank diseases and complex traits. All traits produce negative α estimates, with best-fit mean of -0.38 (s.e. 0.02) across traits. Despite larger rare variant effect sizes, rare variants (MAF < 1%) explain less than 10% of total SNP-heritability for most traits analyzed. Using evolutionary modeling and forward simulations, we validate the α model of MAF-dependent trait effects and assess plausible values of relevant evolutionary parameters.

%B Nat Commun %V 10 %P 790 %8 2019 02 15 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/30770844?dopt=Abstract %R 10.1038/s41467-019-08424-6 %0 Journal Article %J Nat Genet %D 2017 %T Covariate selection for association screening in multiphenotype genetic studies. %A Aschard, Hugues %A Guillemot, Vincent %A Vilhjalmsson, Bjarni %A Patel, Chirag J %A Skurnik, David %A Ye, Chun J %A Wolpin, Brian %A Kraft, Peter %A Zaitlen, Noah %K Algorithms %K Genetic Association Studies %K Genetic Variation %K Genome-Wide Association Study %K Genotype %K Humans %K Models, Genetic %K Multivariate Analysis %K Phenotype %K Reproducibility of Results %K Sample Size %X

Testing for associations in big data faces the problem of multiple comparisons, wherein true signals are difficult to detect on the background of all associations queried. This difficulty is particularly salient in human genetic association studies, in which phenotypic variation is often driven by numerous variants of small effect. The current strategy to improve power to identify these weak associations consists of applying standard marginal statistical approaches and increasing study sample sizes. Although successful, this approach does not leverage the environmental and genetic factors shared among the multiple phenotypes collected in contemporary cohorts. Here we developed covariates for multiphenotype studies (CMS), an approach that improves power when correlated phenotypes are measured on the same samples. Our analyses of real and simulated data provide direct evidence that correlated phenotypes can be used to achieve increases in power to levels often surpassing the power gained by a twofold increase in sample size.

%B Nat Genet %V 49 %P 1789-1795 %8 2017 Dec %G eng %N 12 %1 https://www.ncbi.nlm.nih.gov/pubmed/29038595?dopt=Abstract %R 10.1038/ng.3975 %0 Journal Article %J Nat Genet %D 2017 %T Estimating the selective effects of heterozygous protein-truncating variants from human exome data. %A Cassa, Christopher A %A Weghorn, Donate %A Balick, Daniel J %A Jordan, Daniel M %A Nusinow, David %A Samocha, Kaitlin E %A O'Donnell-Luria, Anne %A MacArthur, Daniel G %A Daly, Mark J %A Beier, David R %A Sunyaev, Shamil R %K Algorithms %K Animals %K Bayes Theorem %K Exome %K Gene Frequency %K Genetic Predisposition to Disease %K Genetic Variation %K Genome-Wide Association Study %K Genotype %K Heterozygote %K Humans %K Mice, Knockout %K Models, Genetic %K Mutation %K Selection, Genetic %K Sequence Analysis, DNA %X

The evolutionary cost of gene loss is a central question in genetics and has been investigated in model organisms and human cell lines. In humans, tolerance of the loss of one or both functional copies of a gene is related to the gene's causal role in disease. However, estimates of the selection and dominance coefficients in humans have been elusive. Here we analyze exome sequence data from 60,706 individuals to make genome-wide estimates of selection against heterozygous loss of gene function. Using this distribution of selection coefficients for heterozygous protein-truncating variants (PTVs), we provide corresponding Bayesian estimates for individual genes. We find that genes under the strongest selection are enriched in embryonic lethal mouse knockouts, Mendelian disease-associated genes, and regulators of transcription. Screening by essentiality, we find a large set of genes under strong selection that are likely to have crucial functions but have not yet been thoroughly characterized.

%B Nat Genet %V 49 %P 806-810 %8 2017 May %G eng %N 5 %1 https://www.ncbi.nlm.nih.gov/pubmed/28369035?dopt=Abstract %R 10.1038/ng.3831 %0 Journal Article %J Nature %D 2017 %T Genetic effects on gene expression across human tissues. %A Battle, Alexis %A Brown, Christopher D %A Engelhardt, Barbara E %A Montgomery, Stephen B %K Alleles %K Chromosomes, Human %K Disease %K Female %K Gene Expression Profiling %K Gene Expression Regulation %K Genetic Variation %K Genome, Human %K Genotype %K Humans %K Male %K Organ Specificity %K Quantitative Trait Loci %X

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

%B Nature %V 550 %P 204-213 %8 2017 Oct 11 %G eng %N 7675 %1 https://www.ncbi.nlm.nih.gov/pubmed/29022597?dopt=Abstract %R 10.1038/nature24277 %0 Journal Article %J Elife %D 2017 %T Genetic identification of a common collagen disease in puerto ricans via identity-by-descent mapping in a health system. %A Belbin, Gillian Morven %A Odgis, Jacqueline %A Sorokin, Elena P %A Yee, Muh-Ching %A Kohli, Sumita %A Glicksberg, Benjamin S %A Gignoux, Christopher R %A Wojcik, Genevieve L %A Van Vleck, Tielman %A Jeff, Janina M %A Linderman, Michael %A Schurmann, Claudia %A Ruderfer, Douglas %A Cai, Xiaoqiang %A Merkelson, Amanda %A Justice, Anne E %A Young, Kristin L %A Graff, Misa %A North, Kari E %A Peters, Ulrike %A James, Regina %A Hindorff, Lucia %A Kornreich, Ruth %A Edelmann, Lisa %A Gottesman, Omri %A Stahl, Eli Ea %A Cho, Judy H %A Loos, Ruth Jf %A Bottinger, Erwin P %A Nadkarni, Girish N %A Abul-Husn, Noura S %A Kenny, Eimear E %K Adolescent %K Adult %K Aged %K Child %K Collagen Diseases %K Female %K Fibrillar Collagens %K Genotype %K Heterozygote %K Hispanic Americans %K Homozygote %K Humans %K Male %K Middle Aged %K Molecular Epidemiology %K Multigene Family %K Musculoskeletal Diseases %K New York City %K Pedigree %K Whole Genome Sequencing %K Young Adult %X

Achieving confidence in the causality of a disease locus is a complex task that often requires supporting data from both statistical genetics and clinical genomics. Here we describe a combined approach to identify and characterize a genetic disorder that leverages distantly related patients in a health system and population-scale mapping. We utilize genomic data to uncover components of distant pedigrees, in the absence of recorded pedigree information, in the multi-ethnic Bio biobank in New York City. By linking to medical records, we discover a locus associated with both elevated genetic relatedness and extreme short stature. We link the gene, , with a little-known genetic disease, previously thought to be rare and recessive. We demonstrate that disease manifests in both heterozygotes and homozygotes, indicating a common collagen disorder impacting up to 2% of individuals of Puerto Rican ancestry, leading to a better understanding of the continuum of complex and Mendelian disease.

%B Elife %V 6 %8 2017 09 12 %G eng %1 https://www.ncbi.nlm.nih.gov/pubmed/28895531?dopt=Abstract %R 10.7554/eLife.25060 %0 Journal Article %J Nature %D 2017 %T The impact of rare variation on gene expression across tissues. %A Li, Xin %A Kim, Yungil %A Tsang, Emily K %A Davis, Joe R %A Damani, Farhan N %A Chiang, Colby %A Hess, Gaelen T %A Zappala, Zachary %A Strober, Benjamin J %A Scott, Alexandra J %A Li, Amy %A Ganna, Andrea %A Bassik, Michael C %A Merker, Jason D %A Hall, Ira M %A Battle, Alexis %A Montgomery, Stephen B %K Bayes Theorem %K Female %K Gene Expression Profiling %K Genetic Variation %K Genome, Human %K Genomics %K Genotype %K Humans %K Male %K Models, Genetic %K Organ Specificity %K Sequence Analysis, RNA %X

Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.

%B Nature %V 550 %P 239-243 %8 2017 10 11 %G eng %N 7675 %1 https://www.ncbi.nlm.nih.gov/pubmed/29022581?dopt=Abstract %R 10.1038/nature24267 %0 Journal Article %J Nat Genet %D 2017 %T Population- and individual-specific regulatory variation in Sardinia. %A Pala, Mauro %A Zappala, Zachary %A Marongiu, Mara %A Li, Xin %A Davis, Joe R %A Cusano, Roberto %A Crobu, Francesca %A Kukurba, Kimberly R %A Gloudemans, Michael J %A Reinier, Frederic %A Berutti, Riccardo %A Piras, Maria G %A Mulas, Antonella %A Zoledziewska, Magdalena %A Marongiu, Michele %A Sorokin, Elena P %A Hess, Gaelen T %A Smith, Kevin S %A Busonero, Fabio %A Maschio, Andrea %A Steri, Maristella %A Sidore, Carlo %A Sanna, Serena %A Fiorillo, Edoardo %A Bassik, Michael C %A Sawcer, Stephen J %A Battle, Alexis %A Novembre, John %A Jones, Chris %A Angius, Andrea %A Abecasis, Gonçalo R %A Schlessinger, David %A Cucca, Francesco %A Montgomery, Stephen B %K Alternative Splicing %K Chromosome Mapping %K Family Health %K Female %K Gene Expression Profiling %K Genetic Predisposition to Disease %K Genetic Variation %K Genetics, Population %K Genome-Wide Association Study %K Genotype %K Humans %K Italy %K Male %K Polymorphism, Single Nucleotide %K Quantitative Trait Loci %K Transcription Initiation Site %X

Genetic studies of complex traits have mainly identified associations with noncoding variants. To further determine the contribution of regulatory variation, we combined whole-genome and transcriptome data for 624 individuals from Sardinia to identify common and rare variants that influence gene expression and splicing. We identified 21,183 expression quantitative trait loci (eQTLs) and 6,768 splicing quantitative trait loci (sQTLs), including 619 new QTLs. We identified high-frequency QTLs and found evidence of selection near genes involved in malarial resistance and increased multiple sclerosis risk, reflecting the epidemiological history of Sardinia. Using family relationships, we identified 809 segregating expression outliers (median z score of 2.97), averaging 13.3 genes per individual. Outlier genes were enriched for proximal rare variants, providing a new approach to study large-effect regulatory variants and their relevance to traits. Our results provide insight into the effects of regulatory variants and their relationship to population history and individual genetic risk.

%B Nat Genet %V 49 %P 700-707 %8 2017 May %G eng %N 5 %1 https://www.ncbi.nlm.nih.gov/pubmed/28394350?dopt=Abstract %R 10.1038/ng.3840