%0 Journal Article %J Nat Genet %D 2019 %T Genome sequencing analysis identifies Epstein-Barr virus subtypes associated with high risk of nasopharyngeal carcinoma. %A Xu, Miao %A Yao, Youyuan %A Chen, Hui %A Zhang, Shanshan %A Cao, Su-Mei %A Zhang, Zhe %A Luo, Bing %A Liu, Zhiwei %A Li, Zilin %A Xiang, Tong %A He, Guiping %A Feng, Qi-Sheng %A Chen, Li-Zhen %A Guo, Xiang %A Jia, Wei-Hua %A Chen, Ming-Yuan %A Zhang, Xiao %A Xie, Shang-Hang %A Peng, Roujun %A Chang, Ellen T %A Pedergnana, Vincent %A Feng, Lin %A Bei, Jin-Xin %A Xu, Rui-Hua %A Zeng, Mu-Sheng %A Ye, Weimin %A Adami, Hans-Olov %A Lin, Xihong %A Zhai, Weiwei %A Zeng, Yi-Xin %A Liu, Jianjun %X

Epstein-Barr virus (EBV) infection is ubiquitous worldwide and is associated with multiple cancers, including nasopharyngeal carcinoma (NPC). The importance of EBV viral genomic variation in NPC development and its striking epidemic in southern China has been poorly explored. Through large-scale genome sequencing of 270 EBV isolates and two-stage association study of EBV isolates from China, we identify two non-synonymous EBV variants within BALF2 that are strongly associated with the risk of NPC (odds ratio (OR) = 8.69, P = 9.69 × 10 for SNP 162476_C; OR = 6.14, P = 2.40 × 10 for SNP 163364_T). The cumulative effects of these variants contribute to 83% of the overall risk of NPC in southern China. Phylogenetic analysis of the risk variants reveals a unique origin in Asia, followed by clonal expansion in NPC-endemic regions. Our results provide novel insights into the NPC endemic in southern China and also enable the identification of high-risk individuals for NPC prevention.

%B Nat Genet %V 51 %P 1131-1136 %8 2019 Jul %G eng %N 7 %1 https://www.ncbi.nlm.nih.gov/pubmed/31209392?dopt=Abstract %R 10.1038/s41588-019-0436-5 %0 Journal Article %J J Am Stat Assoc %D 2017 %T The Generalized Higher Criticism for Testing SNP-Set Effects in Genetic Association Studies. %A Barnett, Ian %A Mukherjee, Rajarshi %A Lin, Xihong %X

It is of substantial interest to study the effects of genes, genetic pathways, and networks on the risk of complex diseases. These genetic constructs each contain multiple SNPs, which are often correlated and function jointly, and might be large in number. However, only a sparse subset of SNPs in a genetic construct is generally associated with the disease of interest. In this article, we propose the generalized higher criticism (GHC) to test for the association between an SNP set and a disease outcome. The higher criticism is a test traditionally used in high-dimensional signal detection settings when marginal test statistics are independent and the number of parameters is very large. However, these assumptions do not always hold in genetic association studies, due to linkage disequilibrium among SNPs and the finite number of SNPs in an SNP set in each genetic construct. The proposed GHC overcomes the limitations of the higher criticism by allowing for arbitrary correlation structures among the SNPs in an SNP-set, while performing accurate analytic -value calculations for any finite number of SNPs in the SNP-set. We obtain the detection boundary of the GHC test. We compared empirically using simulations the power of the GHC method with existing SNP-set tests over a range of genetic regions with varied correlation structures and signal sparsity. We apply the proposed methods to analyze the CGEM breast cancer genome-wide association study. Supplementary materials for this article are available online.

%B J Am Stat Assoc %V 112 %P 64-76 %8 2017 %G eng %N 517 %1 https://www.ncbi.nlm.nih.gov/pubmed/28736464?dopt=Abstract %R 10.1080/01621459.2016.1192039