|Analysis in case-control sequencing association studies with different sequencing depths.
|Year of Publication
|Chen, S, Lin, X
|2020 07 01
With the advent of next-generation sequencing, investigators have access to higher quality sequencing data. However, to sequence all samples in a study using next generation sequencing can still be prohibitively expensive. One potential remedy could be to combine next generation sequencing data from cases with publicly available sequencing data for controls, but there could be a systematic difference in quality of sequenced data, such as sequencing depths, between sequenced study cases and publicly available controls. We propose a regression calibration (RC)-based method and a maximum-likelihood method for conducting an association study with such a combined sample by accounting for differential sequencing errors between cases and controls. The methods allow for adjusting for covariates, such as population stratification as confounders. Both methods control type I error and have comparable power to analysis conducted using the true genotype with sufficiently high but different sequencing depths. We show that the RC method allows for analysis using naive variance estimate (closely approximates true variance in practice) and standard software under certain circumstances. We evaluate the performance of the proposed methods using simulation studies and apply our methods to a combined data set of exome sequenced acute lung injury cases and healthy controls from the 1000 Genomes project.
|PubMed Central ID
|R35 CA197449 / CA / NCI NIH HHS / United States
P01 CA134294 / CA / NCI NIH HHS / United States
U01 HG009088 / HG / NHGRI NIH HHS / United States
U19 CA203654 / CA / NCI NIH HHS / United States
R01 HL113338 / HL / NHLBI NIH HHS / United States
P42 ES016454 / ES / NIEHS NIH HHS / United States
RC2 HL101779 / HL / NHLBI NIH HHS / United States