Analysis in case-control sequencing association studies with different sequencing depths.

TitleAnalysis in case-control sequencing association studies with different sequencing depths.
Publication TypeJournal Article
Year of Publication2020
AuthorsChen, S, Lin, X
JournalBiostatistics
Volume21
Issue3
Pagination577-593
Date Published2020 07 01
ISSN1468-4357
Abstract

With the advent of next-generation sequencing, investigators have access to higher quality sequencing data. However, to sequence all samples in a study using next generation sequencing can still be prohibitively expensive. One potential remedy could be to combine next generation sequencing data from cases with publicly available sequencing data for controls, but there could be a systematic difference in quality of sequenced data, such as sequencing depths, between sequenced study cases and publicly available controls. We propose a regression calibration (RC)-based method and a maximum-likelihood method for conducting an association study with such a combined sample by accounting for differential sequencing errors between cases and controls. The methods allow for adjusting for covariates, such as population stratification as confounders. Both methods control type I error and have comparable power to analysis conducted using the true genotype with sufficiently high but different sequencing depths. We show that the RC method allows for analysis using naive variance estimate (closely approximates true variance in practice) and standard software under certain circumstances. We evaluate the performance of the proposed methods using simulation studies and apply our methods to a combined data set of exome sequenced acute lung injury cases and healthy controls from the 1000 Genomes project.

DOI10.1093/biostatistics/kxy073
Alternate JournalBiostatistics
PubMed ID30590456
PubMed Central IDPMC7308042
Grant ListR35 CA197449 / CA / NCI NIH HHS / United States
P01 CA134294 / CA / NCI NIH HHS / United States
U01 HG009088 / HG / NHGRI NIH HHS / United States
U19 CA203654 / CA / NCI NIH HHS / United States
R01 HL113338 / HL / NHLBI NIH HHS / United States
P42 ES016454 / ES / NIEHS NIH HHS / United States
RC2 HL101779 / HL / NHLBI NIH HHS / United States