Submitted by ja607 on
Title | A multi-task convolutional deep neural network for variant calling in single molecule sequencing. |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Luo, R, Sedlazeck, FJ, Lam, T-W, Schatz, MC |
Journal | Nat Commun |
Volume | 10 |
Issue | 1 |
Pagination | 998 |
Date Published | 2019 03 01 |
ISSN | 2041-1723 |
Keywords | Base Sequence, Computational Biology, DNA Mutational Analysis, Genome, Human, Genome-Wide Association Study, Genomics, Genotype, Genotyping Techniques, Humans, INDEL Mutation, Nanopores, Neural Networks, Computer, Polymorphism, Single Nucleotide, Sequence Analysis, DNA, Software |
Abstract | The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model. |
DOI | 10.1038/s41467-019-09025-z |
Alternate Journal | Nat Commun |
PubMed ID | 30824707 |
PubMed Central ID | PMC6397153 |
Grant List | R01 HG006677 / HG / NHGRI NIH HHS / United States UM1 HG008898 / HG / NHGRI NIH HHS / United States |