%0 Journal Article %J Nat Commun %D 2019 %T A multi-task convolutional deep neural network for variant calling in single molecule sequencing. %A Luo, Ruibang %A Sedlazeck, Fritz J %A Lam, Tak-Wah %A Schatz, Michael C %K Base Sequence %K Computational Biology %K DNA Mutational Analysis %K Genome, Human %K Genome-Wide Association Study %K Genomics %K Genotype %K Genotyping Techniques %K Humans %K INDEL Mutation %K Nanopores %K Neural Networks, Computer %K Polymorphism, Single Nucleotide %K Sequence Analysis, DNA %K Software %X

The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.

%B Nat Commun %V 10 %P 998 %8 2019 03 01 %G eng %N 1 %1 https://www.ncbi.nlm.nih.gov/pubmed/30824707?dopt=Abstract %R 10.1038/s41467-019-09025-z