Combining DNA Methylation with Deep Learning Improves Sensitivity and Accuracy of Eukaryotic Genome Annotation

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us

Date

2020-04

Journal Title

Journal ISSN

Volume Title

Publisher

[Bloomington, Ind.] : Indiana University

Abstract

The genome assembly process has significantly decreased in computational complexity since the advent of third-generation long-read technologies. However, genome annotations still require significant manual effort from scientists to produce trust-worthy annotations required for most bioinformatic analyses. Current methods for automatic eukaryotic annotation rely on sequence homology, structure, or repeat detection, and each method requires a separate tool, making the workflow for a final product a complex ensemble. Beyond the nucleotide sequence, one important component of genetic architecture is the presence of epigenetic marks, including DNA methylation. However, no automatic annotation tools currently use this valuable information. As methylation data becomes more widely available from nanopore sequencing technology, tools that take advantage of patterns in this data will be in demand. The goal of this dissertation was to improve the annotation process by developing and training a recurrent neural network (RNN) on trusted annotations to recognize multiple classes of elements from both the reference sequence and DNA methylation. We found that our proposed tool, RNNotate, detected fewer coding elements than GlimmerHMM and Augustus, but those predictions were more often correct. When predicting transposable elements, RNNotate was more accurate than both Repeat-Masker and RepeatScout. Additionally, we found that RNNotate was significantly less sensitive when trained and run without DNA methylation, validating our hypothesis. To our best knowledge, we are not only the first group to use recurrent neural networks for eukaryotic genome annotation, but we also innovated in the data space by utilizing DNA methylation patterns for prediction.

Description

Thesis (Ph.D.) - Indiana University, School of Informatics, Computing, and Engineering, 2020

Keywords

genome annotation, deep learning, rnn, dna methylation, epigenetics

Citation

Journal

DOI

Link(s) to data and video for this item

Relation

Rights

Type

Doctoral Dissertation