Mining Microbial Genomes from Datasets on the Sequence Read Archive

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

2020-01-15

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The declining costs of genome sequencing and growing amounts of genetic data has allowed the field of genomics to become more integrated with computational analysis. The use of high performance clusters (HPC) is necessary to compute the large amounts of data in genomic projects, however, many biologists lack background experience in working with HPC systems, which limits their ability to best address their research questions. The National Center of Genome Analysis Support (NCGAS) is an NSF-funded center that focuses on filling this need, by providing training as workshops, bioinformatics support on projects, and access to compute resources. As a byproduct of helping research projects, we develop open source workflows and make them available to the community. Here we present a developed workflow that will assist researchers in mining the Sequence Read Archive (SRA), to identify environments/datasets potentially containing genomes of interest, and identify their closely related genomes. As a proof of concept, we used two genomes to test the developed workflow, selected to ensure the flexibility of the workflow to generate results in formats amiable to further downstream analysis, based on the research question. The developed pipeline is made available through GitHub (https://github.com/NCGAS/CEWiT-REU-Identifying-datasets-in-SRA-using-Jetstream), and available as a pre-installed workflow on the XSEDE Jetstream cloud computing infrastructure. 

Description

Keywords

NCGAS, sequencing, database, sequence read archive

Citation

Papudeshi B, Leffler H, Ganapaneni S, Sanders SA, Ganote C, and Doak TG. (2020). Mining Microbial Genomes from Datasets on the Sequence Read Archive. Plant and Animal Genome 2020, San Diego, California. Available at http://hdl.handle.net/2022/25300.

Journal

DOI

Link(s) to data and video for this item

Relation

Rights

Except where otherwise noted, the contents of this presentation are copyright of the Trustees of Indiana University. This license includes the following terms: You are free to share -to copy, distribute and transmit the work and to remix -to adapt the work under the following conditions: attribution -you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.

Type

Presentation

Collections