Browsing by Author "Papudeshi, Bhavya"
Now showing 1 - 20 of 27
- Results Per Page
- Sort Options
Item Bioinformatic analysis using Jetstream, a cloud computing environment(Plant and Animal Genomics 2018, 2018-01-15) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Fischer, Jeremy; Doak, TomNational Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high-performance computing systems. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment (as most high-performance computing clusters). Jetstream (https://jetstream-cloud.org/) is a cloud computing resource that provides access to preconfigured virtual machines, making the transition relatively effortless, flattening the learning curve needed to get results from experiments that otherwise produce an untenable amount of data. Currently, over 14% of all allocations of usage on Jetstream are for biology other than protein folding – the majority of this being some sort of genomic analysis. NCGAS currently hosts over 123 genome analysis and bioinformatics software titles on Jetstream as preconfigured virtual machines. In this digital tool and resources workshop, we will demonstrate on how to set up Jetstream accounts, start a preconfigured virtual machine, and run genomic analysis on this virtual machine (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php). Jetstream also provides environments for prototyping and publishing tailored workflows that gives researchers access to interactive computing and data analysis resources on demand.Item Compute resources available to the research community for microbiome analysis(Plant and Animal Genome XXVII, 2019-01-16) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomThe National Center for Genome Analysis Support (NCGAS) is an NSF-funded center tasked with assisting biologists in getting access to computational resources they need in order to analyze genomic data. To support microbiome analysis, NCGAS provides preconfigured virtual machines (VM) to identify taxa in 16S amplicon sequencing, and to identify both taxa and functions from whole genome metagenomes. Additionally, a pipeline to reconstruct genomes from metagenomes, to examine the role of specific microbes in a community, is available as a preconfigured VM hosting Anvi’o (https://ncgas.org/Blog_Posts/Running%20Anvio%20on%20Jetstream.php). Jetstream, a cloud computing resource, is both easy to use and flattens the learning curve for using the Linux operating system and for installing bioinformatics software. Jetstream provides an environment for both prototyping and publishing tailored workflows. Through an NCGAS allocation, a researcher can get access to Jetstream, and to other national compute clusters with more memory and for parallel processing. These compute resources have Globus connect subscriptions which assists in transferring terabytes of data quickly. In this workshop, we will demonstrate how to get an NCGAS allocation, set up a Jetstream account, spin up a preconfigured virtual machine for microbiome analysis (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php), and transfer data between compute clusters using Globus (https://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php).Item Coupling metagenomics with high-performance computing to mine the Sequence Read Archive (SRA) to analyze Pseudomonas phage PAK-P1(Center of Excellence for Women & Technology, 2019-04-12) Ganapaneni, Sruthi; Leffler, Haley; Papudeshi, Bhavya; Sanders, Sheri; Doak, ThomasItem Genome and transcriptome analysis of fish tapeworm Nippotaenia percotti through scientific collaboration between research labs and national cyberinfrastructure.(2020-01-13) Papudeshi, Bhavya; Chafin, Tyler K; Sanders, Sheri A; Ganote, Carrie; Reshetnikov, Andrey N; Sokolov, Sergey G; Doak, Thomas; Pummill, Jeff F; Douglas, Marlis R.; Douglas, Michael E.Tapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea, a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item The Genome of Fish Tapeworm Nippotaenia percotti as a Potential Bookmark for Gene Loci that Facilitates Anthropogenic Infection.(Plant and Animal Genome XXVII, 2019-01-14) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, Tom; Chafin, Tyler; Reshetnikov, Andrey; Sokolov, Sergey; Pummil, Jeff; Douglas, Marlis; Douglas, MichaelTapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item Introduction to Metagenomics(2018-10-23) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomItem Mining Microbial Genomes from Datasets on the Sequence Read Archive(2020-01-15) Papudeshi, Bhavya; Leffler, Haley; Ganapanei, Sruthi; Sanders, Sheri; Ganote, Carrie; Doak, ThomasThe declining costs of genome sequencing and growing amounts of genetic data has allowed the field of genomics to become more integrated with computational analysis. The use of high performance clusters (HPC) is necessary to compute the large amounts of data in genomic projects, however, many biologists lack background experience in working with HPC systems, which limits their ability to best address their research questions. The National Center of Genome Analysis Support (NCGAS) is an NSF-funded center that focuses on filling this need, by providing training as workshops, bioinformatics support on projects, and access to compute resources. As a byproduct of helping research projects, we develop open source workflows and make them available to the community. Here we present a developed workflow that will assist researchers in mining the Sequence Read Archive (SRA), to identify environments/datasets potentially containing genomes of interest, and identify their closely related genomes. As a proof of concept, we used two genomes to test the developed workflow, selected to ensure the flexibility of the workflow to generate results in formats amiable to further downstream analysis, based on the research question. The developed pipeline is made available through GitHub (https://github.com/NCGAS/CEWiT-REU-Identifying-datasets-in-SRA-using-Jetstream), and available as a pre-installed workflow on the XSEDE Jetstream cloud computing infrastructure.Item Mining the Sequence Read Archive to identify crAssphage, a ubiquitous inhabitant of the human microbiome, in dog and pig samples(Center of Excellence for Women & Technology, 2019-04-12) Leffler, Haley; Ganapaneni, Sruthi; Papudeshi, Bhavya; Sanders, Sheri; Doak, ThomasItem National Center for Genome Analysis Support (NCGAS) use and development of Tripal Genome Browsers on XSEDE’s Jetstream(Plant and Animal Genomics 2018, 2018-01-14) Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya; Doak, TomThe National Center for Genome Analysis Support (NCGAS) helps the biological community analyze, understand, and make use of the vast amount of genomic information now available. To this end, NCGAS develops and supports genome browsers for several genomics projects, using Tripal as its front end for the last year. Since the adoption of Tripal, we’ve modified some of the tools to develop features our users find useful. We will introduce our groups’ projects and give a demo of these tools in our browsers. These tools include: 1) Modifications to tripal_blast module to link blast reports to external JBrowse sites through URL manipulation—allowing visualization against potentially any browser with publicly available data with which to build a blast database. 2) A GUI based web tool to spin up new JBrowse instances, with on-the-fly GUI-based track addition/removal, which allows for more flexible community visualization. 3) Finally, we will demo the virtual machine image built on the XSEDE cloud (Jetstream) with Tripal, JBrowse, and these tools installed - allowing for free genome browser hosting with minimal command line use. It is our hope that these tools will reduce the learning curve required to make use of Tripal genome visualization tools.Item National Center for Genome Analysis Support (NCGAS): Genomics and other Science in the NSF-Funded Jetstream Cloud(2020-01-13) Doak, Thomas; Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya; Fischer, Jeremy; Hancock, David Y.The National Center for Genome Analysis Support (NCGAS) is an NSF-funded (NSF-1445604) center that helps all NSF-funded researchers doing genomics research. Genomics includes transcriptomics, metagenomics, genome annotation, etc. Our support includes providing access to large memory computing, maintaining curated sets of genomics applications, providing one-on-one consultation, and creating educational opportunities. A resource that we have come to rely on for providing these services is the NSF-funded Jetstream Cloud—maintained by Indiana University (led by the Indiana University Pervasive Technology Institute (PTI) and the University of Texas at Austin's Texas Advanced Computing Center (TACC). Additionally, we leverage Globus data transfer tools. Globus at the University of Chicago is responsible for integrating Jetstream with the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE), and for integrating Globus data movement and management tools, as well as Globus-based secure user authentication. With a focus on ease of use and broad accessibility, Jetstream is designed for those who have not previously used high performance computing and software resources—for researchers who need more than desktop-strength computing but less than full-scale High Performance Computing (HPC). Jetstream features a web-based user interface based on the popular Atmosphere cloud computing environment—developed by CyVerse—extended to support science and engineering research generally. The system is particularly geared toward 21st-century workforce development at small colleges and universities – especially historically black colleges and universities, minority serving institutions, tribal colleges, and higher education institutions in EPSCoR States. Jetstream provides a library of virtual machines designed to do discipline-specific scientific analysis, but researchers can also develop their own VMs, with their own software sets, or sets specialized to a particular task. These VMs can be both saved and shared with collaborators. Currently there are 19 genomics VMs, including RStudio instances with bioconductor, ready-made genome browsers with JBrowse/Tripal, and metagenomic tools like QIIME2 and Anvi’o. biology and molecular biology researchers are the largest users of Jetstream. NCGAS has found VMs extremely useful in education and workshops: we develop class-specific VMs, with all the applications needed, then clone, so that each student has their own VM to work on (making courses easy to scale). In addition to on-demand VMs, persistent science gateways can be established using template VMs NCGAS has built. These can be used to provide services to collaborators or to the world. Users can easily create Galaxy servers on Jetstream: each server comes preconfigured with hundreds of tools and commonly used reference datasets—once running, researchers can use it or customize it. Many NCGAS users establish genome browsers—specific to their organism—that are shared with small sets of collaborating researchers—but can be shared to the world. Jetstream is accessed via an allocation process at XSEDE—a startup allocation is typically approved within a day.Item Navigating High Performance Computing (HPC) Resources(2018-10-04) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, TomItem Navigating the Sequence Read Archive to identify crAssphage, an ubiquitous inhabitant of the human microbiome(Jim Holland Summer Science Research Program Poster Session, 2019-07-14) Cai, Jasmine X.; Weathers, Jania G.; Leffler, Haley; Ganapaneni, Sruthi; Papudeshi, Bhavya; Sanders, Sheri; Doak, Thomas G.The declining costs of genome sequencing and growing amounts of genetic data is evolving the field of genomics to become more integrated with computational analysis. The use of high performance clusters(HPC) are necessary to compute the large amounts of data in genomic projects. However, many biologists lack the background experience in working with HPC systems, which limits their ability to best address their research questions. National Center of Genome Analysis Support (NCGAS) is an NSF funded center that focuses on filling this crevice, through helping the research through providing training as workshops, bioinformatics support on projects, and access to compute resources. As a byproduct of helping on research projects, we develop open source workflows and make them available to the community. Here we present a developed workflow that will assist researchers in mining the sequence read archive (SRA), to identify other environments/datasets potentially contain a genome of interest, and identify their closely related genomes. As a proof of concept, we used two genomes to test the developed workflow. We selected these two different genomes to ensure the flexibility of the workflow to generate results in formats to aid further downstream analysis based on the research question.The developed pipeline will be made available through an NSF cloud computing platform, Jetstream with documentation to the research community.Item NCGAS 2020 Annual User Survey(2020-10-26) Wernert, Julie; Sanders, Sheri; Papudeshi, Bhavya; Jankowski, HarmonyItem NCGAS makes robust transcriptome analysis easier with a readily usable workflow following de novo assembly best practices(Plant and Animal Genomics 2018, 2018-01-17) Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya; Mockaitis, Keithanne; Doak, TomThe National Center for Genome Analysis Support (NCGAS) assists research groups with de novo transcriptome assembly. Best practices for such analyses include sample pooling, running multiple assembler algorithms with multiple parameters, combining the assemblies, and filtering the redundancy/erroneously assembled transcripts. These combined de novo transcriptome assemblies can put a technical burden on genomic researchers who may not be fully computationally trained on efficient use of HPC clusters. NCGAS has created a workflow template to move client data through 19 parallelized assemblies using four software packages (Trinity, SOAP-denovo, transABySS, and VelvetOases) and multiple khmers. The transcripts are then combined and filtered using EviGenes to output putative transcripts and alternative forms in a replicable manner. The process is semi-automated but flexible enough to allow researchers to adjust parameters if they desire. While designed for IU machines and XSEDE’s Bridges, allocations on these machines are available to any genomics researchers in US and the job scripts can be easily adjusted for other job handlers/clusters. This workflow provides a low bar for entry into robust transcriptome assembly that follows best practices, while also providing a replicable means of filtering large numbers of transcripts into a draft version of a transcriptome. Scripts can be found at https://github.com/NCGAS/IndianaUniversity/tree/master/Transcriptome_Workflow_Mason.Item NCGAS Makes Robust Transcriptome Assembly Even Easier with Added Features to an Accessible de novo Transcriptome Assembly Workflow(Plant and Animal Genome XXVII, 2019-01-12) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, TomThe National Center for Genome Analysis Support (NCGAS) assists research groups with de novo transcriptome assembly. Following best practice for combined de novo transcriptome assemblies can put a technical burden on genomic researchers who may not be fully computationally trained on efficient use of HPC clusters or the variety of available software packages. NCGAS has created a workflow template to move RNAseq data through 19 parallelized assemblies using four software packages (Trinity, SOAP-denovo, transABySS, and Velvet Oases) and multiple kmers. The transcripts are then combined and filtered using EviGenes to output putative transcripts and alternative forms in a replicable manner. The process is semi-automated but flexible enough to allow researchers to adjust parameters if they desire. This workflow provides a low bar for entry into robust transcriptome assembly that follows best practices, while also providing a replicable means of filtering large numbers of transcripts into a draft version of a transcriptome. We will highlight the main work flow in this demo but will concentrate on the additional features added to the workflow in the last year, including annotation via Trinotate, differential expression handling, and the automated creation of table of assembly metrics via BUSCO and Quast for each sub-assembly. As this workflow has now been adopted by several groups, we will also discuss available training and current implementations of the tool.Item Population Genetics of Tree Swallows, in Collaboration with NCGAS(Plant and Animal Genome XXVII, 2019-01-14) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, Tom; Mansfield, Charles; Tseng, Chi Yen; Custer, Thomas; Custer, Christine; Matson, ColeThe National Center for Genome Analysis Support (NCGAS) provides training and computational resources in an effort to train biologists to approach historically-difficult, non-model problems with large biological data sets. For example, our collaborators at Baylor University work with Tree Swallow (Tachycineta bicolor), using RNAseq data in population genetics and toxicology. Working with the NCGAS, they assembled a de novo transcriptome assembly for the Tree Swallow, for which there is no genome. Variant calling using the transcriptome identified 66,169 single nucleotide polymorphisms (SNPs) across 144 samples. They were then able to identify phylogeographic structuring across the Great Lakes Region, including accurate grouping populations distributed across smaller geographic scales (e.g. along the Maumee River). SNPs were also used to assess population heterozygosity and genetic diversity. This project required large scale data handling, large memory machines to assembly the transcriptome, and advanced Linux skills to manage the data and analyses. NCGAS provided the computation resources and training on the Linux environment and data management. Further assistance was provided in consultation and problem solving - leading to a high level of independence and competency of the graduate student researcher.Item Pushing the limits of job flexibility on HPC(Galaxy Community Conference 2017, 2017-06-30) Ganote, Carrie; Sanders, Sheri; Doak, Tom; Brokaw, Cicada; Papudeshi, Bhavya; Haas, Brian; Bankapur, Asma; Tickle, Tim; Blood, PhilItem Reconstruction of Metagenome-Assembled Microbial Genomes from a Micro-biome (Poster)(Plant and Animal Genomics 2018, 2018-01-15) Papudeshi, BhavyaMicrobiome/host interactions describe characteristics that affect the host health; shotgun metagenomics sequences microbiome samples, allowing us to analyze its taxonomic and metabolic potential. Reconstruction of metagenome fragments into genomes (called metagenome-assembled genomes) that facilitates linking function to taxa within microbial symbionts. Reconstruction of genomes sort assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects with variable sequencing platforms, diversity, and environment using a set of parameters to select for optimal assembly and binning tools. We evaluated 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat) for four projects (105 metagenomes). We find that SPAdes assembled more contigs (143,718 ± 124) of longer length (N50 = 1632 ± 108 bp), incorporated the most sequences (19.65 %), and low chimera levels (microbial richness and evenness were maintained across assembly). SPAdes assembly was responsive to biological and technological variations within the projects. MetaBat binning tool produced bins, characteristic of a genome with less GC variation (standard deviation 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75). MetaBat extracted 115 bins of which 66 bins were identified as quality reconstructed metagenome-assembled genomes with a genus specific sequences. In conclusion, we present a set of biologically relevant parameters to select for optimal assembly and binning tools. SPAdes and MetaBat tools reconstructed quality metagenome-assembled genomes for the four projects included in this study.Item RNA-Seq Demo on Galaxy(2018-10-25) Ganote, Carrie; Papudeshi, Bhavya; Sanders, Sheri; Doak, TomItem Summary of annual NCGAS user survey (Summer 2018)(2018-09-10) Wernert, Julie; Doak, Thomas G.; Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya