Presentations
Permanent link for this collectionhttps://hdl.handle.net/2022/14534
Browse
Browsing Presentations by Author "Doak, Tom"
Now showing 1 - 20 of 25
- Results Per Page
- Sort Options
Item Bioinformatic analysis using Jetstream, a cloud computing environment(Plant and Animal Genomics 2018, 2018-01-15) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Fischer, Jeremy; Doak, TomNational Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high-performance computing systems. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment (as most high-performance computing clusters). Jetstream (https://jetstream-cloud.org/) is a cloud computing resource that provides access to preconfigured virtual machines, making the transition relatively effortless, flattening the learning curve needed to get results from experiments that otherwise produce an untenable amount of data. Currently, over 14% of all allocations of usage on Jetstream are for biology other than protein folding – the majority of this being some sort of genomic analysis. NCGAS currently hosts over 123 genome analysis and bioinformatics software titles on Jetstream as preconfigured virtual machines. In this digital tool and resources workshop, we will demonstrate on how to set up Jetstream accounts, start a preconfigured virtual machine, and run genomic analysis on this virtual machine (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php). Jetstream also provides environments for prototyping and publishing tailored workflows that gives researchers access to interactive computing and data analysis resources on demand.Item Cluster Quick Guide(2016-08-03) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Cluster Quick Start(2017-07-12) Sanders, Sheri; Doak, Tom; Ganote, CarrieItem Compute resources available to the research community for microbiome analysis(Plant and Animal Genome XXVII, 2019-01-16) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomThe National Center for Genome Analysis Support (NCGAS) is an NSF-funded center tasked with assisting biologists in getting access to computational resources they need in order to analyze genomic data. To support microbiome analysis, NCGAS provides preconfigured virtual machines (VM) to identify taxa in 16S amplicon sequencing, and to identify both taxa and functions from whole genome metagenomes. Additionally, a pipeline to reconstruct genomes from metagenomes, to examine the role of specific microbes in a community, is available as a preconfigured VM hosting Anvi’o (https://ncgas.org/Blog_Posts/Running%20Anvio%20on%20Jetstream.php). Jetstream, a cloud computing resource, is both easy to use and flattens the learning curve for using the Linux operating system and for installing bioinformatics software. Jetstream provides an environment for both prototyping and publishing tailored workflows. Through an NCGAS allocation, a researcher can get access to Jetstream, and to other national compute clusters with more memory and for parallel processing. These compute resources have Globus connect subscriptions which assists in transferring terabytes of data quickly. In this workshop, we will demonstrate how to get an NCGAS allocation, set up a Jetstream account, spin up a preconfigured virtual machine for microbiome analysis (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php), and transfer data between compute clusters using Globus (https://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php).Item Data Movement and Management(2016-08-04) Sanders, Sheri; Ganote, Carrie; Doak, TomItem deNovo Transcriptome Assembly(2017-07-13) Sanders, Sheri; Ganote, Carrie; Doak, TomItem The Genome of Fish Tapeworm Nippotaenia percotti as a Potential Bookmark for Gene Loci that Facilitates Anthropogenic Infection.(Plant and Animal Genome XXVII, 2019-01-14) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, Tom; Chafin, Tyler; Reshetnikov, Andrey; Sokolov, Sergey; Pummil, Jeff; Douglas, Marlis; Douglas, MichaelTapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item Introduction to Metagenomics(2018-10-23) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomItem Moving Forward in Bioinformatics(2016-11-30) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Moving Off GUIs: A Guide to What’s Next(2016-10-27) Sanders, Sheri; Ganote, Carrie; Doak, TomItem The National Center for Genome Analysis Support(2017-05-15) Ganote, Carrie; Doak, Tom; Blood, Phil; Sanders, SheriItem National Center for Genome Analysis Support (NCGAS) use and development of Tripal Genome Browsers on XSEDE’s Jetstream(Plant and Animal Genomics 2018, 2018-01-14) Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya; Doak, TomThe National Center for Genome Analysis Support (NCGAS) helps the biological community analyze, understand, and make use of the vast amount of genomic information now available. To this end, NCGAS develops and supports genome browsers for several genomics projects, using Tripal as its front end for the last year. Since the adoption of Tripal, we’ve modified some of the tools to develop features our users find useful. We will introduce our groups’ projects and give a demo of these tools in our browsers. These tools include: 1) Modifications to tripal_blast module to link blast reports to external JBrowse sites through URL manipulation—allowing visualization against potentially any browser with publicly available data with which to build a blast database. 2) A GUI based web tool to spin up new JBrowse instances, with on-the-fly GUI-based track addition/removal, which allows for more flexible community visualization. 3) Finally, we will demo the virtual machine image built on the XSEDE cloud (Jetstream) with Tripal, JBrowse, and these tools installed - allowing for free genome browser hosting with minimal command line use. It is our hope that these tools will reduce the learning curve required to make use of Tripal genome visualization tools.Item Navigating High Performance Computing (HPC) Resources(2017-04-12) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Navigating High Performance Computing (HPC) Resources(2018-10-04) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, TomItem NCGAS makes robust transcriptome analysis easier with a readily usable workflow following de novo assembly best practices(Plant and Animal Genomics 2018, 2018-01-17) Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya; Mockaitis, Keithanne; Doak, TomThe National Center for Genome Analysis Support (NCGAS) assists research groups with de novo transcriptome assembly. Best practices for such analyses include sample pooling, running multiple assembler algorithms with multiple parameters, combining the assemblies, and filtering the redundancy/erroneously assembled transcripts. These combined de novo transcriptome assemblies can put a technical burden on genomic researchers who may not be fully computationally trained on efficient use of HPC clusters. NCGAS has created a workflow template to move client data through 19 parallelized assemblies using four software packages (Trinity, SOAP-denovo, transABySS, and VelvetOases) and multiple khmers. The transcripts are then combined and filtered using EviGenes to output putative transcripts and alternative forms in a replicable manner. The process is semi-automated but flexible enough to allow researchers to adjust parameters if they desire. While designed for IU machines and XSEDE’s Bridges, allocations on these machines are available to any genomics researchers in US and the job scripts can be easily adjusted for other job handlers/clusters. This workflow provides a low bar for entry into robust transcriptome assembly that follows best practices, while also providing a replicable means of filtering large numbers of transcripts into a draft version of a transcriptome. Scripts can be found at https://github.com/NCGAS/IndianaUniversity/tree/master/Transcriptome_Workflow_Mason.Item NCGAS Makes Robust Transcriptome Assembly Even Easier with Added Features to an Accessible de novo Transcriptome Assembly Workflow(Plant and Animal Genome XXVII, 2019-01-12) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, TomThe National Center for Genome Analysis Support (NCGAS) assists research groups with de novo transcriptome assembly. Following best practice for combined de novo transcriptome assemblies can put a technical burden on genomic researchers who may not be fully computationally trained on efficient use of HPC clusters or the variety of available software packages. NCGAS has created a workflow template to move RNAseq data through 19 parallelized assemblies using four software packages (Trinity, SOAP-denovo, transABySS, and Velvet Oases) and multiple kmers. The transcripts are then combined and filtered using EviGenes to output putative transcripts and alternative forms in a replicable manner. The process is semi-automated but flexible enough to allow researchers to adjust parameters if they desire. This workflow provides a low bar for entry into robust transcriptome assembly that follows best practices, while also providing a replicable means of filtering large numbers of transcripts into a draft version of a transcriptome. We will highlight the main work flow in this demo but will concentrate on the additional features added to the workflow in the last year, including annotation via Trinotate, differential expression handling, and the automated creation of table of assembly metrics via BUSCO and Quast for each sub-assembly. As this workflow has now been adopted by several groups, we will also discuss available training and current implementations of the tool.Item NCGAS: National Resources for Computationally Intensive Bioinformatics(PEARC 2017, 2017-07) Doak, Tom; Ganote, Carrie; Sanders, Sheri; Hallock, BarbItem Population Genetics of Tree Swallows, in Collaboration with NCGAS(Plant and Animal Genome XXVII, 2019-01-14) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, Tom; Mansfield, Charles; Tseng, Chi Yen; Custer, Thomas; Custer, Christine; Matson, ColeThe National Center for Genome Analysis Support (NCGAS) provides training and computational resources in an effort to train biologists to approach historically-difficult, non-model problems with large biological data sets. For example, our collaborators at Baylor University work with Tree Swallow (Tachycineta bicolor), using RNAseq data in population genetics and toxicology. Working with the NCGAS, they assembled a de novo transcriptome assembly for the Tree Swallow, for which there is no genome. Variant calling using the transcriptome identified 66,169 single nucleotide polymorphisms (SNPs) across 144 samples. They were then able to identify phylogeographic structuring across the Great Lakes Region, including accurate grouping populations distributed across smaller geographic scales (e.g. along the Maumee River). SNPs were also used to assess population heterozygosity and genetic diversity. This project required large scale data handling, large memory machines to assembly the transcriptome, and advanced Linux skills to manage the data and analyses. NCGAS provided the computation resources and training on the Linux environment and data management. Further assistance was provided in consultation and problem solving - leading to a high level of independence and competency of the graduate student researcher.Item Pushing the limits of job flexibility on HPC(Galaxy Community Conference 2017, 2017-06-30) Ganote, Carrie; Sanders, Sheri; Doak, Tom; Brokaw, Cicada; Papudeshi, Bhavya; Haas, Brian; Bankapur, Asma; Tickle, Tim; Blood, PhilItem RNA-Seq Demo on Galaxy(2018-10-25) Ganote, Carrie; Papudeshi, Bhavya; Sanders, Sheri; Doak, Tom