Presentations
Permanent link for this collectionhttps://hdl.handle.net/2022/14534
Browse
Browsing Presentations by Author "Doak, Thomas"
Now showing 1 - 20 of 20
- Results Per Page
- Sort Options
Item ACI-REF Mission: User Sensitivity 101(2015-09-02) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Automating work in Galaxy(2015-08-12) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Coupling metagenomics with high-performance computing to mine the Sequence Read Archive (SRA) to analyze Pseudomonas phage PAK-P1(Center of Excellence for Women & Technology, 2019-04-12) Ganapaneni, Sruthi; Leffler, Haley; Papudeshi, Bhavya; Sanders, Sheri; Doak, ThomasItem Developing a workflow for bioacoustic recording devices and frog call analysis within Jetstream(Center of Excellence for Women & Technology, 2019-04-12) Foran, Eliza; Anderson, Jazzly; Slayton, Thomas; Guido, Emmanuel; Doak, Thomas; Sanders, SheriItem Galaxy for Data Provenance(2014-07-16) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Genome and transcriptome analysis of fish tapeworm Nippotaenia percotti through scientific collaboration between research labs and national cyberinfrastructure.(2020-01-13) Papudeshi, Bhavya; Chafin, Tyler K; Sanders, Sheri A; Ganote, Carrie; Reshetnikov, Andrey N; Sokolov, Sergey G; Doak, Thomas; Pummill, Jeff F; Douglas, Marlis R.; Douglas, Michael E.Tapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea, a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item Harvesting Field Station Data: Automating Raspberry Pi Sensors to Collaborative Websites, and Update(2019-09-09) Sanders, Sheri; Foran, Eliza; Guido, Emmanuel; Anderson, Jazzly; Slayton, Thomas; Doak, ThomasField stations increasingly leverage remote sensors for large scale environmental data collection. Here we demonstrate a proof-of-concept workflow from data collection from remote sensors to presentation of summary results on a remote, and therefore fast and stable, cloud server. Environmental data is collected via raspberry pi's in several locations and data is streamed to the server on XSEDE's Jetstream, housed at Indiana University, through low-bandwith messaging. The Jetstream server does all the heavy lifting, exporting the data into a database, running automatically updating summary scripts to produce graphs, and hosting a Drupal-based website to present the data to collaborators or the public. While we use compact data in our demo, larger databases can be backed up on XSEDE's Wrangler, also housed at Indiana University. The end product is automatic aggregation and back up of sensor data onto a stable website that does not require a in-house server or large bandwidth on-site. This set-up is packaged into a ready-to-use and publically-available Jetstream image, meaning researchers could use their own sensors and R code for custom graphs with very little individual set up. Alternatively, the set-up can be used to house and display larger scale databases from other data types, such as audio recordings or photography. Future work will be in developing the ability to "pick up" data via drone fly-over and aggregation of citizen science data from multiple sites.Item Intro to Bioinformatics - Assembling a Transcriptome(2013-06-13) Ganote, Carrie L.; Doak, ThomasItem Intro to Using Galaxy for Bioinformatics(2013-09-17) Ganote, Carrie L.; Doak, ThomasItem Introduction to Galaxy 2015(2015-08-12) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Mining Microbial Genomes from Datasets on the Sequence Read Archive(2020-01-15) Papudeshi, Bhavya; Leffler, Haley; Ganapanei, Sruthi; Sanders, Sheri; Ganote, Carrie; Doak, ThomasThe declining costs of genome sequencing and growing amounts of genetic data has allowed the field of genomics to become more integrated with computational analysis. The use of high performance clusters (HPC) is necessary to compute the large amounts of data in genomic projects, however, many biologists lack background experience in working with HPC systems, which limits their ability to best address their research questions. The National Center of Genome Analysis Support (NCGAS) is an NSF-funded center that focuses on filling this need, by providing training as workshops, bioinformatics support on projects, and access to compute resources. As a byproduct of helping research projects, we develop open source workflows and make them available to the community. Here we present a developed workflow that will assist researchers in mining the Sequence Read Archive (SRA), to identify environments/datasets potentially containing genomes of interest, and identify their closely related genomes. As a proof of concept, we used two genomes to test the developed workflow, selected to ensure the flexibility of the workflow to generate results in formats amiable to further downstream analysis, based on the research question. The developed pipeline is made available through GitHub (https://github.com/NCGAS/CEWiT-REU-Identifying-datasets-in-SRA-using-Jetstream), and available as a pre-installed workflow on the XSEDE Jetstream cloud computing infrastructure.Item Mining the Sequence Read Archive to identify crAssphage, a ubiquitous inhabitant of the human microbiome, in dog and pig samples(Center of Excellence for Women & Technology, 2019-04-12) Leffler, Haley; Ganapaneni, Sruthi; Papudeshi, Bhavya; Sanders, Sheri; Doak, ThomasItem Moving Large Data to Galaxy(2014-07-16) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem The National Center for Genome Analysis Support(2014-11-16) Ganote, Carrie; Wu, Le-Shin; Doak, Thomas; Barnett, WilliamItem National Center for Genome Analysis Support (NCGAS): Genomics and other Science in the NSF-Funded Jetstream Cloud(2020-01-13) Doak, Thomas; Sanders, Sheri; Ganote, Carrie; Papudeshi, Bhavya; Fischer, Jeremy; Hancock, David Y.The National Center for Genome Analysis Support (NCGAS) is an NSF-funded (NSF-1445604) center that helps all NSF-funded researchers doing genomics research. Genomics includes transcriptomics, metagenomics, genome annotation, etc. Our support includes providing access to large memory computing, maintaining curated sets of genomics applications, providing one-on-one consultation, and creating educational opportunities. A resource that we have come to rely on for providing these services is the NSF-funded Jetstream Cloud—maintained by Indiana University (led by the Indiana University Pervasive Technology Institute (PTI) and the University of Texas at Austin's Texas Advanced Computing Center (TACC). Additionally, we leverage Globus data transfer tools. Globus at the University of Chicago is responsible for integrating Jetstream with the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE), and for integrating Globus data movement and management tools, as well as Globus-based secure user authentication. With a focus on ease of use and broad accessibility, Jetstream is designed for those who have not previously used high performance computing and software resources—for researchers who need more than desktop-strength computing but less than full-scale High Performance Computing (HPC). Jetstream features a web-based user interface based on the popular Atmosphere cloud computing environment—developed by CyVerse—extended to support science and engineering research generally. The system is particularly geared toward 21st-century workforce development at small colleges and universities – especially historically black colleges and universities, minority serving institutions, tribal colleges, and higher education institutions in EPSCoR States. Jetstream provides a library of virtual machines designed to do discipline-specific scientific analysis, but researchers can also develop their own VMs, with their own software sets, or sets specialized to a particular task. These VMs can be both saved and shared with collaborators. Currently there are 19 genomics VMs, including RStudio instances with bioconductor, ready-made genome browsers with JBrowse/Tripal, and metagenomic tools like QIIME2 and Anvi’o. biology and molecular biology researchers are the largest users of Jetstream. NCGAS has found VMs extremely useful in education and workshops: we develop class-specific VMs, with all the applications needed, then clone, so that each student has their own VM to work on (making courses easy to scale). In addition to on-demand VMs, persistent science gateways can be established using template VMs NCGAS has built. These can be used to provide services to collaborators or to the world. Users can easily create Galaxy servers on Jetstream: each server comes preconfigured with hundreds of tools and commonly used reference datasets—once running, researchers can use it or customize it. Many NCGAS users establish genome browsers—specific to their organism—that are shared with small sets of collaborating researchers—but can be shared to the world. Jetstream is accessed via an allocation process at XSEDE—a startup allocation is typically approved within a day.Item The National Center for Genomic Analysis Support: creating a national cyberinfrastructure environment for genomics researchers.(2015-05-16) Barnett, William; Doak, Thomas; Wu, Le-Shin; Ganote, CarrieItem Quality Control and Assessment of RNA-Seq Data(2014-07-22) Ganote, Carrie; Podicheti, Ram; Wu, Le-Shin; Doak, ThomasItem RNA-Seq Demo on Galaxy(2014-07-16) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem RNA-Seq Demo on Galaxy(2015-08-12) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Using National Cyberinfrastructure(2014-09-04) Ganote, Carrie; Doak, Thomas