Browsing by Author "Sanders, Sheri"
Now showing 1 - 20 of 50
- Results Per Page
- Sort Options
Item Bioinformatic analysis using Jetstream, a cloud computing environment(Plant and Animal Genomics 2018, 2018-01-15) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Fischer, Jeremy; Doak, TomNational Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high-performance computing systems. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment (as most high-performance computing clusters). Jetstream (https://jetstream-cloud.org/) is a cloud computing resource that provides access to preconfigured virtual machines, making the transition relatively effortless, flattening the learning curve needed to get results from experiments that otherwise produce an untenable amount of data. Currently, over 14% of all allocations of usage on Jetstream are for biology other than protein folding – the majority of this being some sort of genomic analysis. NCGAS currently hosts over 123 genome analysis and bioinformatics software titles on Jetstream as preconfigured virtual machines. In this digital tool and resources workshop, we will demonstrate on how to set up Jetstream accounts, start a preconfigured virtual machine, and run genomic analysis on this virtual machine (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php). Jetstream also provides environments for prototyping and publishing tailored workflows that gives researchers access to interactive computing and data analysis resources on demand.Item Cluster Quick Guide(2016-08-03) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Cluster Quick Start(2017-07-12) Sanders, Sheri; Doak, Tom; Ganote, CarrieItem Compute resources available to the research community for microbiome analysis(Plant and Animal Genome XXVII, 2019-01-16) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomThe National Center for Genome Analysis Support (NCGAS) is an NSF-funded center tasked with assisting biologists in getting access to computational resources they need in order to analyze genomic data. To support microbiome analysis, NCGAS provides preconfigured virtual machines (VM) to identify taxa in 16S amplicon sequencing, and to identify both taxa and functions from whole genome metagenomes. Additionally, a pipeline to reconstruct genomes from metagenomes, to examine the role of specific microbes in a community, is available as a preconfigured VM hosting Anvi’o (https://ncgas.org/Blog_Posts/Running%20Anvio%20on%20Jetstream.php). Jetstream, a cloud computing resource, is both easy to use and flattens the learning curve for using the Linux operating system and for installing bioinformatics software. Jetstream provides an environment for both prototyping and publishing tailored workflows. Through an NCGAS allocation, a researcher can get access to Jetstream, and to other national compute clusters with more memory and for parallel processing. These compute resources have Globus connect subscriptions which assists in transferring terabytes of data quickly. In this workshop, we will demonstrate how to get an NCGAS allocation, set up a Jetstream account, spin up a preconfigured virtual machine for microbiome analysis (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php), and transfer data between compute clusters using Globus (https://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php).Item Coupling metagenomics with high-performance computing to mine the Sequence Read Archive (SRA) to analyze Pseudomonas phage PAK-P1(Center of Excellence for Women & Technology, 2019-04-12) Ganapaneni, Sruthi; Leffler, Haley; Papudeshi, Bhavya; Sanders, Sheri; Doak, ThomasItem Data Movement and Management(2016-08-04) Sanders, Sheri; Ganote, Carrie; Doak, TomItem deNovo assembly and annotation of Ambystoma laterale and Ambystoma jeffersonianum transcriptomes: the first steps toward investigating polyploid salamander expression(2017-06-25) Sanders, Sheri; Pfrender, MichaelPolyploidy Ambystoma salamanders are interesting as stable polyploid vertebrate system. Polyploidy in vertebrates is not well understood, and the unique salamander complex features non-recombining genomes with varying levels of ploidy and contribution of parental species. We assembled and annotated the two major contributors to this complex, Ambystoma jeffersonianum and Ambystoma laterale in preparation for investigating how the genomes respond to diverse polyploidy. We assembled each transcriptome using four assemblers (Velvet, SOAPdeNOVO, Trinity, and TransAbyss). We then curated and annotated both transcriptomes using tr2aacds and Trinotate. Orthologs and differentially expressed genes between the two parental species were identified to establish a baseline by which to compare the polyploids. Additionally, we developed an immune inventory for the two transcriptomes. The annotation will be useful in the next steps as we investigate the polyploid salamander libraries to determine the effect of polyploidy on expression.Item deNovo Transcriptome Assembly(2017-07-13) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Developing a workflow for bioacoustic recording devices and frog call analysis within Jetstream(Center of Excellence for Women & Technology, 2019-04-12) Foran, Eliza; Anderson, Jazzly; Slayton, Thomas; Guido, Emmanuel; Doak, Thomas; Sanders, SheriItem Finding Effector III Genes in phytophthora infestans and magnaporthe oryzae Using Machine Learning(2021-08-6) Campbell, Christine; Cooper, Lyric; Snapp-Childs, Winona; Sanders, SheriItem The Genome of Fish Tapeworm Nippotaenia percotti as a Potential Bookmark for Gene Loci that Facilitates Anthropogenic Infection.(Plant and Animal Genome XXVII, 2019-01-14) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, Tom; Chafin, Tyler; Reshetnikov, Andrey; Sokolov, Sergey; Pummil, Jeff; Douglas, Marlis; Douglas, MichaelTapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item Harvesting Field Station Data: Automating Data Flow from Raspberry Pi Sensors to Collaborative Websites(Annual Meeting of the Organization of Biological Field Stations, 2018-09-22) Sanders, Sheri; Guido, Emmanuel; Anderson, Jazzly; Slayton, Thomas; Doak, Thomas G.Field stations increasingly leverage remote sensors for large scale environmental data collection. Here we demonstrate a proof-of-concept workflow from data collection from remote sensors to presentation of summary results on a remote - and therefore fast and stable - cloud server. Environmental data is collected via raspberry pis in several locations and the data is streamed to the server on XSEDE's Jetstream, housed in part at Indiana University, through low-bandwith messaging. The Jetstream cloud server does all the heavy lifting, exporting the data into a database, running automatically updating summary scripts to produce graphs, and hosting a Drupal-based website to present the data to collaborators or the public. While we use compact data in our demo, larger databases can be backed up on XSEDE's Wrangler, a large scale storage server also housed in part at Indiana University. The end product is automatic aggregation and back up of sensor data onto a stable website that does not require a in-house server or large bandwidth on-site. This workflow is packaged into a ready-to-use and publically-available Jetstream image, meaning researchers could use their own sensors and R code for custom graphs with very little set up. Alternatively, the image can be used to house and display larger scale databases from other data types, such as audio recordings or photography. Future work will be in developing the ability to "pick up" data via drone fly-over and aggregation of citizen science data from multiple sites.Item Harvesting Field Station Data: Automating Raspberry Pi Sensors to Collaborative Websites, and Update(2019-09-09) Sanders, Sheri; Foran, Eliza; Guido, Emmanuel; Anderson, Jazzly; Slayton, Thomas; Doak, ThomasField stations increasingly leverage remote sensors for large scale environmental data collection. Here we demonstrate a proof-of-concept workflow from data collection from remote sensors to presentation of summary results on a remote, and therefore fast and stable, cloud server. Environmental data is collected via raspberry pi's in several locations and data is streamed to the server on XSEDE's Jetstream, housed at Indiana University, through low-bandwith messaging. The Jetstream server does all the heavy lifting, exporting the data into a database, running automatically updating summary scripts to produce graphs, and hosting a Drupal-based website to present the data to collaborators or the public. While we use compact data in our demo, larger databases can be backed up on XSEDE's Wrangler, also housed at Indiana University. The end product is automatic aggregation and back up of sensor data onto a stable website that does not require a in-house server or large bandwidth on-site. This set-up is packaged into a ready-to-use and publically-available Jetstream image, meaning researchers could use their own sensors and R code for custom graphs with very little individual set up. Alternatively, the set-up can be used to house and display larger scale databases from other data types, such as audio recordings or photography. Future work will be in developing the ability to "pick up" data via drone fly-over and aggregation of citizen science data from multiple sites.Item Harvesting Field Station Data; Raspberry Pi Sensors to Jetstream Databases(2018-07-24) Anderson, Jazzly; Slayton, Thomas; Guido, Emmanuel; Doak, Thomas; Sanders, Sheri; Walker, TonyThis paper was given at the PEARC18 conference in Pittsburgh, PA on July 24, 2018.Item Introduction to Metagenomics(2018-10-23) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomItem Mining Microbial Genomes from Datasets on the Sequence Read Archive(2020-01-15) Papudeshi, Bhavya; Leffler, Haley; Ganapanei, Sruthi; Sanders, Sheri; Ganote, Carrie; Doak, ThomasThe declining costs of genome sequencing and growing amounts of genetic data has allowed the field of genomics to become more integrated with computational analysis. The use of high performance clusters (HPC) is necessary to compute the large amounts of data in genomic projects, however, many biologists lack background experience in working with HPC systems, which limits their ability to best address their research questions. The National Center of Genome Analysis Support (NCGAS) is an NSF-funded center that focuses on filling this need, by providing training as workshops, bioinformatics support on projects, and access to compute resources. As a byproduct of helping research projects, we develop open source workflows and make them available to the community. Here we present a developed workflow that will assist researchers in mining the Sequence Read Archive (SRA), to identify environments/datasets potentially containing genomes of interest, and identify their closely related genomes. As a proof of concept, we used two genomes to test the developed workflow, selected to ensure the flexibility of the workflow to generate results in formats amiable to further downstream analysis, based on the research question. The developed pipeline is made available through GitHub (https://github.com/NCGAS/CEWiT-REU-Identifying-datasets-in-SRA-using-Jetstream), and available as a pre-installed workflow on the XSEDE Jetstream cloud computing infrastructure.Item Mining the Sequence Read Archive to identify crAssphage, a ubiquitous inhabitant of the human microbiome, in dog and pig samples(Center of Excellence for Women & Technology, 2019-04-12) Leffler, Haley; Ganapaneni, Sruthi; Papudeshi, Bhavya; Sanders, Sheri; Doak, ThomasItem Moving Forward in Bioinformatics(2016-11-30) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Moving Off GUIs: A Guide to What’s Next(2016-10-27) Sanders, Sheri; Ganote, Carrie; Doak, TomItem The National Center for Genome Analysis Support(2017-05-15) Ganote, Carrie; Doak, Tom; Blood, Phil; Sanders, Sheri
- «
- 1 (current)
- 2
- 3
- »