Browsing by Author "Ganote, Carrie"
Now showing 1 - 20 of 60
- Results Per Page
- Sort Options
Item ACI-REF Mission: User Sensitivity 101(2015-09-02) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Automating work in Galaxy(2015-08-12) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Bioinformatic analysis using Jetstream, a cloud computing environment(Plant and Animal Genomics 2018, 2018-01-15) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Fischer, Jeremy; Doak, TomNational Center for Genome Analysis Support (NCGAS) assists researchers in addressing the scientific challenges of understanding and analyzing the wealth of gene sequence information now available. This includes on-boarding biology professionals who lack the necessary computational background to run their analyses on high-performance computing systems. Virtual machines help with the transition to command line use, software installation, and running analysis in the Linux environment (as most high-performance computing clusters). Jetstream (https://jetstream-cloud.org/) is a cloud computing resource that provides access to preconfigured virtual machines, making the transition relatively effortless, flattening the learning curve needed to get results from experiments that otherwise produce an untenable amount of data. Currently, over 14% of all allocations of usage on Jetstream are for biology other than protein folding – the majority of this being some sort of genomic analysis. NCGAS currently hosts over 123 genome analysis and bioinformatics software titles on Jetstream as preconfigured virtual machines. In this digital tool and resources workshop, we will demonstrate on how to set up Jetstream accounts, start a preconfigured virtual machine, and run genomic analysis on this virtual machine (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php). Jetstream also provides environments for prototyping and publishing tailored workflows that gives researchers access to interactive computing and data analysis resources on demand.Item Cluster Quick Guide(2016-08-03) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Cluster Quick Start(2017-07-12) Sanders, Sheri; Doak, Tom; Ganote, CarrieItem Compute resources available to the research community for microbiome analysis(Plant and Animal Genome XXVII, 2019-01-16) Papudeshi, Bhavya; Sanders, Sheri; Ganote, Carrie; Doak, TomThe National Center for Genome Analysis Support (NCGAS) is an NSF-funded center tasked with assisting biologists in getting access to computational resources they need in order to analyze genomic data. To support microbiome analysis, NCGAS provides preconfigured virtual machines (VM) to identify taxa in 16S amplicon sequencing, and to identify both taxa and functions from whole genome metagenomes. Additionally, a pipeline to reconstruct genomes from metagenomes, to examine the role of specific microbes in a community, is available as a preconfigured VM hosting Anvi’o (https://ncgas.org/Blog_Posts/Running%20Anvio%20on%20Jetstream.php). Jetstream, a cloud computing resource, is both easy to use and flattens the learning curve for using the Linux operating system and for installing bioinformatics software. Jetstream provides an environment for both prototyping and publishing tailored workflows. Through an NCGAS allocation, a researcher can get access to Jetstream, and to other national compute clusters with more memory and for parallel processing. These compute resources have Globus connect subscriptions which assists in transferring terabytes of data quickly. In this workshop, we will demonstrate how to get an NCGAS allocation, set up a Jetstream account, spin up a preconfigured virtual machine for microbiome analysis (https://ncgas.org/Blog_Posts/Getting%20Started%20on%20Jetstream.php), and transfer data between compute clusters using Globus (https://ncgas.org/Blog_Posts/Getting%20Started%20with%20Globus.php).Item Computing challenges in working with genomics-scale data(2014-07-23) Wu, Le-Shin; Ganote, CarrieIntroduction to the computing challenges in working with genomics-scale data and the possible solutions.Item Cyberinfrastructure resources enabling creation of the loblolly pine reference transcriptome(ACM, New York, NY, 2015-07-26) Wu, Le-Shin; Ganote, Carrie; Doak, Thomas; Barnett, William K.; Mockaitis, Keithanne; Stewart, Craig A.Today's genomics technologies generate more sequence data than ever before possible, and at substantially lower costs, serving researchers across biological disciplines in transformative ways. Building transcriptome assemblies from RNA sequencing reads is one application of next-generation sequencing (NGS) that has held a central role in biological discovery in both model and non- model organisms, with and without whole genome sequence references. A major limitation in effective building of transcriptome references is no longer the sequencing data generation itself, but the computing infrastructure and expertise needed to assemble, analyze and manage the data. Here we describe a currently available resource dedicated to achieving such goals, and its use for extensive RNA assembly of up to 1.3 billion reads representing the massive transcriptome of loblolly pine, using four major assembly software installations. The Mason cluster, an XSEDE second tier resource at Indiana University, provides the necessary fast CPU cycles, large memory, and high I/O throughput for conducting large-scale genomics research. The National Center for Genome Analysis Support, or NCGAS, provides technical support in using HPC systems, bioinformatic support for determining the appropriate method to analyze a given dataset, and practical assistance in running computations. We demonstrate that a sufficient supercomputing resource and good workflow design are elements that are essential to large eukaryotic genomics and transcriptomics projects such as the complex transcriptome of loblolly pine, gene expression data that inform annotation and functional interpretation of the largest genome sequence reference to date.Item Cyberinfrastructure Resources for Genomics Research(2014-10) Hallock, Barbara; Ganote, Carrie; Pespeni, MelissaNew DNA sequencing technologies are generating more sequence data, faster, and cheaper. But there is a catch: the sequences are shorter and the nucleotide identification has higher error rates, meaning that the computational challenge of assembling a full genome from sequence data is also greater. In this poster, we examine cyberinfrastructure resources available to researchers undertaking genomics work, and present a case study that illustrates how one lab is currently making use of these resources.Item Data Movement and Management(2016-08-04) Sanders, Sheri; Ganote, Carrie; Doak, TomItem deNovo Transcriptome Assembly(2017-07-13) Sanders, Sheri; Ganote, Carrie; Doak, TomItem Galaxy based BLAST submission to distributed national high throughput computing resources(2013-03) Hayashi, Soichi; Gesing, Sandra; Quick, Rob; Teige, Scott; Ganote, Carrie; Wu, Le-shin; Prout, ElizabethTo assist the bioinformatic community in leveraging the national cyberinfrastructure, the National Center for Genomic Analysis Support (NCGAS) along with Indiana University's High Throughput Computing (HTC) group have engineered a method to use the Galaxy to submit BLAST jobs to the Open Science Grid (OSG). OSG is a collaboration of resource providers that utilize opportunistic cycles at more than 100 universities and research centers in the US. BLAST jobs make a significant portion of the research conducted on NCGAS resources, moving jobs that are conducive to an HTC environment to the national cyberinfrastructure would alleviate load on resources at NCGAS and provide a cost effective solution for getting more cycles to reduce the unmet needs of bioinformatic researchers. To this point researchers have tackled this issue by purchasing additional resources or enlisting collaborators doing the same type of research, while HTC experts have focused on expanding the number of resources available to historically HTC friendly science workflows. In this paper, we bring together expertise from both areas to address how a bioinformatics researcher using their normal interface, Galaxy, can seamlessly access the OSG which routinely supplies researchers with millions of compute hours daily. Efficient use of these results will supply additional compute time to researcher and help provide a yet unmet need for BLAST computing cycles.Item Galaxy Deployment on Heterogenous Hardware(2014-07-01) Ganote, Carrie; Hayashi, SoichiIndiana University, like many institutions, houses a heterogenous mixture of compute resources. In addition to university resources, the National Center for Genome Analysis Support, the Extreme Science and Engineering Discovery Environment, and the Open Science Grid all provide resources to biologists with NSF affiliations. Such a diverse mixture of compute power and services could be applied to address the equally diverse set of problems and needs in the bioinformatics field. Many software suites are well suited for large numbers of fast CPUS, such as phylogenetic tree building algorithms. De novo assembly problems really crave a machine with lots of RAM to spare. Alignment and mapping problems where each input is a separate invocation lend themselves perfectly to high-throughput, heavily distributed compute systems. Galaxy is a web interface that acts as a mediator between the biologist and the underlying hardware and software - in an ideal setup, Galaxy would be able to delegate work to the best suited underlying infrastructure. We present an instance of Galaxy at Indiana University, installed and maintained by NCGAS, that takes advantage of a variety of compute resources to increase utilization and efficiency. The OSG is a distributed grid through which Blast jobs can be run. IU, NCGAS and XSEDE jointly support Mason, a 512Gb/node system. For IU users, Big Red 2 is the first university-owned petaFLOPS machine. Connecting these resources to Galaxy and using the best tool for the job results in the best performance and utilization - everyone wins.Item Galaxy Deployments at Indiana University(2015-04-16) Ganote, CarrieA short overview of the different Galaxy instances at IU.Item Galaxy for Data Provenance(2014-07-16) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Genome and transcriptome analysis of fish tapeworm Nippotaenia percotti through scientific collaboration between research labs and national cyberinfrastructure.(2020-01-13) Papudeshi, Bhavya; Chafin, Tyler K; Sanders, Sheri A; Ganote, Carrie; Reshetnikov, Andrey N; Sokolov, Sergey G; Doak, Thomas; Pummill, Jeff F; Douglas, Marlis R.; Douglas, Michael E.Tapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea, a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item The Genome of Fish Tapeworm Nippotaenia percotti as a Potential Bookmark for Gene Loci that Facilitates Anthropogenic Infection.(Plant and Animal Genome XXVII, 2019-01-14) Sanders, Sheri; Papudeshi, Bhavya; Ganote, Carrie; Doak, Tom; Chafin, Tyler; Reshetnikov, Andrey; Sokolov, Sergey; Pummil, Jeff; Douglas, Marlis; Douglas, MichaelTapeworms (Cestoda) are endoparasites infecting all vertebrates. Most (>50%) of their described diversity is within the clade Cyclophyllidea a common, chronic source of anthropogenic infection. We report the first sequenced genome for the tapeworm Nippotaenia percotti (Nippotaeniidea - the putative sister group to Cyclophyllidea), which is host-specific to a fish in the Amur River. The genome was derived for comparative purposes, to explore evolutionary change in functional gene loci of immunological import. Pooled individuals were sequenced on Illumina (HiSeq 2000) and PacBio (RSII), with additional RNAseq on the HiSeq 2500. Hybrid assemblies were completed in SPAdes with long-read scaffolding in LINKS. The assembly was further improved using Redundans and Pilon, generating 3,410 contigs at an N50 of 209,561bp. Transcriptomes were assembled using a combined de novo approach (CDTA) with multiple assemblers and k-mers. Assembled transcripts were combined using EvidentialGene, producing 28,226 assembled transcripts at an N50 of 2,290bp, then annotated using Trinotate. The assembled genome was annotated using MAKER, identifying 30,671 genes, using our assembled transcriptome and genomes of closely-related cestodes. Gene evolution was examined using 15 cestode genomes from the WormBase Parasite database, with the MCL algorithm identifying 16,099 orthologous genes clusters. Gene loss/gain was assessed by contrasting gene clusters with the cestode phylogenetic tree constructed with core genes identified by BUSCO, using IQ-Tree. Nippotaenia percotti’s genome provides a baseline for future investigations into candidate-gene families potentially involved with anthropogenic infection and would also sponsor improvements in tapeworm treatment and control.Item Introducing CAFE: Computational Analysis of (gene) Family Evolution(Plant and Animal Genomics 2018, 2018-01-15) Ganote, Carrie; Mendes, Fabio; Henschel, Robert; Hahn, Matthew; Fulton, BenComparison of whole genomes has revealed large and frequent changes in the size of gene families, the result of gene duplication and loss. Comparative genomic analyses allow us to identify large-scale patterns of change and to make inferences regarding the role of natural selection in gene gain and loss. But genome assemblies constructed from these data are often fragmented and incomplete, resulting in annotation errors, especially in the number of genes present in a genome. To make these analyses possible, we have developed a stochastic birth-and-death model for gene family evolution—applied in the software package CAFE—which is robust in the face of less-than-ideal assemblies. Application of this method to data from multiple whole genomes of many groups has revealed remarkable patterns of gene gain and loss, including gene movement among chromosomes (especially sex chromosomes), polymorphic copy-number variants under local selection, and provides novel methods for carrying out genome assembly, to more accurately estimate gene number. We will describe the application of CAFE to genome sets, and illustrate the conclusions possible from CAFE analysis. The demonstration will use a publicly available VM running CAFE, posted on the Jetstream cloud.Item Introduction to Galaxy 2015(2015-08-12) Ganote, Carrie; Wu, Le-Shin; Doak, ThomasItem Introduction to genomics software use on high performance computing systems(2014-07-22) Wu, Le-Shin; Ganote, Carrieintroduction to running genomics softwares on a high performance computing systems.
- «
- 1 (current)
- 2
- 3
- »