National Center for Genome Analysis Support (NCGAS): Genomics and other Science in the NSF-Funded Jetstream Cloud

Thumbnail Image
The National Center for Genome Analysis Support (NCGAS) is an NSF-funded (NSF-1445604) center that helps all NSF-funded researchers doing genomics research. Genomics includes transcriptomics, metagenomics, genome annotation, etc. Our support includes providing access to large memory computing, maintaining curated sets of genomics applications, providing one-on-one consultation, and creating educational opportunities. A resource that we have come to rely on for providing these services is the NSF-funded Jetstream Cloud—maintained by Indiana University (led by the Indiana University Pervasive Technology Institute (PTI) and the University of Texas at Austin's Texas Advanced Computing Center (TACC). Additionally, we leverage Globus data transfer tools. Globus at the University of Chicago is responsible for integrating Jetstream with the NSF-funded Extreme Science and Engineering Discovery Environment (XSEDE), and for integrating Globus data movement and management tools, as well as Globus-based secure user authentication. With a focus on ease of use and broad accessibility, Jetstream is designed for those who have not previously used high performance computing and software resources—for researchers who need more than desktop-strength computing but less than full-scale High Performance Computing (HPC). Jetstream features a web-based user interface based on the popular Atmosphere cloud computing environment—developed by CyVerse—extended to support science and engineering research generally. The system is particularly geared toward 21st-century workforce development at small colleges and universities – especially historically black colleges and universities, minority serving institutions, tribal colleges, and higher education institutions in EPSCoR States. Jetstream provides a library of virtual machines designed to do discipline-specific scientific analysis, but researchers can also develop their own VMs, with their own software sets, or sets specialized to a particular task. These VMs can be both saved and shared with collaborators. Currently there are 19 genomics VMs, including RStudio instances with bioconductor, ready-made genome browsers with JBrowse/Tripal, and metagenomic tools like QIIME2 and Anvi’o. biology and molecular biology researchers are the largest users of Jetstream. NCGAS has found VMs extremely useful in education and workshops: we develop class-specific VMs, with all the applications needed, then clone, so that each student has their own VM to work on (making courses easy to scale). In addition to on-demand VMs, persistent science gateways can be established using template VMs NCGAS has built. These can be used to provide services to collaborators or to the world. Users can easily create Galaxy servers on Jetstream: each server comes preconfigured with hundreds of tools and commonly used reference datasets—once running, researchers can use it or customize it. Many NCGAS users establish genome browsers—specific to their organism—that are shared with small sets of collaborating researchers—but can be shared to the world. Jetstream is accessed via an allocation process at XSEDE—a startup allocation is typically approved within a day.



Jetstream, NCGAS, Cloud Computing


Doak TG, Sanders SA, Ganote C, Papudeshi B, Fischer J, Hancock DY. (2020). National Center for Genome Analysis Support (NCGAS): Genomics and other Science in the NSF-Funded Jetstream Cloud. Plant and Animal Genome 2020, San Diego, California. Available at


