Abstract:
Indiana University, like many institutions, houses a heterogenous mixture of compute resources. In addition to university resources, the National Center for Genome Analysis Support, the Extreme Science and Engineering Discovery Environment, and the Open Science Grid all provide resources to biologists with NSF affiliations. Such a diverse mixture of compute power and services could be applied to address the equally diverse set of problems and needs in the bioinformatics field.
Many software suites are well suited for large numbers of fast CPUS, such as phylogenetic tree building algorithms. De novo assembly problems really crave a machine with lots of RAM to spare. Alignment and mapping problems where each input is a separate invocation lend themselves perfectly to high-throughput, heavily distributed compute systems. Galaxy is a web interface that acts as a mediator between the biologist and the underlying hardware and software - in an ideal setup, Galaxy would be able to delegate work to the best suited underlying infrastructure.
We present an instance of Galaxy at Indiana University, installed and maintained by NCGAS, that takes advantage of a variety of compute resources to increase utilization and efficiency. The OSG is a distributed grid through which Blast jobs can be run. IU, NCGAS and XSEDE jointly support Mason, a 512Gb/node system. For IU users, Big Red 2 is the first university-owned petaFLOPS machine. Connecting these resources to Galaxy and using the best tool for the job results in the best performance and utilization - everyone wins.
Description:
Talk presented at Galaxy Community Conference 2014, June 30 - July 2, 2014. Video is available at URL:
https://wiki.galaxyproject.org/Events/GCC2014/Abstracts/Talks#Galaxy_Deployment_on_Heterogenous_Hardware