Peer-reviewed Publications
Permanent link for this collectionhttps://hdl.handle.net/2022/13009
Browse
Recent Submissions
Now showing 1 - 20 of 32
Item Security standards compliance and ease of use of high performance computing systems in clinical research(2021-07) Link, Matthew; Shankar, Anurag; Hancock, David Y.; Henschel, Robert; Michael, Scott; Stewart, Craig A.Precision health research and personalized health therapies involve analysis of protected health information. In 2007, Indiana University established the ability to analyze protected health information (HIPAA alignment) as the minimal and default security level for its research high performance computing (HPC) systems and research storage systems. This resulted in a drastic increase in the use of IU HPC systems by clinical researchers. Security levels were later upgraded to FISMA Low as a default. We recommend that, within the US, FISMA (Federal Information Security Modernization Act) Low compliance be the default minimal level of security for large-scale HPC systems. This would facilitate precision medicine research and enable higher education HPC resources to be used in response to future civil health emergencies.Item ImageX: New and improved Image Explorer for astronomical images and beyond(2016-08-08) Hayashi, Soichi; Gopu, Arvind; Kotulla, Ralf; Young, Michael D.Item RabbitQR: Fast and flexible Big Data Processing at LSST data rates using existing, shared-use hardware(2016-08-08) Kotulla, Ralf; Gopu, Arvind; Hayashi, SoichiItem StarDock: Shipping Customized Computing Environments to the Data(2016-08-08) Young, Michael D.; Hayashi, Soichi; Gopu, ArvindItem Trident: Scalable Compute Archives - Workflows, Visualization, and Analysis(2016-08-08) Gopu, Arvind; Hayashi, Soichi; Young, Michael D.; Kotulla, Ralf; Henschel, Robert; Harbeck, DanielItem IQ-Wall: An Open Standard for Tiled Video Walls that Balances Flexibility, Usability, Performance, and Cost(2016-06-29) Boyles, Michael; Gniady, Tassie; Wernert, Eric; Eller, Chris; Reagan, David; Rogers, JeffTiled video walls are engaging, useful, and pervasive in our everyday environment. They can be especially attractive to higher education institutions looking to spur innovation in teaching, research, and collaboration. However, if not thoughtfully designed, video walls can be expensive, difficult to maintain, and provide only limited functionality. Indiana University has been working with video walls for more than 10 years, using them in a variety of settings to support faculty, staff, and students in a broad range of research, education, community engagement, and creative activities. This experience has led to the development of an open hardware and software standard for video walls that provides flexibility, usability, maintainability, and the lowest possible costs while still maintaining good performance and high visual quality. In this paper, we share the motivations and technical details behind this open standard, as well as the lessons learned in b uilding and supporting tiled video walls as multi-purpose displays.Item Building a Chemical-Protein Interactome on the Open Science Grid(2015-03-15) Quick, Rob E.; Meroueh, Samy; Hayashi, Soichi; Mats, Rynge; Teige, Scott; Xu, David; Wang, BoThe Structural Protein-Ligand Interactome (SPLINTER) project predicts the interaction of thousands of small molecules with thousands of proteins. These interactions are predicted using the three-dimensional structure of the bound complex between each pair of protein and compound that is predicted by molecular docking. These docking runs consist of millions of individual short jobs each lasting only minutes. However, computing resources to execute these jobs (which cumulatively take tens of millions of CPU hours) are not readily or easily available in a cost effective manner. By looking to National Cyberinfrastructure resources, and specifically the Open Science Grid (OSG), we have been able to harness CPU power for researchers at the Indiana University School of Medicine to provide a quick and efficient solution to their unmet computing needs. Using the job submission infrastructure provided by the OSG, the docking data and simulation executable was sent to more than 100 universities and research centers worldwide. These opportunistic resources provided millions of CPU hours in a matter of days, greatly reducing time docking simulation time for the research group. The overall impact of this approach allows researchers to identify small molecule candidates for individual proteins, or new protein targets for existing FDA-approved drugs and biologically active compounds.Item Broadside Love: A Comparison of Reading with Digital Tools versus Deep Knowledge in the Ballads of Samuel Pepys(Iter and the Arizona Center for Medieval and Renaissance Studies, 2014) Gniady, TassieThis essay explores the ways in which one portion of the ballads, those having to do with Love Pleasant (a category Pepys created and which was the largest in his collection), deal with the notion of love as typified in cheap print. This comparative analysis is done through the use of digital tools and slow/deep reading. I explore what digital textual analysis brings to the table when dealing with a large, but pre-selected, dataset in which the elements should share many common elements; how false data can be identified and winnowed out if one is just beginning work on broadside ballads; and, finally, what is the best way to interleave digital tools with slow reading.Item Performance Optimization for the Trinity RNA-Seq Assembler(Springer, 2015-09) Wagner, Michael; Fulton, Ben; Henschel, RobertUtilizing the enormous computing resources of high performance computing systems is anything but a trivial task. Performance analysis tools are designed to assist developers in this challenging task by helping to understand the application behavior and identify critical performance issues. In this paper we share our efforts and experiences in analyzing and optimizing Trinity, a well-established framework for the de novo reconstruction of transcriptomes from RNA-seq reads. Thereby, we try to reflect all aspects of the ongoing performance engineering: the identification of optimization targets, the code improvements resulting in 20% overall runtime reduction, as well as the challenges we encountered getting there.Item Big Data on Ice: The Forward Observer System for In-flight Synthetic Aperture Radar Processing(2015) Knepper, Richard; Link, Matthew R.; Standish, MatthewWe introduce the Forward Observer system, which is designed to provide data assurance in field data acquisition while receiving significant amounts (several terabytes per flight) of Synthetic Aperture Radar data during flights over the polar regions, which provide unique requirements for developing data collection and processing systems. Under polar conditions in the field and given the difficulty and expense of collecting data, data retention is absolutely critical. Our system provides a storage and analysis cluster with software that connects to field instruments via standard protocols, replicates data to multiple stores automatically as soon as it is written, and provides pre-processing of data so that initial visualizations are available immediately after collection, where they can provide feedback to researchers in the aircraft during the flight.Item Programmable Immersive Peripheral Environmental System (PIPES): A Prototype Control System for Environmental Feedback DevicesFrend, Chauncey; Boyles, Michael J.This paper describes an environmental feedback device (EFD) control system aimed at simplifying the VR development cycle. Programmable Immersive Peripheral Environmental System (PIPES) affords VR developers a custom approach to programming and controlling EFD behaviors while relaxing the required knowledge and expertise of electronic systems. PIPES has been implemented for the Unity engine and features EFD control using the Arduino integrated development environment. PIPES was installed and tested on two VR systems, a large format CAVE system and an Oculus Rift HMD system. A photocell based end-to-end latency experiment was conducted to measure latency within the system. This work extends previously unpublished prototypes of a similar design. Development and experiments described in this paper are part of the VR community goal to understand and apply environment effects to VEs that ultimately add to usersâ perceived presence.Item Cyberinfrastructure, Science Gateways, Campus Bridging, and Cloud Computing(IGI Global, 2014-07) Stewart, Craig A.; Knepper, Richard D.; Link, Matthew R.; Pierce, Marlon; Wernert, Eric A.; Wilkins-Diehr, NancyComputers accelerate our ability to achieve scientific breakthroughs. As technology evolves and new research needs come to light, the role for cyberinfrastructure as “knowledge” infrastructure continues to expand. This article defines and discusses cyberinfrastructure and the related topics of science gateways and campus bridging; identifies future challenges in cyberinfrastructure; and discusses challenges and opportunities related to the evolution of cyberinfrastructure, “big data” (datacentric, data-enabled, and data-intensive research and data analytics), and cloud computing.Item Conducting K-12 Outreach to Evoke Early Interest in IT, Science, and Advanced Technology(ACM, 2012-07-16) Kallback-Rose, Kristy; Antolovic, Danko; Ping, Robert; Seiffert, Kurt; Stewart, Craig A.; Miller, ThereseThe Indiana University Pervasive Technology Institute has engaged for several years in K-12 Education, Outreach and Training (EOT) events related to technology in general and computing in particular. In each event we strive to positively influence children’s perception of science and technology. We view K-12 EOT as a channel for technical professionals to engage young people in the pursuit of scientific and technical understanding. Our goal is for students to see these subjects as interesting, exciting, and worth further pursuit. By providing opportunities for pre-college students to engage in science, technology, engineering and mathematics (STEM) activities first hand, we hope to influence their choices of careers and field-of-study later in life. In this paper we give an account of our experiences with providing EOT: we describe several of our workshops and events; we provide details regarding techniques that we found to be successful in working with both students and instructors; we discuss program costs and logistics; and we describe our plans for the future.Item Ink, Paper, Scissors: Experiments in Cutting Campus Printing Costs(ACM, 2012) Husk, Malinda J.Universities are always looking for ways to economize, both because of rising costs and because of growing awareness of ecological issues. Printing is a common target. Indiana University’s Pervasive Technology Institute (PTI) compared several typefaces, looking at ink usage, paper usage, and readability. PTI chose to standardize on 11-point Times New Roman for printed documentation such as internal reports and white papers. PowerPoint presentations and other items with relatively small blocks of text are done in Century Gothic. Reports for external audiences will include a mix of fonts with deliberate mindfulness toward ink and paper usage. In short, if a message is rendered ineffective by its presentation, any ink or paper used can be considered wasted.Item Research Data Storage Available to Researchers Throughout the U.S. via the TeraGrid(ACM, 2006) McCaulay, D. Scott; Link, Matthew R.Many faculty members at small to mid-size colleges and universities do important, high quality research that requires significant storage. In many cases, such storage requirements are difficult to meet with local resources; even when local resources suffice, data integrity is best ensured by maintenance of a remote copy. Via the nationally-funded TeraGrid, Indiana University offers researchers at colleges and universities throughout the US the opportunity to easily store up to 1 TB of data within the IU data storage system. The TeraGrid is the National Science Foundation's flagship effort to create a national research cyberinfrastructure, and one key goal of the TeraGrid is to provide facilities that improve the productivity of the US research community generally. Providing facilities that improve the capacity and reliability of research data storage is an important part of this. This paper will describe the process for storing data at IU via the TeraGrid, and will in general discuss how this capability is part of a larger TeraGrid-wide data storage strategy.Item A high throughput workflow environment for cosmological simulations(ACM, 2012) Erickson, Brandon M.S.; Singh, Raminderjeet; Evrard, August E.; Becker, Matthew R.; Busha, Michael T.; Kravtsov, Andrey V.; Marru, Suresh; Pierce, Marlon; Wechsler, Risa H.The next generation of wide-area sky surveys offer the power to place extremely precise constraints on cosmological parameters and to test the source of cosmic acceleration. These observational programs will employ multiple techniques based on a variety of statistical signatures of galaxies and large-scale structure. These techniques have sources of systematic error that need to be understood at the percent-level in order to fully leverage the power of next-generation catalogs. Simulations of large-scale structure provide the means to characterize these uncertainties. We are using XSEDE resources to produce multiple synthetic sky surveys of galaxies and large-scale structure in support of science analysis for the Dark Energy Survey. In order to scale up our production to the level of fifty 1010-particle simulations, we are working to embed production control within the Apache Airavata workflow environment. We explain our methods and report how the workflow has reduced production time by 40% compared to manual management.Item Ultrascan solution modeler: integrated hydrodynamic parameter and small angle scattering computation and fitting tools(ACM, 2012) Brookes, Emre; Singh, Raminderjeet; Pierce, Marlon; Marru, Suresh; Demeler, Borries; Rocco, MattiaUltraScan Solution Modeler (US-SOMO) processes atomic and lower-resolution bead model representations of biological and other macromolecules to compute various hydrodynamic parameters, such as the sedimentation and diffusion coefficients, relaxation times and intrinsic viscosity, and small angle scattering curves, that contribute to our understanding of molecular structure in solution. Knowledge of biological macromolecules' structure aids researchers in understanding their function as a path to disease prevention and therapeutics for conditions such as cancer, thrombosis, Alzheimer's disease and others. US-SOMO provides a convergence of experimental, computational, and modeling techniques, in which detailed molecular structure and properties are determined from data obtained in a range of experimental techniques that, by themselves, give incomplete information. Our goal in this work is to develop the infrastructure and user interfaces that will enable a wide range of scientists to carry out complicated experimental data analysis techniques on XSEDE. Our user community predominantly consists of biophysics and structural biology researchers. A recent search on PubMed reports 9,205 papers in the decade referencing the techniques we support. We believe our software will provide these researchers a convenient and unique framework to refine structures, thus advancing their research. The computed hydrodynamic parameters and scattering curves are screened against experimental data, effectively pruning potential structures into equivalence classes. Experimental methods may include analytical ultracentrifugation, dynamic light scattering, small angle X-ray and neutron scattering, NMR, fluorescence spectroscopy, and others. One source of macromolecular models is X-ray crystallography. However, the conformation in solution may not match that observed in the crystal form. Using computational techniques, an initial fixed model can be expanded into a search space utilizing high temperature molecular dynamic approaches or stochastic methods such as Brownian dynamics. The number of structures produced can vary greatly, ranging from hundreds to tens of thousands or more. This introduces a number of cyberinfrastructure challenges. Computing hydrodynamic parameters and small angle scattering curves can be computationally intensive for each structure, and therefore cluster compute resources are essential for timely results. Input and output data sizes can vary greatly from less than 1 MB to 2 GB or more. Although the parallelization is trivial, along with data size variability there is a large range of compute sizes, ranging from one to potentially thousands of cores with compute time of minutes to hours. In addition to the distributed computing infrastructure challenges, an important concern was how to allow a user to conveniently submit, monitor and retrieve results from within the C++/Qt GUI application while maintaining a method for authentication, approval and registered publication usage throttling. Middleware supporting these design goals has been integrated into the application with assistance from the Open Gateway Computing Environments (OGCE) collaboration team. The approach was tested on various XSEDE clusters and local compute resources. This paper reviews current US-SOMO functionality and implementation with a focus on the newly deployed cluster integration.Item Performance and quality of service of data and video movement over a 100 Gbps testbed(Elsevier, 2013-01) Kluge, Michael; Simms, Stephen; William, Thomas; Henschel, Robert; Georgi, Andy; Meyer, Christian; Mueller, Matthias S.; Stewart, Craig A.; Wünsch, Wolfgang; Nagel, Wolfgang E.Digital instruments and simulations are creating an ever-increasing amount of data. The need for institutions to acquire these data and transfer them for analysis, visualization, and archiving is growing as well. In parallel, networking technology is evolving, but at a much slower rate than our ability to create and store data. Single fiber 100 Gbps networking solutions are soon to be deployed as national infrastructure. This article describes our experiences with data movement and video conferencing across a networking testbed, using the first commercially available single fiber 100 Gbps technology. The testbed is unique in its ability to be configured for a total length of 60, 200, or 400 km, allowing for tests with varying network latency. We performed low-level TCP tests and were able to use more than 99.9% of the theoretical available bandwidth with minimal tuning efforts. We used the Lustre file system to simulate how end users would interact with a remote file system over such a high performance link. We were able to use 94.4% of the theoretical available bandwidth with a standard file system benchmark, essentially saturating the wide area network. Finally, we performed tests with H.323 video conferencing hardware and Quality of service (QoS) settings, showing that the link can reliably carry a full high-definition stream. Overall, we demonstrated the practicality of 100 Gbps networking and Lustre as excellent tools for data management.Item TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications(IOS Press, 2008) Catlett, Charlie; Allcock, William E.; Andrews, Phil; Aydt, Ruth; Bair, Ray; Balac, Natasha; Banister, Bryan; Barker, Trish; Bartelt, Mark; Beckman, Pete; Berman, Francine; Bertoline, Gary; Blatecky, Alan; Boisseau, Jay; Bottum, Jim; Brunett, Sharon; Bunn, Julian; Butler, Michelle; Carver, David; Cobb, John; Cockerill, Tim; Couvares, Peter F.; Dahan, Maytal; Diehl, Diana; Dunning, Thom; Foster, Ian; Gaither, Kelly; Gannon, Dennis; Goasguen, Sebastien; Grobe, Michael; Hart, Dave; Heinzel, Matt; Hempel, Chris; Huntoon, Wendy; Insley, Joseph; Jordan, Christopher; Judson, Ivan; Kamrath, Anke; Karonis, Nicholas; Kesselman, Carl; Kovatch, Patricia; Lane, Lex; Lathrop, Scott; Levine, Michael; Lifka, David; Liming, Lee; Livny, Miron; Loft, Rich; Marcusiu, Doru; Marsteller, Jim; Martin, Stuart; McCaulay, D. Scott; McGee, John; McGinnis, Laura; McRobbie, Michael; Messina, Paul; Moore, Reagan; Moore, Richard; Navarro, J.P.; Nichols, Jeff; Papka, Michael E.; Pennington, Rob; Pike, Greg; Pool, Jim; Reddy, Raghu; Reed, Dan; Rimovsky, Tony; Roberts, Eric; Roskies, Ralph; Sanielevici, Sergiu; Scott, J. Ray; Shankar, Anurag; Sheddon, Mark; Showerman, Mike; Simmel, Derek; Singer, Abe; Skow, Dane; Smallen, Shava; Smith, Warren; Song, Carol; Stevens, Rick; Stewart, Craig A.; Stock, Robert B.; Stone, Nathan; Towns, John; Urban, Tomislav; Vildibill, Mike; Walker, Edward; Welch, Von; Wilkins-Diehr, Nancy; Williams, Roy; Winkler, Linda; Zhao, Lan; Zimmerman, AnnTeraGrid is a national-scale computational science facility supported through a partnership among thirteen institutions, with funding from the US Na- tional Science Foundation [1]. Initially created through a Major Research Equip- ment Facilities Construction (MREFC [2]) award in 2001, the TeraGrid facility began providing production computing, storage, visualization, and data collections services to the national science, engineering, and education community in January 2004. In August 2005 NSF funded a five-year program to operate, enhance, and expand the capacity and capabilities of the TeraGrid facility to meet the growing needs of the science and engineering community through 2010. This paper de- scribes TeraGrid in terms of the structures, architecture, technologies, and services that are used to provide national-scale, open cyberinfrastructure. The focus of the paper is specifically on the technology approach and use of middleware for the purposes of discussing the impact of such approaches on scientific use of compu- tational infrastructure. While there are many individual science success stories, we do not focus on these in this paper. Similarly, there are many software tools and systems deployed in TeraGrid but our coverage is of the basic system middleware and is not meant to be exhaustive of all technology efforts within TeraGrid. We look in particular at growth and events during 2006 as the user population ex- panded dramatically and reached an initial “tipping point” with respect to adoption of new “grid” capabilities and usage modalities.Item Wide Area Filesystem Performance using Lustre on the TeraGrid(2007-06) Simms, Stephen C.; Pike, Gregory G.; Balog, DouglasToday’s scientific applications demand computational resources that can be provided only by parallel clusters of computers. Storage subsystems have responded to the increased demand for high-throughput disk access by moving to network attached storage. Emerging Cyberinfrastructure strategies are leading to geographically distributed computing resources such as the National Science Foundation’s TeraGrid. One feature of the TeraGrid is a dedicated national network with WAN bandwidth on the same scale as machine room bandwidth. A natural next step for storage is to export file systems across wide area networks to be available on diverse resources. In this paper we detail our testing with the Lustre file system across the TeraGrid network. On a single 10 Gbps WAN link we achieved single host performance approaching 700 MB/s for single file writes and 1GB/s for two simultaneous file writes with minimal tuning.