Technical reports (not peer-reviewed)
http://hdl.handle.net/2022/13010
2015-05-25T18:08:09ZResults of 2013 Survey of Parallel Computing Needs Focusing on NSF-funded Researchers - Dataset and Analyses
http://hdl.handle.net/2022/19924
Results of 2013 Survey of Parallel Computing Needs Focusing on NSF-funded Researchers - Dataset and Analyses
Stewart, Craig A.; Arenson, Andrew; Fischer, Jeremy; Link, Matthew R.; Michael, Scott A.; Wernert, Julie
2015-05-01T00:00:00ZResults of 2013 Survey of Parallel Computing Needs Focusing on NSF-funded Researchers
http://hdl.handle.net/2022/19906
Results of 2013 Survey of Parallel Computing Needs Focusing on NSF-funded Researchers
Stewart, Craig A.; Arenson, Andrew; Fischer, Jeremy; Link, Matthew R.; Michael, Scott A.; Wernert, Julie A
The field of supercomputing is experiencing a rapid change in system structure, programming models, and software environments in response to advances in application requirements and in underlying enabling technologies. Traditional parallel programming approaches have relied on static resource allocation and task scheduling through programming interfaces such as MPI and OpenMP. These methods are reaching their efficiency and scalability limits on the new emerging classes of systems, spurring the creation of innovative dynamic strategies and software tools, including advanced runtime system software and programming interfaces that use them. To accelerate adoption of these next-generation methods, Indiana University is investigating the creation of a single supported Reconfigurable Execution Framework Testbed (REFT) to be used by parallel application algorithm developers as well as researchers in advanced tools for parallel computing. These investigations are funded by the National Science Foundation Award Number 1205518 to Indiana University with Thomas Sterling as Principal Investigator, and Maciej Brodowicz, Matthew R. Link, Andrew Lumsdaine, and Craig Stewart as Co-Principal Investigators. As a starting point in this research we proposed to assess needs in parallel computing in general and needs for software tools and testbeds in particular within the NSF-funded research community. As one set of data toward understanding these needs, we conducted a survey of researchers funded by the National Science Foundation. Because of the strong possibility of distinct needs of researchers funded by what is now the Division of Advanced Cyberinfrastructure, researchers funded by the other divisions of the Computer and Information Sciences and Engineering Directorate, and researchers funded by the remainder of the NSF, we surveyed these populations separately. The report states the methods and summarize survey results. The data sets and copies of SPSS descriptive statistics describing the data are available online at http://hdl.handle.net/2022/19924.
2015-05-14T00:00:00ZSustained software for cyberinfrastructure – analyses of successful efforts with a focus on NSF-funded software
http://hdl.handle.net/2022/19807
Sustained software for cyberinfrastructure – analyses of successful efforts with a focus on NSF-funded software
Stewart, Craig A.; Barnett, William K.; Wernert, Eric A.; Wernert, Julie A.; Welch, Von; Knepper, Richard
Reliable software that provides needed functionality is clearly essential for an effective distributed cyberinfrastructure (CI) that supports comprehensive, balanced, and flexible distributed CI that, in turn, supports science and engineering applications. The purpose of this study was to understand what factors lead to software projects being well sustained over the long run, focusing on software created with funding from the US National Science Foundation (NSF) and/or used by researchers funded by the NSF. We surveyed NSF-funded researchers and performed in-depth studies of software projects that have been sustained over many years. Successful projects generally used open-source software licenses and employed good software engineering practices and test practices. However, many projects that have not been well sustained over time also meet these criteria. The features that stood out about successful projects included deeply committed leadership and some sort of user forum or conference at least annually. In some cases, software project leaders have employed multiple financial strategies over the course of a decades-old software project. Such well-sustained software is used in major distributed CI projects that support thousands of users, and this software is critical to the operation of major distributed CI facilities in the US. The findings of our study identify some characteristics of software that is relevant to the NSF-supported research community, and that has been sustained over many years.
Univariate Analysis and Normality Test Using SAS, Stata, and SPSS
http://hdl.handle.net/2022/19742
Univariate Analysis and Normality Test Using SAS, Stata, and SPSS
Park, Hun Myoung
Descriptive statistics provide important information about variables to be analyzed. Mean, median, and mode measure central tendency of a variable. Measures of dispersion include variance, standard deviation, range, and interquantile range (IQR). Researchers may draw a histogram, stem-and-leaf plot, or box plot to see how a variable is distributed.
Statistical methods are based on various underlying assumptions. One common assumption is that a random variable is normally distributed. In many statistical analyses, normality is often conveniently assumed without any empirical evidence or test. But normality is critical in many statistical methods. When this assumption is violated, interpretation and inference may not be reliable or valid.
The t-test and ANOVA (Analysis of Variance) compare group means, assuming a variable of interest follows a normal probability distribution. Otherwise, these methods do not make much sense. Figure 1 illustrates the standard normal probability distribution and a bimodal distribution. How can you compare means of these two random variables?
There are two ways of testing normality (Table 1). Graphical methods visualize the distributions of random variables or differences between an empirical distribution and a theoretical distribution (e.g., the standard normal distribution). Numerical methods present
summary statistics such as skewness and kurtosis, or conduct statistical tests of normality. Graphical methods are intuitive and easy to interpret, while numerical methods provide objective ways of examining normality.