Technical reports (not peer-reviewed)
http://hdl.handle.net/2022/13010
2015-04-26T01:24:32ZSustained software for cyberinfrastructure – analyses of successful efforts with a focus on NSF-funded software
http://hdl.handle.net/2022/19807
Sustained software for cyberinfrastructure – analyses of successful efforts with a focus on NSF-funded software
Stewart, Craig A.; Barnett, William K.; Wernert, Eric A.; Wernert, Julie A.; Welch, Von; Knepper, Richard
Reliable software that provides needed functionality is clearly essential for an effective distributed cyberinfrastructure (CI) that supports comprehensive, balanced, and flexible distributed CI that, in turn, supports science and engineering applications. The purpose of this study was to understand what factors lead to software projects being well sustained over the long run, focusing on software created with funding from the US National Science Foundation (NSF) and/or used by researchers funded by the NSF. We surveyed NSF-funded researchers and performed in-depth studies of software projects that have been sustained over many years. Successful projects generally used open-source software licenses and employed good software engineering practices and test practices. However, many projects that have not been well sustained over time also meet these criteria. The features that stood out about successful projects included deeply committed leadership and some sort of user forum or conference at least annually. In some cases, software project leaders have employed multiple financial strategies over the course of a decades-old software project. Such well-sustained software is used in major distributed CI projects that support thousands of users, and this software is critical to the operation of major distributed CI facilities in the US. The findings of our study identify some characteristics of software that is relevant to the NSF-supported research community, and that has been sustained over many years.
Univariate Analysis and Normality Test Using SAS, Stata, and SPSS
http://hdl.handle.net/2022/19742
Univariate Analysis and Normality Test Using SAS, Stata, and SPSS
Park, Hun Myoung
Descriptive statistics provide important information about variables to be analyzed. Mean, median, and mode measure central tendency of a variable. Measures of dispersion include variance, standard deviation, range, and interquantile range (IQR). Researchers may draw a histogram, stem-and-leaf plot, or box plot to see how a variable is distributed.
Statistical methods are based on various underlying assumptions. One common assumption is that a random variable is normally distributed. In many statistical analyses, normality is often conveniently assumed without any empirical evidence or test. But normality is critical in many statistical methods. When this assumption is violated, interpretation and inference may not be reliable or valid.
The t-test and ANOVA (Analysis of Variance) compare group means, assuming a variable of interest follows a normal probability distribution. Otherwise, these methods do not make much sense. Figure 1 illustrates the standard normal probability distribution and a bimodal distribution. How can you compare means of these two random variables?
There are two ways of testing normality (Table 1). Graphical methods visualize the distributions of random variables or differences between an empirical distribution and a theoretical distribution (e.g., the standard normal distribution). Numerical methods present
summary statistics such as skewness and kurtosis, or conduct statistical tests of normality. Graphical methods are intuitive and easy to interpret, while numerical methods provide objective ways of examining normality.
Regression Models for Ordinal and Nominal Dependent Variables Using SAS, Stata, LIMDEP, and SPSS
http://hdl.handle.net/2022/19741
Regression Models for Ordinal and Nominal Dependent Variables Using SAS, Stata, LIMDEP, and SPSS
Park, Hun Myoung
A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results.
Regression Models for Binary Dependent Variables Using Stata, SAS, R, LIMDEP, and SPSS
http://hdl.handle.net/2022/19740
Regression Models for Binary Dependent Variables Using Stata, SAS, R, LIMDEP, and SPSS
Park, Hun Myoung
A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results.