Office of the Vice President for Information Technology/University Information Technology Serviceshttp://hdl.handle.net/2022/3562015-04-01T03:26:43Z2015-04-01T03:26:43ZSoftware in Science: a Report of Outcomes of the 2014 National Science Foundation Software Infrastructure for Sustained Innovation (SI2) MeetingPlale, BethJones, MattThain, Douglashttp://hdl.handle.net/2022/197602015-03-31T22:45:51Z2015-03-31T00:00:00ZSoftware in Science: a Report of Outcomes of the 2014 National Science Foundation Software Infrastructure for Sustained Innovation (SI2) Meeting
Plale, Beth; Jones, Matt; Thain, Douglas
The second annual NSF Software Infrastructure for Sustained Innovation (SI2) PI meeting took place in Arlington, VA February 24-25, 2014. It was hosted by Beth Plale, Indiana University; Douglas Thain, University of Notre Dame; and Matt Jones, National Center for Ecological Analysis and Synthesis.
This report captures the challenges and outcomes emerging from the meeting over the four topic areas discussed i) Attribution and Citation, ii) Reproducibility, Reusability, and Preservation, iii) Project/Software Sustainability, and iv) Career Paths. The report is an academic synthesis with credit to all the participants and to the notetakers who took prodigious notes and synthesized the results upon which the conclusions of this report are derived.
2015-03-31T00:00:00ZUnivariate Analysis and Normality Test Using SAS, Stata, and SPSSPark, Hun Myounghttp://hdl.handle.net/2022/197422015-03-27T16:03:52ZUnivariate Analysis and Normality Test Using SAS, Stata, and SPSS
Park, Hun Myoung
Descriptive statistics provide important information about variables to be analyzed. Mean, median, and mode measure central tendency of a variable. Measures of dispersion include variance, standard deviation, range, and interquantile range (IQR). Researchers may draw a histogram, stem-and-leaf plot, or box plot to see how a variable is distributed.
Statistical methods are based on various underlying assumptions. One common assumption is that a random variable is normally distributed. In many statistical analyses, normality is often conveniently assumed without any empirical evidence or test. But normality is critical in many statistical methods. When this assumption is violated, interpretation and inference may not be reliable or valid.
The t-test and ANOVA (Analysis of Variance) compare group means, assuming a variable of interest follows a normal probability distribution. Otherwise, these methods do not make much sense. Figure 1 illustrates the standard normal probability distribution and a bimodal distribution. How can you compare means of these two random variables?
There are two ways of testing normality (Table 1). Graphical methods visualize the distributions of random variables or differences between an empirical distribution and a theoretical distribution (e.g., the standard normal distribution). Numerical methods present
summary statistics such as skewness and kurtosis, or conduct statistical tests of normality. Graphical methods are intuitive and easy to interpret, while numerical methods provide objective ways of examining normality.
Regression Models for Ordinal and Nominal Dependent Variables Using SAS, Stata, LIMDEP, and SPSSPark, Hun Myounghttp://hdl.handle.net/2022/197412015-03-25T22:30:28ZRegression Models for Ordinal and Nominal Dependent Variables Using SAS, Stata, LIMDEP, and SPSS
Park, Hun Myoung
A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results.
Regression Models for Binary Dependent Variables Using Stata, SAS, R, LIMDEP, and SPSSPark, Hun Myounghttp://hdl.handle.net/2022/197402015-03-26T20:19:09ZRegression Models for Binary Dependent Variables Using Stata, SAS, R, LIMDEP, and SPSS
Park, Hun Myoung
A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results.