Technical reports (not peer-reviewed)

Permanent link for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 79
  • Item
    Explore True Performance Using Application Benchmark for the Next Generation HPC Systems: First NSF EAGER SPEC HPG Workshop Report
    (2019-09-12) Henschel, Robert; Li, Junjie; Eigenmann, Rudolf; Chandrasekaran, Sunita
  • Item
    NSF 2017 Annual Report for EAGER: Open XD Metrics on Demand Value Analytics
    (2017-04-02) Link, Matthew R.; Borner, Katy; Furlani, Thomas; Gallo, Steven; Henschel, Robert; DeLeon, Robert; Fulton, Ben; Yearke, Tom
  • Item
    White Paper: Lustre WAN over 100Gbps
    (2016-02) Thota, Abhinav; Henschel, Robert; Simms, Stephen
    This work is an international collaboration with Rheinisch-Westfälische Technische Hochschule Aachen, Germany (RWTH) and the Center of Information Services and High Performance Computing (ZIH) at Technische Universität Dresden, Germany to analyze the effect of a high-bandwidth high-latency link on the I/O patterns of scientific applications using the 100Gbps transatlantic link.
  • Item
    IU PTI/UITS Research Technologies Annual Report: FY 2014
    (2015-11-10) Stewart, Craig A., Miller, Therese
    This Fiscal Year 2014 (FY2014) report outlines IU community accomplishments using IU's cyberinfrastructure, as they relate to several IU Bicentennial Strategic Plan goals and ongoing principles of excellence. The report includes research and discovery highlights.
  • Item
    Use of IU parallel computing resources and high performance file systems July 2013 to Dec 2014
    (2015-08-13) Link, Matthew R.; Henschel, Robert; Hancock, David Y.; Stewart, Craig A.
    The paper discusses the contributions of Big Red II and Data Capacitor II and their impact on IU's research and creative output.
  • Item
    Use of IU advanced computational systems – parallelism, job mixes, and queue wait times
    (2015-08-12) Link, Matthew R.; Henschel, Robert; Hancock, David Y.; Stewart, Craig A.
    The report presents updated information on usage of Big Red II and Karst for the first half of FY 2015.
  • Item
    Results of 2013 Survey of Parallel Computing Needs Focusing on NSF-funded Researchers
    (2015-05-14) Stewart, Craig A.; Arenson, Andrew; Fischer, Jeremy; Link, Matthew R.; Michael, Scott A.; Wernert, Julie A
    The field of supercomputing is experiencing a rapid change in system structure, programming models, and software environments in response to advances in application requirements and in underlying enabling technologies. Traditional parallel programming approaches have relied on static resource allocation and task scheduling through programming interfaces such as MPI and OpenMP. These methods are reaching their efficiency and scalability limits on the new emerging classes of systems, spurring the creation of innovative dynamic strategies and software tools, including advanced runtime system software and programming interfaces that use them. To accelerate adoption of these next-generation methods, Indiana University is investigating the creation of a single supported Reconfigurable Execution Framework Testbed (REFT) to be used by parallel application algorithm developers as well as researchers in advanced tools for parallel computing. These investigations are funded by the National Science Foundation Award Number 1205518 to Indiana University with Thomas Sterling as Principal Investigator, and Maciej Brodowicz, Matthew R. Link, Andrew Lumsdaine, and Craig Stewart as Co-Principal Investigators. As a starting point in this research we proposed to assess needs in parallel computing in general and needs for software tools and testbeds in particular within the NSF-funded research community. As one set of data toward understanding these needs, we conducted a survey of researchers funded by the National Science Foundation. Because of the strong possibility of distinct needs of researchers funded by what is now the Division of Advanced Cyberinfrastructure, researchers funded by the other divisions of the Computer and Information Sciences and Engineering Directorate, and researchers funded by the remainder of the NSF, we surveyed these populations separately. The report states the methods and summarize survey results. The data sets and copies of SPSS descriptive statistics describing the data are available online at http://hdl.handle.net/2022/19924.
  • Item
    Univariate Analysis and Normality Test Using SAS, Stata, and SPSS
    Park, Hun Myoung
    Descriptive statistics provide important information about variables to be analyzed. Mean, median, and mode measure central tendency of a variable. Measures of dispersion include variance, standard deviation, range, and interquantile range (IQR). Researchers may draw a histogram, stem-and-leaf plot, or box plot to see how a variable is distributed. Statistical methods are based on various underlying assumptions. One common assumption is that a random variable is normally distributed. In many statistical analyses, normality is often conveniently assumed without any empirical evidence or test. But normality is critical in many statistical methods. When this assumption is violated, interpretation and inference may not be reliable or valid. The t-test and ANOVA (Analysis of Variance) compare group means, assuming a variable of interest follows a normal probability distribution. Otherwise, these methods do not make much sense. Figure 1 illustrates the standard normal probability distribution and a bimodal distribution. How can you compare means of these two random variables? There are two ways of testing normality (Table 1). Graphical methods visualize the distributions of random variables or differences between an empirical distribution and a theoretical distribution (e.g., the standard normal distribution). Numerical methods present summary statistics such as skewness and kurtosis, or conduct statistical tests of normality. Graphical methods are intuitive and easy to interpret, while numerical methods provide objective ways of examining normality.
  • Item
    Regression Models for Ordinal and Nominal Dependent Variables Using SAS, Stata, LIMDEP, and SPSS
    Park, Hun Myoung
    A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results.
  • Item
    Regression Models for Binary Dependent Variables Using Stata, SAS, R, LIMDEP, and SPSS
    Park, Hun Myoung
    A categorical variable here refers to a variable that is binary, ordinal, or nominal. Event count data are discrete (categorical) but often treated as continuous variables. When a dependent variable is categorical, the ordinary least squares (OLS) method can no longer produce the best linear unbiased estimator (BLUE); that is, OLS is biased and inefficient. Consequently, researchers have developed various regression models for categorical dependent variables. The nonlinearity of categorical dependent variable models makes it difficult to fit the models and interpret their results.
  • Item
    Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS
    Park, Hun Myoung
    Panel (or longitudinal) data are cross-sectional and time-series. There are multiple entities, each of which has repeated measurements at different time periods. U.S. Census Bureau’s Census 2000 data at the state or county level are cross-sectional but not time-series, while annual sales figures of Apple Computer Inc. for the past 20 years are time series but not cross-sectional. If annual sales data of IBM, LG, Siemens, Microsoft, and AT&T during the same periods are also available, they are panel data. The cumulative General Social Survey (GSS), American National Election Studies (ANES), and Current Population Survey (CPS) data are not panel data in the sense that individual respondents vary across survey years. Panel data may have group effects, time effects, or the both, which are analyzed by fixed effect and random effect models.
  • Item
    Hypothesis Testing and Statistical Power of a Test
    Park, Hun Myoung
    How powerful is my study (test)? How many observations do I need to have for what I want to get from the study? You may want to know statistical power of a test to detect a meaningful effect, given sample size, test size (significance level), and standardized effect size. You may also want to determine the minimum sample size required to get a significant result, given statistical power, test size, and standardized effect size. These analyses examine the sensitivity of statistical power and sample size to other components, enabling researchers to efficiently use research resources. This document summarizes basics of hypothesis testing and statistic power analysis, and then illustrates how to do using SAS 9, Stata 10, G*Power 3.
  • Item
    Estimating Multilevel Models using SPSS, Stata, SAS and R
    Albright, Jeremy J.; Marinova, Dani M.
    Multilevel data are pervasive in the social sciences. Students may be nested within schools, voters within districts, or workers within firms, to name a few examples. Statistical methods that explicitly take into account hierarchically structured data have gained popularity in recent years, and there now exist several special-purpose statistical programs designed specifically for estimating multilevel models (e.g. HLM, MLwiN). In addition, the increasing use of of multilevel models --- also known as hierarchical linear and mixed effects models --- has led general purpose packages such as SPSS, Stata, SAS, and R to introduce their own procedures for handling nested data. Nonetheless, researchers may face two challenges when attempting to determine the appropriate syntax for estimating multilevel/mixed models with general purpose software. First, many users from the social sciences come to multilevel modeling with a background in regression models, whereas much of the software documentation utilizes examples from experimental disciplines {[}due to the fact that multilevel modeling methodology evolved out of ANOVA methods for analyzing experiments with random effects (Searle, Casella, and McCulloch, 1992){]}. Second, notation for multilevel models is often inconsistent across disciplines (Ferron 1997). The purpose of this document is to demonstrate how to estimate multilevel models using SPSS, Stata SAS, and R. It first seeks to clarify the vocabulary of multilevel models by defining what is meant by fixed effects, random effects, and variance components. It then compares the model building notation frequently employed in applications from the social sciences with the more general matrix notation found in much of the software documentation. The syntax for centering variables and estimating multilevel models is then presented for each package.
  • Item
    Confirmatory Factor Analysis using Amos, LISREL, Mplus, SAS/STAT CALIS
    (2009) Albright, Jeremy J.; Park, Hun Myoung
    Factor analysis is a statistical method used to find a small set of unobserved variables (also called latent variables, or factors) which can account for the covariance among a larger set of observed variables (also called manifest variables). A factor is an unobservable variable that is assumed to influence observed variables. Scores on multiple tests may be indicators of intelligence (Spearman, 1904); political liberties and popular sovereignty may measure the quality of a country’s democracy (Bollen, 1980); or issue emphases in election manifestos may signify a political party’s underlying ideology (Gabel & Huber, 2000). Factor analysis is also used to assess the reliability and validity of measurement scales (Carmines & Zeller, 1979).
  • Item
    Comparing Group Means: T-tests and One-way ANOVA Using Stata, SAS, R, and SPSS
    (2009) Park, Hun Myoung
    T-tests and analysis of variance (ANOVA) are widely used statistical methods to compare group means. For example, the independent sample t-test enables you to compare annual personal income between rural and urban areas and examine the difference in the grade point average (GPA) between male and female students. Using the paired t-test, you can also compare the change in outcomes before and after a treatment is applied. For a t-test, the mean of a variable to be compared should be substantively interpretable. Technically, the left-hand side (LHS) variable to be tested should be interval or ratio scaled (continuous), whereas the right-hand side (RHS) variable should be binary (categorical). The ttest can also compare the proportions of binary variables. The mean of a binary variable is the proportion or percentage of success of the variable. When sample size is large, t-tests and z-test for comparing proportions produce almost the same answer.
  • Item
    Storage Briefing: Trends and IU
    (2013-08) Floyd, Mike J.; Seiffert, Kurt; Stewart, Craig A.; Turner, George; Cromwell, Dennis; Hancock, Dave; Kallback-Rose, Kristy; Link, Matthew R.; Simms, Steve; Williams, Troy
    Storage technology is in a period of accelerating change. Rapid developments in flash memory and cloud services are the most visible indicators. Uses of storage face growing regulatory and funding agency compliance, and a growing bundling between software systems and storage options. This briefing presents information on industry, regulatory, and usage trends at Indiana University (IU), and comparisons of storage services with Purdue and other Committee on Institutional Cooperation (CIC) schools. The sections of this document are largely independent of each other, so readers may focus on areas of interest.
  • Item
    IndianaMap Cadastral Cloud Implementation
    (2013-09) Bodenhamer, David J
    The purpose of this project is to evaluate the technical issues, opportunities, and costs associated with deployment and geoprocessing of IndianaMap Cadastral data in Amazon Cloud Platforms.