HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK
Loading...
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Date
2017-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
[Bloomington, Ind.] : Indiana University
Permanent Link
Abstract
Almost every field of science is now undergoing a data-driven revolution requiring analyzing massive datasets. Machine learning algorithms are widely used to find meaning in a given dataset and discover properties of complex systems. At the same time, the landscape of computing has evolved towards computers exhibiting many-core architectures of increasing complexity. However, there is no simple and unified programming framework allowing for these machine learning applications to exploit these new machines’ parallel computing capability. Instead, many efforts focus on specialized ways to speed up individual algorithms. In this thesis, the Harp framework, which uses collective communication techniques, is prototyped to improve the performance of data movement and provides high-level APIs for various synchronization patterns in iterative computation. In contrast to traditional parallelization strategies that focus on handling high volume training data, a less known challenge is that the high dimensional model is also in high volume and difficult to synchronize. As an extension of the Hadoop MapReduce system, Harp includes a collective communication layer and a set of programming interfaces. Iterative machine learning algorithms can be parallelized through efficient synchronization methods utilizing both inter-node and intra-node parallelism. The usability and efficiency of Harp’s approach is validated on applications such as K-means Clustering, Multi-Dimensional Scaling, Latent Dirichlet Allocation and Matrix Factorization. The results show that these machine learning applications can achieve high parallel performance on Harp.
Description
Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2017
Keywords
MACHINE LEARNING, COLLECTIVE COMMUNICATION, BIG DATA
Citation
Journal
DOI
Link(s) to data and video for this item
Relation
Rights
Type
Doctoral Dissertation