Big Data Analytics in Static and Streaming Provenance
Loading...
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Date
2016-04
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
[Bloomington, Ind.] : Indiana University
Permanent Link
Abstract
With recent technological and computational advances, scientists increasingly integrate
sensors and model simulations to understand spatial, temporal, social, and ecological
relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However,
provenance can be overwhelming in both volume and complexity; the now forecasting
potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making.
Description
Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016
Keywords
Big Data provenance, stream processing, data mining, data representation, data visualization
Citation
Journal
DOI
Link(s) to data and video for this item
Relation
Rights
Type
Doctoral Dissertation