Provenance as Essential Infrastructure for Data Lakes
Loading...
Can’t use the file because of accessibility barriers? Contact us
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Permanent Link
Abstract
The Data Lake is emerging as a Big Data storage and management solution which can store any type of data at scale and execute data transformations for analysis. Higher flexibility in storage increases the risk of Data Lakes becoming data swamps. In this paper we show how provenance contributes to data management within a Data Lake infrastructure. We study provenance integration challenges and propose a reference architecture for provenance usage in a Data Lake. Finally we discuss the applicability of our tools in the proposed architecture.
Series and Number:
Indiana University Computer Science Technical Reports; TR725
EducationalLevel:
Is Based On:
Target Name:
Teaches:
Table of Contents
Description
Keywords
Citation
Journal
DOI
Link(s) to data and video for this item
Relation
Rights
This work is protected by copyright unless stated otherwise.