Provenance as Essential Infrastructure for Data Lakes

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The Data Lake is emerging as a Big Data storage and management solution which can store any type of data at scale and execute data transformations for analysis. Higher flexibility in storage increases the risk of Data Lakes becoming data swamps. In this paper we show how provenance contributes to data management within a Data Lake infrastructure. We study provenance integration challenges and propose a reference architecture for provenance usage in a Data Lake. Finally we discuss the applicability of our tools in the proposed architecture.

Series and Number:

Indiana University Computer Science Technical Reports; TR725

EducationalLevel:

Is Based On:

Target Name:

Teaches:

Table of Contents

Description

Keywords

Citation

Journal

DOI

Link(s) to data and video for this item

Relation

Rights

This work is protected by copyright unless stated otherwise.

Type