The Data Capsule for Non-Consumptive Research: Final Report

Thumbnail Image
If you need an accessible version of this item, please email your request to so that they may create one and provide it to you.
Journal Title
Journal ISSN
Volume Title
Digital texts with access and use protections form a unique and fast growing collection of materials. Growing equally quickly is the development of text and data mining algorithms that process large text-based collections for purposes of exploring the content computationally. There is a strong need for research to establish the foundations for secure computational and data technologies that can ensure a non-consumptive environment for use-protected texts such as the copyrighted works in the HathiTrust Digital Library. Developing a secure computation and data environment for non-consumptive research for the HathiTrust Research Center is funded through a grant from the Alfred P. Sloan Foundation. In this research, researchers at HTRC and the University of Michigan are developing a “data capsule framework” that is founded on a principle of “trust but verify”. The project has resulted in a novel experimental framework that permits analytical investigation of a corpus but prohibits data from leaving the capsule. The HTRC Data Capsule is both a system architecture and set of policies that enable computational investigation over the protected content of the HT digital repository that is carried out and controlled directly by a researcher.
HathiTrust Research Center, HTRC Data Capsule, non-consumptive research
Link(s) to data and video for this item
Technical Report