Show simple item record

dc.contributor.author Plale, Beth en
dc.contributor.author Zeng, Jiaan en
dc.contributor.author McDonald, Robert en
dc.contributor.author Chen, Miao en
dc.date.accessioned 2014-09-17T13:28:31Z en
dc.date.available 2014-09-17T13:28:31Z en
dc.date.issued 2014-09-10 en
dc.identifier.uri https://hdl.handle.net/2022/18936 en
dc.description.abstract The first mode of access by the community of digital humanities and informatics researchers and educators to the copyrighted content of the HathiTrust digital repository will be to extracted statistical and aggregated information about the copyrighted texts. But can the HathiTrust Research Center support scientific research that allows a researcher to carry out their own analysis and extract their own information? This question is the focus of a 3-year, $606,000 grant from the Alfred P. Sloan Foundation (Plale, Prakash 2011-2014), which has resulted in a novel experimental framework that permits analytical investigation of a corpus but prohibits data from leaving the capsule. The HTRC Data Capsule is both a system architecture and set of policies that enable computational investigation over the protected content of the HT digital repository that is carried out and controlled directly by a researcher. It leverages the foundational security principles of the Data Capsules of A. Prakash of University of Michigan, which allows privileged access to sensitive data while also restricting the channels through which that data can be released. Ongoing work extends the HTRC Data Capsule to give researchers more compute power at their fingertips. The new thrust, HT-DC Cloud, extends existing security guarantees and features to allow researchers to carry out compute-heavy tasks, like LDA topic modeling, on large-scale compute resources. HTRC Data Capsule works by giving a researcher their own virtual machine that runs within the HTRC domain. The researcher can configure the VM as they would their own desktop with their own tools. After they are done, the VM switches into a "secure" mode, where network and other data channels are restricted in exchange for access to the data being protected. Results are emailed to the user. In this talk we discuss the motivations for the HTRC Data Capsule, its successes and challenges. HTRC Data Capsule runs at Indiana University. See more at http://d2i.indiana.edu/non-consumptive-research en
dc.language.iso en_US en
dc.publisher Indiana University Digital Collections Services en
dc.relation.isversionof Click on the link below in the "External Files" section to play this video. en
dc.relation.uri https://purl.dlib.indiana.edu/iudl/media/b49425mg29 en
dc.subject Text analysis en
dc.subject Databases en
dc.title HathiTrust Research Center Data Capsule v1.0: An Overview of Functionality en
dc.type Presentation en
dc.altmetrics.display false en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUScholarWorks


Advanced Search

Browse

My Account

Statistics