Reliable Access to Massive Restricted Texts: Experience-based Evaluation

dc.contributor.authorPeng, Zong
dc.contributor.authorPlale, Beth A
dc.date.accessioned2025-02-20T16:41:18Z
dc.date.available2025-02-20T16:41:18Z
dc.date.issued2019-04-05
dc.description.abstractLibraries are seeing growing numbers of digitized textual corpora that frequently come with restrictions on their content. Computational analysis corpora that are large, while of interest to scholars, can be cumbersome because of the combination of size, granularity of access, and access restrictions. Efficient management of such a collection for general access especially under failures depends on the primary storage system. In this paper, we identify the requirements of managing for computational analysis a massive text corpus and use it as basis to evaluate candidate storage solutions. The study based on the 5.9 billion page collection of the HathiTrust digital library. Our findings led to the choice of Cassandra 3.x for the primary back end store, which is currently in deployment in the HathiTrust Research Center.
dc.identifier.citationPeng, Zong, and Plale, Beth A. "Reliable Access to Massive Restricted Texts: Experience-based Evaluation." Concurrency and Computation: Practice and Experience, vol. on-line version, 2019-04-05, https://doi.org/10.1002/cpe.5255.
dc.identifier.issn1532-0626
dc.identifier.otherBRITE 6502
dc.identifier.urihttps://hdl.handle.net/2022/31947
dc.language.isoen
dc.relation.isversionofhttps://doi.org/10.1002/cpe.5255
dc.relation.isversionofhttp://arxiv.org/pdf/1903.00771
dc.relation.journalConcurrency and Computation: Practice and Experience
dc.titleReliable Access to Massive Restricted Texts: Experience-based Evaluation

Files

Can’t use the file because of accessibility barriers? Contact us