Show simple item record

dc.contributor.author Christie, Jennifer
dc.contributor.author Kloster, David
dc.contributor.author Walsh, John
dc.date.accessioned 2020-12-15T15:27:02Z
dc.date.available 2020-12-15T15:27:02Z
dc.date.issued 2020-11-04
dc.identifier.uri https://hdl.handle.net/2022/26003
dc.description.abstract The HathiTrust Digital Library (HTDL) was founded in 2008 with just over 2 million volumes in the collection. Today there are over 17 million volumes ranging from 6th-century psalters to 21st-century academic texts. The diverse contents of the HTDL include government documents, academic journal articles, and monographs from all the disciplines one would find represented in a typical academic research library. While the majority of materials are in English, there are many volumes in German, French, Spanish, Italian, Arabic, Chinese, Russian, and Latin. Researchers may perform text analysis on the contents of HTDL by utilizing the many text analysis tools and data sets provided by the HathiTrust Research Center (HTRC). The HathiTrust Research Center (HTRC), based at IU Bloomington, develops infrastructure, tools, and services to support Text Data Mining of the HTDL corpus. These include off-the-shelf web-based text analysis tools, a secure data capsule computing environment for analysis of rights-restricted content, and the HTRC Extracted Features Data Set, which provides volume-level and page-level word counts and other metadata for the entire corpus. This presentation will discuss the current contents of the HTDL collection and its benefits as a data source and provide examples of existing research facilitated by HTDL collections and HTRC resources. In addition, this presentation will give an overview of the various HTRC text analysis tools and the different options for analyzing public domain and copyrighted material. en
dc.language.iso en en
dc.publisher Indiana University Digital Collections Services en
dc.relation.isversionof Click the link below to play this video en
dc.relation.uri https://purl.dlib.indiana.edu/iudl/media/415p295d1h
dc.subject Text data mining en
dc.subject Computational linguistics en
dc.subject Digital libraries en
dc.title The HathiTrust Research Center (HTRC): Mining the 17 Million Volumes of the HathiTrust Digital Library en
dc.type Presentation en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IUScholarWorks


Advanced Search

Browse

My Account

Statistics