Historical Documentation

In an effort to maintain the digital provenance of this project, this page contains documentation and project details as it was conceived and implemented between 2007-2009.

LSTA Grant Proposal

  • Link to PDF that needs to be hosted somewhere!

Encoding Overview

The Indiana University Digital Library Program received a grant from the Institute for Museum and Library Services under the provisions of the Library Services and Technology Act (LSTA) administered by the Indiana State Library to digitize and encode a 102-year run of The Indiana Magazine of History(IMH). The journal features historical articles, critical essays, research notes, annotated primary documents, reviews, and notices.

Encoding the IMH came with its own sets of challenges due to the variety of content types included within the journal, and the changes in structure over its 102-year print history. In addition to scholarly articles, tables of contents, indexes, and other text types commonly found in journals, the text of the IMH contains reprints of primary source materials from letters and diaries to election results. The presence of tabular data and highly structured text such as poetry posed structural difficulties, while foreign languages and a proliferation of proper names created the need for focused semantic encoding.

For the purpose of this project, the semantic encoding focused on article types (scholarly, book reviews, and editorial materials), article features (diaries, letters, and bibliographies) and place names. Structural encoding focused on basic print conventions for serials such as page breaks, bylines, etc. as well as lists, tables, blockquotes and footnotes.

The encoding required to faithfully represent the text would have been prohibitively expensive, and was furthermore unnecessary due to the facsimile page images provided. Certain structural elements such as columns and verse, and semantic elements such as personal names and back matter indices were not encoded for these reasons.

Issue-level encoding following the Guidelines for Electronic Text Encoding and Interchange, version P4 was employed to maintain the conceptual integrity of the print journal.

Bibliographic information about a TEI-encoded text is captured in the <teiHeader>. Each TEI.2 file can have only one header, and any given portion of text can only have one header that applies to it. Like most journals, articles in the IMH have two sets of bibliographic metadata: issue-level and article-level. The journal also contains book review articles, consisting of multiple reviews, each with its own set of metadata. In order to provide readers with issue-level browse access and article-level search access (e.g., search by article title or author), bibliographic metadata was captured for the issue and for each of the articles, including discrete book reviews, within the issue by way of independent headers.

The IMH was outsourced for digitization and encoding, but it underwent a second iteration of in-house encoding after our quality control process revealed a number of errors and inconsistencies with the outsourced encoding. Currently, place name corrections and article-level subject normalization is ongoing. A phase two release of the IMH online will include more robust place name and topical subject access.

For more detailed information about encoding particulars and challenges we faced, please refer to the Papers and Presentations section of the web site.

Metadata Overview

The Text Encoding Initiative (TEI) guidelines and the digital library community of practice offered a number of potential methods for representing article-level bibliographic metadata, including TEI Corpus, Metadata Object Description Schema (MODS), and article-level TEI documents that link to the parent issue. After exploring these and other options, the TEI Independent Header eventually emerged as the best way to represent granular metadata for the online Indiana Magazine of History (IMH).

The auxiliary schema for the Independent Header (IHS) was developed to allow the exchange of bibliographic metadata for text collections to support the creation of indices and other aggregations. The creators of the schema did not necessarily envision the use of the Independent Header in serials encoding, but this method has a number of advantages. Utilizing the IHS in this fashion allows the TEI to function as the authoritative metadata source for the document, and allows the encoder to faithfully represent the issue-based structure of the original without compromising the unique identity of each article. It also supports our larger goal of interoperability with other text collections. Since the IHS is part of the TEI standard (P4 and earlier), the encoder does not have to extend or modify the DTD. This not only simplifies documentation needs for management and preservation, but also allows for easier reuse of content and integration with other collections.

By capturing article-level metadata using the TEI independent header, the TEI serves as the descriptive source for the IMH online from which we can derive functionality such as page turning and bibliographic metadata in other formats such as Dublin Core (DC) and Metadata Object Description Schema (MODS) for sharing via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The Metadata Encoding and Transmission Standard (METS) is used both to manage metadata about the digital objects that comprise the IMH online (archival and derivative images, XML files, etc.) and to drive page turning functionality for METS Navigator, an open source software developed by the IU Digital Library Program.

For more detailed information about metadata decisions and challenges, please refer to the Papers and Presentations section of the web site.

Technical Overview: XTF and Fedora

The contents of the online version of the Indiana Magazine of History - XML/TEI files, DTDs, master and derivative images, PDFs, etc. - are stored in the IU Digital Library Program's Fedora repository, an open source software framework for managing and delivering digital collections and resources. A Journal Content Model (see diagram) was developed to reflect the hierarchical nature of the journal. This model allows us, among other things, to attach metadata, especially bibliographic metadata, at various levels such as the volume, issue and article levels.

Journal Content Model Diagram

Software for the project was developed using Java Servlet technology, Java Server Pages, the Apache Struts Java Web application framework, and Asynchronous Javascript and XML (AJAX) technologies. TEI documents are transformed to HTML pages using the Extensible Stylesheet Language Transformations (XSLT). Searching and browsing capabilities are implemented using a customized version of the open source eXtensible Text Framework(XTF) developed by the California Digital Library. Page turning functions are powered by METS Navigator, an open source METS-based page turning software application developed by the IU Digital Library Program. The web site is delivered on the Tomcat application server and Apache HTTP Server software.