Automating JATS XML Tagging With ChatGPT

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

2024-05-15

Journal Title

Journal ISSN

Volume Title

Publisher

[Bloomington, Ind.] : Indiana University

Abstract

While significant progress has been made in streamlining JATS XML publication workflows, efficiently converting article submission files into JATS XML galleys remains challenging for smaller publishers. The Journal Article Tag Suite (JATS) is a global standard for scholarly journal publishing, indexing, sharing, and archiving. Motivated by the advantages of XML publishing, the Indiana University open access journal publishing program has explored a number of options to expand our use of JATS. In 2023, we began experimenting with the generative AI tool ChatGPT to assess its potential in automating the JATS conversion step in our publishing workflow. Our results demonstrated that ChatGPT can effectively tag plain-text research article content in accurate, publishable JATS. In an effort to automate XML tagging for the journal Studies in Digital Heritage (SDH), we designed several prompts to direct ChatGPT in tagging each section of a research article in our specific JATS format. Guided by prompts that provided relevant XML examples, ChatGPT was able to produce JATS-compliant tagging from plain-text article content. At the section level, the JATS produced by ChatGPT was comparable in accuracy to our vendor-produced JATS. Eventually, this approach along with several additional steps was able to produce a publication-ready JATS galley which we then posted to SDH. While our experiment with automating JATS XML tagging demonstrates that large language models like ChatGPT are capable of performing this type of work with high accuracy, the current token limitations of ChatGPT 3.5 necessitate a piecemeal approach which makes this method too unwieldy for large scale adoption at this point. Nevertheless, if the token limit were substantially increased, and if we could input all our prompts simultaneously, fully automated JATS tagging may be within reach.

Description

Keywords

Library Publishing, JATS, XML, ChatGPT, Prompt Engineering, LLM

Citation

Vaughn, M. (2024, May 15). Automating JATS XML Tagging with ChatGPT. [Conference presentation] Library Publishing Forum, Minneapolis, MN.

Journal

Link(s) to data and video for this item

Relation

Rights

Type

Presentation