Automatically Dating Classical Chinese Texts: Preliminary Study on Biji and Buddhist Texts

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

2022-04-22

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In recent years, there has been an increasing amount of literature on using computational methods to study language change. These studies demonstrate good performance in automatically identifying the time of text writing (Popescu and Strapparava, 2015), tracing semantic change (Schlechtweg et al, 2020), and even discovering rules underlying language change (Hamilton et al., 2016). However, such studies are questioned for taking at face value (Hengchen et al., 2021), and models' performance in varieties of languages or genres remains unclear. Regarding Classical Chinese, we realize that there is a clear lack of open-access diachronic data, and the lexical change among different genres is seldom addressed in a computational way with large data. In this study, we approach the issue of how language changes across time and across genres by using classification tasks. Two types of texts: Chinese Biji and Buddhist texts are included. We firstly aim to examine how well language models (such as ngram, word2vec, transformers) can predict the written time of historical texts. Then, we are interested in what we can learn from the language models' prediction. We analyze the results we obtained and discuss the future direction.

Description

Keywords

Citation

Journal

DOI

Link(s) to data and video for this item

Rights

Type

Presentation

Collections