Automatically Dating Classical Chinese Texts: Preliminary Study on Biji and Buddhist Texts
Loading...
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Date
2022-04-22
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Permanent Link
Abstract
In recent years, there has been an increasing amount of literature on using computational methods to study language change. These studies demonstrate good performance in automatically identifying the time of text writing (Popescu and Strapparava, 2015), tracing semantic change (Schlechtweg et al, 2020), and even discovering rules underlying language change (Hamilton et al., 2016). However, such studies are questioned for taking at face value (Hengchen et al., 2021), and models' performance in varieties of languages or genres remains unclear. Regarding Classical Chinese, we realize that there is a clear lack of open-access diachronic data, and the lexical change among different genres is seldom addressed in a computational way with large data. In this study, we approach the issue of how language changes across time and across genres by using classification tasks. Two types of texts: Chinese Biji and Buddhist texts are included. We firstly aim to examine how well language models (such as ngram, word2vec, transformers) can predict the written time of historical texts. Then, we are interested in what we can learn from the language models' prediction. We analyze the results we obtained and discuss the future direction.
Description
Keywords
Citation
Journal
DOI
Link(s) to data and video for this item
Rights
Type
Presentation