Word embeddings and semantic shifts in historical Spanish: Methodological considerations
Loading...
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Date
2021-08-25
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Digital Scholarship in the Humanities
Permanent Link
Abstract
Word embeddings have recently been applied to detect and explore changes in word meaning on large historical corpora. While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning accuracy and applicability of these methods for historical data. There is scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can also be used for historical data sets. Our overarching goal is to use word embeddings for investigating semantic shifts in the history of Spanish. In the work presented here, we focus on methodological questions that arise: We first examine the stability and applicability of three commonly used word embedding models on a small corpus of medieval and classical Spanish. Comparing our results with a study on the word algo as a test case, we show that a rank-averaging method can produce more stable results from the embeddings. We corroborate previous theoretical work while demonstrating the applicability of our method when training word embeddings on small corpora for the analysis of semantic change. Second, we investigate how best to evaluate different embeddings models. We show that an existing analogy test cannot be used without modification. Our new analogy test, consisting of roughly ten thousand questions for medieval and classical Spanish, will be released with the article.
Description
This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review.
Keywords
Citation
Hai Hu, Patrícia Amaral, Sandra Kübler, Word embeddings and semantic shifts in historical Spanish: Methodological considerations, Digital Scholarship in the Humanities, 2021;, fqab050, https://doi.org/10.1093/llc/fqab050
Journal
Link(s) to data and video for this item
Relation
Rights
Type
Article