Word embeddings and semantic shifts in historical Spanish: Methodological considerations

dc.contributor.authorHu, Hai
dc.contributor.authorAmaral, Patrícia
dc.contributor.authorKübler, Sandra
dc.date.accessioned2022-03-15T14:35:28Z
dc.date.available2022-03-15T14:35:28Z
dc.date.issued2021-08-25
dc.descriptionThis is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review.
dc.description.abstractWord embeddings have recently been applied to detect and explore changes in word meaning on large historical corpora. While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning accuracy and applicability of these methods for historical data. There is scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can also be used for historical data sets. Our overarching goal is to use word embeddings for investigating semantic shifts in the history of Spanish. In the work presented here, we focus on methodological questions that arise: We first examine the stability and applicability of three commonly used word embedding models on a small corpus of medieval and classical Spanish. Comparing our results with a study on the word algo as a test case, we show that a rank-averaging method can produce more stable results from the embeddings. We corroborate previous theoretical work while demonstrating the applicability of our method when training word embeddings on small corpora for the analysis of semantic change. Second, we investigate how best to evaluate different embeddings models. We show that an existing analogy test cannot be used without modification. Our new analogy test, consisting of roughly ten thousand questions for medieval and classical Spanish, will be released with the article.
dc.identifier.citationHai Hu, Patrícia Amaral, Sandra Kübler, Word embeddings and semantic shifts in historical Spanish: Methodological considerations, Digital Scholarship in the Humanities, 2021;, fqab050, https://doi.org/10.1093/llc/fqab050
dc.identifier.doihttps://doi.org/10.1093/llc/fqab050
dc.identifier.urihttps://hdl.handle.net/2022/27372
dc.language.isoen
dc.publisherDigital Scholarship in the Humanities
dc.relation.isversionofhttps://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqab050/6357326?redirectedFrom=fulltext
dc.rightsThis work may be protected by copyright unless otherwise stated.
dc.titleWord embeddings and semantic shifts in historical Spanish: Methodological considerations
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hu_Amaral_Kuebler_2021.pdf
Size:
832.3 KB
Format:
Adobe Portable Document Format
Description:
Can’t use the file because of accessibility barriers? Contact us