Word embeddings and semantic shifts in historical Spanish: Methodological considerations

Loading...
Thumbnail Image

External File or Record

Can’t use the file because of accessibility barriers? Contact us

Journal Title

Journal ISSN

Volume Title

Publisher

Digital Scholarship in the Humanities

Abstract

Word embeddings have recently been applied to detect and explore changes in word meaning on large historical corpora. While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning accuracy and applicability of these methods for historical data. There is scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can also be used for historical data sets. Our overarching goal is to use word embeddings for investigating semantic shifts in the history of Spanish. In the work presented here, we focus on methodological questions that arise: We first examine the stability and applicability of three commonly used word embedding models on a small corpus of medieval and classical Spanish. Comparing our results with a study on the word algo as a test case, we show that a rank-averaging method can produce more stable results from the embeddings. We corroborate previous theoretical work while demonstrating the applicability of our method when training word embeddings on small corpora for the analysis of semantic change. Second, we investigate how best to evaluate different embeddings models. We show that an existing analogy test cannot be used without modification. Our new analogy test, consisting of roughly ten thousand questions for medieval and classical Spanish, will be released with the article.

Series and Number:

EducationalLevel:

Is Based On:

Target Name:

Teaches:

Table of Contents

Description

This is a pre-copyedited, author-produced version of an article accepted for publication in Digital Scholarship in the Humanities following peer review.

Keywords

Citation

Hai Hu, Patrícia Amaral, Sandra Kübler, Word embeddings and semantic shifts in historical Spanish: Methodological considerations, Digital Scholarship in the Humanities, 2021;, fqab050, https://doi.org/10.1093/llc/fqab050

Journal

Rights

This work may be protected by copyright unless otherwise stated.