Modeling semantic change with word embeddings using small historical corpora
Loading...
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Date
2021-03-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Indiana University Digital Collections Services
Permanent Link
Abstract
Word embeddings have recently been applied to detect and explore changes in word meaning in historical corpora (Hamilton et al., 2016; Rodda et al., 2017; Hellrich, 2019). While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning the stability, accuracy and applicability of these methods for historical data. Previous studies mostly made use of exceptionally large corpora such as Google books (Hamilton et al., 2016). However, there is scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can be used for historical data sets.
In the work presented here, we focus on three methodological questions:
How replicable and stable are the results of different word embeddings models?
How do we determine the accuracy of different embedding models on our historical data?
Given the low resource situation, can we find (enough) meaningful words in the embeddings to draw conclusions about semantic change? Do our findings correspond to prior knowledge?
We experimented with a historical corpus of medieval and classical Spanish that is an order of magnitude smaller than those used in previous studies, and obtained word embeddings using three commonly used word embedding models: SGNS (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and SVDPPMI (Levy et al., 2015). We compare the results of different models and the solutions we developed to address the challenges found.
Description
Keywords
Linguistics, Semantic change, Text analysis
Citation
Journal
DOI
Link(s) to data and video for this item
Click on the link below to play this video
Rights
Type
Presentation