Modeling semantic change with word embeddings using small historical corpora

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

2021-03-03

Journal Title

Journal ISSN

Volume Title

Publisher

Indiana University Digital Collections Services

Abstract

Word embeddings have recently been applied to detect and explore changes in word meaning in historical corpora (Hamilton et al., 2016; Rodda et al., 2017; Hellrich, 2019). While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning the stability, accuracy and applicability of these methods for historical data. Previous studies mostly made use of exceptionally large corpora such as Google books (Hamilton et al., 2016). However, there is scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can be used for historical data sets. In the work presented here, we focus on three methodological questions: How replicable and stable are the results of different word embeddings models? How do we determine the accuracy of different embedding models on our historical data? Given the low resource situation, can we find (enough) meaningful words in the embeddings to draw conclusions about semantic change? Do our findings correspond to prior knowledge? We experimented with a historical corpus of medieval and classical Spanish that is an order of magnitude smaller than those used in previous studies, and obtained word embeddings using three commonly used word embedding models: SGNS (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and SVDPPMI (Levy et al., 2015). We compare the results of different models and the solutions we developed to address the challenges found.

Description

Keywords

Linguistics, Semantic change, Text analysis

Citation

Journal

DOI

Link(s) to data and video for this item

Click on the link below to play this video

Rights

Type

Presentation