Modeling semantic change with word embeddings using small historical corpora

dc.contributor.authorHu, Hai
dc.contributor.authorAmaral, Patrícia
dc.contributor.authorKübler, Sandra
dc.date.accessioned2021-03-05T21:32:04Z
dc.date.available2021-03-05T21:32:04Z
dc.date.issued2021-03-03
dc.description.abstractWord embeddings have recently been applied to detect and explore changes in word meaning in historical corpora (Hamilton et al., 2016; Rodda et al., 2017; Hellrich, 2019). While word embeddings are useful in many Natural Language Processing tasks, there are a number of questions that need to be addressed concerning the stability, accuracy and applicability of these methods for historical data. Previous studies mostly made use of exceptionally large corpora such as Google books (Hamilton et al., 2016). However, there is scarce literature on the stability and replicability of these embeddings, especially on small corpora, which are common in historical work. It also remains unclear whether methods used to evaluate embeddings in contemporary data can be used for historical data sets. In the work presented here, we focus on three methodological questions: How replicable and stable are the results of different word embeddings models? How do we determine the accuracy of different embedding models on our historical data? Given the low resource situation, can we find (enough) meaningful words in the embeddings to draw conclusions about semantic change? Do our findings correspond to prior knowledge? We experimented with a historical corpus of medieval and classical Spanish that is an order of magnitude smaller than those used in previous studies, and obtained word embeddings using three commonly used word embedding models: SGNS (Mikolov et al., 2013), GloVe (Pennington et al., 2014), and SVDPPMI (Levy et al., 2015). We compare the results of different models and the solutions we developed to address the challenges found.en
dc.identifier.urihttps://hdl.handle.net/2022/26310
dc.language.isoenen
dc.publisherIndiana University Digital Collections Servicesen
dc.relation.isversionofClick on the link below to play this videoen
dc.relation.urihttps://purl.dlib.indiana.edu/iudl/media/425k327w0x
dc.subjectLinguisticsen
dc.subjectSemantic changeen
dc.subjectText analysisen
dc.titleModeling semantic change with word embeddings using small historical corporaen
dc.typePresentationen

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
sem change -- IU brown bag 20210303[33].pdf
Size:
1.51 MB
Format:
Adobe Portable Document Format
Description:
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.