Extensions to Embedding Regression: Models for Context-Specific Description and Inference

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

2024-02-02

Journal Title

Journal ISSN

Volume Title

Publisher

Indiana University Workshop in Methods

Abstract

Social scientists commonly seek to make statements about how word use varies over circumstances—including time, partisan identity, or some other document-level covariate. For example, researchers might wish to know how Republicans and Democrats diverge in their understanding of the term “immigration.” Building on the success of pretrained language models, we introduce the a la carte on text (conText) embedding regression model for this purpose. This fast and simple method produces valid vector representations of how words are used—and thus what words “mean”—in different contexts. We show that it outperforms slower, more complicated alternatives and works well even with very few documents. The model also allows for hypothesis testing and statements about statistical significance. We also provide extensions of the model to non-English languages and demonstrate applications for the same.

Description

Arthur Spirling is the Class of 1987 Professor of Politics at Princeton University. He received a bachelor’s and master’s degree from the London School of Economics, and a master’s degree and PhD from the University of Rochester. Previously, he served on the faculties of Harvard University and New York University. Spirling’s research centers on quantitative methods for analyzing political behavior, especially institutional development and the use of text-as-data. His work on these subjects has appeared in outlets such as the American Political Science Review, the American Journal of Political Science and the Journal of the American Statistical Association. Currently he is active on problems at the intersection of data science and social science, including those related to machine learning, and large language models. He previously won teaching and mentoring awards at Harvard and NYU, along with the “Emerging Scholar” prize from the Society for Political Methodology.

Keywords

Citation

Journal

DOI

Link(s) to data and video for this item

Rights

Type

Presentation