DEFENDING AGAINST AUTHORSHIP ATTRIBUTION ATTACKS WITH LARGE LANGUAGE MODELS
Loading...
Can’t use the file because of accessibility barriers? Contact us
Date
2025-06
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
[Bloomington, Ind.] : Indiana University
Permanent Link
Abstract
In today’s digital era, individuals leave significant digital footprints through their writing, whether on social media or on their employer’s devices. These digital footprints pose a serious challenge for identity protection: authorship attribution techniques can identify the author of an unsigned document with high accuracy. This threat is especially acute for those who must speak publicly while safeguarding their anonymity, including whistleblowers, journalists, activists, and individuals living under oppressive regimes.
Defenses against authorship attribution attacks rely on altering an individual’s writing style, making it unlinkable to their prior work while maintaining meaning and fluency. Despite extensive efforts at automation, existing techniques rarely match the effectiveness of manual interventions and make significant technical demands of individuals seeking to obfuscate their writing style.
This dissertation investigates the use of large language models (LLMs) as an effective defense against authorship attribution attacks. These models are user-friendly and respond directly to natural language prompts, making them particularly accessible for privacy-conscious individuals. Through extensive experiments, this dissertation reproduces both established automated and manual circumvention strategies with LLMs.
The results confirm that, with the right prompts, LLMs can offer significant protection from authorship attribution attacks. A simple “write differently” prompt on lightweight LLMs produces semantically faithful, inconspicuous text while driving attribution models’ performance down to near-chance levels. Surprisingly, open-weights models with just 8–9 billion parameters consistently outperform far larger closed-source models. Furthermore, this research overturns assumptions about in-context learning, showing that adding context, such as personas, exemplars, or extended demonstrations, often harms rather than helps defensive performance.
These findings advance our understanding of how LLMs can frustrate stylometric fingerprinting while providing actionable guidance for those who need anonymization most, yet may struggle to access its benefits. At the same time, by bridging theory and practice, this dissertation delivers a practical solution to defend against authorship attribution attacks.
Description
Thesis (Ph.D.) - Indiana University, Department of Information and Library Science, 2025
Keywords
Authorship Attribution, Stylometry, Style, Privacy, Large Language Model, Natural Language Processing
Citation
Journal
DOI
Link(s) to data and video for this item
Relation
Rights
Type
Doctoral Dissertation