DEFENDING AGAINST AUTHORSHIP ATTRIBUTION ATTACKS WITH LARGE LANGUAGE MODELS

dc.contributor.advisorRiddell, Allen B.
dc.contributor.authorWang, Haining
dc.date.accessioned2025-06-30T13:16:57Z
dc.date.available2025-06-30T13:16:57Z
dc.date.issued2025-06
dc.descriptionThesis (Ph.D.) - Indiana University, Department of Information and Library Science, 2025
dc.description.abstractIn today’s digital era, individuals leave significant digital footprints through their writing, whether on social media or on their employer’s devices. These digital footprints pose a serious challenge for identity protection: authorship attribution techniques can identify the author of an unsigned document with high accuracy. This threat is especially acute for those who must speak publicly while safeguarding their anonymity, including whistleblowers, journalists, activists, and individuals living under oppressive regimes. Defenses against authorship attribution attacks rely on altering an individual’s writing style, making it unlinkable to their prior work while maintaining meaning and fluency. Despite extensive efforts at automation, existing techniques rarely match the effectiveness of manual interventions and make significant technical demands of individuals seeking to obfuscate their writing style. This dissertation investigates the use of large language models (LLMs) as an effective defense against authorship attribution attacks. These models are user-friendly and respond directly to natural language prompts, making them particularly accessible for privacy-conscious individuals. Through extensive experiments, this dissertation reproduces both established automated and manual circumvention strategies with LLMs. The results confirm that, with the right prompts, LLMs can offer significant protection from authorship attribution attacks. A simple “write differently” prompt on lightweight LLMs produces semantically faithful, inconspicuous text while driving attribution models’ performance down to near-chance levels. Surprisingly, open-weights models with just 8–9 billion parameters consistently outperform far larger closed-source models. Furthermore, this research overturns assumptions about in-context learning, showing that adding context, such as personas, exemplars, or extended demonstrations, often harms rather than helps defensive performance. These findings advance our understanding of how LLMs can frustrate stylometric fingerprinting while providing actionable guidance for those who need anonymization most, yet may struggle to access its benefits. At the same time, by bridging theory and practice, this dissertation delivers a practical solution to defend against authorship attribution attacks.
dc.identifier.urihttps://hdl.handle.net/2022/33626
dc.language.isoen_US
dc.publisher[Bloomington, Ind.] : Indiana University
dc.subjectAuthorship Attribution
dc.subjectStylometry
dc.subjectStyle
dc.subjectPrivacy
dc.subjectLarge Language Model
dc.subjectNatural Language Processing
dc.titleDEFENDING AGAINST AUTHORSHIP ATTRIBUTION ATTACKS WITH LARGE LANGUAGE MODELS
dc.typeDoctoral Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_final.pdf
Size:
211.25 MB
Format:
Adobe Portable Document Format
Can’t use the file because of accessibility barriers? Contact us