DEFENDING AGAINST AUTHORSHIP ATTRIBUTION ATTACKS WITH LARGE LANGUAGE MODELS
dc.contributor.advisor | Riddell, Allen B. | |
dc.contributor.author | Wang, Haining | |
dc.date.accessioned | 2025-06-30T13:16:57Z | |
dc.date.available | 2025-06-30T13:16:57Z | |
dc.date.issued | 2025-06 | |
dc.description | Thesis (Ph.D.) - Indiana University, Department of Information and Library Science, 2025 | |
dc.description.abstract | In today’s digital era, individuals leave significant digital footprints through their writing, whether on social media or on their employer’s devices. These digital footprints pose a serious challenge for identity protection: authorship attribution techniques can identify the author of an unsigned document with high accuracy. This threat is especially acute for those who must speak publicly while safeguarding their anonymity, including whistleblowers, journalists, activists, and individuals living under oppressive regimes. Defenses against authorship attribution attacks rely on altering an individual’s writing style, making it unlinkable to their prior work while maintaining meaning and fluency. Despite extensive efforts at automation, existing techniques rarely match the effectiveness of manual interventions and make significant technical demands of individuals seeking to obfuscate their writing style. This dissertation investigates the use of large language models (LLMs) as an effective defense against authorship attribution attacks. These models are user-friendly and respond directly to natural language prompts, making them particularly accessible for privacy-conscious individuals. Through extensive experiments, this dissertation reproduces both established automated and manual circumvention strategies with LLMs. The results confirm that, with the right prompts, LLMs can offer significant protection from authorship attribution attacks. A simple “write differently” prompt on lightweight LLMs produces semantically faithful, inconspicuous text while driving attribution models’ performance down to near-chance levels. Surprisingly, open-weights models with just 8–9 billion parameters consistently outperform far larger closed-source models. Furthermore, this research overturns assumptions about in-context learning, showing that adding context, such as personas, exemplars, or extended demonstrations, often harms rather than helps defensive performance. These findings advance our understanding of how LLMs can frustrate stylometric fingerprinting while providing actionable guidance for those who need anonymization most, yet may struggle to access its benefits. At the same time, by bridging theory and practice, this dissertation delivers a practical solution to defend against authorship attribution attacks. | |
dc.identifier.uri | https://hdl.handle.net/2022/33626 | |
dc.language.iso | en_US | |
dc.publisher | [Bloomington, Ind.] : Indiana University | |
dc.subject | Authorship Attribution | |
dc.subject | Stylometry | |
dc.subject | Style | |
dc.subject | Privacy | |
dc.subject | Large Language Model | |
dc.subject | Natural Language Processing | |
dc.title | DEFENDING AGAINST AUTHORSHIP ATTRIBUTION ATTACKS WITH LARGE LANGUAGE MODELS | |
dc.type | Doctoral Dissertation |
Files
Original bundle
1 - 1 of 1
Collections
Can’t use the file because of accessibility barriers? Contact us