Learning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study

dc.contributor.authorWang, Rui; Li, Yong; Wang, Xiaofeng; Tang, Haixu; Zhou, Xiaoyong
dc.date.accessioned2025-11-13T20:31:47Z
dc.date.available2025-11-13T20:31:47Z
dc.date.issued2009-08
dc.description.abstractGenome-wide association studies (GWAS) aim at discovering the association between genetic variations, particularly single-nucleotide polymorphism (SNP), and common diseases, which have been well recognized to be one of the most important and active areas in biomedical research. Also renowned is the privacy implication of such studies, which has been brought into the limelight by the recent attack proposed by Homer et al. Homer's attack demonstrates that it is possible to identify a participant of a GWAS from analyzing the allele frequencies of a large number of SNPs. Such a threat, unfortunately, was found in our research to be significantly understated. In this paper, we demonstrate that individuals can actually be identified from even a relatively small set of statistics, as those routinely published in GWAS papers. We present two attacks. The first one extends Homer's attack with a much more powerful test statistic, based on the correlations among different SNPs described by coefficient of determination ($r^2$). This attack can determine the presence of an individual in a GWAS from the statistics related to a couple of hundred SNPs. The second attack can lead to complete disclosure of hundreds of the participants' SNPs, by analyzing the information derived from the published statistics. We also found that those attacks can succeed even when the precisions of the statistics are low and part of data is missing, which makes the effects of such simple defense limited. We evaluated our attacks on the real human genomes from the International HapMap project, and concluded that such threats are completely realistic.
dc.identifier.urihttps://hdl.handle.net/2022/34523
dc.relation.ispartofseriesIndiana University Computer Science Technical Reports; TR680
dc.rightsThis work is protected by copyright unless stated otherwise.
dc.rights.uri
dc.titleLearning Your Identity and Disease from Research Papers: Information Leaks in Genome Wide Association Study

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TR680.pdf
Size:
692.81 KB
Format:
Adobe Portable Document Format
Can’t use the file because of accessibility barriers? Contact us