Adaptive gPCA: A method for structured dimensionality reduction

Loading...
Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

When working with large biological data sets, exploratory analysis is an important first step for understanding the latent structure and for generating hypotheses to be tested in subsequent analyses. However, when the number of variables is large compared to the number of samples, standard methods such as principal components analysis give results which are unstable and difficult to interpret. To mitigate these problems, we have developed a method which allows the analyst to incorporate side information about the relationships between the variables in a way that encourages similar variables to have similar loadings on the principal axes. This leads to a low-dimensional representation of the samples which both describes the latent structure and which has axes which are interpretable in terms of groups of closely related variables. The method is derived by putting a prior encoding the relationships between the variables on the data and following through the analysis on the posterior distributions of the samples. We show that our method does well at reconstructing true latent structure in simulated data and we also demonstrate the method on a dataset investigating the effects of antibiotics on the composition of bacteria in the human gut.

Description

Keywords

Citation

Fukuyama, Julia A. "Adaptive gPCA: A method for structured dimensionality reduction." Annals of Applied Statistics, 2017-02-01.

Journal

Annals of Applied Statistics

DOI

Link(s) to data and video for this item

Relation

Rights

Type