msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing

Thumbnail Image
Can’t use the file because of accessibility barriers? Contact us

Date

2018-12-04

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Large-scale proteomics projects often generate massive and highly redundant tandem mass spectra(MS/MS). Spectral clustering algorithms can reduce the redundancy in these datasets, and thus speed up database searching for peptide identication, a major bottleneck for proteomic data analysis. The key challenge of spectral clustering is to reduce the redundancy in the MS/MS spectra data, while retaining sufficient sensitivity to identify peptides from the clustered spectra. In this paper, we present the software msCRUSH, which implements a novel spectral clustering algorithm based on the locality sensitive hashing (LSH) technique. When tested on a large-scale proteomic dataset consisting of 23.6 million spectra (including 14.4 million spectra of charge 2+), msCRUSH runs 6.9-11.3x faster than the state-of-the-art spectral clustering software, PRIDE Cluster, while achieving higher clustering sensitivity and comparable accuracy. Using the consensus spectra reported by msCRUSH, commonly used spectra search engines MSGF+ and Mascot can identify 3% and 1% more unique peptides, respectively, comparing to the identification results from the raw MS/MS spectra at the same false discovery rate (1% FDR) of peptide level. msCRUSH is implemented in C++, and is released as open source software.

Description

This record is for a(n) postprint of an article published by ACS in Journal of Proteome Research on 2018-12-04; the version of record is available at https://doi.org/10.1021/acs.jproteome.8b00448.

Keywords

Citation

Wang, Lei, et al. "msCRUSH: Fast Tandem Mass Spectral Clustering Using Locality Sensitive Hashing." Journal of Proteome Research, vol. 18, no. 1, 2018-12-04, https://doi.org/10.1021/acs.jproteome.8b00448.

Journal

Journal of Proteome Research

DOI

Link(s) to data and video for this item

Relation

Rights

Type