Accurate Estimation of False Discovery Rates for Protein and Proteoform Identification in Top Down Proteomics

No Thumbnail Available
Can’t use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Forthcoming

Abstract

Within the last five years, top down proteomics (TDP) has emerged as a high throughput technique for protein identification in addition to characterization and quantitation of thousands of modified proteoforms. Here, a framework for calculating an accurate false discovery rate (FDR) that considers both protein and proteoform levels was used to evaluate local dependencies when aggregating results from replicate LC-MS/MS runs searched with different search modes and parameters.Ê We find that proteoform identifications are not statistically independent of each other and that correcting the FDR locally within a given LC-MS/MS run is not sufficient to control FDR globally across a large experiment.Ê A series of corrections used previously in genomics was implemented to address these issues and produce a global FDR calculation that scales well. ÊThe validity of the system is assessed by analyzing two previously published experimental datasets.ÊWeb-based access via a new TDPORTAL to high-performance computation enables all steps necessary to create a set of results utilizing the accurate and scalable FDR estimation described here. Also, a new application called TOP DOWN VIEWER enables viewing, analyzing, and sharing result sets via .tdReport files and is available at http://topdownviewer.northwestern.edu.

Description

Click on the PURL link below in the "External Files" section to download the dataset.

Keywords

False Discovery Rate, Top Down Proteomics, Statistical Dependency, Molecular Levels, Protein Identification, Proteoform, Search Engine, Multiple Hypothesis Testing

Citation

Forthcoming

Journal

DOI

Link(s) to data and video for this item

Rights

n. Creative Commons Attribution 4.0 license

Type

Dataset

Collections