Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

A Description of the Clinical Proteomics Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline

Published

Author(s)

Jeri S. Roth, Paul A. Rudnick, Sanford Markey, Yuri Mirokhin, Xinjian Yan, Dmitrii Tchekhovskoi, Stephen Stein, Nathan J. Edwards, Ratna R. Thangudu, Karen A. Ketchum, Christopher R. Kinsinger, Mehdi Mesri, Henry Rodriguez

Abstract

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) has produced large proteomics datasets from the mass spectrometric interrogation of tumor samples previously studied by The Cancer Genome Atlas (TCGA) program. The availability of the genomic and proteomic data is enabling the proteogenomic study of these samples for both reference (i.e., those for which perfect sequence matches are exist found in canonical sequence databases) and non-reference protein markers of cancer. The CPTAC labs have focused on colon, breast, and ovarian tissues in the first round of analyses. Spectra from these datasets are produced from 2D LC-MS/MS analyses and represent deep coverage of these samples. To reduce the variability introduced by disparate data analysis platforms, the CPTAC Common Data Analysis Platform (CDAP) was created. The CDAP produces both peptide-spectrum-match (PSM)-level reports as well as protein reports from a set of tools that perform the following analysis steps: (1) Peak-picking and quantitative data extraction, (2) database searching, (3) gene-based protein parsimony, and (4) FDR-based filtering. The pipeline also produces localization scores for the phospho-peptide enrichment studies using PhosphoRS. Quantitative information for each of the datasets is specific to the sample processing (i.e., label-free or 4plex iTRAQ), and PSM and protein reports contain the spectrum-level or gene-level ("rolled-up") precursor peak areas or iTRAQTM reporter ion log-ratios. The reports are available in simple tab-delimited formats and, for the PSM-reports, in mzIdentML. The goal of the CDAP is to provide standard, uniform reports for all of the CPTAC data, enabling comparisons between different samples and cancer types as well as across the major ‘omics fields.
Citation
ACS Journal of Proteome Research
Volume
15
Issue
3

Keywords

proteomics data resource, bioinformatics, cancer, CPTAC, data analysis pipeline

Citation

Roth, J. , Rudnick, P. , Markey, S. , Mirokhin, Y. , Yan, X. , Tchekhovskoi, D. , Stein, S. , Edwards, N. , Thangudu, R. , Ketchum, K. , Kinsinger, C. , Mesri, M. and Rodriguez, H. (2016), A Description of the Clinical Proteomics Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline, ACS Journal of Proteome Research, [online], https://doi.org/10.1021/acs.jproteome.5b01091 (Accessed April 16, 2024)
Created February 9, 2016, Updated October 12, 2021