Data-Driven and Peak-Based Feature Selection In Serum Protein Mass Spectrometry
Walter S. Liggett Jr, Peter E. Barker, O J. Semmes, L H. Cazares
Consider functional canonical correlation analysis (CCA) applied to disjoint sections of lengthy protein mass spectra for the purpose of finding long-distance correlation structure. The relations between the CCA weight functions, which are derived from the data, and spectral peaks, which can be traced to individual proteins, provide a basis for interpreting the structure. The data analyzed consist of repeated measurements of a human serum standard by surface-enhanced laser desorption/ionization (SELDI) time-of-flight (TOF) mass spectrometry. There are 88 spectra obtained from 11 protein chips each with 8 spots. The data-analysis goal is insight into the sample preparation step in such spectrometry, a step that involves the protein chip. We see that variation in this step has an outsized effect on a few proteins. We obtain this insight through interpretation of the long-distance correlation structure and through comparison of spectral variation from chip to chip with variation from spot to spot on single chips.
biomarker validation, SELDI-TOF, serum proteomics
Liggett Jr, W.
, Barker, P.
, Semmes, O.
and Cazares, L.
Data-Driven and Peak-Based Feature Selection In Serum Protein Mass Spectrometry, Clinical Chemistry
(Accessed June 1, 2023)