We present an approach to construct a classification rule based on the mass spectrometry data provided by the organizers of the "Classification Competition on Clinical Mass Spectrometry Proteomic Diagnosis Data." Before constructing a classification rule, we attempted to pre-process the data and to select features of the spectra that were likely due to true biological signals (i.e., peptides/proteins). As a result, we selected a set of 92 features. To construct the classification rule, we considered eight methods for selecting a subset of the features, combined with seven classification methods. The performance of the resulting 56 combinations was evaluated by using a cross-validation procedure with 1000 re-sampled data sets. The best result, as indicated by the lowest overall misclassification rate, was obtained by using the whole set of 92 features as the input for a support-vector machine (SVM) with a linear kernel. This method was therefore used to construct the classification rule. For the training data set, the total error rate for the classification rule, as estimated by using leave-one-out cross-validation, was equal to 0.16, with the sensitivity and specificity equal to 0.87 and 0.82, respectively.

Editor-in-Chief: Stumpf, Michael P.H.
Editorial Board Member: Beaumont, Mark / Binder, Harald / Gupta, Mayetri / Hubbard, Alan E. / Husmeier, Dirk / Ji, Hongkai / Keles, Sunduz / Kerr, Kathleen / Lazzeroni, Laura / Lin, Shili / Ma, Ping / Marjoram, Paul / Mertens, Bart / Nerman, Olle / G. Petretto, Enrico / Plagnol, Vincent / Purdom, Elizabeth / Robin, Stéphane / Rzhetsky, Andrey / Sanguinetti, Guido / van der Laan, Mark J. / von Haeseler, Arndt / Weeks, Daniel E. / Wiuf, Carsten / Zhao, Hongyu
6 Issues per year
IMPACT FACTOR 2011: 1.517
5-year IMPACT FACTOR: 1.704
Rank 27 out of 116 in category Statistics & Probability in the 2011 Thomson Reuters Journal Citation Report/Science Edition
Issues
Volume 12 (2013)
Volume 11 (2012)
Volume 10 (2011)
Volume 9 (2010)
Volume 8 (2009)
Volume 7 (2008)
Volume 6 (2007)
Volume 5 (2006)
Volume 4 (2005)
Volume 3 (2004)
Volume 2 (2003)
Volume 1 (2002)
Most Downloaded Articles
- A General Framework for Weighted Gene Co-Expression Network Analysis by Zhang, Bin and Horvath, Steve
- Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments by Smyth, Gordon K
- Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates by Lund, Steven P./ Nettleton, Dan/ McCarthy, Davis J. and Smyth, Gordon K.
- Adjusting for Spurious Gene-by-Environment Interaction Using Case-Parent Triads by Shin, Ji-Hyung/ Infante-Rivard, Claire/ Graham, Jinko and McNeney, Brad
- A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics by Schäfer, Juliane and Strimmer, Korbinian
A Cross-Validation Study to Select a Classification Procedure for Clinical Diagnosis Based on Proteomic Mass Spectrometry
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
1Hasselt University, Center for Statistics
Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 7, Issue 2, Pages –, ISSN (Online) 1544-6115, DOI: 10.2202/1544-6115.1363, March 2008
- Published Online:
- 2008-03-24
Keywords: proteomic MALDI-TOFMS preprocessing; feature selection; two-stage cross-validation; classification for clinical diagnosis


















Comments (0)