Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter May 11, 2019

Reproducibility of biomarker identifications from mass spectrometry proteomic data in cancer studies

  • Yulan Liang EMAIL logo , Adam Kelemen and Arpad Kelemen

Abstract

Reproducibility of disease signatures and clinical biomarkers in multi-omics disease analysis has been a key challenge due to a multitude of factors. The heterogeneity of the limited sample, various biological factors such as environmental confounders, and the inherent experimental and technical noises, compounded with the inadequacy of statistical tools, can lead to the misinterpretation of results, and subsequently very different biology. In this paper, we investigate the biomarker reproducibility issues, potentially caused by differences of statistical methods with varied distribution assumptions or marker selection criteria using Mass Spectrometry proteomic ovarian tumor data. We examine the relationship between effect sizes, p values, Cauchy p values, False Discovery Rate p values, and the rank fractions of identified proteins out of thousands in the limited heterogeneous sample. We compared the markers identified from statistical single features selection approaches with machine learning wrapper methods. The results reveal marked differences when selecting the protein markers from varied methods with potential selection biases and false discoveries, which may be due to the small effects, different distribution assumptions, and p value type criteria versus prediction accuracies. The alternative solutions and other related issues are discussed in supporting the reproducibility of findings for clinical actionable outcomes.

  1. Conflict of interest statement: The authors declare no conflicts of interest regarding this work.

References

Allison, D. B., A. W. Brown, B. J. George and K. A. Kaiser (2016): “Reproducibility: a tragedy of errors,” Nature, 530, 27–29.10.1038/530027aSearch in Google Scholar PubMed PubMed Central

Baggerly, K. A., J. S. Morris, S. R. Edmonson and K. R. Coombes (2005a): “Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer,” J. Natl. Cancer Inst., 97, 307–309.10.1093/jnci/dji008Search in Google Scholar PubMed

Baggerly, K. A., K. R. Coombes and J. S. Morris (2005b): “Bias, randomization, and ovarian proteomic data: a reply to ‘producers and consumers’,” Cancer Inform., 1, 9–14.10.1177/117693510500100101Search in Google Scholar

Baggerly, K. A. and K. R. Coombes (2009): “Deriving chemosensitivity from cell lines: forensic bioinformatics and reproducible research in high-throughput biology,” Ann. Appl. Stat., 3, 1309–1334.10.1214/09-AOAS291Search in Google Scholar

Ballman, K. V. (2015): “Biomarker: predictive or prognostic?” J. Clin. Oncol., 33, 3968–3971.10.1200/JCO.2015.63.3651Search in Google Scholar PubMed

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. R. Stat. Soc. B, 57, 289–300.10.1111/j.2517-6161.1995.tb02031.xSearch in Google Scholar

Benjamini, Y. and Y. Daniel (2001): “The control of the false discovery rate in multiple testing under dependency,” Ann. Stat., 29, 1165–1188.10.1214/aos/1013699998Search in Google Scholar

Benjamini, Y., D. Yekutieli, D. Edwards, J. P. Shaffer, A. C. Tamhane, P. H. Westfall and B. Holland (2005): “False discovery rate controlling confidence intervals for selected parameters,” J. Am. Stat. Assoc., 100, 71–80.10.1198/016214504000001907Search in Google Scholar

Brenner, D. E. and D. P. Normolle (2007): “Biomarkers for cancer risk, early detection, and prognosis: the validation conundrum,” Cancer Epidemiol. Biomarkers Prev., 16, 1918–1920.10.1158/1055-9965.EPI-07-2619Search in Google Scholar PubMed

Cancer Genome Atlas Research Network (2011): “Integrated genomic analyses of ovarian carcinoma,” Nature, 474, 609–615.10.1038/nature10166Search in Google Scholar PubMed PubMed Central

Carrillo, R. E., T. C. Aysal and K. E. Barner (2010): “A generalized cauchy distribution framework for problems requiring robust behavior,” EURASIP J. Adv. Signal Process., 2010, 312989.10.1155/2010/312989Search in Google Scholar

Chambers, M. C., B. Maclean, R. Burke, D. Amodei, D. L. Ruderman, S. Neumann, L. Gatto, B. Fischer, B. Pratt, J. Egertson, K. Hoff, D. Kessner, N. Tasman, N. Shulman, B. Frewen, T. A. Baker, M.-Y. Brusniak, C. Paulse, D. Creasy, L. Flashner, K. Kani, C. Moulding, S. L. Seymour, L. M. Nuwaysir, B. Lefebvre, F. Kuhlmann, J. Roark, P. Rainer, S. Detlev, T. Hemenway, A. Huhmer, J. Langridge, B. Connolly, T. Chadick, K. Holly, J. Eckels, E. W. Deutsch, R. L. Moritz, J. E. Katz, D. B. Agus, M. MacCoss, D. L. Tabb and P. Mallick (2012): “A cross-platform toolkit for mass spectrometry and proteomics,” Nat. Biotechnol., 30, 918–920.10.1038/nbt.2377Search in Google Scholar PubMed PubMed Central

Colquhoun, D. (2014): “An investigation of the false discovery rate and the misinterpretation of p-values,” R. Soc. Open Sci., 1, 140216.10.1098/rsos.140216Search in Google Scholar PubMed PubMed Central

Colquhoun, D. (2017): “The reproducibility of research and the misinterpretation of p-values,” R. Soc. Open Sci., 4(12). DOI: 10.1098/rsos.10.1098/rsosSearch in Google Scholar

Crutchfield, C. A., S. N. Thomas, L. J. Sokoll and D. W. Chan (2016): “Advances in mass spectrometry-based clinical biomarker discovery,” Clin. Proteomics., 13, 1.10.1186/s12014-015-9102-9Search in Google Scholar PubMed PubMed Central

Deutsch, E. W., J. P. Albar, P. A. Binz, M. Eisenacher, A. R. Jones, G. Mayer, G. S. Omenn, S. Orchard, J. A. Vizcaíno and H. Hermjakob (2015): “Development of data representation standards by the human proteome organization proteomics standards initiative,” J. Am. Med. Inform. Assoc., 22, 495–506.10.1093/jamia/ocv001Search in Google Scholar PubMed PubMed Central

Glaab, E. and R. Schneider (2015): “RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis,” Bioinformatics, 31, 2235–2237.10.1093/bioinformatics/btv127Search in Google Scholar PubMed PubMed Central

Goh, W. W. and L. Wong (2016): “Evaluating feature-selection stability in next-generation proteomics,” J. Bioinform. Comput. Biol., 14, 1650029.10.1142/S0219720016500293Search in Google Scholar PubMed

Heberle, H., G. V. Meirelles, F. R. da Silva, G. P. Telles and R. Minghim (2015): “InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams,” BMC Bioinformatics, 16, 169.10.1186/s12859-015-0611-3Search in Google Scholar PubMed PubMed Central

Holman, J. D., D. L. Tabb and P. Mallick (2014): “Employing proteowizard to convert raw mass spectrometry data,” Curr. Protoc. Bioinformatics, 46 (13.24): 1– 9 .10.1002/0471250953.bi1324s46Search in Google Scholar PubMed PubMed Central

Horikoshi, N., J. Cong, N. Kley and T. Shenk (1999): “Isolation of differentially expressed cDNAs from p53-dependent apoptotic cells: activation of the human homologue of the Drosophila peroxidasin gene,” Biochem. Biophys. Res. Commun., 261, 864–869.10.1006/bbrc.1999.1123Search in Google Scholar

Hrydziuszko, O. and M. R. Viant (2012): “Missing values in mass spectrometry based metabolomics, an undervalued step in the data processing pipeline,” Metabolomics, 8, 161–174.10.1007/s11306-011-0366-4Search in Google Scholar

Huber, P. J. and E. M. Ronchetti (2009): Robust statistics. Second edition, Hoboken, USA: Wiley.10.1002/9780470434697Search in Google Scholar

Ioannidis, J. P. A. (2005): “Why most published research findings are false,” PLoS Med., 2, 696–701.Search in Google Scholar

Ioannidis, J. P. and M. J. Khoury (2011): “Improving validation practices in ‘omics’ research,” Science, 334, 1230–1232.10.1126/science.1211811Search in Google Scholar

Klimberg, R. and B. D. McCullough (2018): Fundamentals of predictive analytics with JMP. ISBN-13: 978-1629598567.Search in Google Scholar

Kveine, M., E. Tenstad, G. Dosen, S. Funderud and E. Rian (2002): “Characterization of the novel human transmembrane protein 9 (TMEM9) that localizes to lysosomes and late endosomes,” Biochem. Biophys. Res. Commun., 297, 912–917.10.1016/S0006-291X(02)02228-3Search in Google Scholar

Lazar, C., L. Gatto, M. Ferro, C. Bruley and T. Burger (2016): “Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies,” J. Proteome Res., 15, 1116–1125.10.1021/acs.jproteome.5b00981Search in Google Scholar PubMed

Li, Q., J. B. Brown, H. Huang and P. J. Bickel (2011): “Measuring reproducibility of high-throughput experiments,” Ann. Appl. Stat., 5, 1752–1779.10.1214/11-AOAS466Search in Google Scholar

Liang, Y., and A. Kelemen (2018): “Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications,” J. Briefs Bioinformatics, 19(5): 1051–1068.10.1093/bib/bbx036Search in Google Scholar PubMed

Liang, Y., B. Tayo, X. Cai and A. Kelemen (2005): “Differential and trajectory methods for time course gene expression data,” Bioinformatics, 20, 3009–3016.10.1093/bioinformatics/bti465Search in Google Scholar PubMed PubMed Central

Liang, Y., A. Kelemen and B. O. Tayo (2007): “Model based or algorithms based? Gene expression based statistical methods to find evidence of diabetes,” J. Stat. Methods Med. Res., 16, 139–153.10.1177/0962280206071927Search in Google Scholar PubMed

Lo, A., H. Chernoff, T. Zheng and S. Lo (2016): “Framework for making better predictions by directly estimating variables’ predictivity,” Proc. Natl. Acad. Sci. USA, 113, 14277–14282.10.1073/pnas.1616647113Search in Google Scholar PubMed PubMed Central

Marino, M. J. (2014): “The use and misuse of statistical methodologies in pharmacology research,” Biochem Pharmacol., 87, 78–92.10.1016/j.bcp.2013.05.017Search in Google Scholar PubMed

McDermott, J. E., J. Wang, H. Mitchell, B. J. Webb-Robertson, R. Hafen, J. Ramey and K. D. Rodland (2013): “Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data,” Expert Opin. Med. Diagn., 7, 37–51.10.1517/17530059.2012.718329Search in Google Scholar PubMed PubMed Central

Mehta, S., A. Shelling, A. Muthukaruppan, A. Lasham, C. Blenkiron, G. Laking and C. Pring (2010): “Predictive and prognostic molecular markers for cancer medicine,” Ther. Adv. Med. Oncol., 2, 125–148.10.1177/1758834009360519Search in Google Scholar PubMed PubMed Central

Mertins, P., N. D. Udeshi, K. R. Clauser, D. R. Mani, J. Patel, S. E. Ong, J. D. Jaffe and S. A. Carr (2012): “iTRAQ labeling is superior to mTRAQ for quantitative global proteomics and phosphoproteomics,” Mol. Cell Proteomics, 11, M111.014423.10.1074/mcp.M111.014423Search in Google Scholar PubMed PubMed Central

Mertins, P., L. C. Tang, K. Krug, D. J. Clark, M. A. Gritsenko, L. Chen, K. R. Clauser, T. R. Clauss, P. Shah, M. A. Gillette, V. A. Petyuk, S. N. Thomas, D. R. Mani, F. Mundt, R. J. Moore, Y. Hu, R. Zhao, M. Schnaubelt, H. Keshishian, M. E. Monroe, Z. Zhang, N. D. Udeshi, D. Mani, S. R. Davies, R. Reid Townsend, D. W. Chan, R. D. Smith, H. Zhang, T. Liu and S. A. Carr. (2018): “Reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography–mass spectrometry,” Nat. Protoc., 13, 1632–1661.10.1038/s41596-018-0006-9Search in Google Scholar PubMed PubMed Central

Mitchell, M. S., J. Kan-Mitchell, B. Minev, C. Edman and R. J. Deans (2000): “A novel melanoma gene (MG50) encoding the interleukin 1 receptor antagonist and six epitopes recognized by human cytolytic T lymphocytes,” Cancer Res., 60, 6448–6456.Search in Google Scholar

Morris, J. S. (2012): “Statistical methods for proteomic biomarker discovery based on feature extraction or functional modeling approaches,” Stat. Interface, 5, 117–135.10.4310/SII.2012.v5.n1.a11Search in Google Scholar PubMed PubMed Central

Müller, F., L. Fischer, Z. A. Chen, T. Auchynnikava and J. Rappsilber (2017): “On the reproducibility of label-free quantitative cross-linking/mass spectrometry,” J. Am. Soc. Mass Spectrometr., 29, 405–412.10.1007/s13361-017-1837-2Search in Google Scholar PubMed PubMed Central

Neumann, U., N. Genze and D. Heider (2017): “EFS: an ensemble feature selection tool implemented as R-package and web-application,” Biodata Mining, 10, 21.10.1186/s13040-017-0142-8Search in Google Scholar PubMed PubMed Central

Schmitt, P., J. Mandel and M. Guedj (2015): “A comparison of six methods for missing data imputation,” J. Biomet. Biostat., 6, 224.Search in Google Scholar

Simon, R. (2015): “Sensitivity, specificity, PPV, and NPV for predictive biomarkers,” J. Natl. Cancer Inst., 107(8). DOI: 10.1093/jnci/djv153.10.1093/jnci/djv153Search in Google Scholar PubMed PubMed Central

Shannon, P., A. Markiel, O. Ozier, N. S. Baliga, J. T. Wang, D. Ramage, N. Amin, B. Schwikowski and T. Ideker (2003): “Cytoscape: a software environment for integrated models of biomolecular interaction networks,” Genome Res., 13, 2498–2504.10.1101/gr.1239303Search in Google Scholar PubMed PubMed Central

Soric, B. (1989): “Statistical discoveries and effect-size estimation,” J. Am. Med. Assoc., 84, 608–610.10.2307/2289950Search in Google Scholar

Sugden, L. A., M. R. Tackett, Y. A. Savva, W. A. Thompson and C. E. Lawrence (2013): “Assessing the validity and reproducibility of genome-scale predictions,” Bioinformatics, 29, 2844–2851.10.1093/bioinformatics/btt508Search in Google Scholar PubMed PubMed Central

Swiatly, A., S. Plewa, J. Matysiak and Z. J. Kokot (2018): “Mass spectrometry-based proteomics techniques and their application in ovarian cancer research,” J. Ovarian Res., 11, 88.10.1186/s13048-018-0460-6Search in Google Scholar PubMed PubMed Central

Tabb, D. L., L. Vega-Montoto, P. A. Rudnick, A. M. Variyath, A. J. Ham, D. M. Bunk, L. E. Kilpatrick, D. D. Billheimer, R. K. Blackman, H. L. Cardasis, S. A. Carr, K. R. Clauser, J. D. Jaffe, K. A. Kowalski, T. A. Neubert, F. E. Regnier, B. Schilling, T. J. Tegeler, M. Wang, P. Wang, J. R. Whiteaker, L. J. Zimmerman, S. J. Fisher, B. W. Gibson, C. R. Kinsinger, M. Mesri, H. Rodriguez, S. E. Stein, P. Tempst, A. G. Paulovich, D. C. Liebler and C. Spiegelman (2010): “Repeatability and reproducibility in proteomic identifications by liquid chromatography-tandem mass spectrometry,” J. Proteome Res., 9, 761–776.10.1021/pr9006365Search in Google Scholar PubMed PubMed Central

Walsh, C. S. (2015): “Two decades beyond BRCA1/2: homologous recombination, hereditary cancer risk and a target for ovarian cancer therapy,” Gynecol. Oncol., 137, 343–350.10.1016/j.ygyno.2015.02.017Search in Google Scholar PubMed

Wang, X., G. A. Anderson, R. D. Smith and A. R. Dabney (2017a): “A hybrid approach to protein differential expression in mass spectrometry-based proteomics,” Bioinformatics, 28, 1586–1591.10.1093/bioinformatics/bts193Search in Google Scholar PubMed PubMed Central

Wang, J., Z. Ma, S. A. Carr, P. Mertins, H. Zhang, Z. Zhang, D. W. Chan, M. J. C. Ellis, R. R. Townsend, R. D. Smith, J. E. McDermott, X. Chen, A. G. Paulovich, E. S. Boja, M. Mesri, C. R. Kinsinger, H. Rodriguez, K. D. Rodland, D. C. Liebler and B. Zhang (2017b): “Proteome profiling outperforms transcriptome profiling for co-expression based gene function prediction,” Mol. Cell Proteomics, 16, 121–134.10.1074/mcp.M116.060301Search in Google Scholar PubMed PubMed Central

Wang, W., A. C.-H. Sue and W. W. Goh (2017c): “Feature selection in clinical proteomics: with great power comes great reproducibility,” Drug Discov. Today, 22, 912–918.10.1016/j.drudis.2016.12.006Search in Google Scholar PubMed

Wei, R, J. Wang, M. Su, E. Jia, S. Chen, T. Chen and Y. Ni (2018): “Missing value imputation approach for mass spectrometry-based metabolomics data,” Sci. Rep., 8, 663.10.1038/s41598-017-19120-0Search in Google Scholar PubMed PubMed Central

Wiemann, S., B. Weil, R. Wellenreuther, J. Gassenhuber, S. Glassl, W. Ansorge, M. Boecher, H. Bloecker, S. Bauersachs, H. Blum, J. Lauber, A. Düsterhöft, A. Beyer, K. Köhrer, N. Strack, H. W. Mewes, B. Ottenwälder, B. Obermaier, J. Tampe, D. Heubner, R. Wambutt, B. Korn, M. Klein and A. Poustka (2001): “Towards a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs,” Genome Res., 11, 422–435.10.1101/gr.154701Search in Google Scholar

Zanivan, S., F. Maione, M. Y. Hein, J. R. Hernandez-Fernaud, P. Ostasiewicz, E. Giraudo and M. Mann (2013): “SILAC-based proteomics of human primary endothelial cell morphogenesis unveils tumor angiogenic markers,” Mol. Cell Proteomics, 12, 3599–3611.10.1074/mcp.M113.031344Search in Google Scholar PubMed PubMed Central

Zhang, Z. and D. W. Chan (2010): “The road from discovery to clinical diagnostics: lessons learned from the first FDA-cleared in vitro diagnostic multivariate index assay of proteomic biomarkers,” Cancer Epidemiol. Biomarkers Prevent., 19, 2995–2999.10.1158/1055-9965.EPI-10-0580Search in Google Scholar PubMed PubMed Central

Zhang, H., T. Liu, Z. Zhang, S. H. Payne, B. Zhang, J. E. McDermott, J. Zhou, V. A. Petyuk, L. Chen, D. Ray, S. Sun, F. Yang, L. Chen, J. Wang, P. Shah, S.-W. Cha, P. Aiyetan, S. Woo, Y. Tian, M. A. Gritsenko, C. Choi, M. E. Monroe, S. Thomas, R. J. Moore, K.-H. Yu, D. L. Tabb, D. Fenyoì, V. Bafna, Y. Wang, H. Rodriguez, E. S. Boja, T. Hiltke, R. C. Rivers, L. Sokoll, H. Zhu, I.-M. Shih, A. Pandey, B. Zhang, M. P. Snyder, D. A. Levine, R. D. Smith, D. W. Chan, K. D. Rodland, K. D. Rodland and the CPTAC investigators, (2016): “Deep proteogenomic characterization of human ovarian cancer,” Cell, 166, 755–765.10.1016/j.cell.2016.05.069Search in Google Scholar PubMed PubMed Central

Zumbo, B. D. and M. J. Jennings (2002): “The robustness of validity and efficiency of the related samples t-test in the presence of outliers,” Psicológica, 23, 415–450.Search in Google Scholar


Supplementary Material

The online version of this article offers supplementary material (DOI: https://doi.org/10.1515/sagmb-2018-0039).


Published Online: 2019-05-11

©2019 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 28.3.2024 from https://www.degruyter.com/document/doi/10.1515/sagmb-2018-0039/html
Scroll to top button