degruyter.com uses cookies to store information that enables us to optimize our website and make browsing more comfortable for you. To learn more about the use of cookies, please read our Privacy Policy. OK

A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks

Markus Ruschhaupt 1 , Wolfgang Huber 2 , Annemarie Poustka 3 ,  and Ulrich Mansmann 4
  • 1 German Cancer Research Centre
  • 2 German Cancer Research Center, Heidelberg, Germany
  • 3 German Cancer Research Centre
  • 4 University of Heidelberg

We demonstrate a concept and implementation of a compendium for the classification of high-dimensional data from microarray gene expression profiles. A compendium is an interactive document that bundles primary data, statistical processing methods, figures, and derived data together with the textual documentation and conclusions. Interactivity allows the reader to modify and extend these components. We address the following questions: how much does the discriminatory power of a classifier depend on the choice of the algorithm that was used to identify it; what alternative classifiers could be used just as well; how robust is the result. The answers to these questions are essential prerequisites for validation and biological interpretation of the classifiers. We show how to use this approach by looking at these questions for a specific breast cancer microarray data set that first has been studied by Huang et al. (2003).

    • compHuang_1.0.4.tar.gz
    • compHuang_1.0.4.zip
    • MCRestimate_1.0.9.tar.gz
    • MCRestimate_1.0.9.zip
Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

SAGMB publishes significant research on the application of statistical ideas to problems arising from computational biology. The range of topics includes linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarrary data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies.

Search