Jump to ContentJump to Main Navigation
Show Summary Details

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR increased in 2015: 1.265
5-year IMPACT FACTOR: 1.423
Rank 42 out of 123 in category Statistics & Probability in the 2015 Thomson Reuters Journal Citation Report/Science Edition

SCImago Journal Rank (SJR) 2015: 0.954
Source Normalized Impact per Paper (SNIP) 2015: 0.554
Impact per Publication (IPP) 2015: 1.061

Mathematical Citation Quotient (MCQ) 2015: 0.06

See all formats and pricing
Volume 11, Issue 3 (Feb 2012)

Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry

Stephen O. Nyangoma
  • Centre for Molecular Medicine, Ninewells Hospital, and University of Dundee
/ Stuart I. Collins
  • Cancer Research UK and University of Birmingham
/ Douglas G. Altman
  • University of Oxford
/ Philip Johnson
  • Cancer Research UK and University of Birmingham
/ Lucinda J. Billingham
  • Cancer Research UK and University of Birmingham
Published Online: 2012-02-10 | DOI: https://doi.org/10.1515/1544-6115.1686

In cancer clinical proteomics, MALDI and SELDI profiling are used to search for biomarkers of potentially curable early-stage disease. A given number of samples must be analysed in order to detect clinically relevant differences between cancers and controls, with adequate statistical power. From clinical proteomic profiling studies, expression data for each peak (protein or peptide) from two or more clinically defined groups of subjects are typically available. Typically, both exposure and confounder information on each subject are also available, and usually the samples are not from randomized subjects. Moreover, the data is usually available in replicate. At the design stage, however, covariates are not typically available and are often ignored in sample size calculations. This leads to the use of insufficient numbers of samples and reduced power when there are imbalances in the numbers of subjects between different phenotypic groups. A method is proposed for accommodating information on covariates, data imbalances and design-characteristics, such as the technical replication and the observational nature of these studies, in sample size calculations. It assumes knowledge of a joint distribution for the protein expression values and the covariates. When discretized covariates are considered, the effect of the covariates enters the calculations as a function of the proportions of subjects with specific attributes. This makes it relatively straightforward (even when pilot data on subject covariates is unavailable) to specify and to adjust for the effect of the expected heterogeneities. The new method suggests certain experimental designs which lead to the use of a smaller number of samples when planning a study. Analysis of data from the proteomic profiling of colorectal cancer reveals that fewer samples are needed when a study is balanced than when it is unbalanced, and when the IMAC30 chip-type is used. The method is implemented in the clippda package and is available in R at: http://www.bioconductor.org/help/bioc-views/release/bioc/html/clippda.html.

Keywords: sample size calculations; data imbalance; heterogeneity; covariates; technical replicates; observational study; expected Fisher information; cancer; clinical proteomics; SELDI; designing a proteomic profiling experiment

About the article

Published Online: 2012-02-10

Citation Information: Statistical Applications in Genetics and Molecular Biology, ISSN (Online) 1544-6115, DOI: https://doi.org/10.1515/1544-6115.1686. Export Citation

Comments (0)

Please log in or register to comment.
Log in