Jump to ContentJump to Main Navigation

Statistical Applications in Genetics and Molecular Biology

Editor-in-Chief: Stumpf, Michael P.H.

6 Issues per year

IMPACT FACTOR 2013: 1.055
Rank 48 out of 119 in category Statistics & Probability in the 2013 Thomson Reuters Journal Citation Report/Science Edition

VolumeIssuePage

Sample Size Calculations for Designing Clinical Proteomic Profiling Studies Using Mass Spectrometry

Stephen O. Nyangoma1 / Stuart I. Collins2 / Douglas G. Altman3 / Philip Johnson4 / Lucinda J. Billingham5

1Centre for Molecular Medicine, Ninewells Hospital, and University of Dundee

2Cancer Research UK and University of Birmingham

3University of Oxford

4Cancer Research UK and University of Birmingham

5Cancer Research UK and University of Birmingham

Citation Information: Statistical Applications in Genetics and Molecular Biology. Volume 11, Issue 3, ISSN (Online) 1544-6115, DOI: 10.1515/1544-6115.1686, February 2012

Publication History

Published Online:
2012-02-10

In cancer clinical proteomics, MALDI and SELDI profiling are used to search for biomarkers of potentially curable early-stage disease. A given number of samples must be analysed in order to detect clinically relevant differences between cancers and controls, with adequate statistical power. From clinical proteomic profiling studies, expression data for each peak (protein or peptide) from two or more clinically defined groups of subjects are typically available. Typically, both exposure and confounder information on each subject are also available, and usually the samples are not from randomized subjects. Moreover, the data is usually available in replicate. At the design stage, however, covariates are not typically available and are often ignored in sample size calculations. This leads to the use of insufficient numbers of samples and reduced power when there are imbalances in the numbers of subjects between different phenotypic groups. A method is proposed for accommodating information on covariates, data imbalances and design-characteristics, such as the technical replication and the observational nature of these studies, in sample size calculations. It assumes knowledge of a joint distribution for the protein expression values and the covariates. When discretized covariates are considered, the effect of the covariates enters the calculations as a function of the proportions of subjects with specific attributes. This makes it relatively straightforward (even when pilot data on subject covariates is unavailable) to specify and to adjust for the effect of the expected heterogeneities. The new method suggests certain experimental designs which lead to the use of a smaller number of samples when planning a study. Analysis of data from the proteomic profiling of colorectal cancer reveals that fewer samples are needed when a study is balanced than when it is unbalanced, and when the IMAC30 chip-type is used. The method is implemented in the clippda package and is available in R at: http://www.bioconductor.org/help/bioc-views/release/bioc/html/clippda.html.

Keywords: sample size calculations; data imbalance; heterogeneity; covariates; technical replicates; observational study; expected Fisher information; cancer; clinical proteomics; SELDI; designing a proteomic profiling experiment

Comments (0)

Please log in or register to comment.
Users without a subscription are not able to see the full content. Please, subscribe or login to access all content.