A common experimental strategy utilizing microarrays is to develop a signature of genes responding to some treatment in a model system, and then ask whether the same genes respond in an analogous way in a more natural and uncontrolled environment. In statistical terms, the question posed is whether genes score similarly on some statistical test in two independent data sets. Approaches to this problem ignoring gene/gene correlations common to all microarray data sets are known to give overstated statistical confidence levels. Permutation approaches have been proposed to give more accurate confidence levels, but can not be applied when sample sizes are small. Here we argue that the product moment correlation between test statistics in the two experiments is an ideal measure for summarizing concordance between the experiments, as confidence levels accounting for intergene correlations depend only on a single number -- the average squared correlation between gene pairs in the data set. The resulting null standard deviation is shown to vary by less than a factor of two over six distinct experimental data sets, suggesting that a universal constant may be used for this quantity. We show how a hidden assumption of the permutation approach may lead to incorrect p-values, while the analytic approach presented here is shown to be resistant to this assumption.
©2011 Walter de Gruyter GmbH & Co. KG, Berlin/Boston