In this paper, we introduce a new family of power transformations, which has the generalized logarithm as one of its members, in the same manner as the usual logarithm belongs to the family of Box-Cox power transformations. Although the new family has been developed for analyzing gene expression data, it allows a wider scope of mean-variance related data to be reached. We study the analytical properties of the new family of transformations, as well as the mean-variance relationships that are stabilized by using its members. We propose a methodology based on this new family, which includes a simple strategy for selecting the family member adequate for a data set. We evaluate the finite sample behavior of different classical and robust estimators based on this strategy by Monte Carlo simulations. We analyze real genomic data by using the proposed transformation to empirically show how the new methodology allows the variance of these data to be stabilized.
Barros, M., G. A. Paula and V. Leiva (2009): “An R implementation for generalized Birnbaum-Saunders distributions,” Comp. Stat. Data Anal., 53, 1511–1528.
Bengtsson, H. and O. Hössjer (2006): “Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method,” BMC Bioinformatics, 7:100.
Box, G. E. P. and D. R. Cox (1964): “An analysis of transformations,” J. Roy. Stat. Soc. B, 26, 211–251.
Cui, X., M. K. Kerr and G. A. Churchill (2003): “Transformations for cDNA microarray data,” Stat. Appl. Genet. Mol. Biol., 2(1), Article 4.
Durbin, B. P., J. S. Hardin, D. M. Hawkins and D. M. Rocke (2002): “A variance-stabilizing transformation for gene-expression microarray data,” Bioinformatics, 18, S105–S110.
Emmerson, J. D. and M. A. Stoto (1987): Transforming data. In: Hoaglin, D. C., Mosteller, F., Tukey, J. W. (Eds.), Understanding Robust and Exploratory Data Analysis, Wiley, New York, pp. 65–104.
Galton, F. (1879): “The geometric mean, in vital and social statistics,” Proc. Royal Soc., 29, 365–367.
Gibrat, R. (1930): Les Inegalités Économiques, Sirey, Paris.
Hawkins, D. M. (2002): “Diagnostics for conformity of paired quantitative measurements,” Stat. Med., 21, 1913–1935.
Huang, S. and Y. Qu (2006): “The loss in power when the test of differential expression is performed under a wrong scale,” J. Comp. Biol., 13, 786–797.
Huber, P. J. (1987): Robust Statistics, Wiley, New York.
Huber, W., A. Heydebreck, H. Sültmann, A. Poustka and M. Vingron (2002): “Variance stabilization applied to microarray data calibration and to the quantification of differential expression,” Bioinformatics, 18(Suppl. 1), S96–S104.
Huber, W., A. Heydebreck, H. Sueltmann, A. Poustka and M. Vingron (2003): “Parameter estimation for the calibration and variance stabilization of microarray data,” Stat. Appl. Gen. Mol. Biol., 2(1), Article 3.
Johnson, N. L. (1949): “Systems of frequency curves generated by methods of translation,” Biometrika, 36, 149–176.
Johnson, N. L., S. Kotz and N. Balakrishnan (1994): Continuous Univariate Distributions, Wiley, New York.
Kapteyn, J. and M. J. van Uven (1916): Skew Frequency Curves in Biology and Statistics, Hoitsema Brothers, Groningen.
Kotz, S., V. Leiva and A. Sanhueza (2010): “Two new mixture models related to the inverse Gaussian distribution,” Meth. Comp. App. Prob., 12, 199–212.
Leiva, V., H. Hernández and A. Riquelme (2006): “A new package for the Birnbaum-Saunders distribution,” R J., 6, 35–40.
Leiva, V., H Hernández and A. Sanhueza (2008): “An R package for a general class of inverse Gaussian distributions,” J. Stat. Soft., 26, 1–21.
Leiva, V., A. Sanhueza, D. M. Kelmansky and E. J. Martínez (2009): “On the glog-normal distribution and its association with the gene expression problem,” Comp. Stat. Data Anal., 53, 1613–1621.
McAlister, D. (1879): “The law of the geometric mean,” Proc. Royal Soc., 29, 367–376.
Purdom, E. and S. P. Holmes (2005): “Error distribution for gene expression data,” Stat. Appl. Genet. Mol. Biol., 4(1), Article 16.
R Development Core Team (2013): R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, available at www.R-project.org..
Rocke, D. M. and B. Durbin (2001): “A model for measurement error for gene expression arrays,” J. Comp. Biol., 8, 557–569.
Rocke, D. M., and S. Lorenzato (1995): “A two-component model for measurement error in analytical chemistry,” Technometrics, 37, 176–184.
Rousseeuw, P. J., and A. M. Leroy (1987): Robust Regression and Outlier Detection, Wiley, New York.
Smyth, G. K. (2004): “Linear models and empirical Bayes methods for assessing differential expression in microarray experiments,” Stat. Appl. Genet. Mol. Biol., 3(1), Article 3.
Smyth, G. K., Y. H. Yang and T. Speed (2003): Statistical Issues in cDNA Microarray Data Analysis, Humana Press, Totowa, NJ.
Speed, T. (2003): Statistical Analysis of Gene Expression Data, Chapman & Hall, New York.
van den Berg, R. A., H. C. Hoefsloot, J. A. Westerhuis, A. K. Smilde and M. J. van der Werf (2006): “Centering, scaling, and transformations: improving the biological information content of metabolomics data,” BMC Genomics, 7, 142–147.
Wicksell, S. D. (1917): “On the genetic theory of frequency. Arkiv för Matematik,” Astronomi och Fysik, 12, 1–56.