Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions

  • 1 Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, K1H 8M5, Canada
David R. Bickel

Abstract

Multiple comparison procedures that control a family-wise error rate or false discovery rate provide an achieved error rate as the adjusted p-value or q-value for each hypothesis tested. However, since achieved error rates are not understood as probabilities that the null hypotheses are true, empirical Bayes methods have been employed to estimate such posterior probabilities, called local false discovery rates (LFDRs) to emphasize that their priors are unknown and of the frequency type. The main approaches to LFDR estimation, relying either on fully parametric models to maximize likelihood or on the presence of enough hypotheses for nonparametric density estimation, lack the simplicity and generality of adjusted p-values. To begin filling the gap, this paper introduces simple methods of LFDR estimation with proven asymptotic conservatism without assuming the parameter distribution is in a parametric family. Simulations indicate that they remain conservative even for very small numbers of hypotheses. One of the proposed procedures enables interpreting the original FDR control rule in terms of LFDR estimation, thereby facilitating practical use of the former. The most conservative of the new procedures is applied to measured abundance levels of 20 proteins.

  • Abadir, K. (2005): “The mean-median-mode inequality: counterexamples,” Economet. Theory, 21(2), 477–482.

  • Basu, S. and A. Dasgupta (1997): “The mean, median, and mode of unimodal distributions: a characterization,” Theor. Probab. Appl+, 41(2), 210–223.

    • Crossref
  • Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. B, 57, 289–300.

  • Bickel, D. R. (2011a): “Estimating the null distribution to adjust observed confidence levels for genome-scale screening,” Biometrics, 67, 363–370.

    • Crossref
  • Bickel, D. R. (2011b): Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison. Technical Report, Ottawa Institute of Systems Biology, arXiv:1104.0341.

  • Bickel, D. R. (2012a): “Coherent frequentism: a decision theory based on confidence sets,” Commun. Stat. Theory, 41, 1478–1496.

    • Crossref
  • Bickel, D. R. (2012b): “Empirical Bayes interval estimates that are conditionally equal to unadjusted confidence intervals or to default prior credibility intervals,” Stat. Applications Genet. Mol. Biol., 11(3), art.7.

    • Crossref
  • Clopper, C. J. and E. S. Pearson (1934): “The use of confidence or fiducial limits illustrated in the case of the binomial,” Biometrika, 26, 404–413.

    • Crossref
  • Dudoit, S. and M. J. van der Laan (2008): Multiple testing procedures with applications to genomics, New York: Springer.

    • Crossref
  • Edwards, A. W. F. (1992): Likelihood, Baltimore: Johns Hopkins Press.

  • Edwards, D., L. Wang, and P. Sørensen (2012): “Network-enabled gene expression analysis,” BMC Bioinformatics, 13(art. 167).

    • Crossref
  • Efron, B. (1986): “Why isn′t everyone a Bayesian,” Am. Stat., 40(1), 1–5.

  • Efron, B. (2004): “Large-scales imultaneous hypothesis testing: the choice of a null hypothesis,” J. Am. Stat. Assoc., 99, 96–104.

  • Efron, B. (2010a): “Correlated z-values and the accuracy of large-scale statistical estimates,” J. Am. Stat. Assoc., 105, 1042–1055.

    • Crossref
  • Efron, B. (2010b): Large-scale inference: empirical bayes methods for estimation, testing, and prediction, Cambridge: Cambridge University Press.

    • Crossref
  • Efron, B. (2010c): “Rejoinder to comments on B. Efron, “Correlated z-values and the accuracy of large-scale statistical estimates,”” J. Am. Stat. Assoc., 105, 1067–1069.

    • Crossref
  • Efron, B. and R. Tibshirani (2002): “Empirical Bayes methods and false discovery rates for microarrays,” Genet. Epidemiol., 23, 70–86.

  • Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001): “Empirical Bayes analysis of a microarray experiment,” J. Am. Stat. Assoc., 96, 1151–1160.

  • Fisher, R. A. (1973): Statistical methods and scientific inference, New York: Hafner Press.

  • Gentleman, R. C., V. J. Carey, D. M. Bates, et al., (2004): “Bioconductor: open software development for computational biology and bioinformatics,” Genome Biol., 5, R80.

  • Hald, A. (2007): A history of parametric statistical inference from bernoulli to fisher, New York: Springer, 1713–1935.

  • Kyburg, H. E. and C. M. Teng (2006): “Non monotonic logic and statistical inference,” Comput. Intell. 22, 26–51.

  • Li, X. (2009): ProData. Bioconductor.org documentation for the ProData package. http://www.bioconductor.org/packages/2.12/data/experiment/html/ProData.html.

  • Morris, C. N. (1983a): “Parametric empirical Bayes inference: theory and applications,” J. Am. Stat. Assoc., 78, 47–55.

    • Crossref
  • Morris, C. N. (1983b): “Parametric empirical Bayes inference: theory and applications: rejoinder,” J. Am. Stat. Assoc., 78, 63–65.

    • Crossref
  • Morris, J. S., P. J. Brown, R. C. Herrick, K. A. Baggerly, and K.R. Coombes (2008): “Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models,” Biometrics, 64(2), 479–489.

    • Crossref
    • PubMed
  • Muralidharan, O. (2010): “An empirical Bayes mixture method for effect size and false discovery rate estimation,” Ann. Appl. Stat., 4, 422–438.

  • Padilla, M. and D. R. Bickel (2012): “Empirical Bayes methods corrected for small numbers of tests,” Stat. Applications Genet. Mol. Biol., 11(5), art.4.

    • Crossref
  • R Development Core Team (2008): R:a language and environment for statistical computing, Vienna, Austria: R foundation for statistical computing.

  • Singh, K., M. Xie, and W. E. Strawderman (2007): “Confidence distribution (CD)–distribution estimator of a parameter,” IMS Lecture Notes Monograph Series, 54, 132–150.

    • Crossref
  • Storey, J. D. (2002): “A direct approach to false discovery rates,” J. Roy. Stat. Soc. B, 64, 479–498.

  • Westfall, P. H. (2010): “Comment on B. Efron,“Correlated z-values and the accuracy of large-scale statistical estimates,”” J. Am. Stat. Assoc., 105, 1063–1066.

  • Westfall, P. H. and S. S. Young (1993): Resampling-Based Multiple Testing. Hoboken: John Wiley & Sons.

  • Whittemore, A. S. (2007): “A Bayesian false discovery rate for multiple testing,” J. Appl. Stat., 34(1), 1–9.

    • Crossref
  • Wilkinson, G. N. (1977): “On resolving the controversy instatistical inference(with discussion),” J. Roy. Stat. Soc. B, 39, 119–171.

  • Yuan, B. (2009): “Bayesian frequentist hybrid inference,” Ann. Stat., 37, 2458–2501.

Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


or
Log in with your institution

Journal + Issues

SAGMB publishes significant research on the application of statistical ideas to problems arising from computational biology. The range of topics includes linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarrary data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies.

Search