Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter June 21, 2013

Simple estimators of false discovery rates given as few as one or two p-values without strong parametric assumptions

David R. Bickel

Abstract

Multiple comparison procedures that control a family-wise error rate or false discovery rate provide an achieved error rate as the adjusted p-value or q-value for each hypothesis tested. However, since achieved error rates are not understood as probabilities that the null hypotheses are true, empirical Bayes methods have been employed to estimate such posterior probabilities, called local false discovery rates (LFDRs) to emphasize that their priors are unknown and of the frequency type. The main approaches to LFDR estimation, relying either on fully parametric models to maximize likelihood or on the presence of enough hypotheses for nonparametric density estimation, lack the simplicity and generality of adjusted p-values. To begin filling the gap, this paper introduces simple methods of LFDR estimation with proven asymptotic conservatism without assuming the parameter distribution is in a parametric family. Simulations indicate that they remain conservative even for very small numbers of hypotheses. One of the proposed procedures enables interpreting the original FDR control rule in terms of LFDR estimation, thereby facilitating practical use of the former. The most conservative of the new procedures is applied to measured abundance levels of 20 proteins.


Corresponding author: David R. Bickel, Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, Ontario, K1H 8M5, Canada

The Biobase (Gentleman et al., 2004) package of R (R Development Core Team, 2008) facilitated the computations. I thank an anonymous referee for critical comments that led to clearer communication of this research and especially its motivation. This research was partially supported by the Canada Foundation for Innovation, by the Ministry of Research and Innovation of Ontario, and by the Faculty of Medicine of the University of Ottawa.

Appendix A: Additional proofs

Proof of Lemma 1

With the trivial estimator

Proposition 2 implies that any random variable of the form

is a conservative estimator of

if the random variable
converges to
in probability. The estimators
and, for any C∈[0, 1],
are of that form with
and
respectively. The convergence of
to
is guaranteed by the weak law of large numbers. Since
is the median of the random variable that has SC(•; x) as its distribution function and since SC(•; x) is an asymptotic confidence distribution in the sense of Singh et al. (2007), a sufficient condition for its convergence to
is that fixed-level confidence intervals formed by SC(•; x) degenerate to a point as N→∞ (Singh et al., 2007, Theorem 3.1). That condition is met since SC(•; x) is defined by equation (10), consistent with the confidence intervals of Clopper and Pearson (1934). Thus, the conservatism of
and
are established.

Similarly, because

the conservatism of
follows from its convergence to
in probability. Since
as defined in equation (15) is a confidence posterior mean of
its convergence to
follows from the two conditions of Singh et al. (2007, Theorem 3.2):

  1. fixed-level confidence intervals formed by the asymptotic confidence distribution of

    degenerate to a point as N→∞;

  2. the confidence posterior variance

is bounded in probability.

The first condition results from the monotonicity between

and
in the integrand of equation (15), in which
is fixed, and the fact that, as argued above to establish the conservatism of
the degeneracy condition is met for SC(•; x), the asymptotic confidence distribution of
The second condition follows trivially from the fact that the domain of SC is [0, 1], thereby establishing the conservatism of

Proof of Theorem 1

Since Ψ(α)=E(ψ(Pi)|Piα, the nonnegative-skewness condition implies

Ψ(α)≥median (ψ(Pi)|Piα).

Thus, defining the variables

and
to be IID with Pi and P(i), respectively, for i=1, …, N,

almost surely. The monotonicity of ψ implies that, almost surely,

Because the conservatism of Ψ*(α) means limN∞Pr(Ψ*(α)≥Ψ(α))=1,

References

Abadir, K. (2005): “The mean-median-mode inequality: counterexamples,” Economet. Theory, 21(2), 477–482.Search in Google Scholar

Basu, S. and A. Dasgupta (1997): “The mean, median, and mode of unimodal distributions: a characterization,” Theor. Probab. Appl+, 41(2), 210–223.10.1137/S0040585X97975447Search in Google Scholar

Benjamini, Y. and Y. Hochberg (1995): “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” J. Roy. Stat. Soc. B, 57, 289–300.Search in Google Scholar

Bickel, D. R. (2011a): “Estimating the null distribution to adjust observed confidence levels for genome-scale screening,” Biometrics, 67, 363–370.10.1111/j.1541-0420.2010.01491.xSearch in Google Scholar

Bickel, D. R. (2011b): Small-scale inference: Empirical Bayes and confidence methods for as few as a single comparison. Technical Report, Ottawa Institute of Systems Biology, arXiv:1104.0341.Search in Google Scholar

Bickel, D. R. (2012a): “Coherent frequentism: a decision theory based on confidence sets,” Commun. Stat. Theory, 41, 1478–1496.10.1080/03610926.2010.543302Search in Google Scholar

Bickel, D. R. (2012b): “Empirical Bayes interval estimates that are conditionally equal to unadjusted confidence intervals or to default prior credibility intervals,” Stat. Applications Genet. Mol. Biol., 11(3), art.7.10.1515/1544-6115.1765Search in Google Scholar

Clopper, C. J. and E. S. Pearson (1934): “The use of confidence or fiducial limits illustrated in the case of the binomial,” Biometrika, 26, 404–413.10.1093/biomet/26.4.404Search in Google Scholar

Dudoit, S. and M. J. van der Laan (2008): Multiple testing procedures with applications to genomics, New York: Springer.10.1007/978-0-387-49317-6Search in Google Scholar

Edwards, A. W. F. (1992): Likelihood, Baltimore: Johns Hopkins Press.Search in Google Scholar

Edwards, D., L. Wang, and P. Sørensen (2012): “Network-enabled gene expression analysis,” BMC Bioinformatics, 13(art. 167).10.1186/1471-2105-13-167Search in Google Scholar

Efron, B. (1986): “Why isn′t everyone a Bayesian,” Am. Stat., 40(1), 1–5.Search in Google Scholar

Efron, B. (2004): “Large-scales imultaneous hypothesis testing: the choice of a null hypothesis,” J. Am. Stat. Assoc., 99, 96–104.Search in Google Scholar

Efron, B. (2010a): “Correlated z-values and the accuracy of large-scale statistical estimates,” J. Am. Stat. Assoc., 105, 1042–1055.10.1198/jasa.2010.tm09129Search in Google Scholar

Efron, B. (2010b): Large-scale inference: empirical bayes methods for estimation, testing, and prediction, Cambridge: Cambridge University Press.10.1017/CBO9780511761362Search in Google Scholar

Efron, B. (2010c): “Rejoinder to comments on B. Efron, “Correlated z-values and the accuracy of large-scale statistical estimates,”” J. Am. Stat. Assoc., 105, 1067–1069.10.1198/jasa.2010.tm09129Search in Google Scholar

Efron, B. and R. Tibshirani (2002): “Empirical Bayes methods and false discovery rates for microarrays,” Genet. Epidemiol., 23, 70–86.Search in Google Scholar

Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001): “Empirical Bayes analysis of a microarray experiment,” J. Am. Stat. Assoc., 96, 1151–1160.Search in Google Scholar

Fisher, R. A. (1973): Statistical methods and scientific inference, New York: Hafner Press.Search in Google Scholar

Gentleman, R. C., V. J. Carey, D. M. Bates, et al., (2004): “Bioconductor: open software development for computational biology and bioinformatics,” Genome Biol., 5, R80.Search in Google Scholar

Hald, A. (2007): A history of parametric statistical inference from bernoulli to fisher, New York: Springer, 1713–1935.Search in Google Scholar

Kyburg, H. E. and C. M. Teng (2006): “Non monotonic logic and statistical inference,” Comput. Intell. 22, 26–51.Search in Google Scholar

Li, X. (2009): ProData. Bioconductor.org documentation for the ProData package. http://www.bioconductor.org/packages/2.12/data/experiment/html/ProData.html.Search in Google Scholar

Morris, C. N. (1983a): “Parametric empirical Bayes inference: theory and applications,” J. Am. Stat. Assoc., 78, 47–55.10.1080/01621459.1983.10477920Search in Google Scholar

Morris, C. N. (1983b): “Parametric empirical Bayes inference: theory and applications: rejoinder,” J. Am. Stat. Assoc., 78, 63–65.10.2307/2287105Search in Google Scholar

Morris, J. S., P. J. Brown, R. C. Herrick, K. A. Baggerly, and K.R. Coombes (2008): “Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models,” Biometrics, 64(2), 479–489.10.1111/j.1541-0420.2007.00895.xSearch in Google Scholar

Muralidharan, O. (2010): “An empirical Bayes mixture method for effect size and false discovery rate estimation,” Ann. Appl. Stat., 4, 422–438.Search in Google Scholar

Padilla, M. and D. R. Bickel (2012): “Empirical Bayes methods corrected for small numbers of tests,” Stat. Applications Genet. Mol. Biol., 11(5), art.4.10.1515/1544-6115.1807Search in Google Scholar

R Development Core Team (2008): R:a language and environment for statistical computing, Vienna, Austria: R foundation for statistical computing.Search in Google Scholar

Singh, K., M. Xie, and W. E. Strawderman (2007): “Confidence distribution (CD)–distribution estimator of a parameter,” IMS Lecture Notes Monograph Series, 54, 132–150.10.1214/074921707000000102Search in Google Scholar

Storey, J. D. (2002): “A direct approach to false discovery rates,” J. Roy. Stat. Soc. B, 64, 479–498.Search in Google Scholar

Westfall, P. H. (2010): “Comment on B. Efron,“Correlated z-values and the accuracy of large-scale statistical estimates,”” J. Am. Stat. Assoc., 105, 1063–1066.Search in Google Scholar

Westfall, P. H. and S. S. Young (1993): Resampling-Based Multiple Testing. Hoboken: John Wiley & Sons.Search in Google Scholar

Whittemore, A. S. (2007): “A Bayesian false discovery rate for multiple testing,” J. Appl. Stat., 34(1), 1–9.10.1080/02664760600994745Search in Google Scholar

Wilkinson, G. N. (1977): “On resolving the controversy instatistical inference(with discussion),” J. Roy. Stat. Soc. B, 39, 119–171.Search in Google Scholar

Yuan, B. (2009): “Bayesian frequentist hybrid inference,” Ann. Stat., 37, 2458–2501.Search in Google Scholar

Published Online: 2013-06-21
Published in Print: 2013-08-01

©2013 by Walter de Gruyter Berlin Boston