Abstract
The Mantel-Haenszel estimators for the common effect parameters of stratified 2×2 tables have been widely adopted in epidemiological and clinical studies for controlling the effects of confounding factors. Although the Mantel-Haenszel estimators are simple and effective estimating methods, the correctness of the common effect assumptions cannot be justified in general practices. Also then, the targeted “common effect parameters” do not exist. Under these settings, even if the Mantel-Haenszel estimators have desirable properties, it is quite uncertain what they estimate and how the estimates are interpreted. In this article, we conducted theoretical evaluations for their asymptotic behaviors when the common effect assumptions are violated. We explicitly showed that the Mantel-Haenszel estimators converge to weighted averages of stratum-specific effect parameters and they can be interpreted as intuitive summaries of the stratum-specific effect measures. Also, the Mantel-Haenszel estimators correspond to the standardized effect measures on standard distributions of stratification variables to be the total cohort, approximately. In addition, the ordinary sandwich-type variance estimators are still valid for quantifying variabilities of the Mantel-Haenszel estimators. We implemented numerical studies based on two epidemiologic studies of breast cancer and schizophrenia for evaluating empirical properties of these estimators, and confirmed general validities of these theoretical results.
1 Introduction
In the analysis of epidemiologic and clinical studies, the Mantel-Haenszel estimators (Mantel and Haenszel 1959; Rothman, Greenland and Lash 2008) for the common effect parameters of stratified 2×2 tables have been widely adopted for controlling the effects of confounding factors. Due to their simplicity and highly efficiency, these estimators are preferred by epidemiologists and have also been one of the standard methods in meta-analysis (Higgins and Green 2008). Although the Mantel-Haenszel estimators are effective estimating methods for the common effect parameters, the common effect assumptions cannot be justified rigorously, in practice (Greenland 1982; Mantel et al. 1977). When the common effect assumptions are violated, the targeted parameters estimated by the Mantel-Haenszel methods are quite uncertain and it is not clear what they estimate. Greenland and Maldonado (1994) inferred that the Mantel-Haenszel rate ratio estimator is approximated by the standardized rate ratio on a standard distribution of stratification variables to be the total cohort. They also showed its general correctness through numerical studies, although there were not sufficient theoretical justifications.
The violation of the common effect assumptions can be regarded as one of model misspecification problems. In theoretical studies, the model misspecification problems have been widely researched mainly for the maximum likelihood estimators based on the landmark paper of White (1982). Although its generalization to the estimating equation theory (Godambe 1969) has not been found until recent studies, Yi and Reid (2010) provided generalized results of White (1982)’s asymptotic results for the behavior of maximum likelihood estimators under misspecified models. Since it has been well known that the Mantel-Haenszel estimators can be regarded as local efficient estimators for the common effect parameters under null effects (the exposure effects are zero) through the estimating equation theory (Fujii and Yanagimoto 2005; Sato 1990; Yanagimoto 1990), the asymptotic behaviors can be assessed using the Yi and Reid (2010)’s results.
In this article, we evaluate asymptotic behaviors of the Mantel-Haenszel estimators when the common effect assumptions are violated. We show the Mantel-Haenszel estimators can be approximately interpreted as estimators for average exposure effect under the heterogeneity of effects across strata. We would show that the average effects are generally viewed as good approximations to the standardized estimators under certain conditions. In addition, we would discuss validities of ordinary variance estimators of the Mantel-Haenszel estimators under the heterogeneous settings. We also assess their empirical properties through numerical studies based on two epidemiologic studies of breast cancer and schizophrenia.
2 Analysis of cohort studies with binary data
2.1 Mantel-Haenszel risk ratio and risk difference estimators
First, we discuss the common risk ratio and risk difference estimation for stratified analysis in cohort studies. Consider a series of K 2 × 2 tables formed by pairs of independent binomial observations
The Mantel-Haenszel estimators of the common risk ratio
where
Note that these estimating functions are unbiased under the common effects assumptions, such that
For evaluating the asymptotic behaviors of the Mantel-Haenszel estimators, it is useful to formulate two large sample schemes that are common for stratified analyses. The first, denoted as Asymptotic I, is to have a fixed number of strata K while
First, denoting
Under the Asymptotic I, we assume
and variances
where
An outline of proof is provided in Appendix. Note that
In addition, the Mantel-Haenszel estimators can also be interpreted to converge to weighted averages of stratum specific risk ratios
Second, we consider Asymptotic II. The limiting model considered here is similar to those employed by Breslow (1981) and Greenland and Robins (1985). We suppose there is a finite number of possible configurations of total sample sizes
Under the Asymptotic II, the Mantel-Haenszel estimators converge to normal distribution with means equal to
and variances
An outline of proof is provided in Appendix II. Similar to Asymptotic I,
Another concern is the asymptotic variance estimation of
2.2 Illustration: Tamoxifen use and recurrence of breast cancer
Table 1 presents parts of the results of a cohort study to assess the risk of second primary cancers after adjuvant tamoxifen therapy for breast cancer (Matsuyama et al. 2000; Sato and Matsuyama 2003). Nearly null effect of tamoxifen was observed for the unstratified analysis (crude risk ratio: 1.011, crude risk difference: 0.002). However, stratifying by lymph node metastasis at surgery, possible preventive effects were observed in each stratum (the stratum-specific risk ratios: 0.910 and 0.670, the risk differences: −0.030 and −0.035). Although there would be hardly effect modification for the risk differences, that for the risk ratios would exist. We suppose the heterogeneous setting under Asymptotic I. The Mantel-Haenszel risk ratio estimator was 0.830 and the Mantel-Haenszel risk difference estimator was −0.033. Besides, the standardized risk ratio and risk difference with standards
In addition, we conducted simulation studies for investigating empirical properties of the Mantel-Haenszel estimators under heterogeneity. We consider several scenarios based on the stratified dataset of Table 1, such as
As the results, in the all scenarios, means of the distributions of the Mantel-Haenszel estimates mostly accord to the asymptotic mean of the distributions of the Mantel-Haenszel estimators
Results of a cohort study for evaluating the risk of second primary cancers after adjuvant tamoxifen therapy for breast cancer (Matsuyama et al. 2000; Sato and Matsuyama 2003).
Lymph node metastasis at surgery | Not lymph node metastasis at surgery | |||
---|---|---|---|---|
Tamoxifen use | Not use | Tamoxifen use | Not use | |
Recurrence | 368 | 253 | 96 | 171 |
Not recurrence | 847 | 507 | 1,238 | 1,421 |
Total | 1,215 | 760 | 1,334 | 1,592 |
Recurrence proportion | 0.303 | 0.333 | 0.072 | 0.107 |
Risk ratio | 0.910 | 0.670 | ||
Risk difference | −0.030 | −0.035 |

Results of simulations: Means of 25,000 estimates of the Mantel-Haenszel estimates (
3 Analysis of cohort studies with person-time data
3.1 Mantel-Haenszel rate ratio and rate difference estimators
We consider estimating the common rate ratio and rate difference for stratified person-time data of cohort studies. Suppose a series of K strata constructed by independent Poisson observations
where
It should be noted that these estimating functions are unbiased under the common effects assumptions, and thus, consistency of the estimators follow.
Here, we consider similar limiting models the previous section. The large-strata limiting model, Asymptotic I, is to have a fixed number of strata K while
Under the Asymptotic I, we assume
and variances
where
Also, under Asymptotic II, we suppose there is a finite number of possible configurations of total sample sizes
Under the Asymptotic II, the Mantel-Haenszel estimators converge to normal distribution with means equal to
and variances
These results can be obtained as the same way with Theorem 1 and 2 (see Appendix). Therefore, similarly to the binomial cases in Section 2, when the common effect assumptions are violated, these quantities can be interpreted as expected quantities of standardized rate ratio and rate differences with the standard weight
3.2 Illustration: Mortality rates for clozapine users
Table 2 present a result of a study of mortality rates among current users and past users of clozapine that was used to treat schizophrenia (Rothman 2002; Walker et al. 1997). Clozapine uses were thought to be associated to mortality for current users, therefore the past users were used as their controls. Stratifying by two age groups (10–54 years old, and 55–95 years old), although possible protective effects were observed in both strata (the stratum-specific rate ratios: 0.448 and 0.486, the rate differences: −388.7 and −2903 per 105 person-years). In this study, there would be hardly effect modification for the rate ratios, although a certain effect modification would exist for the rate difference. We also consider the large-strata limiting model, here. The Mantel-Haenszel rate ratio estimator was 0.469 and the Mantel-Haenszel risk difference estimator was −710.7 per 105 person-years. Besides, the standardized rate ratio and rate difference with
Here, we also conducted simulation experiments for investigating empirical properties of the Mantel-Haenszel estimators under heterogeneity. We consider several scenarios based on the stratified dataset of Table 2, such as
In the all settings, means of the distributions of the Mantel-Haenszel estimates mostly accord to the asymptotic mean of the distributions of the Mantel-Haenszel estimators
Results of a cohort study: Mortality rates for current and past clozapine users (Walker et al. 1997); Data from Rothman (2002, p. 154).
Age (years): 10–54 | Age (years): 55–94 | |||
---|---|---|---|---|
Current | Past | Current | Past | |
Deaths | 196 | 111 | 167 | 157 |
Person-years | 62,119 | 15,763 | 6,085 | 2,780 |
Rate (per 105 person-years) | 315.5 | 704.2 | 2,744 | 5,647 |
Rate ratio | 0.448 | 0.486 | ||
Rate difference (per 105 person-years) | −388.7 | −2,903 |

Results of simulations: Means of 25,000 estimates of the Mantel-Haenszel estimates (
4 Analysis of case-control studies
4.1 Mantel-Haenszel odds ratio estimator
We discuss the common odds ratio estimation for stratified analyses in case-control studies. Consider the same setting with Section 2, a series of K 2 × 2 tables formed by pairs of independent binomial observations
where
Under the common effect assumption, this estimating function is unbiased, i. e.,
Under the Asymptotic I, the Mantel-Haenszel estimator converges to normal distribution with mean equal to
and variances
where
Also, under the Asymptotic II,
and variances
where
Therefore, similar to Sections 2 and 3, the Mantel-Haenszel odds ratio estimator can also be interpreted to converge to a weighted average of stratum specific odds ratios
4.2 Numerical evaluation by simulations
4.2.1 Behaviors of the Mantel-Haenszel estimator
We assessed the empirical properties of the Mantel-Haenszel estimator
In the results of the all settings, as expected, in the common effect settings (

Results of simulations under Asymptotic I: Means of 3,600 estimates of the Mantel-Haenszel estimates (

Results of simulations under Asymptotic II: Means of 3,600 estimates of the Mantel-Haenszel estimates (
4.2.2 Variance estimation
We also assessed validity of the variance estimators. Settings were roughly mimicked the case-control datasets generated in the previous simulations. At first, for the large-strata settings, we generated 2 × 2 tables (K = 2) such as
Second, for the sparse data settings, we generated 1:1 and 1:4 matched case-control datasets under possibly heterogeneous two populations. We divided the case datasets to
Simulations results under Asymptotic I: Actual SE of the Mantel-Haenszel estimator and means of squared roots of variance estimates by the Hauck’s estimator (
Actual SE | |||||
---|---|---|---|---|---|
OR1 = 0.500 | OR2 = 0.250 | 0.151 | 0.156 | 0.152 | 0.151 |
OR2 = 0.375 | 0.153 | 0.151 | 0.150 | 0.150 | |
OR2 = 0.500 | 0.148 | 0.150 | 0.149 | 0.150 | |
OR2 = 0.625 | 0.149 | 0.150 | 0.149 | 0.149 | |
OR2 = 0.750 | 0.150 | 0.151 | 0.149 | 0.149 | |
OR2 = 0.875 | 0.150 | 0.152 | 0.149 | 0.149 | |
OR2 = 1.000 | 0.148 | 0.153 | 0.149 | 0.149 | |
OR1 = 0.750 | OR2 = 0.375 | 0.151 | 0.154 | 0.150 | 0.149 |
OR2 = 0.563 | 0.150 | 0.150 | 0.148 | 0.149 | |
OR2 = 0.750 | 0.149 | 0.148 | 0.148 | 0.148 | |
OR2 = 0.938 | 0.146 | 0.148 | 0.147 | 0.148 | |
OR2 = 1.125 | 0.147 | 0.149 | 0.147 | 0.147 | |
OR2 = 1.313 | 0.150 | 0.150 | 0.147 | 0.147 | |
OR2 = 1.500 | 0.146 | 0.151 | 0.147 | 0.146 | |
OR1 = 1.000 | OR2 = 0.500 | 0.148 | 0.155 | 0.150 | 0.150 |
OR2 = 0.750 | 0.149 | 0.150 | 0.149 | 0.149 | |
OR2 = 1.000 | 0.149 | 0.149 | 0.148 | 0.148 | |
OR2 = 1.250 | 0.150 | 0.149 | 0.148 | 0.148 | |
OR2 = 1.500 | 0.150 | 0.149 | 0.147 | 0.147 | |
OR2 = 1.750 | 0.147 | 0.150 | 0.147 | 0.147 | |
OR2 = 2.000 | 0.147 | 0.152 | 0.148 | 0.146 |
Simulations results under Asymptotic II: Actual SE of the Mantel-Haenszel estimator and means of squared roots of variance estimates by the Robins-Breslow-Greenland’s estimator (
1:1 matching | 1:4 matching | ||||||
---|---|---|---|---|---|---|---|
Actual SE | Actual SE | ||||||
OR1 = 0.500 | OR2 = 0.250 | 0.180 | 0.178 | 0.181 | 0.136 | 0.135 | 0.136 |
OR2 = 0.375 | 0.171 | 0.172 | 0.174 | 0.133 | 0.132 | 0.132 | |
OR2 = 0.500 | 0.171 | 0.169 | 0.171 | 0.131 | 0.130 | 0.131 | |
OR2 = 0.625 | 0.168 | 0.168 | 0.169 | 0.131 | 0.129 | 0.130 | |
OR2 = 0.750 | 0.166 | 0.166 | 0.168 | 0.128 | 0.129 | 0.129 | |
OR2 = 0.875 | 0.166 | 0.166 | 0.167 | 0.130 | 0.129 | 0.129 | |
OR2 = 1.000 | 0.169 | 0.165 | 0.167 | 0.131 | 0.129 | 0.129 | |
OR1 = 0.750 | OR2 = 0.375 | 0.169 | 0.169 | 0.170 | 0.130 | 0.132 | 0.133 |
OR2 = 0.563 | 0.170 | 0.165 | 0.166 | 0.129 | 0.129 | 0.130 | |
OR2 = 0.750 | 0.165 | 0.163 | 0.164 | 0.129 | 0.128 | 0.129 | |
OR2 = 0.938 | 0.162 | 0.162 | 0.163 | 0.129 | 0.127 | 0.128 | |
OR2 = 1.125 | 0.162 | 0.162 | 0.163 | 0.127 | 0.127 | 0.128 | |
OR2 = 1.313 | 0.160 | 0.162 | 0.163 | 0.124 | 0.127 | 0.128 | |
OR2 = 1.500 | 0.164 | 0.162 | 0.163 | 0.126 | 0.128 | 0.128 | |
OR1 = 1.000 | OR2 = 0.500 | 0.165 | 0.166 | 0.168 | 0.129 | 0.132 | 0.132 |
OR2 = 0.750 | 0.165 | 0.163 | 0.165 | 0.128 | 0.129 | 0.129 | |
OR2 = 1.000 | 0.164 | 0.162 | 0.164 | 0.126 | 0.128 | 0.128 | |
OR2 = 1.250 | 0.164 | 0.162 | 0.163 | 0.126 | 0.128 | 0.128 | |
OR2 = 1.500 | 0.163 | 0.162 | 0.164 | 0.127 | 0.128 | 0.128 | |
OR2 = 1.750 | 0.160 | 0.163 | 0.164 | 0.126 | 0.128 | 0.128 | |
OR2 = 2.000 | 0.160 | 0.163 | 0.165 | 0.125 | 0.128 | 0.129 |
5 Concluding remarks
The Mantel-Haenszel estimators have been widely applied in epidemiological and clinical researches involving meta-analysis due to their simplicity and efficiency. However, correctness of the common effect assumptions cannot be justified in general practice, and the targeted “common effect parameter” does not exist, then. Under this setting, even if the Mantel-Haenszel estimators have desirable properties, it is uncertain what they estimate and how the estimates are interpreted. However, many epidemiologists and statisticians would anticipate that they might be interpreted as an average exposure effect in some kinds of means, although there were not certain theoretical reasons. In this study, we provided theoretical evaluations of the Mantel-Haenszel estimators under the common effect assumptions are violated, and showed the intuitions are mostly correct. These results also correspond to the anticipations of Greenland and Maldonado (1994). We also showed these large sample results are valid under realistic situations with finite samples by a series of numerical studies.
As related recent theoretical works, Xu and O’Quigley (2000) and Hattori and Henmi (2012) showed the partial likelihood estimator of the Cox regression model can be interpreted as an average hazard ratio estimator even when the proportional hazard assumption was violated. According to the results of this study, the Mantel-Haenszel estimators are also interpreted as (i) when the common effect assumption is correct (as the best scenario), they are nearly efficient estimators of the common effect parameters, and (ii) when the common effect assumption is incorrect, they can be interpreted as the average exposure effect estimators across strata. Obviously, when a strong effect modification exists, it would not be recommended synthesizing the stratum-specific effect measures as a common effect (Greenland 1982; Mantel et al. 1977). The uses of the common effect estimators are appropriate, at least, for the settings that moderate effect modification are. In both ways, these theoretical and numerical evidences of the Mantel-Haenszel estimators would be a meaningful information for practices in epidemiological and clinical researches.
Funding statement: Funding: This work was supported by Grant-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology of Japan (Grant numbers: 25280008, 15K15954).
References
Breslow, N. E. (1981). Odds ratio estimators when the data are sparse. Biometrika, 68:73–84.10.1093/biomet/68.1.73Search in Google Scholar
Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests. Biometrics, 10:417–451.10.2307/3001616Search in Google Scholar
Fujii, Y., and Yanagimoto, T. (2005). Pairwise conditional score functions: a generalization of the Mantel-Haenszel estimator. Journal of Statistical Planning and Inference, 128:1–12.10.1016/j.jspi.2003.09.035Search in Google Scholar
Godambe, V. P. (1969). An optimum property of regular maximum likelihood estimation. Annals of Mathematical Statistics, 31:1208–1212.10.1214/aoms/1177705693Search in Google Scholar
Greenland, S. (1982). Interpretation and estimation of summary ratios under heterogeneity. Statistics in Medicine, 1:217–227.10.1002/sim.4780010304Search in Google Scholar PubMed
Greenland, S. (1987). Interpretation and choice of effect measures in epidemiologic analysis. American Journal of Epidemiology, 125:761–768.10.1093/oxfordjournals.aje.a114593Search in Google Scholar PubMed
Greenland, S., and Maldonado, G. (1994). The interpretation of multiplicative-model parameters as standardized parameters. Statistics in Medicine, 13:989–999.10.1002/sim.4780131002Search in Google Scholar PubMed
Greenland, S., and Robins, J. (1985). Estimation of a common effect parameter from sparse follow-up data. Biometrics, 41:55–68.10.2307/2530643Search in Google Scholar
Hattori, S., and Henmi, M. (2012). Estimation of treatment effects based on possibly misspecified Cox regression. Lifetime Data Analysis, 18:408–433.10.1007/s10985-012-9222-8Search in Google Scholar PubMed
Hauck, W. W. (1979). The large sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics, 35:817–819.10.2307/2530114Search in Google Scholar
Higgins, J. P. T., and Green, S. (2008). Cochrane Handbook for Systematic Reviews of Interventions. Chichester: Wiley-Blackwell.10.1002/9780470712184Search in Google Scholar
Mantel, N., Brown, C., and Byar, D. P. (1977). Tests for homogeneity of effect in an epidemiologic investigation. American Journal of Epidemiology, 106:125–129.10.1093/oxfordjournals.aje.a112441Search in Google Scholar
Mantel, N., and Haenszel, W. H. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22:719–748.Search in Google Scholar
Matsuyama, Y., Tominaga, T., Nomura, Y., et al. (2000). Second cancers after adjuvant tamoxifen therapy for breast cancer in Japan. Annals of Oncology, 11:1537–1543.10.1093/oxfordjournals.annonc.a010406Search in Google Scholar
Nurminen, M. (1981). Asymptotic efficiency of general noniterative estimators of common relative risk. Biometrika, 68:525–530.10.1093/biomet/68.2.525Search in Google Scholar
Robins, J. M., Breslow, N., and Greenland, S. (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics, 42:311–323.Search in Google Scholar
Rothman, K. J. (2002). Epidemiology: An Introduction. New York: Oxford University Press.Search in Google Scholar
Rothman, K. J., Greenland, G., and Lash, T. L. (2008). Modern Epidemiology. 3rd Edition. Philadelphia: Lippincott Williams & Wilkins.Search in Google Scholar
Sato, T. (1989). On variance estimator for the Mantel-Haenszel risk difference. Biometrics, 45:1323–1324.Search in Google Scholar
Sato, T. (1990). Confidence intervals for effect parameters common in cancer epidemiology. Environmetal Health Perspectives, 87: 95–101.10.1289/ehp.908795Search in Google Scholar
Sato, T., and Matsuyama, Y. (2003). Marginal structural models as a tool for standardization. Epidemiology, 14:680–686.10.1097/01.EDE.0000081989.82616.7dSearch in Google Scholar
Tarone, R. E. (1981). On summary estimators of relative risk. Journal of Chronic Diseases, 34:463–468.10.1016/0021-9681(81)90006-0Search in Google Scholar
Walker, A. M. (1985). Small sample properties of some estimators of a common hazard ratio. Applied Statistics, 34:42–48.10.2307/2347883Search in Google Scholar
Walker, A. M., Lanza, L. L., Arellano, F., and Rothman, K. J. (1997). Mortality in current and former users of clozapine. Epidemiology, 8:671–677.10.1097/00001648-199711000-00014Search in Google Scholar
White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50:1–9.10.2307/1912526Search in Google Scholar
Xu, R., and O‘Quigley, J. (2000). Estimating average regression effect under non-proportional hazards. Biostatistics, 1:423–439.10.1093/biostatistics/1.4.423Search in Google Scholar
Yanagimoto, T. (1990). Combining moment estimates of a parameter common through strata. Journal of Statistical Planning and Inference, 25:187–198.10.1016/0378-3758(90)90065-3Search in Google Scholar
Yi, G. Y., and Reid, N. (2010). A note on mis-specified estimating functions. Statistica Sinica, 20:1749–1769.Search in Google Scholar
Appendix
In this appendix, we describe outline of proofs of derivation of asymptotic distributions of the Mantel-Haenszel estimators. Because the functional forms of the Mantel-Haenszel estimating functions are common, the rationale of proofs is basically common. Here, we briefly describe that of the odds ratio case.
Asymptotic I. Taylor expansion on the Mantel-Haenszel estimating function
Because of the law of large number,
It accords to
Thus, the asymptotic distribution is derived.
Asymptotic II. For the sparse data limiting model
The first term of the right-hand can be expressed as
where
The expression of the objective function is
Thus, the solution corresponds to
Therefore, the asymptotic distribution is derived.
© 2016 by De Gruyter