## 1 Introduction

In the analysis of epidemiologic and clinical studies, the Mantel-Haenszel estimators (Mantel and Haenszel 1959; Rothman, Greenland and Lash 2008) for the common effect parameters of stratified 2×2 tables have been widely adopted for controlling the effects of confounding factors. Due to their simplicity and highly efficiency, these estimators are preferred by epidemiologists and have also been one of the standard methods in meta-analysis (Higgins and Green 2008). Although the Mantel-Haenszel estimators are effective estimating methods for the common effect parameters, the common effect assumptions cannot be justified rigorously, in practice (Greenland 1982; Mantel et al. 1977). When the common effect assumptions are violated, the targeted parameters estimated by the Mantel-Haenszel methods are quite uncertain and it is not clear what they estimate. Greenland and Maldonado (1994) inferred that the Mantel-Haenszel rate ratio estimator is approximated by the standardized rate ratio on a standard distribution of stratification variables to be the total cohort. They also showed its general correctness through numerical studies, although there were not sufficient theoretical justifications.

The violation of the common effect assumptions can be regarded as one of model misspecification problems. In theoretical studies, the model misspecification problems have been widely researched mainly for the maximum likelihood estimators based on the landmark paper of White (1982). Although its generalization to the estimating equation theory (Godambe 1969) has not been found until recent studies, Yi and Reid (2010) provided generalized results of White (1982)’s asymptotic results for the behavior of maximum likelihood estimators under misspecified models. Since it has been well known that the Mantel-Haenszel estimators can be regarded as local efficient estimators for the common effect parameters under null effects (the exposure effects are zero) through the estimating equation theory (Fujii and Yanagimoto 2005; Sato 1990; Yanagimoto 1990), the asymptotic behaviors can be assessed using the Yi and Reid (2010)’s results.

In this article, we evaluate asymptotic behaviors of the Mantel-Haenszel estimators when the common effect assumptions are violated. We show the Mantel-Haenszel estimators can be approximately interpreted as estimators for average exposure effect under the heterogeneity of effects across strata. We would show that the average effects are generally viewed as good approximations to the standardized estimators under certain conditions. In addition, we would discuss validities of ordinary variance estimators of the Mantel-Haenszel estimators under the heterogeneous settings. We also assess their empirical properties through numerical studies based on two epidemiologic studies of breast cancer and schizophrenia.

## 2 Analysis of cohort studies with binary data

### 2.1 Mantel-Haenszel risk ratio and risk difference estimators

First, we discuss the common risk ratio and risk difference estimation for stratified analysis in cohort studies. Consider a series of *K* 2 × 2 tables formed by pairs of independent binomial observations

The Mantel-Haenszel estimators of the common risk ratio

For evaluating the asymptotic behaviors of the Mantel-Haenszel estimators, it is useful to formulate two large sample schemes that are common for stratified analyses. The first, denoted as Asymptotic I, is to have a fixed number of strata *K* while

First, denoting

Under the Asymptotic I, we assume

where

An outline of proof is provided in Appendix. Note that

In addition, the Mantel-Haenszel estimators can also be interpreted to converge to weighted averages of stratum specific risk ratios

Second, we consider Asymptotic II. The limiting model considered here is similar to those employed by Breslow (1981) and Greenland and Robins (1985). We suppose there is a finite number of possible configurations of total sample sizes *K* strata, and we denote the number of it as *L*. Then, the weight *G* heterogeneous risk ratios and/or risk differences among *K* strata, i. e., *G* subsets that the common effect assumptions hold within each subset. We denote these *G* heterogeneous effect parameters as *l*th configuration and *g*th subset and *l* = 1,2,…,*L*; *g* = 1,2,…,*G*). In addition, the nuisance parameter *l*th and *g*th subset. The corresponding *l*th and *g*th subset as

Under the Asymptotic II, the Mantel-Haenszel estimators converge to normal distribution with means equal to

An outline of proof is provided in Appendix II. Similar to Asymptotic I,

Another concern is the asymptotic variance estimation of

### 2.2 Illustration: Tamoxifen use and recurrence of breast cancer

Table 1 presents parts of the results of a cohort study to assess the risk of second primary cancers after adjuvant tamoxifen therapy for breast cancer (Matsuyama et al. 2000; Sato and Matsuyama 2003). Nearly null effect of tamoxifen was observed for the unstratified analysis (crude risk ratio: 1.011, crude risk difference: 0.002). However, stratifying by lymph node metastasis at surgery, possible preventive effects were observed in each stratum (the stratum-specific risk ratios: 0.910 and 0.670, the risk differences: −0.030 and −0.035). Although there would be hardly effect modification for the risk differences, that for the risk ratios would exist. We suppose the heterogeneous setting under Asymptotic I. The Mantel-Haenszel risk ratio estimator was 0.830 and the Mantel-Haenszel risk difference estimator was −0.033. Besides, the standardized risk ratio and risk difference with standards

In addition, we conducted simulation studies for investigating empirical properties of the Mantel-Haenszel estimators under heterogeneity. We consider several scenarios based on the stratified dataset of Table 1, such as

As the results, in the all scenarios, means of the distributions of the Mantel-Haenszel estimates mostly accord to the asymptotic mean of the distributions of the Mantel-Haenszel estimators

Results of a cohort study for evaluating the risk of second primary cancers after adjuvant tamoxifen therapy for breast cancer (Matsuyama et al. 2000; Sato and Matsuyama 2003).

Lymph node metastasis at surgery | Not lymph node metastasis at surgery | |||
---|---|---|---|---|

Tamoxifen use | Not use | Tamoxifen use | Not use | |

Recurrence | 368 | 253 | 96 | 171 |

Not recurrence | 847 | 507 | 1,238 | 1,421 |

Total | 1,215 | 760 | 1,334 | 1,592 |

Recurrence proportion | 0.303 | 0.333 | 0.072 | 0.107 |

Risk ratio | 0.910 | 0.670 | ||

Risk difference | −0.030 | −0.035 |

## 3 Analysis of cohort studies with person-time data

### 3.1 Mantel-Haenszel rate ratio and rate difference estimators

We consider estimating the common rate ratio and rate difference for stratified person-time data of cohort studies. Suppose a series of *K* strata constructed by independent Poisson observations

Here, we consider similar limiting models the previous section. The large-strata limiting model, Asymptotic I, is to have a fixed number of strata *K* while

Under the Asymptotic I, we assume

where

Also, under Asymptotic II, we suppose there is a finite number of possible configurations of total sample sizes *K* strata, and we denote the number of it as *L*. Then, the weight *G* heterogeneous rate ratios and/or rate differences among *K* strata, i. e., *G* subsets that the common effect assumptions hold within each subset. We denote these *G* heterogeneous effect parameters as *l*th configuration and *g*th subset and *l* = 1,2,…,*L*; *g* = 1,2,…,*G*). In addition, the nuisance parameter *l*th and *g*th subset. The corresponding *l*th and *g*th subset as

Under the Asymptotic II, the Mantel-Haenszel estimators converge to normal distribution with means equal to

These results can be obtained as the same way with Theorem 1 and 2 (see Appendix). Therefore, similarly to the binomial cases in Section 2, when the common effect assumptions are violated, these quantities can be interpreted as expected quantities of standardized rate ratio and rate differences with the standard weight

### 3.2 Illustration: Mortality rates for clozapine users

Table 2 present a result of a study of mortality rates among current users and past users of clozapine that was used to treat schizophrenia (Rothman 2002; Walker et al. 1997). Clozapine uses were thought to be associated to mortality for current users, therefore the past users were used as their controls. Stratifying by two age groups (10–54 years old, and 55–95 years old), although possible protective effects were observed in both strata (the stratum-specific rate ratios: 0.448 and 0.486, the rate differences: −388.7 and −2903 per 10^{5} person-years). In this study, there would be hardly effect modification for the rate ratios, although a certain effect modification would exist for the rate difference. We also consider the large-strata limiting model, here. The Mantel-Haenszel rate ratio estimator was 0.469 and the Mantel-Haenszel risk difference estimator was −710.7 per 10^{5} person-years. Besides, the standardized rate ratio and rate difference with ^{5} person-years, respectively. As for the breast cancer example in Section 2, in this case, the Mantel-Haenszel estimates and the standardized estimates are approximately identical.

Here, we also conducted simulation experiments for investigating empirical properties of the Mantel-Haenszel estimators under heterogeneity. We consider several scenarios based on the stratified dataset of Table 2, such as

In the all settings, means of the distributions of the Mantel-Haenszel estimates mostly accord to the asymptotic mean of the distributions of the Mantel-Haenszel estimators

Results of a cohort study: Mortality rates for current and past clozapine users (Walker et al. 1997); Data from Rothman (2002, p. 154).

Age (years): 10–54 | Age (years): 55–94 | |||
---|---|---|---|---|

Current | Past | Current | Past | |

Deaths | 196 | 111 | 167 | 157 |

Person-years | 62,119 | 15,763 | 6,085 | 2,780 |

Rate (per 10^{5} person-years) | 315.5 | 704.2 | 2,744 | 5,647 |

Rate ratio | 0.448 | 0.486 | ||

Rate difference (per 10^{5} person-years) | −388.7 | −2,903 |

## 4 Analysis of case-control studies

### 4.1 Mantel-Haenszel odds ratio estimator

We discuss the common odds ratio estimation for stratified analyses in case-control studies. Consider the same setting with Section 2, a series of *K* 2 × 2 tables formed by pairs of independent binomial observations

Under the Asymptotic I, the Mantel-Haenszel estimator converges to normal distribution with mean equal to

where

Also, under the Asymptotic II,

where *G* strata and

Therefore, similar to Sections 2 and 3, the Mantel-Haenszel odds ratio estimator can also be interpreted to converge to a weighted average of stratum specific odds ratios

### 4.2 Numerical evaluation by simulations

### 4.2.1 Behaviors of the Mantel-Haenszel estimator

We assessed the empirical properties of the Mantel-Haenszel estimator *K* = 2) cohort data mimicked to the breast cancer research (Matsuyama et al. 2000; Sato and Matsuyama 2003) in Section 2.2, such as

In the results of the all settings, as expected, in the common effect settings (

### 4.2.2 Variance estimation

We also assessed validity of the variance estimators. Settings were roughly mimicked the case-control datasets generated in the previous simulations. At first, for the large-strata settings, we generated 2 × 2 tables (*K* = 2) such as _{1} and OR_{2}. We set OR_{1} to be 0.500, 0.750, 1.000, and OR_{2} to be 0.500, 0.750, 1.000, 1.250, 1.500, 1.750, 2.000 times of OR_{1}. We also set ), the Robins-Breslow-Greenland’s estimator (

*), and the bootstrap variance estimator (*$\stackrel{\u02c6}{V}$

_{RBG}*) of 3,600 replications. The number of bootstrap resampling was set to 5000. Results of the simulations were presented in Table 3. Under all of the settings considered, the three estimators validly quantified the actual SE, as a whole.*$\stackrel{\u02c6}{V}$

_{boot}Second, for the sparse data settings, we generated 1:1 and 1:4 matched case-control datasets under possibly heterogeneous two populations. We divided the case datasets to _{1} and OR_{2}, too. We set OR_{1} to be 0.500, 0.750, 1.000, and OR_{2} to be 0.500, 0.750, 1.000, 1.250, 1.500, 1.750, 2.000 times of OR_{1}. We also set ), and the bootstrap variance estimator (

*) of 3,600 replications. The number of bootstrap resampling was set to 5,000. Results of the simulations were presented in Table 4. For the sparse data settings, the variance estimators validly quantified the actual SE under all of the settings considered. The existing variance estimators would be generally valid for quantifying SE of the Mantel-Haenszel estimators even when the common effect assumptions are violated.*$\stackrel{\u02c6}{V}$

_{boot}Simulations results under Asymptotic I: Actual SE of the Mantel-Haenszel estimator and means of squared roots of variance estimates by the Hauck’s estimator (

Actual SE | |||||
---|---|---|---|---|---|

OR_{1} = 0.500 | OR_{2} = 0.250 | 0.151 | 0.156 | 0.152 | 0.151 |

OR_{2} = 0.375 | 0.153 | 0.151 | 0.150 | 0.150 | |

OR_{2} = 0.500 | 0.148 | 0.150 | 0.149 | 0.150 | |

OR_{2} = 0.625 | 0.149 | 0.150 | 0.149 | 0.149 | |

OR_{2} = 0.750 | 0.150 | 0.151 | 0.149 | 0.149 | |

OR_{2} = 0.875 | 0.150 | 0.152 | 0.149 | 0.149 | |

OR_{2} = 1.000 | 0.148 | 0.153 | 0.149 | 0.149 | |

OR_{1} = 0.750 | OR_{2} = 0.375 | 0.151 | 0.154 | 0.150 | 0.149 |

OR_{2} = 0.563 | 0.150 | 0.150 | 0.148 | 0.149 | |

OR_{2} = 0.750 | 0.149 | 0.148 | 0.148 | 0.148 | |

OR_{2} = 0.938 | 0.146 | 0.148 | 0.147 | 0.148 | |

OR_{2} = 1.125 | 0.147 | 0.149 | 0.147 | 0.147 | |

OR_{2} = 1.313 | 0.150 | 0.150 | 0.147 | 0.147 | |

OR_{2} = 1.500 | 0.146 | 0.151 | 0.147 | 0.146 | |

OR_{1} = 1.000 | OR_{2} = 0.500 | 0.148 | 0.155 | 0.150 | 0.150 |

OR_{2} = 0.750 | 0.149 | 0.150 | 0.149 | 0.149 | |

OR_{2} = 1.000 | 0.149 | 0.149 | 0.148 | 0.148 | |

OR_{2} = 1.250 | 0.150 | 0.149 | 0.148 | 0.148 | |

OR_{2} = 1.500 | 0.150 | 0.149 | 0.147 | 0.147 | |

OR_{2} = 1.750 | 0.147 | 0.150 | 0.147 | 0.147 | |

OR_{2} = 2.000 | 0.147 | 0.152 | 0.148 | 0.146 |

Simulations results under Asymptotic II: Actual SE of the Mantel-Haenszel estimator and means of squared roots of variance estimates by the Robins-Breslow-Greenland’s estimator (

1:1 matching | 1:4 matching | ||||||
---|---|---|---|---|---|---|---|

Actual SE | Actual SE | ||||||

OR_{1} = 0.500 | OR_{2} = 0.250 | 0.180 | 0.178 | 0.181 | 0.136 | 0.135 | 0.136 |

OR_{2} = 0.375 | 0.171 | 0.172 | 0.174 | 0.133 | 0.132 | 0.132 | |

OR_{2} = 0.500 | 0.171 | 0.169 | 0.171 | 0.131 | 0.130 | 0.131 | |

OR_{2} = 0.625 | 0.168 | 0.168 | 0.169 | 0.131 | 0.129 | 0.130 | |

OR_{2} = 0.750 | 0.166 | 0.166 | 0.168 | 0.128 | 0.129 | 0.129 | |

OR_{2} = 0.875 | 0.166 | 0.166 | 0.167 | 0.130 | 0.129 | 0.129 | |

OR_{2} = 1.000 | 0.169 | 0.165 | 0.167 | 0.131 | 0.129 | 0.129 | |

OR_{1} = 0.750 | OR_{2} = 0.375 | 0.169 | 0.169 | 0.170 | 0.130 | 0.132 | 0.133 |

OR_{2} = 0.563 | 0.170 | 0.165 | 0.166 | 0.129 | 0.129 | 0.130 | |

OR_{2} = 0.750 | 0.165 | 0.163 | 0.164 | 0.129 | 0.128 | 0.129 | |

OR_{2} = 0.938 | 0.162 | 0.162 | 0.163 | 0.129 | 0.127 | 0.128 | |

OR_{2} = 1.125 | 0.162 | 0.162 | 0.163 | 0.127 | 0.127 | 0.128 | |

OR_{2} = 1.313 | 0.160 | 0.162 | 0.163 | 0.124 | 0.127 | 0.128 | |

OR_{2} = 1.500 | 0.164 | 0.162 | 0.163 | 0.126 | 0.128 | 0.128 | |

OR_{1} = 1.000 | OR_{2} = 0.500 | 0.165 | 0.166 | 0.168 | 0.129 | 0.132 | 0.132 |

OR_{2} = 0.750 | 0.165 | 0.163 | 0.165 | 0.128 | 0.129 | 0.129 | |

OR_{2} = 1.000 | 0.164 | 0.162 | 0.164 | 0.126 | 0.128 | 0.128 | |

OR_{2} = 1.250 | 0.164 | 0.162 | 0.163 | 0.126 | 0.128 | 0.128 | |

OR_{2} = 1.500 | 0.163 | 0.162 | 0.164 | 0.127 | 0.128 | 0.128 | |

OR_{2} = 1.750 | 0.160 | 0.163 | 0.164 | 0.126 | 0.128 | 0.128 | |

OR_{2} = 2.000 | 0.160 | 0.163 | 0.165 | 0.125 | 0.128 | 0.129 |

## 5 Concluding remarks

The Mantel-Haenszel estimators have been widely applied in epidemiological and clinical researches involving meta-analysis due to their simplicity and efficiency. However, correctness of the common effect assumptions cannot be justified in general practice, and the targeted “common effect parameter” does not exist, then. Under this setting, even if the Mantel-Haenszel estimators have desirable properties, it is uncertain what they estimate and how the estimates are interpreted. However, many epidemiologists and statisticians would anticipate that they might be interpreted as an average exposure effect in some kinds of means, although there were not certain theoretical reasons. In this study, we provided theoretical evaluations of the Mantel-Haenszel estimators under the common effect assumptions are violated, and showed the intuitions are mostly correct. These results also correspond to the anticipations of Greenland and Maldonado (1994). We also showed these large sample results are valid under realistic situations with finite samples by a series of numerical studies.

As related recent theoretical works, Xu and O’Quigley (2000) and Hattori and Henmi (2012) showed the partial likelihood estimator of the Cox regression model can be interpreted as an average hazard ratio estimator even when the proportional hazard assumption was violated. According to the results of this study, the Mantel-Haenszel estimators are also interpreted as (i) when the common effect assumption is correct (as the best scenario), they are nearly efficient estimators of the common effect parameters, and (ii) when the common effect assumption is incorrect, they can be interpreted as the average exposure effect estimators across strata. Obviously, when a strong effect modification exists, it would not be recommended synthesizing the stratum-specific effect measures as a common effect (Greenland 1982; Mantel et al. 1977). The uses of the common effect estimators are appropriate, at least, for the settings that moderate effect modification are. In both ways, these theoretical and numerical evidences of the Mantel-Haenszel estimators would be a meaningful information for practices in epidemiological and clinical researches.

## References

Breslow, N. E. (1981). Odds ratio estimators when the data are sparse. Biometrika, 68:73–84.

Cochran, W. G. (1954). Some methods for strengthening the common chi-square tests. Biometrics, 10:417–451.

Fujii, Y., and Yanagimoto, T. (2005). Pairwise conditional score functions: a generalization of the Mantel-Haenszel estimator. Journal of Statistical Planning and Inference, 128:1–12.

Godambe, V. P. (1969). An optimum property of regular maximum likelihood estimation. Annals of Mathematical Statistics, 31:1208–1212.

Greenland, S. (1982). Interpretation and estimation of summary ratios under heterogeneity. Statistics in Medicine, 1:217–227.

Greenland, S. (1987). Interpretation and choice of effect measures in epidemiologic analysis. American Journal of Epidemiology, 125:761–768.

Greenland, S., and Maldonado, G. (1994). The interpretation of multiplicative-model parameters as standardized parameters. Statistics in Medicine, 13:989–999.

Greenland, S., and Robins, J. (1985). Estimation of a common effect parameter from sparse follow-up data. Biometrics, 41:55–68.

Hattori, S., and Henmi, M. (2012). Estimation of treatment effects based on possibly misspecified Cox regression. Lifetime Data Analysis, 18:408–433.

Hauck, W. W. (1979). The large sample variance of the Mantel-Haenszel estimator of a common odds ratio. Biometrics, 35:817–819.

Higgins, J. P. T., and Green, S. (2008). Cochrane Handbook for Systematic Reviews of Interventions. Chichester: Wiley-Blackwell.

Mantel, N., Brown, C., and Byar, D. P. (1977). Tests for homogeneity of effect in an epidemiologic investigation. American Journal of Epidemiology, 106:125–129.

Mantel, N., and Haenszel, W. H. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22:719–748.

Matsuyama, Y., Tominaga, T., Nomura, Y., et al. (2000). Second cancers after adjuvant tamoxifen therapy for breast cancer in Japan. Annals of Oncology, 11:1537–1543.

Nurminen, M. (1981). Asymptotic efficiency of general noniterative estimators of common relative risk. Biometrika, 68:525–530.

Robins, J. M., Breslow, N., and Greenland, S. (1986). Estimators of the Mantel-Haenszel variance consistent in both sparse data and large-strata limiting models. Biometrics, 42:311–323.

Rothman, K. J. (2002). Epidemiology: An Introduction. New York: Oxford University Press.

Rothman, K. J., Greenland, G., and Lash, T. L. (2008). Modern Epidemiology. 3rd Edition. Philadelphia: Lippincott Williams & Wilkins.

Sato, T. (1989). On variance estimator for the Mantel-Haenszel risk difference. Biometrics, 45:1323–1324.

Sato, T. (1990). Confidence intervals for effect parameters common in cancer epidemiology. Environmetal Health Perspectives, 87: 95–101.

Sato, T., and Matsuyama, Y. (2003). Marginal structural models as a tool for standardization. Epidemiology, 14:680–686.

Tarone, R. E. (1981). On summary estimators of relative risk. Journal of Chronic Diseases, 34:463–468.

Walker, A. M. (1985). Small sample properties of some estimators of a common hazard ratio. Applied Statistics, 34:42–48.

Walker, A. M., Lanza, L. L., Arellano, F., and Rothman, K. J. (1997). Mortality in current and former users of clozapine. Epidemiology, 8:671–677.

White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica, 50:1–9.

Xu, R., and O‘Quigley, J. (2000). Estimating average regression effect under non-proportional hazards. Biostatistics, 1:423–439.

Yanagimoto, T. (1990). Combining moment estimates of a parameter common through strata. Journal of Statistical Planning and Inference, 25:187–198.

Yi, G. Y., and Reid, N. (2010). A note on mis-specified estimating functions. Statistica Sinica, 20:1749–1769.

In this appendix, we describe outline of proofs of derivation of asymptotic distributions of the Mantel-Haenszel estimators. Because the functional forms of the Mantel-Haenszel estimating functions are common, the rationale of proofs is basically common. Here, we briefly describe that of the odds ratio case.

*Asymptotic I*. Taylor expansion on the Mantel-Haenszel estimating function

Because of the law of large number,

*Asymptotic II*. For the sparse data limiting model