1 Introduction
Randomized clinical trials and pre-clinical studies often cover many correlated endpoints. The scales of these endpoints are often different. The experimental goal is not only to clarify which treatment groups differ but also for which endpoints. Hence, it is not clear a priori, for which endpoints differences between the treatment groups can be expected. These endpoints must be detected by the analysis a posteriori, so that they must be evaluated jointly – not separately. Multiplicity adjustment must then take both the number of treatment comparisons and the number of endpoints into account. The family-wise error type I (FWE) must be maintained with regard to all endpoints simultaneously. In addition, their correlations are important to be considered. First, the degree of conservatism of the elementary test decisions is reduced by taking them into account. Second, effects may be erroneously ignored or masked when analysing the endpoints separately. And third, the degree of correlation is essential. For example, highly correlated endpoints do not give the same amount of information about the data as uncorrelated ones.
Hasler and Hothorn [1] have described an extension of the Dunnett procedure [2] to the case of multiple, correlated endpoints. The focus is on simultaneous confidence intervals (SCIs) for differences of means. In a further paper [3], the authors have presented an extension of the trend test by Williams [4] and Bretz [5], respectively. Here the focus is on SCIs for ratio of means (based on Dilba et al. [6, 7]). Both procedures take the correlations of the endpoints into account, and the multiplicity adjustment includes both the number of endpoints and the common treatment comparisons. A related approximate multivariate t-distribution is used to obtain quantiles for SCIs and test decisions or to obtain multiplicity-adjusted p-values. The FWE is maintained in the strong sense in an admissible range. The procedures assume multivariate normally distributed endpoints with possibly different scales, allowing endpoint-specific variances.
However, one further assumption is that the covariance matrices – containing the covariances of the endpoints – are equal for all treatments. This is not fulfilled in situations when variances or correlations differ due to the treatment groups. This problem is briefly addressed by Hasler and Hothorn [1, 3], but a solution is only suggested. In addition, the suggestions made in these two papers are slightly different. Moreover, many-to-one comparisons based on Dunnett [2], or trend test based on Williams [4] and Bretz [5] are special cases of multiple contrast tests (MCTs), which allow the evaluation of a broad class of linear testing problems, such as the all-pair comparison of Tukey [8], or any user-defined contrast tests. This article presents an extension of the methods of Hasler and Hothorn [1, 3] to the general case of MCTs for multivariate normally distributed endpoints with unequal covariance matrices for the treatment groups. Former suggestions are considered in detail. In Section 2, the testing problem is formulated, approximate distributions of the test statistics are derived for several approaches. Section 3 shows results of simulations concerning the FWE. SCIs are presented in Section 4. An example is given in Section 5, conclusions and a discussion in Section 6.
2 Testing problem and test procedures
2.1 Testing problem
For
Let
Of interest are the contrasts
representing linear combinations of the means
where
This means that the overall null hypothesis
2.2 Test procedures
For the case of equal covariance matrices,
where
and correlation matrix
The submatrices
where
For the case of unequal covariance matrices, i.e. there exist at least two groups
Under
according to Satterthwaite [9]. The normal strategy would be now to derive the (approximate) joint distribution of all test statistics
This procedure is referred to here as the CE procedure, indicating that contrast- and endpoint-specific degrees of freedom are used.
As an alternative to the above-mentioned approach, Hasler and Hothorn [3] suggested another version, namely the same test statistics and correlation matrix as for the CE procedure but only contrast-specific degrees of freedom. Therefore define
For each contrast, the minimum of the degrees of freedom (6) is taken over the endpoints. Hence, q different qk-variate t-distributions will be applied, and
Considering the elements of the correlation matrix belonging to CE and MIN, one can see that the correlations of MCTs in the presence of heteroscedasticity [10] are recovered for
The decision rule for testing problem (1) is to reject
mvtnorm[13, 14] of the statistical software R [15].
3 Simulations concerning the FWE
As already in the case of equal covariance matrices [1], the derivation of the exact joint distribution of the test statistics would be a challenging problem. The endpoints have different scales, their covariances must be estimated, and the covariance matrices are unequal. In this article, approximations are used based on multivariate t-distributions. Therefore, a validation was done by simulations. Three and five treatment groups, respectively, have been compared in a simulation study. The first group was regarded as the control. The study had different numbers of endpoints with related expected values:
mvtnorm[13, 14] and
SimComp[16].
Tables 1 and 2 show the simulated
According to Xu et al. [17] and Liu et al. [18], applying multivariate t-distributions in the context of multiple endpoints and using the method of Genz and Bretz [11] may lead to slightly liberal test decisions. Also for that reason, the degrees of freedom for the MIN procedure according to eq. (8) are defined in a conservative manner. For each contrast, the minimum of the degrees of freedom (6) is taken over the endpoints.
FWE of one-sided MCTs (Dunnett contrasts) for several numbers of treatment groups and endpoints, several correlations and procedures;
| Groups | Endpoints | Correlations | Procedures | |||
| MIN | CE | BON | HOM | |||
| 0.049 | 0.049 | 0.048 | 0.141 | |||
| 0.050 | 0.052 | 0.049 | 0.135 | |||
| 0.052 | 0.053 | 0.042 | 0.121 | |||
| 0.050 | 0.052 | 0.045 | 0.131 | |||
| 0.053 | 0.058 | 0.056 | 0.200 | |||
| 0.051 | 0.056 | 0.052 | 0.190 | |||
| 0.050 | 0.052 | 0.031 | 0.131 | |||
| 0.050 | 0.054 | 0.046 | 0.173 | |||
| 0.048 | 0.056 | 0.053 | 0.267 | |||
| 0.047 | 0.055 | 0.050 | 0.250 | |||
| 0.057 | 0.061 | 0.030 | 0.158 | |||
| 0.047 | 0.054 | 0.044 | 0.230 | |||
| 0.053 | 0.053 | 0.047 | 0.160 | |||
| 0.050 | 0.051 | 0.044 | 0.157 | |||
| 0.051 | 0.052 | 0.038 | 0.126 | |||
| 0.054 | 0.055 | 0.046 | 0.148 | |||
| 0.052 | 0.058 | 0.051 | 0.225 | |||
| 0.056 | 0.060 | 0.050 | 0.208 | |||
| 0.051 | 0.053 | 0.030 | 0.145 | |||
| 0.047 | 0.051 | 0.041 | 0.192 | |||
| 0.051 | 0.059 | 0.052 | 0.318 | |||
| 0.048 | 0.055 | 0.048 | 0.300 | |||
| 0.058 | 0.062 | 0.028 | 0.166 | |||
| 0.055 | 0.061 | 0.049 | 0.280 | |||
FWE of one-sided MCTs (Tukey contrasts) for several numbers of treatment groups and endpoints, several correlations and procedures;
| Groups | Endpoints | Correlations | Procedures | |||
| MIN | CE | BON | HOM | |||
| 0.049 | 0.049 | 0.038 | 0.151 | |||
| 0.047 | 0.050 | 0.040 | 0.146 | |||
| 0.050 | 0.051 | 0.033 | 0.128 | |||
| 0.048 | 0.050 | 0.037 | 0.137 | |||
| 0.044 | 0.050 | 0.040 | 0.219 | |||
| 0.050 | 0.056 | 0.043 | 0.205 | |||
| 0.052 | 0.055 | 0.028 | 0.146 | |||
| 0.049 | 0.053 | 0.037 | 0.189 | |||
| 0.049 | 0.059 | 0.045 | 0.300 | |||
| 0.050 | 0.060 | 0.043 | 0.275 | |||
| 0.054 | 0.059 | 0.023 | 0.164 | |||
| 0.049 | 0.060 | 0.040 | 0.258 | |||
| 0.050 | 0.050 | 0.034 | 0.183 | |||
| 0.050 | 0.052 | 0.036 | 0.170 | |||
| 0.052 | 0.053 | 0.033 | 0.140 | |||
| 0.052 | 0.053 | 0.033 | 0.167 | |||
| 0.050 | 0.056 | 0.039 | 0.268 | |||
| 0.048 | 0.054 | 0.038 | 0.251 | |||
| 0.054 | 0.057 | 0.025 | 0.166 | |||
| 0.052 | 0.056 | 0.035 | 0.216 | |||
| 0.050 | 0.059 | 0.040 | 0.371 | |||
| 0.051 | 0.059 | 0.040 | 0.355 | |||
| 0.062 | 0.066 | 0.026 | 0.188 | |||
| 0.051 | 0.061 | 0.038 | 0.312 | |||
Although the procedures presented allow unequal covariance matrices for the treatment groups, the multivariate normal distribution is still a strong assumption. Therefore, a further simulation study was done to check how sensitive the proposed methods are to violations of the multivariate normal distribution. Similarly to the above study, three and five treatment groups, respectively, have been compared. The first group was regarded as the control. The study had different numbers of correlated log-normally distributed endpoints based on multivariate normally distributed data with related parameters:
Tables 3 and 4 show the simulated
FWE of one-sided MCTs (Dunnett contrasts) for several numbers of treatment groups and correlated log-normally distributed endpoints, and several procedures;
| Groups | Endpoints | Procedures | |||
| MIN | CE | BON | HOM | ||
| 0.028 | 0.028 | 0.024 | 0.077 | ||
| 0.025 | 0.026 | 0.023 | 0.093 | ||
| 0.027 | 0.029 | 0.024 | 0.113 | ||
| 0.034 | 0.035 | 0.028 | 0.099 | ||
| 0.034 | 0.035 | 0.028 | 0.106 | ||
| 0.032 | 0.035 | 0.026 | 0.150 | ||
FWE of one-sided MCTs (Tukey contrasts) for several numbers of treatment groups and correlated log-normally distributed endpoints, and several procedures;
| Groups | Endpoints | Procedures | |||
| MIN | CE | BON | HOM | ||
| 0.025 | 0.025 | 0.018 | 0.074 | ||
| 0.021 | 0.022 | 0.015 | 0.089 | ||
| 0.021 | 0.024 | 0.017 | 0.117 | ||
| 0.028 | 0.029 | 0.018 | 0.093 | ||
| 0.031 | 0.034 | 0.021 | 0.123 | ||
| 0.030 | 0.035 | 0.022 | 0.162 | ||
4 Simultaneous confidence intervals
For test decisions, as well as for the estimation of the contrasts
where
5 Example
Homma et al. [19] published summary data for five multiple, continuous endpoints of the randomized, placebo-controlled phase IIb dose-finding study of a novel anti-muscarinic agent. For a one-way layout with a zero-dose placebo group (
SimComp[16], command
ermvnorm(), of the statistical software R [15]. For these data, the summary statistics are given in Table 5. The correlation matrix was not given by Homma et al. [19]. Therefore, the semi-synthetic example data are based on the same theoretical correlation matrix for all treatment groups, namely
representing plausible highly positive (e.g. 0.8 for Uiepw and Uepd) and lightly negative (e.g. –0.3 for Mpd and Uvvpm) correlations for these multiple urinary endpoints. Note that means and standard deviations of the generated and of the original data set are exactly the same. Actually, some changes to the baseline were negative. The related data were multiplied by minus one, so that all endpoints have the same positive direction and higher values indicate a better treatment effect. The standard deviations per endpoint clearly differ depending on the treatment group. For example, endpoint Uiepw has standard deviations 272.76 (Placebo), 72.88 (Imid0.1), 41.11 (Imid0.2), and 93.45 (Imid0.5).
Summary statistics (mean (sd)) for the urinary endpoints of the data set in Homma et al. [19]
| Endpoint | Placebo | Imid0.1 | Imid0.2 | Imid0.5 | ||||
| Iepw | 42.86 | (70.17) | 59.81 | (61.48) | 71.61 | (43.95) | 82.19 | (28.68) |
| Uiepw | 18.94 | (272.76) | 57.07 | (72.88) | 75.67 | (41.11) | 74.20 | (93.45) |
| Mpd | 1.07 | (1.93) | 1.72 | (2.11) | 1.59 | (1.89) | 2.33 | (2.20) |
| Uepd | 38.12 | (62.58) | 60.29 | (43.51) | 57.37 | (53.28) | 62.31 | (32.64) |
| Uvvpm | 2.29 | (42.70) | 14.06 | (37.50) | 9.89 | (37.64) | 26.11 | (43.79) |
| Sample size | 95 | 91 | 93 | 76 | ||||
These data are the same as already used in Hasler and Hothorn [3]. The authors considered SCIs for ratios of means, and contrasts related to the trend test of Williams [4]. Here, SCIs for differences of means, and contrasts related to Dunnett [2] are applied. The Dunnett-type contrast matrix is given by
where the rows represent the comparisons versus the placebo, and the columns represent the treatment groups. The differences of interest are
and the hypotheses to be tested are given by
Table 6 shows the estimated differences to the placebo (Estimate), p-values of Dunnett tests, separately for the endpoints and assuming homogeneous variances for the dose groups according to Homma et al. [19] (p-val. (prev.), adjusted p-values according to the multivariate MIN procedure described (p-val. (adj.), and related lower simultaneous confidence limits (Lower limit) for each dose and each endpoint. By definition, the multivariate procedure is more conservative than elementary Dunnett tests separately for the endpoints, each at level
Summary of the test for the semi-synthetic example data according to Homma et al. [19]
| Dose | Endpoint | Estimate | p-val. (prev.) | p-val. (adj.) | Lower limit |
| Imid0.1 | Iepw | 16.95 | 0.0906 | 0.2978 | −8.46 |
| Imid0.1 | Uiepw | 38.13 | 0.2287 | 0.5300 | −38.12 |
| Imid0.1 | Mpd | 0.65 | 0.0791 | 0.1356 | −0.13 |
| Imid0.1 | Uepd | 22.17 | 0.0079 | 0.0312 | 1.46 |
| Imid0.1 | Uvvpm | 11.77 | 0.1367 | 0.1965 | −3.71 |
| Imid0.2 | Iepw | 28.75 | 0.0010 | 0.0063 | 6.31 |
| Imid0.2 | Uiepw | 56.73 | 0.0335 | 0.1950 | −17.81 |
| Imid0.2 | Mpd | 0.52 | 0.1920 | 0.2485 | −0.21 |
| Imid0.2 | Uepd | 19.25 | 0.0246 | 0.1147 | −3.05 |
| Imid0.2 | Uvvpm | 7.60 | 0.4653 | 0.5412 | −7.85 |
| Imid0.5 | Iepw | 39.33 | <0.0001 | <0.0001 | 18.55 |
| Imid0.5 | Uiepw | 55.26 | 0.0541 | 0.2571 | −23.42 |
| Imid0.5 | Mpd | 1.26 | 0.0002 | 0.0009 | 0.42 |
| Imid0.5 | Uepd | 24.19 | 0.0053 | 0.0086 | 4.68 |
| Imid0.5 | Uvvpm | 23.82 | 0.0006 | 0.0031 | 6.33 |
The package
SimComp[16] of the statistical software R [15] provides calculations concerning simultaneous tests and confidence intervals for both difference- and ratio-based contrasts of normal means for data with possibly more than one primary endpoint. The covariance matrices – containing the covariances between the endpoints – may be assumed to be equal or possibly unequal for the different groups. (The MIN procedure is realized for the latter case.) This package was used to re-generate and to analyse the example data. It is available at http://www.r-project.org. The input for the results of Table 6 is approximately:
SimCiDiff(data=data object,
grp="name of the grouping variable",
resp=c("name of endpoint 1"," name of endpoint 2","..."),
type="Dunnett",
base=alphanumerical number of the control group,
alternative="greater",
covar.equal=FALSE).
For the related p-values use the command
SimTestDiff(), and for ratio-based testing and intervals
SimTestRat()and
SimCiRat(), respectively.
6 Conclusions and discussion
For the special cases of contrasts related to Dunnett [2] and Williams [4], Hasler and Hothorn [1, 3] had proposed extensions to the case of multiple endpoints. Their approaches have been generalized in this article to any MCTs and to the situation of heteroscedasticity, i.e. unequal covariance matrices
In the presence of heteroscedasticity, the CE procedure with contrast- and endpoint-specific degrees of freedom tends to liberal test decisions, whereas the MIN procedure with (minimized) contrast-specific degrees of freedom maintains the FWE in the strong sense in a passable range. This was shown by simulations. Possible slight variations around the nominal FWE
If the assumption of a multivariate normal distribution for the data is not fulfilled, the procedures considered cannot be recommended without caution. The simulation results show that they yield conservative test decisions if the data follow distributions with a positive skew. In this case or for other non-normal distributions, non-parametric procedures – like those of Munzel and Brunner [22]; Bathke and Harrar [23]; Harrar and Bathke [24, 25] – should be used instead. Note that these procedures do not provide multiplicity-adjusted p-values or SCIs for each contrast-endpoint combination as they use ANOVA-type
The main advantage of usual MCTs over other testing procedures is that the correlations of the contrasts are taken into account. These correlations mainly depend on whether and which treatment means are involved in the contrasts. All treatment means, however, which are involved in the same contrast are independent. Consequently, the MIN procedure can also be used for the analysis of repeated measures by simply regarding the time points as endpoints, as long as the condition
A solution to the following problem might be a task for the future: the MIN procedure presented generally assumes unequal covariance matrices which occur if variances or correlations of some endpoints differ depending on the treatment groups. The procedure does not distinguish whether unequal correlations or unequal variances lead to unequal covariances. In practice, however, it can occur that just the variances differ, and the correlation structure stays the same for the treatment groups (or the other way round).
References
- 2.↑
Dunnett CW. A multiple comparison procedure for comparing several treatments with a control. J Am Stat Assoc 1955;50:1096–121.
- 3.↑
Hasler M, Hothorn LA. A multivariate Williams-type trend procedure. Stat Biopharm Res 2012;4:57–65.
- 4.↑
Williams DA. A test for differences between treatment means when several dose levels are compared with a zero dose control. Biometrics 1971;27:103–17.
- 5.↑
Bretz F. An extension of the Williams trend test to general unbalanced linear models. Comput Stat Data Anal 2006;50:1735–48.
- 6.↑
Dilba G, Bretz F, Guiard V. Simultaneous confidence sets and confidence intervals for multiple ratios. J Stat Plann Inference 2006;136:2640–58.
- 7.↑
Dilba G, Bretz F, Guiard V, Hothorn LA. Simultaneous confidence intervals for ratios with applications to the comparison of several treatments with a control. Methods Inf Med 2004;43:465–9.
- 8.↑
Tukey JW. The problem of multiple comparisons. Dittoed manuscript of 396 pages. Princeton, NJ: Department of Statistics, Princeton University, 1953.
- 9.↑
Satterthwaite FE. An approximate distribution of estimates of variance components. Biometrics 1946;2:110–14.
- 10.↑
Hasler M, Hothorn LA. Multiple contrast tests in the presence of heteroscedasticity. Biom J 2008;50:793–800.
- 11.↑
Genz A, Bretz F. Methods for the computation of multivariate t-probabilities. J Comput Graphical Stat 2002;11:950–71.
- 12.↑
Bretz F, Genz A, Hothorn LA. On the numerical availability of multiple comparison procedures. Biom J 2001;43:645–56.
- 13.↑
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, et al. mvtnorm: Multivariate normal and t distributions. R package version 0.9-9994, 2012. Available at: http://CRAN.R-project.org/package=mvtnorm
- 15.↑
R Core Team. R: A language and environment for statistical computing. In: R Foundation for Statistical Computing, Vienna, Austria, 2012. Available at: http://www.R-project.org/, ISBN 3-900051-07-0.
- 16.↑
Hasler M. SimCOMP: Simultaneous comparisons for multiple endpoints. R package version 1.7.0, 2012. Available at: http://CRAN.R-project.org/package=SimComp
- 17.↑
Xu HY, Nuamah I, Liu JY, Lim P, Sampson A. A Dunnett-Bonferroni-based parallel gatekeeping procedure for dose-response clinical trials with multiple endpoints. Pharm Stat 2009;8:301–16.
- 18.↑
Liu Y, Hsu J, Ruberg S. Partition testing in dose-response studies with multiple endpoints. Pharm Stat 2007;6:181–92.
- 19.↑
Homma Y, Yamaguchi T, Yamaguchi O. A randomized, double-blind, placebo-controlled phase ii dose-finding study of the novel anti-muscarinic agent imidafenacin in Japanese patients with overactive bladder. Int J Urol 2008;15:809–15.
- 21.↑
Hommel G. A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 1988;75:383–6.
- 22.↑
Munzel U, Brunner E. Nonparametric methods in multivariate factorial designs. J Stat Plann Inference 2000;88:117–32.
- 23.↑
Bathke AC, Harrar SW. Nonparametric methods in multivariate factorial designs for large number of factor levels. J Stat Plann Inference 2008;138:588–610.
- 24.↑
Harrar SW, Bathke AC. Nonparametric methods for unbalanced multivariate data and many factor levels. J Multivariate Anal 2008;99:1635–64.
- 25.↑
Harrar SW, Bathke AC. A modified two-factor multivariate analysis of variance: asymptotics and small sample approximations. Ann Inst Stat Math 2012;64:135–65.
- 26.
Hasler, M. (2013): Multiple contrasts for repeated measures. The International Journal of Biostatistics, 9, 49–61.
