Abstract
This paper suggests several models that describe the symmetry and asymmetry structure of each subdimension for the multiway square contingency table with ordered categories. A classical three-way categorical example is examined to illustrate the model results. These models analyze the subsymmetric and asymetric structure of the table.
1 Introduction
Square contingency tables with the same categories occur frequently in applied sciences. Such tables arise from tabulating the repeated measurements of a categorical response variable. Some examples for these kind of tables are: for instance, when the subjects are measured at two different points in time (e.g., responses before and after experiments); the decisions of two experts are measured on the same set of subjects (e.g., the grading of the same cancer tumors by two specialists); two similar units in a sample are measured (e.g., the grades of vision of the left and the right eyes); matched pair experiments (e.g., social status of the fathers and sons) [1]. For square contingency tables, several models have been proposed (see, for example [2-8] but the models of symmetry (S), quasi-symmetry (QS), marginal homogeneity (MH) are classical and well known models [9,10] and the applicability of the these models is straightforward. The QS is less restrictive model than the S model [11-13].
Consider an RxR square contingency table with the same row and column classifications. Let pij-denote the probability that an observation will fall in the ith row and jth column of the table. Bowker [14] considered the symmetry (S) model for RxR tables defined by
The S model implies that the probability that an observation will fall in cell (i, j) of the table is equal to the probability that it falls in cell (j, i).
Multiway contingency table is obtained when a sample of n observations is cross classified with respect to T categorical variables having the same number of categories. Such tables are very popular in panel studies or matched pair examples. The symmetry model is denfied in multidimensional way.
Denote the kth categorical variable by Xk (k = 1, ..., T) and consider an RT contingency table (T ≥ 3). Let pi1…iT denote the probability that an observation will fall in the (i1, ..., iT)th cell of the table.
Agresti [1] defined the S model as
for any permutation (j1,…,jT) of (i1, …, iT) with it=1,..., r;t = 1,..., T.
For example, when T = 3, let X, Y and Z denote the row, column and layer variables, the S model can be expressed as
The simplest possible model of interest is the model of complete independence, where the joint distribution of the three variables is the product of the marginals. The corresponding hypothesis is
Symmetry model for multiway tables is given in general as follows:
The common schemes for representing contingency tables are based on the row column and layer variables that are independent. In three way contingency tables, the choice of predictor and control variable is of interest to many researches. The purpose of this paper is to give some models which represent the subsymmetry and asymmetry for multiway contingency tables. We will concentrate on only three dimensional tables which are a cross-classification of observations by the levels of three categorical variables.
The models are defined in the sub symmetry and asymmetry context taking the first variable as a control variable. The models below are often used to analyze three dimensional tables.
Model | Terms |
---|---|
Saturated | (XYZ) |
Homogeneous associations | (XY, XZ, YZ) |
Conditional independence | (XY, XZ), (XY, YZ), (XZ, YZ) |
Joint independence | (XY, Z), (XZ, Y), (X, XZ) |
Complete independence | (X, Y, Z) |
2 Subsymmetry and asymmetry models
We collect the triplet (X,Y,Z) for each unit in a sample of n units, then the data can be summarized as a three-dimensional table. Let pijk be the probability of units having X = i, Y = j, and Z = k. In what follows, we define some models that represent the subsymmetry and asymmetry.
Parameters in the models and the corresponding symbols in design matrices are defined as:
α: row parameter (X); beta: column parameter (Y);
γ: layer parameter (Z); ψ: symmetry parameter (S);
ω: sub-symmetry parameter for XxZ (B);
θ: sub-symmetry parameter for XxY (W);
τ: conditional symmetry parameter for YxZ (CS)
δ: inverse diagonal matrix for XxZ (SSS);
ξ: diagonal asymmetry parameter (DA);
η: upper triangle parameter (CCS);
v : main diagonal parameter (V).
Each model is in the log-linear form, therefore each has its associated degrees of freedom. The number of parameters to be fit are, for instance, the degrees of freedom for Model (1), which are:
Subsymmetry matrices are defined by each dimension as:
V matrix corresponds to the cells on the main diagonal for XxYxZ.
The conditional factor variables are defined for the asymmetric associations as follows:
Conditional symmetry matrix:
Upper triangle matrix:
Diagonal asymmetry matrix:
Inverse diagonal matrix:
For {YxZ / i =1,2,3},
3 Numerical example
The data in Table 1 are taken directly from Yamamoto et al. [15] and give results of the treatment group only in randomized clinical trials conducted by a pharmaceutical company in anemic patients with cancer receiving chemotherapy. The response is the patient's hemoglobin (HB) concentration at baseline (before treatment) and following 4 and 8 weeks of treatment. Hb response is classified as ≥ 10 g/dl, 8-10 g/dl and < 8 g/dl. The reference ranges for hemoglobin concentration in adults are as: for men: 14.0-17.5 g/dL, for women: 12.3-15.3 g/dL.
8 weeks | |||||
---|---|---|---|---|---|
Baseline | 4 weeks | ≥ 10 g/dl | 8-10 g/dl | < 8 g/dl | |
≥ 10 g/dl | ≥ 10 g/dl | 77 | 7 | 1 | |
8-10 g/dl | ≥ 10 g/dl | 43 | 7 | 0 | |
< 8 g/dl | ≥ 10 g/dl | 3 | 0 | 0 | |
≥ 10 g/dl | 8-10 g/dl | 3 | 8 | 1 | |
8-10 g/dl | 8-10 g/dl | 17 | 16 | 5 | |
< 8 g/dl | 8-10 g/dl | 3 | 8 | 1 | |
≥ 10 g/dl | < 8 g/dl | 1 | 1 | 1 | |
8-10 g/dl | < 8 g/dl | 0 | 2 | 3 | |
< 8 g/dl | < 8 g/dl | 0 | 4 | 3 |
The Models (1-10) proposed here attampt to analyze what is the relationship between X, Y and Z taking “Baseline” as the control.
The example of the design matrix is given for Model (8) in Table 2.
X | Y | Z | Parameter | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Constant | [Y = l] | [Y = 2] | [Z = l] | [Z = 2] | S2 | S3 | S5 | DA | W2 | W3 | W5 | CCS | |||
1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | ||
1 | 2 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 0 | 0 | 0 | 0 | ||
1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 3 | 1 | 0 | 0 | 0 | ||
1 | 2 | 2 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 1 | 0 | 0 | 0 |
3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | ||
1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 4 | 0 | 1 | 0 | 0 | ||
3 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 3 | 0 | 1 | 0 | 0 | |
3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 1 | 0 | 0 | ||
1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 5 | 1 | 0 | 0 | 0 | ||
1 | 2 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | |
3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | ||
1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | ||
2 | 2 | 2 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | ||
1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 4 | 0 | 0 | 1 | 0 | ||
3 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 3 | 0 | 0 | 1 | 0 | |
3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 1 | 0 | ||
1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | ||
1 | 2 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | |
3 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 2 | ||
1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | ||
3 | 2 | 2 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 2 | ||
1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ||
3 | 2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Design matrices are generated for each model. Likelihood ratio chi-square values with associated degrees of freedom, AIC and BIC are given in Table 3. Model comparisons, here in addition to the goodness of fit tests, tend to give better information on what model represents the data better.
Model | Terms | Likelihood ratio chi-square | Degrees of freedom | P-value | BIC | AIC |
---|---|---|---|---|---|---|
1 | Y, Z, SI, S2, Bl, B2, B3, B4, CCS, SSS, V | 18.043 | 13 | 0.156 | –51.771 | –7.957 |
2 | Y, Z, S1, S2, B1, B2, B3, B4, V, CCS | 19.531 | 14 | 0.146 | –55.658 | –8.469 |
3 | Y, Z, S1.S2, B1.B2, B3.B4, CC, CCS, V | 19.443 | 13 | 0.110 | –50.059 | –6.557 |
4 | Y, Z, S2, S3, S5, B2, B3, B5, V | 20.268 | 14 | 0.122 | –54.922 | –7.73 |
5 | Y, Z, V2, S2, S3, S5, B2, B3, B5, CS, CCS, SSS | 19.953 | 13 | 0.096 | –49.865 | –6.047 |
6 | B, Y, Z, S2, S3, S5, B2, B3, B5, W2, W3, W5, V | 12.943 | 10 | 0.227 | –40.763 | –7.057 |
7 | B, Y, S2, S3, S5, W2, W3, W5, V, CS, CCS | 20.825 | 13 | 0.076 | –48.990 | –5.175 |
8 | Y, Z, S2, S3, S5, DA, W2, W3, W5, CCS | 17.623 | 14 | 0.225 | –57.565 | –10.38 |
9 | Y, Z, S2, S3, S5, DA, W2, W5, CCS, V | 15.694 | 13 | 0.266 | –54.124 | –10.306 |
10 | B1, B2, B3, B4, B5, B6 | 13.293 | 11 | 0.275 | –45.780 | –8.707 |
S1, S2, S3, S4, S5, S6 | ||||||
W1, W2, W3, W4, W5, W6 |
Parameter | Estimate | Std. Error | Z | Sig | 95% Confidence Interval | |
---|---|---|---|---|---|---|
Lower Bound | Upper Bound | |||||
Constant | 1.377 | 0.560 | 2.460 | 0.014 | 0.280 | 2.474 |
[Y = l] | 1.181 | 0.376 | 3.146 | 0.002 | 0.445 | 1.918 |
[Y = 2] | 0.502 | 0.343 | 1.462 | 0.144 | –0.171 | 1.175 |
[Y = 3] | Oa | · | · | · | · | · |
[Z = l] | 1.375 | 0.374 | 3.674 | 0.000 | 0.641 | 2.108 |
[Z = 2] | 0.576 | 0.339 | 1.702 | 0.089 | –0.087 | 1.240 |
[Z = 3] | Oa | · | · | · | · | · |
S2 | –1.026 | 0.331 | –3.097 | 0.002 | –1.675 | –0.377 |
S3 | –3.283 | 0.769 | –4.269 | 0.000 | –4.790 | –1.776 |
S5 | –0.607 | 0.422 | –1.439 | 0.150 | –1.433 | 0.220 |
W2 | –0.679 | 0.156 | –4.361 | 0.000 | –0.985 | –0.374 |
W3 | –2.298 | 0.511 | –4.499 | 0.000 | –3.299 | –1.297 |
W5 | –0.669 | 0.317 | –2.107 | 0.035 | –1.291 | –0.047 |
CCS | –0.198 | 0.404 | –0.491 | 0.624 | –0.990 | 0.593 |
DA | 0.087 | 0.089 | 0.975 | 0.329 | –0.088 | 0.261 |
Baseline | |||
---|---|---|---|
ODDS RATIOS | ≥ 10 g/dl | 8-10 g/dl | < 8 g/dl |
0u | 13.10 | 13.10 | 7.78 |
012 | 3.37 | 3.37 | 4.24 |
021 | 3.99 | 4.02 | 6.35 |
022 | 5.69 | 5.66 | 3.36 |
Parameter | Estimate | Std. Error | Z | Sig | 95% Confidence Interval | |
---|---|---|---|---|---|---|
Lower Bound | Upper Bound | |||||
Constant | 1.638 | 0.609 | 2.690 | 0.007 | 0.445 | 2.832 |
[Y=l] | 1.225 | 0.380 | 3.225 | 0.001 | 0.481 | 1.970 |
[Y=2] | 0.529 | 0.347 | 1.524 | 0.127 | –0.151 | 1.210 |
[Y=3] | 0a | · | · | · | · | · |
[Z = l] | 1.392 | 0.366 | 3.799 | 0.000 | 0.674 | 2.110 |
[Z = 2] | 0.590 | 0.335 | 1.762 | 0.078 | –0.066 | 1.247 |
[Z = 3] | 0a | · | · | · | · | · |
S2 | –1.252 | 0.376 | –3.332 | 0.001 | –1.988 | –0.515 |
S3 | –3.502 | 0.792 | –4.422 | 0.000 | –5.054 | –1.949 |
S5 | –0.800 | 0.458 | –1.749 | 0.080 | –1.697 | 0.096 |
W2 | –1.065 | 0.325 | –3.277 | 0.001 | –1.701 | –0.428 |
W3 | –2.642 | 0.573 | –4.607 | 0.000 | –3.765 | –1.518 |
W5 | –0.916 | 0.371 | –2.471 | 0.013 | –1.642 | –0.189 |
CCS | –0.141 | 0.420 | –0.337 | 0.736 | –0.965 | 0.682 |
DA | 0.111 | 0.093 | 1.197 | 0.231 | –0.071 | 0.294 |
V | –0.479 | 0.350 | –1.371 | 0.170 | –1.165 | 0.206 |
Baseline | ||||
---|---|---|---|---|
ODDS RATIOS | ≥ 10 g/dl | 8-10 g/dl | <8 | g/dl |
0u | 14.76 | 14.76 | 12 | .21 |
012 | 2.44 | 3.94 | 3 | 75 |
021 | 3.07 | 4.93 | 4 | 91 |
022 | 9.67 | 5.98 | 3 | 07 |
8 weeks | ||||
---|---|---|---|---|
Baseline | 4 weeks | ≥ 10 g/dl | 8-10 g/dl | <8g/dl |
≥ 10 g/dl | ≥ 10 g/dl | 78.83 | 8.98 | 0.58 |
8-10 g/dl | ≥ 10 g/dl | 6.10 | 9.12 | 1.97 |
< 8 g/dl | ≥ 10 g/dl | 0.08 | 0.5 | 0.6 |
≥ 10 g/dl | 8-10 g/dl | 39.96 | 4.56 | 0.29 |
8-10 g/dl | 8-10 g/dl | 12.04 | 17.99 | 3.89 |
<8g/dl | 8-10 g/dl | 0.43 | 2.56 | 3.13 |
≥ 10 g/dl | < 8g/dl | 4.21 | 0.56 | 0.03 |
8-10 g/dl | < 8g/dl | 4.76 | 4.89 | 1.23 |
< 8 g/dl | < 8 g/dl | 0.59 | 3.85 | 3.25 |
8 weeks | ||||
---|---|---|---|---|
Baseline | 4 weeks | ≥ 10 g/dl | 8-10 g/dl | < 8 g/dl |
≥ 10 g/dl | ≥ 10 g/dl | 76.19 | 10.11 | 0.66 |
8-10 g/dl | ≥ 10 g/dl | 4.84 | 9.45 | 1.51 |
< 8 g/dl | ≥ 10 g/dl | 0.06 | 0.42 | 0.64 |
≥ 10 g/dl | 8-10 g/dl | 42.44 | 3.49 | 0.23 |
8-10 g/dl | 8-10 g/dl | 14.05 | 17.04 | 4.39 |
< 8 g/dl | 8-10 g/dl | 0.39 | 2.33 | 3.59 |
≥ 10 g/dl | < 8 g/dl | 4.36 | 0.49 | 0.03 |
8-10 g/dl | < 8 g/dl | 4.02 | 5.45 | 1.18 |
< 8 g/dl | < 8 g/dl | 0.62 | 4.17 | 2.77 |
The results show that all models fit the data well. The smallest value for both AIC and BIC is obtained for Model (8). Note that Model (8) and Model (9) are the conditional models that collapsed the baseline variable. Recall that Model (8) is
Correspondingly, denote mijk expected frequencies, the Model (8) is represented as
In this model representation, “Baseline” is the control variable therefore it is not included in the parameters.
Model (8) tests the pijk = βjγkψ2ψ3ψ5θ2θ3θ5.η.ξ hypothesis and takes the table YxZ frequencies. The probability that a subject at baseline has hemoglobin level ≥ 10 g/dl is 13.10 more likely being ≥ 10 g/dl at 4 and 8 consequtive weeks instead of 8-10 g/dl.
The HB concentration tends to decrease from baseline throughout 8 weeks, since the maximum likelihood estimates are less than 1.
Therefore, under the model (9), the conditional probability that when a patient's Hb concentration at 4 week is ≥ 10 g/dl, the probability that a patient's HB the probability that a patient's level ≥ 10 g/dl at baseline instead of 8 weeks and 4 weeks is 14.76 times higher than a patient's Hemoglobin level ≥ 10 g/dl instead of 8-10 g/dl at 8 weeks.
The odds ratios greater than one under model (8) and model (9) indicate that the HB concentration at level ≥ 10 g/dl is more likely to occur at baseline instead of after 4 and 8 weeks.
4 Conclusions
We considered subsymmetry models for multiway square contingency tables in which the main diagonal is not of interest. The models are established to analyze square multidimensional contingency tables with ordered categories. We see from the results that the models described here can be applied to a multiway table. We applied models to the patient's hemoglobin concentration data set to illustrate the proposed models. The responsewas the patient's hemoglobin (Hb) concentration at baseline (before treatment) and following 4 weeks and 8 weeks of treatment. The primary goal was to compare the baselines levels to 4th and 8th weeks taking the baseline as a layer variable. We were interested in considering the changing status of patient's Hb concentration from baseline through time. But one wished to see whether there was an asymmetric transition of those concentrations or not, when the value of those concentration at baseline was given. The advantages of the models proposed here are that they are capable of analyzing the conditional odds ratios as well as the parameter estimates. Extensions to k-way tables are straightforward.
References
[1] Agresti A., Analysis of Ordinal Categorical Data, 2nd Edition, John Wiley, Hoboken, 2002.10.1002/0471249688Search in Google Scholar
[2] Yamamoto K., Tomizawa S., Analysis of Unaided Vision Data Using New Decomposition of Symmetry,” American Medical Journal, 2012, 3(1), 37–42.10.3844/amjsp.2012.37.42Search in Google Scholar
[3] Tomizawa S., Tahata K., The Analysis of Symmetry and Asymmetry: Orthogonality of Decomposition of Symmetry into Quasi-Symmetry and Marginal Symmetry for Multi-Way Tables,” Journal de la Société Francaise de Statistique, 2007,148(3), 3–36.Search in Google Scholar
[4] Tahata K. Tomizawa S., Orthogonal Decomposition of Point-Symmetry for Multiway Tables, Advances in Statistical Analysis, 2008, 92(3), 255–269.10.1007/s10182-008-0070-5Search in Google Scholar
[5] Tahata K, Tomizawa S., Generalized Linear Asymmetry Model and Decomposition of Symmetry for Multiway Contingency Tables. J Biomet Biostat., 2011, 2(4), 1–6.10.4172/2155-6180.1000120Search in Google Scholar
[6] Agresti A., A Simple Diagonals-Parameter Symmetry and Quasi-Symmetry Model, Statistics and Probability Letters, 1983, 1(6), 313–316.10.1016/0167-7152(83)90051-2Search in Google Scholar
[7] Miyamoto N., Ohtsuka W., Tomizawa S., Linear Diagonals-Parameter Symmetry and Quasi-Symmetry Models for Cumulative Probabilities in Square Contingency Tables with Ordered Categories,” Biometrical Journal, 2004, 46(6), 664–674.10.1002/bimj.200410066Search in Google Scholar
[8] Iki K. Yamamoto K., Tomizawa S., Quasi-diagonal exponent symmetry model for square contingency tables with ordered categories, Statistics and Probability Letters, 2014, 92, 33–38.10.1016/j.spl.2014.04.029Search in Google Scholar
[9] Goodman L.A., Multiplicative models for square contingency tables with ordered categories. Biometrika, 1979, 66, 413–418.10.1093/biomet/66.3.413Search in Google Scholar
[10] Bishop Y.M.M., Fienberg S.E, Holland P.W., Discrete Multivariate Analysis: Theory and Practise, MIT Press, 1975.Search in Google Scholar
[11] Caussinus H., Contribution ‘a l'analyse statistique des tableaux de correlation, Annales de la Faculté des Sciences de l’Université de Toulouse, 1966, 29, 77–182.10.5802/afst.519Search in Google Scholar
[12] McCullagh P.A., Class of Parametric Models for The Analysis of Square Contingency Tables with Ordered Categories, Biometrika, 1978, 65, 413–418.10.1093/biomet/65.2.413Search in Google Scholar
[13] Kateri M., Agresti A., A class of ordinal quasi-symmetry models for square contingency tables, Statistics & Probability Letters, 2007, 77, 598–603.10.1016/j.spl.2006.09.015Search in Google Scholar
[14] Bowker A.H., A Test for Symmetry in Contingency Tables, Journal of the American Statistical Association, 1948, 43(244), 572–574.10.1080/01621459.1948.10483284Search in Google Scholar PubMed
[15] Yamamoto H., IwashitaT. and Tomizawa S., Decomposition of Symmetry into Ordinal Quasi-Symmetry and Marginal Equimoment for Multi-way Tables, Austrian Journal of Statistics, 2007, 36(4), 291–306.10.17713/ajs.v36i4.340Search in Google Scholar
© 2016 Serpil Aktaş, published by De Gruyter Open
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.