# Subsymmetry and asymmetry models for multiway square contingency tables with ordered categories

Serpil Aktaş
From the journal Open Mathematics

# Abstract

This paper suggests several models that describe the symmetry and asymmetry structure of each subdimension for the multiway square contingency table with ordered categories. A classical three-way categorical example is examined to illustrate the model results. These models analyze the subsymmetric and asymetric structure of the table.

MSC 2010: 62H17

## 1 Introduction

Square contingency tables with the same categories occur frequently in applied sciences. Such tables arise from tabulating the repeated measurements of a categorical response variable. Some examples for these kind of tables are: for instance, when the subjects are measured at two different points in time (e.g., responses before and after experiments); the decisions of two experts are measured on the same set of subjects (e.g., the grading of the same cancer tumors by two specialists); two similar units in a sample are measured (e.g., the grades of vision of the left and the right eyes); matched pair experiments (e.g., social status of the fathers and sons) [1]. For square contingency tables, several models have been proposed (see, for example [2-8] but the models of symmetry (S), quasi-symmetry (QS), marginal homogeneity (MH) are classical and well known models [9,10] and the applicability of the these models is straightforward. The QS is less restrictive model than the S model [11-13].

Consider an RxR square contingency table with the same row and column classifications. Let pij-denote the probability that an observation will fall in the ith row and jth column of the table. Bowker [14] considered the symmetry (S) model for RxR tables defined by

p i j = p j i ( i j ) .

The S model implies that the probability that an observation will fall in cell (i, j) of the table is equal to the probability that it falls in cell (j, i).

Multiway contingency table is obtained when a sample of n observations is cross classified with respect to T categorical variables having the same number of categories. Such tables are very popular in panel studies or matched pair examples. The symmetry model is denfied in multidimensional way.

Denote the kth categorical variable by Xk (k = 1, ..., T) and consider an RT contingency table (T ≥ 3). Let pi1…iT denote the probability that an observation will fall in the (i1, ..., iT)th cell of the table.

Agresti [1] defined the S model as

p i 1... i T = p j 1... j T ,

for any permutation (j1,…,jT) of (i1, …, iT) with it=1,..., r;t = 1,..., T.

For example, when T = 3, let X, Y and Z denote the row, column and layer variables, the S model can be expressed as

p i j k = p i k j = p j i k = p j k i = p k i j = p k j i .

The simplest possible model of interest is the model of complete independence, where the joint distribution of the three variables is the product of the marginals. The corresponding hypothesis is

H 0 : p i j k = p i .. p .j . p .. k

Symmetry model for multiway tables is given in general as follows:

(1.1) p i 1... i T = ( i = 1 T α i ) ( i = 1 T α i ) ψ i 1... i T

The common schemes for representing contingency tables are based on the row column and layer variables that are independent. In three way contingency tables, the choice of predictor and control variable is of interest to many researches. The purpose of this paper is to give some models which represent the subsymmetry and asymmetry for multiway contingency tables. We will concentrate on only three dimensional tables which are a cross-classification of observations by the levels of three categorical variables.

The models are defined in the sub symmetry and asymmetry context taking the first variable as a control variable. The models below are often used to analyze three dimensional tables.

Model Terms
Saturated (XYZ)
Homogeneous associations (XY, XZ, YZ)
Conditional independence (XY, XZ), (XY, YZ), (XZ, YZ)
Joint independence (XY, Z), (XZ, Y), (X, XZ)
Complete independence (X, Y, Z)

## 2 Subsymmetry and asymmetry models

We collect the triplet (X,Y,Z) for each unit in a sample of n units, then the data can be summarized as a three-dimensional table. Let pijk be the probability of units having X = i, Y = j, and Z = k. In what follows, we define some models that represent the subsymmetry and asymmetry.

Model 1 p i j k = ( j = 1 C β i ) ( k = 1 K γ k ) ( s = 1 S ψ k ) ( l = 1 L ω l ) . δ . υ . η j = 1 , ... , C ; k = 1 , ... K ; s = 1 , 2 ; l = 1 , 2 , 3 , 4. j = 1 C β j = k = 1 K γ k = 0
Model 2 p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( l = 1 L ω l ) . υ . η j = 1 , ... , C ; k = 1 , ... K ; s = 1 , 2 ; w = 1 , 2 , 3 , 4. j = 1 C β j = k = 1 K γ k = 0
Model 3 p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( l = 1 L ω l ) . τ . η . υ i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 1 , 2 ; l = 1 , 2 , 3 , 4. j = 1 C β j = k = 1 K γ k = 0
Model 4 p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( l = 1 L ω l ) υ . η i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 2 , 3 , 5 ; l = 2 , 3 , 5. j = 1 C β j = k = 1 K γ k = 0
Model 5 p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( m = 1 M θ m ) . τ . δ . ν . i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 2 , 3 , 5 ; m = 2 , 3 , 5. j = 1 C β j = k = 1 K γ k = 0
Model 6 p i j k = ( i = 1 R α i ) ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( m = 1 M θ m ) ( l = 1 L ω l ) i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 2 , 3 , 5 ; m = 2 , 3 , 5 ; l = 2 , 3 , 5. i = 1 R α i = j = 1 C β j = k = 1 K γ k = 0
Model 7 p i j k = ( i = 1 R α i ) ( j = 1 C β j ) ( s = 1 S ψ s ) ( m = 1 M θ m ) ν . τ . η i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 2 , 3 , 5 ; m = 2 , 3 , 5. i = 1 R α i = j = 1 C β j = k = 1 K γ k = 0
Model 8 p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( m = 1 M θ m ) η . ξ i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 2 , 3 , 5 ; m = 2 , 3 , 5. j = 1 C β j = k = 1 K γ k = 0
Model 9 p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( m = 1 M θ m ) η . ξ . ν i = 1 , ... , R ; j = 1 , ... C ; k = 1 , ... , K ; s = 2 , 3 , 5 ; m = 2 , 3 , 5. j = 1 C β j = k = 1 K γ k = 0
Model 10 p i j k = ( l = 1 L ω s ) ( s = 1 S ψ s ) ( m = 1 M θ m ) i = 1 , ... , 6 ; s = 1 , ...6 ; m = 1 , ... , 6

Parameters in the models and the corresponding symbols in design matrices are defined as:

• α: row parameter (X); beta: column parameter (Y);

• γ: layer parameter (Z); ψ: symmetry parameter (S);

• ω: sub-symmetry parameter for XxZ (B);

• θ: sub-symmetry parameter for XxY (W);

• τ: conditional symmetry parameter for YxZ (CS)

• δ: inverse diagonal matrix for XxZ (SSS);

• ξ: diagonal asymmetry parameter (DA);

• η: upper triangle parameter (CCS);

• v : main diagonal parameter (V).

Each model is in the log-linear form, therefore each has its associated degrees of freedom. The number of parameters to be fit are, for instance, the degrees of freedom for Model (1), which are:

27 [ 1 + 2 + 2 + 2 + 4 + 1 + 1 + 1 ] = 13.

Subsymmetry matrices are defined by each dimension as:

For X x Y , W = [ 1 2 3 2 4 5 3 5 6 ] , For X x Z , B = [ 1 2 3 2 4 5 3 5 6 ] , For Y x Z , S = [ 1 2 3 2 4 5 3 5 6 ] .

V matrix corresponds to the cells on the main diagonal for XxYxZ.

V = [ 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 ]

The conditional factor variables are defined for the asymmetric associations as follows:

Conditional symmetry matrix:

For   { Y x Z / i = 1 , 2 } , C S = [ 1 2 2 3 1 2 3 3 1 ] ,

Upper triangle matrix:

For   { Y x Z / i = 3 } , C C S = [ 1 2 2 0 1 2 0 0 1 ]

Diagonal asymmetry matrix:

For   { Y x Z / i = 1 , 2 } , D A = [ 5 1 2 3 5 1 4 3 5 ] ,

Inverse diagonal matrix:

For {YxZ / i =1,2,3}, S S S = [ 0 0 1 0 1 0 1 0 0 ] Using these factors we analyze the models by GLM appoach.

## 3 Numerical example

The data in Table 1 are taken directly from Yamamoto et al. [15] and give results of the treatment group only in randomized clinical trials conducted by a pharmaceutical company in anemic patients with cancer receiving chemotherapy. The response is the patient's hemoglobin (HB) concentration at baseline (before treatment) and following 4 and 8 weeks of treatment. Hb response is classified as ≥ 10 g/dl, 8-10 g/dl and < 8 g/dl. The reference ranges for hemoglobin concentration in adults are as: for men: 14.0-17.5 g/dL, for women: 12.3-15.3 g/dL.

Table 1

Hemoglobin concentration at baseline, 4 weeks and 8 weeks in carcinomatous anemia patients from a randomized clinical trial.

8 weeks
Baseline 4 weeks ≥ 10 g/dl 8-10 g/dl < 8 g/dl
≥ 10 g/dl ≥ 10 g/dl 77 7 1
8-10 g/dl ≥ 10 g/dl 43 7 0
< 8 g/dl ≥ 10 g/dl 3 0 0
≥ 10 g/dl 8-10 g/dl 3 8 1
8-10 g/dl 8-10 g/dl 17 16 5
< 8 g/dl 8-10 g/dl 3 8 1
≥ 10 g/dl < 8 g/dl 1 1 1
8-10 g/dl < 8 g/dl 0 2 3
< 8 g/dl < 8 g/dl 0 4 3

The Models (1-10) proposed here attampt to analyze what is the relationship between X, Y and Z taking “Baseline” as the control.

The example of the design matrix is given for Model (8) in Table 2.

Table 2

Design matrix of Model (8).

X Y Z Parameter
Constant [Y = l] [Y = 2] [Z = l] [Z = 2] S2 S3 S5 DA W2 W3 W5 CCS
1 1 1 0 1 0 0 0 0 5 0 0 0 0
1 2 1 1 0 0 1 1 0 0 1 0 0 0 0
3 1 1 0 0 0 0 1 0 2 0 0 0 0
1 1 0 1 1 0 1 0 0 3 1 0 0 0
1 2 2 1 0 1 0 1 0 0 0 5 1 0 0 0
3 1 0 1 0 0 0 0 1 1 1 0 0 0
1 1 0 0 1 0 0 1 0 4 0 1 0 0
3 2 1 0 0 0 1 0 0 1 3 0 1 0 0
3 1 0 0 0 0 0 0 0 5 0 1 0 0
1 1 1 0 1 0 0 0 0 5 1 0 0 0
1 2 1 1 0 0 1 1 0 0 1 1 0 0 0
3 1 1 0 0 0 0 1 0 2 1 0 0 0
1 1 0 1 1 0 1 0 0 3 0 0 0 0
2 2 2 1 0 1 0 1 0 0 0 5 0 0 0 0
3 1 0 1 0 0 0 0 1 1 0 0 0 0
1 1 0 0 1 0 0 1 0 4 0 0 1 0
3 2 1 0 0 0 1 0 0 1 3 0 0 1 0
3 1 0 0 0 0 0 0 0 5 0 0 1 0
1 1 1 0 1 0 0 0 0 0 0 1 0 1
1 2 1 1 0 0 1 1 0 0 0 0 1 0 2
3 1 1 0 0 0 0 1 0 0 0 1 0 2
1 1 0 1 1 0 1 0 0 0 0 0 1 0
3 2 2 1 0 1 0 1 0 0 0 0 0 0 1 1
3 1 0 1 0 0 0 0 1 0 0 0 1 2
1 1 0 0 1 0 0 1 0 0 0 0 0 0
3 2 1 0 0 0 1 0 0 1 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 1

Design matrices are generated for each model. Likelihood ratio chi-square values with associated degrees of freedom, AIC and BIC are given in Table 3. Model comparisons, here in addition to the goodness of fit tests, tend to give better information on what model represents the data better.

Table 3

Model results under various models.

Model Terms Likelihood ratio chi-square Degrees of freedom P-value BIC AIC
1 Y, Z, SI, S2, Bl, B2, B3, B4, CCS, SSS, V 18.043 13 0.156 –51.771 –7.957
2 Y, Z, S1, S2, B1, B2, B3, B4, V, CCS 19.531 14 0.146 –55.658 –8.469
3 Y, Z, S1.S2, B1.B2, B3.B4, CC, CCS, V 19.443 13 0.110 –50.059 –6.557
4 Y, Z, S2, S3, S5, B2, B3, B5, V 20.268 14 0.122 –54.922 –7.73
5 Y, Z, V2, S2, S3, S5, B2, B3, B5, CS, CCS, SSS 19.953 13 0.096 –49.865 –6.047
6 B, Y, Z, S2, S3, S5, B2, B3, B5, W2, W3, W5, V 12.943 10 0.227 –40.763 –7.057
7 B, Y, S2, S3, S5, W2, W3, W5, V, CS, CCS 20.825 13 0.076 –48.990 –5.175
8 Y, Z, S2, S3, S5, DA, W2, W3, W5, CCS 17.623 14 0.225 57.565 10.38
9 Y, Z, S2, S3, S5, DA, W2, W5, CCS, V 15.694 13 0.266 –54.124 –10.306
10 B1, B2, B3, B4, B5, B6 13.293 11 0.275 –45.780 –8.707
S1, S2, S3, S4, S5, S6
W1, W2, W3, W4, W5, W6
Table 4

Parameter estimates under Model (8).

Parameter Estimate Std. Error Z Sig 95% Confidence Interval
Lower Bound Upper Bound
Constant 1.377 0.560 2.460 0.014 0.280 2.474
[Y = l] 1.181 0.376 3.146 0.002 0.445 1.918
[Y = 2] 0.502 0.343 1.462 0.144 –0.171 1.175
[Y = 3] Oa · · · · ·
[Z = l] 1.375 0.374 3.674 0.000 0.641 2.108
[Z = 2] 0.576 0.339 1.702 0.089 –0.087 1.240
[Z = 3] Oa · · · · ·
S2 –1.026 0.331 –3.097 0.002 –1.675 –0.377
S3 –3.283 0.769 –4.269 0.000 –4.790 –1.776
S5 –0.607 0.422 –1.439 0.150 –1.433 0.220
W2 –0.679 0.156 –4.361 0.000 –0.985 –0.374
W3 –2.298 0.511 –4.499 0.000 –3.299 –1.297
W5 –0.669 0.317 –2.107 0.035 –1.291 –0.047
CCS –0.198 0.404 –0.491 0.624 –0.990 0.593
DA 0.087 0.089 0.975 0.329 –0.088 0.261
Table 5

Odds Ratios under Model (8).

Baseline
ODDS RATIOS ≥ 10 g/dl 8-10 g/dl < 8 g/dl
0u 13.10 13.10 7.78
012 3.37 3.37 4.24
021 3.99 4.02 6.35
022 5.69 5.66 3.36
Table 6

Parameter estimates under Model (9).

Parameter Estimate Std. Error Z Sig 95% Confidence Interval
Lower Bound Upper Bound
Constant 1.638 0.609 2.690 0.007 0.445 2.832
[Y=l] 1.225 0.380 3.225 0.001 0.481 1.970
[Y=2] 0.529 0.347 1.524 0.127 –0.151 1.210
[Y=3] 0a · · · · ·
[Z = l] 1.392 0.366 3.799 0.000 0.674 2.110
[Z = 2] 0.590 0.335 1.762 0.078 –0.066 1.247
[Z = 3] 0a · · · · ·
S2 –1.252 0.376 –3.332 0.001 –1.988 –0.515
S3 –3.502 0.792 –4.422 0.000 –5.054 –1.949
S5 –0.800 0.458 –1.749 0.080 –1.697 0.096
W2 –1.065 0.325 –3.277 0.001 –1.701 –0.428
W3 –2.642 0.573 –4.607 0.000 –3.765 –1.518
W5 –0.916 0.371 –2.471 0.013 –1.642 –0.189
CCS –0.141 0.420 –0.337 0.736 –0.965 0.682
DA 0.111 0.093 1.197 0.231 –0.071 0.294
V –0.479 0.350 –1.371 0.170 –1.165 0.206
Table 7

Odds Ratios under Model (9).

Baseline
ODDS RATIOS ≥ 10 g/dl 8-10 g/dl <8 g/dl
0u 14.76 14.76 12 .21
012 2.44 3.94 3 75
021 3.07 4.93 4 91
022 9.67 5.98 3 07
Table 8

Expected frequencies under Model (8).

8 weeks
Baseline 4 weeks ≥ 10 g/dl 8-10 g/dl <8g/dl
≥ 10 g/dl ≥ 10 g/dl 78.83 8.98 0.58
8-10 g/dl ≥ 10 g/dl 6.10 9.12 1.97
< 8 g/dl ≥ 10 g/dl 0.08 0.5 0.6
≥ 10 g/dl 8-10 g/dl 39.96 4.56 0.29
8-10 g/dl 8-10 g/dl 12.04 17.99 3.89
<8g/dl 8-10 g/dl 0.43 2.56 3.13
≥ 10 g/dl < 8g/dl 4.21 0.56 0.03
8-10 g/dl < 8g/dl 4.76 4.89 1.23
< 8 g/dl < 8 g/dl 0.59 3.85 3.25
Table 9

Expected frequencies under Model (9).

8 weeks
Baseline 4 weeks ≥ 10 g/dl 8-10 g/dl < 8 g/dl
≥ 10 g/dl ≥ 10 g/dl 76.19 10.11 0.66
8-10 g/dl ≥ 10 g/dl 4.84 9.45 1.51
< 8 g/dl ≥ 10 g/dl 0.06 0.42 0.64
≥ 10 g/dl 8-10 g/dl 42.44 3.49 0.23
8-10 g/dl 8-10 g/dl 14.05 17.04 4.39
< 8 g/dl 8-10 g/dl 0.39 2.33 3.59
≥ 10 g/dl < 8 g/dl 4.36 0.49 0.03
8-10 g/dl < 8 g/dl 4.02 5.45 1.18
< 8 g/dl < 8 g/dl 0.62 4.17 2.77

The results show that all models fit the data well. The smallest value for both AIC and BIC is obtained for Model (8). Note that Model (8) and Model (9) are the conditional models that collapsed the baseline variable. Recall that Model (8) is

p i j k = ( j = 1 C β j ) ( k = 1 K γ k ) ( s = 1 S ψ s ) ( m = 1 M θ m ) η . ξ .

Correspondingly, denote mijk expected frequencies, the Model (8) is represented as

Log ( m i j k ) = Y + Z + S 2 + S 3 + S 5 + W 2 + W 3 + W 5 + D A + C C S .

In this model representation, “Baseline” is the control variable therefore it is not included in the parameters.

Model (8) tests the pijk = βjγkψ2ψ3ψ5θ2θ3θ5.η.ξ hypothesis and takes the table YxZ frequencies. The probability that a subject at baseline has hemoglobin level ≥ 10 g/dl is 13.10 more likely being ≥ 10 g/dl at 4 and 8 consequtive weeks instead of 8-10 g/dl.

The HB concentration tends to decrease from baseline throughout 8 weeks, since the maximum likelihood estimates are less than 1.

Therefore, under the model (9), the conditional probability that when a patient's Hb concentration at 4 week is ≥ 10 g/dl, the probability that a patient's HB the probability that a patient's level ≥ 10 g/dl at baseline instead of 8 weeks and 4 weeks is 14.76 times higher than a patient's Hemoglobin level ≥ 10 g/dl instead of 8-10 g/dl at 8 weeks.

The odds ratios greater than one under model (8) and model (9) indicate that the HB concentration at level ≥ 10 g/dl is more likely to occur at baseline instead of after 4 and 8 weeks.

## 4 Conclusions

We considered subsymmetry models for multiway square contingency tables in which the main diagonal is not of interest. The models are established to analyze square multidimensional contingency tables with ordered categories. We see from the results that the models described here can be applied to a multiway table. We applied models to the patient's hemoglobin concentration data set to illustrate the proposed models. The responsewas the patient's hemoglobin (Hb) concentration at baseline (before treatment) and following 4 weeks and 8 weeks of treatment. The primary goal was to compare the baselines levels to 4th and 8th weeks taking the baseline as a layer variable. We were interested in considering the changing status of patient's Hb concentration from baseline through time. But one wished to see whether there was an asymmetric transition of those concentrations or not, when the value of those concentration at baseline was given. The advantages of the models proposed here are that they are capable of analyzing the conditional odds ratios as well as the parameter estimates. Extensions to k-way tables are straightforward.

### References

[1] Agresti A., Analysis of Ordinal Categorical Data, 2nd Edition, John Wiley, Hoboken, 2002. Search in Google Scholar

[2] Yamamoto K., Tomizawa S., Analysis of Unaided Vision Data Using New Decomposition of Symmetry,” American Medical Journal, 2012, 3(1), 37–42. Search in Google Scholar

[3] Tomizawa S., Tahata K., The Analysis of Symmetry and Asymmetry: Orthogonality of Decomposition of Symmetry into Quasi-Symmetry and Marginal Symmetry for Multi-Way Tables,” Journal de la Société Francaise de Statistique, 2007,148(3), 3–36. Search in Google Scholar

[4] Tahata K. Tomizawa S., Orthogonal Decomposition of Point-Symmetry for Multiway Tables, Advances in Statistical Analysis, 2008, 92(3), 255–269. Search in Google Scholar

[5] Tahata K, Tomizawa S., Generalized Linear Asymmetry Model and Decomposition of Symmetry for Multiway Contingency Tables. J Biomet Biostat., 2011, 2(4), 1–6. Search in Google Scholar

[6] Agresti A., A Simple Diagonals-Parameter Symmetry and Quasi-Symmetry Model, Statistics and Probability Letters, 1983, 1(6), 313–316. Search in Google Scholar

[7] Miyamoto N., Ohtsuka W., Tomizawa S., Linear Diagonals-Parameter Symmetry and Quasi-Symmetry Models for Cumulative Probabilities in Square Contingency Tables with Ordered Categories,” Biometrical Journal, 2004, 46(6), 664–674. Search in Google Scholar

[8] Iki K. Yamamoto K., Tomizawa S., Quasi-diagonal exponent symmetry model for square contingency tables with ordered categories, Statistics and Probability Letters, 2014, 92, 33–38. Search in Google Scholar

[9] Goodman L.A., Multiplicative models for square contingency tables with ordered categories. Biometrika, 1979, 66, 413–418. Search in Google Scholar

[10] Bishop Y.M.M., Fienberg S.E, Holland P.W., Discrete Multivariate Analysis: Theory and Practise, MIT Press, 1975. Search in Google Scholar

[11] Caussinus H., Contribution ‘a l'analyse statistique des tableaux de correlation, Annales de la Faculté des Sciences de l’Université de Toulouse, 1966, 29, 77–182. Search in Google Scholar

[12] McCullagh P.A., Class of Parametric Models for The Analysis of Square Contingency Tables with Ordered Categories, Biometrika, 1978, 65, 413–418. Search in Google Scholar

[13] Kateri M., Agresti A., A class of ordinal quasi-symmetry models for square contingency tables, Statistics & Probability Letters, 2007, 77, 598–603. Search in Google Scholar

[14] Bowker A.H., A Test for Symmetry in Contingency Tables, Journal of the American Statistical Association, 1948, 43(244), 572–574. Search in Google Scholar

[15] Yamamoto H., IwashitaT. and Tomizawa S., Decomposition of Symmetry into Ordinal Quasi-Symmetry and Marginal Equimoment for Multi-way Tables, Austrian Journal of Statistics, 2007, 36(4), 291–306. Search in Google Scholar