BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access April 6, 2016

Subsymmetry and asymmetry models for multiway square contingency tables with ordered categories

Serpil Aktaş
From the journal Open Mathematics

Abstract

This paper suggests several models that describe the symmetry and asymmetry structure of each subdimension for the multiway square contingency table with ordered categories. A classical three-way categorical example is examined to illustrate the model results. These models analyze the subsymmetric and asymetric structure of the table.

MSC 2010: 62H17

1 Introduction

Square contingency tables with the same categories occur frequently in applied sciences. Such tables arise from tabulating the repeated measurements of a categorical response variable. Some examples for these kind of tables are: for instance, when the subjects are measured at two different points in time (e.g., responses before and after experiments); the decisions of two experts are measured on the same set of subjects (e.g., the grading of the same cancer tumors by two specialists); two similar units in a sample are measured (e.g., the grades of vision of the left and the right eyes); matched pair experiments (e.g., social status of the fathers and sons) [1]. For square contingency tables, several models have been proposed (see, for example [2-8] but the models of symmetry (S), quasi-symmetry (QS), marginal homogeneity (MH) are classical and well known models [9,10] and the applicability of the these models is straightforward. The QS is less restrictive model than the S model [11-13].

Consider an RxR square contingency table with the same row and column classifications. Let pij-denote the probability that an observation will fall in the ith row and jth column of the table. Bowker [14] considered the symmetry (S) model for RxR tables defined by

pij=pji(ij).

The S model implies that the probability that an observation will fall in cell (i, j) of the table is equal to the probability that it falls in cell (j, i).

Multiway contingency table is obtained when a sample of n observations is cross classified with respect to T categorical variables having the same number of categories. Such tables are very popular in panel studies or matched pair examples. The symmetry model is denfied in multidimensional way.

Denote the kth categorical variable by Xk (k = 1, ..., T) and consider an RT contingency table (T ≥ 3). Let pi1…iT denote the probability that an observation will fall in the (i1, ..., iT)th cell of the table.

Agresti [1] defined the S model as

pi1...iT=pj1...jT,

for any permutation (j1,…,jT) of (i1, …, iT) with it=1,..., r;t = 1,..., T.

For example, when T = 3, let X, Y and Z denote the row, column and layer variables, the S model can be expressed as

pijk=pikj=pjik=pjki=pkij=pkji.

The simplest possible model of interest is the model of complete independence, where the joint distribution of the three variables is the product of the marginals. The corresponding hypothesis is

H0:pijk=pi..p.j.p..k

Symmetry model for multiway tables is given in general as follows:

(1.1)pi1...iT=(i=1Tαi)(i=1Tαi)ψi1...iT

The common schemes for representing contingency tables are based on the row column and layer variables that are independent. In three way contingency tables, the choice of predictor and control variable is of interest to many researches. The purpose of this paper is to give some models which represent the subsymmetry and asymmetry for multiway contingency tables. We will concentrate on only three dimensional tables which are a cross-classification of observations by the levels of three categorical variables.

The models are defined in the sub symmetry and asymmetry context taking the first variable as a control variable. The models below are often used to analyze three dimensional tables.

ModelTerms
Saturated(XYZ)
Homogeneous associations(XY, XZ, YZ)
Conditional independence(XY, XZ), (XY, YZ), (XZ, YZ)
Joint independence(XY, Z), (XZ, Y), (X, XZ)
Complete independence(X, Y, Z)

2 Subsymmetry and asymmetry models

We collect the triplet (X,Y,Z) for each unit in a sample of n units, then the data can be summarized as a three-dimensional table. Let pijk be the probability of units having X = i, Y = j, and Z = k. In what follows, we define some models that represent the subsymmetry and asymmetry.

Model 1pijk=(j=1Cβi)(k=1Kγk)(s=1Sψk)(l=1Lωl).δ.υ.ηj=1,...,C;k=1,...K;s=1,2;l=1,2,3,4.j=1Cβj=k=1Kγk=0
Model 2pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(l=1Lωl).υ.ηj=1,...,C;k=1,...K;s=1,2;w=1,2,3,4.j=1Cβj=k=1Kγk=0
Model 3pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(l=1Lωl).τ.η.υi=1,...,R;j=1,...C;k=1,...,K;s=1,2;l=1,2,3,4.j=1Cβj=k=1Kγk=0
Model 4pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(l=1Lωl)υ.ηi=1,...,R;j=1,...C;k=1,...,K;s=2,3,5;l=2,3,5.j=1Cβj=k=1Kγk=0
Model 5pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(m=1Mθm).τ.δ.ν.i=1,...,R;j=1,...C;k=1,...,K;s=2,3,5;m=2,3,5.j=1Cβj=k=1Kγk=0
Model 6pijk=(i=1Rαi)(j=1Cβj)(k=1Kγk)(s=1Sψs)(m=1Mθm)(l=1Lωl)i=1,...,R;j=1,...C;k=1,...,K;s=2,3,5;m=2,3,5;l=2,3,5.i=1Rαi=j=1Cβj=k=1Kγk=0
Model 7pijk=(i=1Rαi)(j=1Cβj)(s=1Sψs)(m=1Mθm)ν.τ.ηi=1,...,R;j=1,...C;k=1,...,K;s=2,3,5;m=2,3,5.i=1Rαi=j=1Cβj=k=1Kγk=0
Model 8pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(m=1Mθm)η.ξi=1,...,R;j=1,...C;k=1,...,K;s=2,3,5;m=2,3,5.j=1Cβj=k=1Kγk=0
Model 9pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(m=1Mθm)η.ξ.νi=1,...,R;j=1,...C;k=1,...,K;s=2,3,5;m=2,3,5.j=1Cβj=k=1Kγk=0
Model 10pijk=(l=1Lωs)(s=1Sψs)(m=1Mθm)i=1,...,6;s=1,...6;m=1,...,6

Parameters in the models and the corresponding symbols in design matrices are defined as:

  • α: row parameter (X); beta: column parameter (Y);

  • γ: layer parameter (Z); ψ: symmetry parameter (S);

  • ω: sub-symmetry parameter for XxZ (B);

  • θ: sub-symmetry parameter for XxY (W);

  • τ: conditional symmetry parameter for YxZ (CS)

  • δ: inverse diagonal matrix for XxZ (SSS);

  • ξ: diagonal asymmetry parameter (DA);

  • η: upper triangle parameter (CCS);

  • v : main diagonal parameter (V).

Each model is in the log-linear form, therefore each has its associated degrees of freedom. The number of parameters to be fit are, for instance, the degrees of freedom for Model (1), which are:

27[1+2+2+2+4+1+1+1]=13.

Subsymmetry matrices are defined by each dimension as:

ForXxY,W=[123245356],ForXxZ,B=[123245356],ForYxZ,S=[123245356].

V matrix corresponds to the cells on the main diagonal for XxYxZ.

V=[100000000000010000000000001]

The conditional factor variables are defined for the asymmetric associations as follows:

Conditional symmetry matrix:

For  {YxZ/i=1,2},CS=[122312331],

Upper triangle matrix:

For  {YxZ/i=3},CCS=[122012001]

Diagonal asymmetry matrix:

For  {YxZ/i=1,2},DA=[512351435],

Inverse diagonal matrix:

For {YxZ / i =1,2,3}, SSS=[001010100] Using these factors we analyze the models by GLM appoach.

3 Numerical example

The data in Table 1 are taken directly from Yamamoto et al. [15] and give results of the treatment group only in randomized clinical trials conducted by a pharmaceutical company in anemic patients with cancer receiving chemotherapy. The response is the patient's hemoglobin (HB) concentration at baseline (before treatment) and following 4 and 8 weeks of treatment. Hb response is classified as ≥ 10 g/dl, 8-10 g/dl and < 8 g/dl. The reference ranges for hemoglobin concentration in adults are as: for men: 14.0-17.5 g/dL, for women: 12.3-15.3 g/dL.

Table 1

Hemoglobin concentration at baseline, 4 weeks and 8 weeks in carcinomatous anemia patients from a randomized clinical trial.

8 weeks
Baseline4 weeks≥ 10 g/dl8-10 g/dl< 8 g/dl
≥ 10 g/dl≥ 10 g/dl7771
8-10 g/dl≥ 10 g/dl4370
< 8 g/dl≥ 10 g/dl300
≥ 10 g/dl8-10 g/dl381
8-10 g/dl8-10 g/dl17165
< 8 g/dl8-10 g/dl381
≥ 10 g/dl< 8 g/dl111
8-10 g/dl< 8 g/dl023
< 8 g/dl< 8 g/dl043

The Models (1-10) proposed here attampt to analyze what is the relationship between X, Y and Z taking “Baseline” as the control.

The example of the design matrix is given for Model (8) in Table 2.

Table 2

Design matrix of Model (8).

XYZParameter
Constant[Y = l][Y = 2][Z = l][Z = 2]S2S3S5DAW2W3W5CCS
11101000050000
121100110010000
31100001020000
11011010031000
1221010100051000
31010000111000
11001001040100
321000100130100
31000000050100
11101000051000
121100110011000
31100001021000
11011010030000
2221010100050000
31010000110000
11001001040010
321000100130010
31000000050010
11101000000101
121100110000102
31100001000102
11011010000010
3221010100000011
31010000100012
11001001000000
321000100100000
31000000000001

Design matrices are generated for each model. Likelihood ratio chi-square values with associated degrees of freedom, AIC and BIC are given in Table 3. Model comparisons, here in addition to the goodness of fit tests, tend to give better information on what model represents the data better.

Table 3

Model results under various models.

ModelTermsLikelihood ratio chi-squareDegrees of freedomP-valueBICAIC
1Y, Z, SI, S2, Bl, B2, B3, B4, CCS, SSS, V18.043130.156–51.771–7.957
2Y, Z, S1, S2, B1, B2, B3, B4, V, CCS19.531140.146–55.658–8.469
3Y, Z, S1.S2, B1.B2, B3.B4, CC, CCS, V19.443130.110–50.059–6.557
4Y, Z, S2, S3, S5, B2, B3, B5, V20.268140.122–54.922–7.73
5Y, Z, V2, S2, S3, S5, B2, B3, B5, CS, CCS, SSS19.953130.096–49.865–6.047
6B, Y, Z, S2, S3, S5, B2, B3, B5, W2, W3, W5, V12.943100.227–40.763–7.057
7B, Y, S2, S3, S5, W2, W3, W5, V, CS, CCS20.825130.076–48.990–5.175
8Y, Z, S2, S3, S5, DA, W2, W3, W5, CCS17.623140.22557.56510.38
9Y, Z, S2, S3, S5, DA, W2, W5, CCS, V15.694130.266–54.124–10.306
10B1, B2, B3, B4, B5, B613.293110.275–45.780–8.707
S1, S2, S3, S4, S5, S6
W1, W2, W3, W4, W5, W6
Table 4

Parameter estimates under Model (8).

ParameterEstimateStd. ErrorZSig95% Confidence Interval
Lower BoundUpper Bound
Constant1.3770.5602.4600.0140.2802.474
[Y = l]1.1810.3763.1460.0020.4451.918
[Y = 2]0.5020.3431.4620.144–0.1711.175
[Y = 3]Oa·····
[Z = l]1.3750.3743.6740.0000.6412.108
[Z = 2]0.5760.3391.7020.089–0.0871.240
[Z = 3]Oa·····
S2–1.0260.331–3.0970.002–1.675–0.377
S3–3.2830.769–4.2690.000–4.790–1.776
S5–0.6070.422–1.4390.150–1.4330.220
W2–0.6790.156–4.3610.000–0.985–0.374
W3–2.2980.511–4.4990.000–3.299–1.297
W5–0.6690.317–2.1070.035–1.291–0.047
CCS–0.1980.404–0.4910.624–0.9900.593
DA0.0870.0890.9750.329–0.0880.261
Table 5

Odds Ratios under Model (8).

Baseline
ODDS RATIOS≥ 10 g/dl8-10 g/dl< 8 g/dl
0u13.1013.107.78
0123.373.374.24
0213.994.026.35
0225.695.663.36
Table 6

Parameter estimates under Model (9).

ParameterEstimateStd. ErrorZSig95% Confidence Interval
Lower BoundUpper Bound
Constant1.6380.6092.6900.0070.4452.832
[Y=l]1.2250.3803.2250.0010.4811.970
[Y=2]0.5290.3471.5240.127–0.1511.210
[Y=3]0a·····
[Z = l]1.3920.3663.7990.0000.6742.110
[Z = 2]0.5900.3351.7620.078–0.0661.247
[Z = 3]0a·····
S2–1.2520.376–3.3320.001–1.988–0.515
S3–3.5020.792–4.4220.000–5.054–1.949
S5–0.8000.458–1.7490.080–1.6970.096
W2–1.0650.325–3.2770.001–1.701–0.428
W3–2.6420.573–4.6070.000–3.765–1.518
W5–0.9160.371–2.4710.013–1.642–0.189
CCS–0.1410.420–0.3370.736–0.9650.682
DA0.1110.0931.1970.231–0.0710.294
V–0.4790.350–1.3710.170–1.1650.206
Table 7

Odds Ratios under Model (9).

Baseline
ODDS RATIOS≥ 10 g/dl8-10 g/dl<8g/dl
0u14.7614.7612.21
0122.443.94375
0213.074.93491
0229.675.98307
Table 8

Expected frequencies under Model (8).

8 weeks
Baseline4 weeks≥ 10 g/dl8-10 g/dl<8g/dl
≥ 10 g/dl≥ 10 g/dl78.838.980.58
8-10 g/dl≥ 10 g/dl6.109.121.97
< 8 g/dl≥ 10 g/dl0.080.50.6
≥ 10 g/dl8-10 g/dl39.964.560.29
8-10 g/dl8-10 g/dl12.0417.993.89
<8g/dl8-10 g/dl0.432.563.13
≥ 10 g/dl< 8g/dl4.210.560.03
8-10 g/dl< 8g/dl4.764.891.23
< 8 g/dl< 8 g/dl0.593.853.25
Table 9

Expected frequencies under Model (9).

8 weeks
Baseline4 weeks≥ 10 g/dl8-10 g/dl< 8 g/dl
≥ 10 g/dl≥ 10 g/dl76.1910.110.66
8-10 g/dl≥ 10 g/dl4.849.451.51
< 8 g/dl≥ 10 g/dl0.060.420.64
≥ 10 g/dl8-10 g/dl42.443.490.23
8-10 g/dl8-10 g/dl14.0517.044.39
< 8 g/dl8-10 g/dl0.392.333.59
≥ 10 g/dl< 8 g/dl4.360.490.03
8-10 g/dl< 8 g/dl4.025.451.18
< 8 g/dl< 8 g/dl0.624.172.77

The results show that all models fit the data well. The smallest value for both AIC and BIC is obtained for Model (8). Note that Model (8) and Model (9) are the conditional models that collapsed the baseline variable. Recall that Model (8) is

pijk=(j=1Cβj)(k=1Kγk)(s=1Sψs)(m=1Mθm)η.ξ.

Correspondingly, denote mijk expected frequencies, the Model (8) is represented as

Log(mijk)=Y+Z+S2+S3+S5+W2+W3+W5+DA+CCS.

In this model representation, “Baseline” is the control variable therefore it is not included in the parameters.

Model (8) tests the pijk = βjγkψ2ψ3ψ5θ2θ3θ5.η.ξ hypothesis and takes the table YxZ frequencies. The probability that a subject at baseline has hemoglobin level ≥ 10 g/dl is 13.10 more likely being ≥ 10 g/dl at 4 and 8 consequtive weeks instead of 8-10 g/dl.

The HB concentration tends to decrease from baseline throughout 8 weeks, since the maximum likelihood estimates are less than 1.

Therefore, under the model (9), the conditional probability that when a patient's Hb concentration at 4 week is ≥ 10 g/dl, the probability that a patient's HB the probability that a patient's level ≥ 10 g/dl at baseline instead of 8 weeks and 4 weeks is 14.76 times higher than a patient's Hemoglobin level ≥ 10 g/dl instead of 8-10 g/dl at 8 weeks.

The odds ratios greater than one under model (8) and model (9) indicate that the HB concentration at level ≥ 10 g/dl is more likely to occur at baseline instead of after 4 and 8 weeks.

4 Conclusions

We considered subsymmetry models for multiway square contingency tables in which the main diagonal is not of interest. The models are established to analyze square multidimensional contingency tables with ordered categories. We see from the results that the models described here can be applied to a multiway table. We applied models to the patient's hemoglobin concentration data set to illustrate the proposed models. The responsewas the patient's hemoglobin (Hb) concentration at baseline (before treatment) and following 4 weeks and 8 weeks of treatment. The primary goal was to compare the baselines levels to 4th and 8th weeks taking the baseline as a layer variable. We were interested in considering the changing status of patient's Hb concentration from baseline through time. But one wished to see whether there was an asymmetric transition of those concentrations or not, when the value of those concentration at baseline was given. The advantages of the models proposed here are that they are capable of analyzing the conditional odds ratios as well as the parameter estimates. Extensions to k-way tables are straightforward.

References

[1] Agresti A., Analysis of Ordinal Categorical Data, 2nd Edition, John Wiley, Hoboken, 2002. Search in Google Scholar

[2] Yamamoto K., Tomizawa S., Analysis of Unaided Vision Data Using New Decomposition of Symmetry,” American Medical Journal, 2012, 3(1), 37–42. Search in Google Scholar

[3] Tomizawa S., Tahata K., The Analysis of Symmetry and Asymmetry: Orthogonality of Decomposition of Symmetry into Quasi-Symmetry and Marginal Symmetry for Multi-Way Tables,” Journal de la Société Francaise de Statistique, 2007,148(3), 3–36. Search in Google Scholar

[4] Tahata K. Tomizawa S., Orthogonal Decomposition of Point-Symmetry for Multiway Tables, Advances in Statistical Analysis, 2008, 92(3), 255–269. Search in Google Scholar

[5] Tahata K, Tomizawa S., Generalized Linear Asymmetry Model and Decomposition of Symmetry for Multiway Contingency Tables. J Biomet Biostat., 2011, 2(4), 1–6. Search in Google Scholar

[6] Agresti A., A Simple Diagonals-Parameter Symmetry and Quasi-Symmetry Model, Statistics and Probability Letters, 1983, 1(6), 313–316. Search in Google Scholar

[7] Miyamoto N., Ohtsuka W., Tomizawa S., Linear Diagonals-Parameter Symmetry and Quasi-Symmetry Models for Cumulative Probabilities in Square Contingency Tables with Ordered Categories,” Biometrical Journal, 2004, 46(6), 664–674. Search in Google Scholar

[8] Iki K. Yamamoto K., Tomizawa S., Quasi-diagonal exponent symmetry model for square contingency tables with ordered categories, Statistics and Probability Letters, 2014, 92, 33–38. Search in Google Scholar

[9] Goodman L.A., Multiplicative models for square contingency tables with ordered categories. Biometrika, 1979, 66, 413–418. Search in Google Scholar

[10] Bishop Y.M.M., Fienberg S.E, Holland P.W., Discrete Multivariate Analysis: Theory and Practise, MIT Press, 1975. Search in Google Scholar

[11] Caussinus H., Contribution ‘a l'analyse statistique des tableaux de correlation, Annales de la Faculté des Sciences de l’Université de Toulouse, 1966, 29, 77–182. Search in Google Scholar

[12] McCullagh P.A., Class of Parametric Models for The Analysis of Square Contingency Tables with Ordered Categories, Biometrika, 1978, 65, 413–418. Search in Google Scholar

[13] Kateri M., Agresti A., A class of ordinal quasi-symmetry models for square contingency tables, Statistics & Probability Letters, 2007, 77, 598–603. Search in Google Scholar

[14] Bowker A.H., A Test for Symmetry in Contingency Tables, Journal of the American Statistical Association, 1948, 43(244), 572–574. Search in Google Scholar

[15] Yamamoto H., IwashitaT. and Tomizawa S., Decomposition of Symmetry into Ordinal Quasi-Symmetry and Marginal Equimoment for Multi-way Tables, Austrian Journal of Statistics, 2007, 36(4), 291–306. Search in Google Scholar

Received: 2015-8-1
Accepted: 2015-11-3
Published Online: 2016-4-6
Published in Print: 2016-1-1

© 2016 Serpil Aktaş, published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.