In this study, attention is paid on regression models for the analysis of ordinal response variable with more than two response categories, especially which are ordered. Multinomial distribution is an extension of the binomial distribution to more than two response categories.

Consider a response variable *Y*_{i}; *i*=1,2,…, *N* taking one of several discrete values. Let *P*(*Y*_{i} = *j*) = *π*_{ij}; *i* = 1,2,…,*N j* = 1,2,…, *c* denotes the probability that the response belonging to the *i*^{th} subject falls in the *j*^{th} ordinal response category. In this study, the response variable is “energy security” and it takes the ordered values *A*, *B*, *C*, and *D* as categories indexed by 1, 2, 3, and 4 for 61 different countries from 6 regions in the world.

Let *P*(*Y*_{i} = *j*)= *π*_{ij}; *i* = 1,2,…,61 *j* = 1,2,3,4 denotes the probability that the *i*^{th} country’s response (*i* = 1,2,…,61) falls in the *j*^{th} energy security grade level (*j* = 1,2,3,4). In this study, π_{il} is the probability that the *i*^{th} response is in *A* energy security grade level and so on. Assume that the response categories are mutually exclusive and exhaustive,
$\begin{array}{}\sum _{i=1}^{N=61}\sum _{j=1}^{c=4}{\pi}_{ij}=1.\end{array}$ Additionally there are *N* independent trials for 61 different countries and each trial results in 1 of 4 mutually exclusive and exhaustive outcomes as energy security grade levels. Then the “energy security” ordinal response variable comes from the multinomial probability distribution. In *generalized linear model (GLM)* approach for the energy security ordinal response variable, the main interest is in the cumulative probability of the *i*^{th} response falling into or below the *j*^{th} energy security grade level as follows;
$$\begin{array}{}\mathrm{P}({\mathrm{Y}}_{\mathrm{i}}\le \mathrm{j})={\pi}_{\mathrm{i}1}+\dots +{\pi}_{\mathrm{i}\mathrm{j}}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}},\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{j}=1,2,3,4\end{array}$$(1)

*GLM* for the multinomial distribution has three components; the random component, the systematic component, and the cumulative link function between the random and systematic components [15,21,22].

The random component identifies the ordinal response variable as energy security and assumes that the response variable comes from the exponential family including the multinomial distribution. The systematic component specifies the covariates
$\begin{array}{}{\mathbf{x}}_{i}{}^{{}^{\mathrm{\prime}}}=({x}_{i1},\dots ,{x}_{ip})\end{array}$ belonging to the *i*^{th} country in the *GLM*. Parameters in the systematic component of the *GLM* for the multinomial distribution are estimated by using maximum likelihood (*ML*) method with one of the accompanying iterative methods such as Newton-Raphson (*NR*), Fisher scoring (*FS*) or hybrid method [21, 22, 23, 24]

Let G^{–1} denotes a cumulative link function given in as the inverse of the continuous cumulative distribution function G. Then the general form of a cumulative link model for the *i*^{th} response relates the cumulative probabilities to the linear predictor depended on the covariates as follows [22];
$$\begin{array}{}{\mathrm{G}}^{-1}[\mathrm{P}({\mathrm{Y}}_{\mathrm{i}}\le \mathrm{j})]={\alpha}_{\mathrm{j}}+{{\mathbf{x}}_{\mathbf{i}}}^{{}^{\mathrm{\prime}}}\mathit{\beta}\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}};\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\mathrm{j}=1,2,\dots ,\mathrm{c}-1\end{array}$$(2)

Table 2 Cumulative link functions belonging to the ordinal response variable in the *GLM* from the multinomial distribution.

Let the value of *Y*_{i} = *j* denoted by Y_{ij} with the corresponding probability value π_{ij}. Then the multinominal likelihood function for the cumulative link model given in Eq.(2) is as follows [22];
$$\begin{array}{}{\displaystyle \prod _{\mathrm{i}=1}^{\mathrm{N}=61}\u27ee\prod _{\mathrm{j}=1}^{\mathrm{c}=4}{\pi}_{\mathrm{i}\mathrm{j}}^{{\mathrm{y}}_{\mathrm{i}\mathrm{j}}}\u27ef=\prod _{\mathrm{i}=1}^{\mathrm{N}=61}\{\prod _{\mathrm{j}=1}^{\mathrm{c}=4}{\left[\mathrm{P}({\mathrm{Y}}_{\mathrm{i}}\le \mathrm{j})-\mathrm{P}({\mathrm{Y}}_{\mathrm{i}}\le \mathrm{j}-1)\right]}^{{\mathrm{y}}_{\mathrm{i}\mathrm{j}}}\}}\end{array}$$(3)

In Eq.(3), α_{j}; j = 1,2,…, c – 1 are the intercept parameters and β = (β_{1},, β_{2},… β_{p})^{′} are the parameters belonging to the covariates in the systematic component of the *GLM* for the multinomial distribution. Then the multinomial log-likelihood function for the cumulative link model given in Eq.(2) is as follows [22];
$$\begin{array}{}{\displaystyle \mathrm{L}(\mathit{\alpha},\mathit{\beta})=\sum _{\mathrm{i}=1}^{\mathrm{N}=61}\sum _{\mathrm{j}=1}^{\mathrm{c}=4}{\mathrm{y}}_{\mathrm{i}\mathrm{j}}\mathrm{log}\left[\mathrm{G}({\alpha}_{\mathrm{j}}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})-\mathrm{G}({\alpha}_{\mathrm{j}-1}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})\right]}\end{array}$$(4)

Let g be the probability density function belonging to the derivative of the cumulative distribution function G. δ_{jk} denotes the Kronecker delta as δ_{jk} =1 if j = k and δ_{jk} = 0 otherwise. Then the likelihood equations belonging to the α_{j}; j =1, 2,…, c − 1 and β = (β_{1}, β_{2}, …, β_{p})^{′} parameters in the *GLM* for the multinomial distribution are as follows [22];
$$\begin{array}{}{\displaystyle \frac{\mathrm{\partial}\mathrm{L}}{\mathrm{\partial}{\beta}_{\mathrm{k}}}=\sum _{\mathrm{i}=1}^{\mathrm{N}=61}\sum _{\mathrm{j}=1}^{\mathrm{c}=4}{\mathrm{y}}_{\mathrm{i}\mathrm{j}}{\mathrm{x}}_{\mathrm{i}\mathrm{k}}\frac{\mathrm{g}({\alpha}_{\mathrm{j}}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})-\mathrm{g}({\alpha}_{\mathrm{j}-1}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})}{\mathrm{G}({\alpha}_{\mathrm{j}}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})-\mathrm{G}({\alpha}_{\mathrm{j}-1}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})}=0}\end{array}$$(5)
$$\begin{array}{}{\displaystyle \frac{\mathrm{\partial}\mathrm{L}}{\mathrm{\partial}{\alpha}_{\mathrm{k}}}=\sum _{\mathrm{i}=1}^{\mathrm{N}=61}\sum _{\mathrm{j}=1}^{\mathrm{c}=4}{\mathrm{y}}_{\mathrm{i}\mathrm{j}}\frac{{\delta}_{\mathrm{j}\mathrm{k}}\mathrm{g}({\alpha}_{\mathrm{j}}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})-{\delta}_{\mathrm{j}-1,\mathrm{k}}\mathrm{g}({\alpha}_{\mathrm{j}-1}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})}{\mathrm{G}({\alpha}_{\mathrm{j}}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})-\mathrm{G}({\alpha}_{\mathrm{j}-1}+{{\mathbf{x}}_{\mathbf{i}}}^{\mathrm{\prime}}\mathit{\beta})}=0}\end{array}$$(6)

The likelihood equations given in Eq.(5) and Eq.(6) can be solved by using *NR*, *FS* or hybrid method. In this study, hybrid method, in which iterations with the *FS* method are performed before continuing iterations with the *NR* method, is used. If convergence is achieved before the maximum number of Fisher iterations is reached, the hybrid algorithm continues with the *NR* method [23].

The scale parameter related to the variance of the response variable is estimated by using *Pearson chi-square* statistic as follows [22,23];
$$\begin{array}{}{\displaystyle {\chi}^{2}=\sum _{\mathrm{i}=1}^{\mathrm{N}=61}\sum _{\mathrm{j}=1}^{\mathrm{c}=4}\frac{{\left({\mathrm{y}}_{\mathrm{i}\mathrm{j}}-{\hat{\pi}}_{\mathrm{i}\mathrm{j}}\right)}^{2}}{{\hat{\pi}}_{\mathrm{i}\mathrm{j}}}}\end{array}$$(7)

The estimate of the scale parameter is computed as the ratio of the model Pearson chi-square statistic to the model degrees of freedom defined as N(c − 1)− p in the *GLM* for the multinomial distribution [22,25,26].

In this study, the cumulative link function G^{−1} [P(Y_{i} ≤ j)] describes the functional relationship between the systematic component of the *GLM* and the cumulative probabilities of the *i*^{th} response falling into or below the *j*^{th} energy security grade level.

Cumulative link functions given in are used to permit the cumulative probabilities of the ordinal response variable to be linearly related to the covariates as in Eq.(2) (http://share.uoa.gr/public/Software/SPSS/SPSS22/Manuals/IBM%20SPSS%20Advanced%20Statistics.pdf). where Φ^{−1} is the inverse of the standard normal cumulative distribution function.

Information criteria (*IC*) are used as goodness-of-fit test statistics for determining the best cumulative link function between the systematic component and the cumulative probabilities in the *GLM* for the ordinal response variable are given in .

Table 3 Information criteria (*IC*) used as goodness-of-fit-test statistics for comparing different cumulative link functions in the *GLM* for the ordinal response variable.

(https://www.ibm.com/support/knowledgecenter/SSLVMB_22.0.0/com.ibm.spss.statistics.algorithms/alg_genlin_gzlm_modeltest_goof.htm) where *l* is the maximum value of the multinomial log-likelihood function given in Eq.(4) evaluated at the parameter estimates, *d* = *c* − 1+ *p* is the number of parameters in the model, *N* is the total number of subjects. The smallest values of *IC* determine the best cumulative link function in the *GLM* for the ordinal response variable.

## Comments (0)

General note:By using the comment function on degruyter.com you agree to our Privacy Statement. A respectful treatment of one another is important to us. Therefore we would like to draw your attention to our House Rules.