A novel statistical modeling of air pollution and the COVID-19 pandemic mortality data by Poisson, geometric, and negative binomial regression models with ﬁ xed and random e ﬀ ects

: The coronavirus disease 2019 (COVID-19) pandemic was de ﬁ ned by the World Health Organization (WHO) as a global epidemic on March 11, 2020, as the infectious disease that threatens public health fatally. In this study, the main aim is to model the impact of various air pollution causes on mortality data due to the COVID-19 pandemic by Generalized Linear Mixed Model (GLMM) approach to make global statistical inferences about 174 WHO member countries as subjects in the six WHO regions. “ Total number of deaths by these countries due to the COVID-19 pandemic ” until July 27, 2022, is taken as the response variable. The explanatory variables are taken as the WHO regions, the number of deaths from air pollution causes per 100.000 population as “ household air pollution from solid fuels, ” “ ambient particulate matter pollution, ” and “ ambient ozone pollution. ” In this study, Poisson, geometric, and negative binomial (NB) regression models with “ country ” taken as ﬁ xed and random e ﬀ ects, as special cases of GLMM, are ﬁ tted to model the response variable in the aspect of the above-mentioned explanatory variables. In the Poisson, geometric, and NB regression models, Iteratively Reweighted Least Squares parameter estimation method with the Fisher-Scoring iterative algorithm under the log-link function as canonical link function is used. In the GLMM approach, Laplace approximation is also used in the prediction of random e ﬀ ects. In this study, six di ﬀ erent Poisson, geometric, and NB regression models with ﬁ xed and random e ﬀ ects


Introduction
The coronavirus disease 2019 (COVID-19) pandemic caused by "Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)" infection has led to severe acute respiratory diseases all over the world and has fatally affected the whole world since December 2019 [1][2][3][4][5][6][7][8].Physical, biological, and especially chemical factors cause air pollution, harm the health of humans and other living things, and change the natural structure of the indoor and outdoor environment.
"Household air pollution (HAP) from solid fuels," "ambient particulate matter pollution," and "ambient ozone pollution" discussed in this study are the three leading air pollution causes.HAP is an important measure of indoor air pollution with respect to the smoke from traditional household solid fuel combustion during cooking and heating methods [9][10][11].Particulate matter pollution in the environment, as the annual weighted average mass concentration of aerodynamic particles with a diameter of less than 2.5 µm in one m 3 of air, is one of the most important criteria of the outdoor air population [12][13][14][15][16]. Ambient ozone pollution as another important measure of the outdoor air population is seen as the highest seasonal average of eight-hour daily maximum ozone concentrations [17][18][19].It is known that air pollution causes especially chronic obstructive pulmonary and lung cancer.It is clearly known that the harmful effects of air pollution increase especially respiratory and cardiovascular diseases and also mortality rates in the world.In light of this information, in this study, the global effect of air pollution on the COVID-19 pandemic is examined by applying generalized linear model (GLM) and generalized linear mixed model (GLMM) approaches in terms of indoor and outdoor air pollution indicators.
Ibarra-Espinosa et al. [20] investigated associations between air pollution and daily COVID-19 cases and deaths in Sao Paulo, Brazil, by negative binomial (NB) and quasi-Poisson regression methods.They indicated that even small increases in air pollution cause significant increases in cases and deaths of COVID-19.Odhiambo et al. [21] estimated the relationships between the number of patients infected with COVID-19 in Kenya, the number of people in contact with these patients, and the number of daily air travels to Kenya by the compound Poisson regression model.Oztig and Askin [22] investigated the number of individuals infected with COVID-19 in 144 countries and the mobility of individuals within these countries using the NB regression approach.Similarly, Janković et al. [23] modeled the number of COVID-19 cases and mobility trends in European countries by the same approach.Fitriani and Jaya [24] demonstrated a positive relationship between the population density and the incidence of COVID-19 cases in East Java Province by geographically weighted NB regression model.Coker et al. [25] found a positive relationship between the number of deaths due to the COVID-19 pandemic and the fine particulate matter (PM 2.5 ) in Northern Italy by the NB regression approach.Lee et al. [26] modeled the relationships between the total number of confirmed cases of the COVID-19 pandemic and also the mergers and acquisitions of 145 countries by quasi-Poisson and NB regression approaches.Wu et al. [27] reveal that the increase in the fine particulate matter positively affects the COVID-19 death rates in the United States by NB mixed-effects regression approach.Szyszkowicz [28], Chuang et al. [29], Sun et al. [30], Bülbül et al. [31], Cheng et al. [32], Stieb et al. [33], Chisini et al. [34], Das et al. [35], Karmakar et al. [36], Khalilpourazari et al. [37], Muhsen et al. [38], Tanis and Karakaya [39], Tirkolaee et al. [40], Faruk et al. [41], Gürbüz and Gökçe [42], Özköse and Yavuz [43], Seçilmiş et al. [44], Shamsi et al. [45], and Joshi et al. [46] used various regression models including NB regression, hurdle regression, Poisson regression, zeroinflated regression with fixed and random effects as special cases of GLM and GLMM approaches for modeling the COVID-19 pandemic data.
In light of the studies given in the literature, in this study, an advanced statistical modeling approach based on the GLM and GLMM approaches with Poisson, geometric, and NB distributions is proposed to investigate the relationships between the "mortality data due to the COVID-19 pandemic" and various "causes of air pollution" of 174 World Health Organization (WHO) member countries in the six WHO regions.

Materials
In this study, the "total number of deaths by 174 WHO member countries due to the COVID-19 pandemic" until July 27, 2022, is taken as the response variable."WHO regions," "the numbers of deaths from air pollution causes per 100.000population" as "HAP from solid fuels," "ambient particulate matter pollution," and "ambient ozone pollution" are taken as the explanatory variables as given in Table 1.
About 174 countries are taken in the study as subjects: 47 countries from the African Region (AFR), 21 countries from the Eastern Mediterranean Region (EMR), 49 countries from the European Region (EUR), 33 countries from The number of due deaths from air pollution causes per 100.000population as "HAP from solid fuels" belonging to 2019 [48]

Deaths from ambient particulate matter pollution
The number of due deaths from air pollution causes per 100.000population as "Ambient particulate matter pollution" belonging to 2019 [48] Deaths from ambient ozone pollution The numbers of due deaths from air pollution causes per 100.000population as "Ambient ozone pollution" belonging to 2019 [48] the Americas Region (AMR), 9 countries from the South-East Asian Region (SEAR), and 15 countries from the Western Pacific Region (WPR) according to the WHO given in Table 2.
Descriptive statistics of the response variable as "total deaths due to the COVID-19 pandemic," and explanatory variables as "deaths from different causes of air pollution" according to the 174 countries are given in  Histogram of the "total deaths due to the COVID-19 pandemic" according to the 174 WHO member countries used in this study is given in Figure 1.As seen in Figure 1, 33 countries have less than 500 total deaths due to the COVID-19 pandemic; 13, 22, 20, 23, 41, 4, and 13 countries have "total deaths due to the COVID-19 pandemic" between 500 and 1,000; 1,000 and 2,500; 2,500 and 5,000; 5,000 and 10,000; 10,000 and 50000; 50,000 and 100,000; 100,000 and 250000, respectively.Five countries have greater than 250,000 total deaths due to the COVID-19 pandemic until July 27, 2022.
Histogram of the total deaths due to the COVID-19 pandemic for six WHO regions is given in Figure 2. Countries for six WHO regions with the highest number of total deaths due to the COVID-19 pandemic until July 27, 2022,  Figures 1 and 2 illustrate that the structures of the response variable for both 174 countries all over the world and the six WHO regions are highly right-skewed distributed.
Histograms of the number of deaths from causes of air pollution per 100.000population are given in Figure 3.The largest to the smallest numbers of deaths from causes of air pollution per 100.000population are 8,217, 7,811, and 440 deaths from "household solid fuels," "ambient particulate matter," and "ambient ozone," respectively.
As can be seen from Figure 3, the top 10 countries with the highest number of deaths related to air pollution from "household solid fuels" are Somalia from EMR, Central African Republic from AFR, Papua New Guinea from WPR, Niger from AFR, Guinea-Bissau from AFR, Chad from AFR, Burundi from AFR, Mozambique from AFR, Guinea from AFR, and Madagascar from AFR with 273, 252, 230, 200, 199, 196, 187, 186, 185, and

Methods
Traditionally used linear regression models are based on the assumption that the response variable is "normally" A novel statistical modeling of air pollution and the COVID-19 pandemic mortality data  5 distributed.However, in a statistical study, the data of interest for the response variable may not always have a normal distribution.In this case, the GLM approach bringing the advantage that the response variable comes from various distributions due to the exponential family can be used to analyze data [50][51][52][53].
The GLM constitutes of "random component," "systematic component," and the "link function."The random component part of the GLM can be given in the form of the exponential family as follows [50,[54][55][56][57][58]: where θ is the location parameter, ϕ is the dispersion para- meter, ( ) a ϕ , ( ) b θ , and ( ) c y ϕ , are known functions due to the distribution of the exponential family.
The systematic component also called as "linear predictor" in the form of the linear function can be given as follows [51,54,58,59]: The link function between the systematic component and the random component of the GLM can be given as follows [52,58,60,61]: For an easier understanding of the GLM structure, these components of the GLM are presented visually [58].Known functions as ( ) a ϕ , ( ) b θ , and ( ) c y ϕ , in the structure of the exponential family given in equation ( 1), canonical link functions as ( ) = η g μ , and the inverse link functions as ( ) = − μ g η 1 due to the Poisson, geometric, and NB distributions are given in Table 5 [52,60,62,63].
The GLM is built on the "fixed-effects" terms that have constant effects on the response variable across the subjects taken into the study.On the other hand, it may be necessary to include "random-effects" terms in the model that have varying effects on the response variable across the subjects.In this situation, GLMM is an extended version of the GLM to include "random-effects" terms as well as a linear function of the "fixed-effects" terms included in the model called as the "linear predictor" [55,[64][65][66].For an easier understanding of the GLMM structure, fixed-effects, random-effects, and also the linear predictor part consisting of these effects are presented visually in ref. [58].
The "link function" and the "inverse link function" of the GLMM consisting of the "fixed-effects" and the "randomeffects" can be given as follows [55,58,64,67]: In this study, the GLMM approach with Poisson, geometric, and NB distributions are used when the response variable consists of "count data" as non-negative integer values where the canonical link function is in the form of the "log-link function" given as follows [55,64,67]: "Iteratively reweighted least squares (IRLS)" method with "Fisher-Scoring (FS) iterative algorithm" is used for the GLM approach with the Poisson, geometric, and NB distributions."Maximum likelihood (ML)" method with "Laplace approximation" is also used for the GLMM approach with these distributions [51,55,64,67,68].The performances of the Poisson, geometric, and NB regression models using the IRLS method with the FS iterative algorithm, and also Poisson, geometric, and NB mixed regression models using the ML method with the Laplace approximation under the log-link function are compared using "information criteria" (IC) as goodness-of-fit test statistics given in the studies by Iyit et al. [58,[69][70][71][72][73].

Results and discussion
In this study, mortality data due to the COVID-19 pandemic are modeled in the aspect of various causes of air pollution by GLM and GLMM approaches with Poisson, geometric, and NB distributions under the log-link function to make statistical inferences about the 174 WHO member countries in the six WHO regions. Poisson For this aim, the "total number of deaths by the 174 WHO member countries due to the COVID-19 pandemic" until July 27, 2022, is taken as the response variable.WHO regions, the numbers of deaths from causes of air pollution per 100.000population as "household solid fuels," "ambient particulate matter," and "ambient ozone" are taken as the explanatory variables [43,44].
In this study, Poisson, geometric, and NB regression models with "country" taken as fixed and random effects, as special cases of GLM and GLMM, are fitted to model the response variable in the aspect of the above explanatory variables to make global statistical inferences for investigating the relationships between the total number of deaths by these countries due to the COVID-19 pandemic and the mentioned causes of air pollution.
In this study, "RStudio" program [74] is used for the statistical analysis and modeling of the data.In terms of visualization, the "ggplot" function from the "ggplot2" package is used for drawing all the graphs.The "glm" function from the "stats" package, and the "glmer" function from the "lme4" package are used in the parameter estimation of the GLM and GLMM models, respectively.
Poisson, geometric, and NB regression models in the GLM approach, and also Poisson, geometric, and NB mixed regression models in the GLMM approach under the loglink function are fitted to the 174 countries' COVID-19 pandemic and causes of air pollution data given in Table 1.
First, the results of the Poisson regression model using the IRLS method with the FS iterative algorithm under the log-link function are given in Table 6.
By using the IRLS parameter estimates of the Poisson regression model due to the WHO regions and causes of air pollution given in Table 6, the expected value of the COVID-19 mortality is given as follows:  . (8)  The results of the geometric regression model using the IRLS method with the FS iterative algorithm under the log-link function are given in Table 7.
By using the IRLS parameter estimates of the geometric regression model due to the WHO regions and causes of air pollution given in Table 7, the expected value of the COVID-19 mortality is given as follows: or  . (10)  The results of the NB regression model using the IRLS method with the FS iterative algorithm under the log-link function are given in Table 8.
By using the IRLS parameter estimates of the NB regression model due to the WHO regions and causes of air pollution given in Table 8, the expected value of the COVID-19 mortality is given as follows:   . (12)  The results of the Poisson mixed regression model under the log-link function with "country" taken as the random effect using the ML with the Laplace approximation method are given in Table 9.
By using the ML parameter estimates of the Poisson mixed regression model due to the WHO regions and causes of air pollution given in Table 9, the expected value of the COVID-19 mortality is given as follows:  . (14)  The results of the geometric mixed regression model under the log-link function with "country" taken as the random effect using the ML with the Laplace approximation method are given in Table 10.
By using the ML parameter estimates of the geometric mixed regression model due to the WHO regions and causes of air pollution given in Table 10, the expected value of COVID-19 mortality is given as follows: The results of the NB mixed regression model under the log-link function with "country" taken as the random effect using the ML with the Laplace approximation method are given in Table 11.
By using the ML parameter estimates of the NB mixed regression model due to the WHO regions and causes of air pollution given in Table 11, the expected value of the COVID-19 mortality is given as follows: As a main statistical result from this study, the performances of the Poisson, geometric, and NB regression models in the GLM approach, and also Poisson, geometric, and NB mixed regression models in the GLMM approach under the log-link function fitted to the 174 countries' COVID-19 pandemic and causes of air pollution data are compared using the log-likelihood and IC values given in Table 12.
As can be seen from Table 12, the best fitted model is determined as the "NB mixed regression model" from the GLMM approach due to the COVID-19 pandemic and causes of air pollution data according to the maximum value of the log-likelihood, and also the smallest values of the AIC, AICC, BIC, and CAIC information criteria with -1807.756,3633.512,3634.610,3635.677, and 3644.677,respectively.
Moreover, variance and standard deviation of the random effect due to the NB mixed regression model for the COVID-19 pandemic and causes of air pollution data are determined as the smallest as given in Table 13.
The residual graphs for the NB mixed regression model as the best fitted model due to the COVID-19 pandemic and causes of air pollution data are given in Figure 4.
In   A novel statistical modeling of air pollution and the COVID-19 pandemic mortality data  11 Q-Q plot of the Pearson residuals of the model, it can be seen that the Pearson residuals are closer fitted to the q-q line in the red color.From the box-plot of the Pearson residuals of the model, the Pearson residuals are especially located in the range (−0.6, 0.1), and it can be easily seen that there is no outlier.

Conclusion
In this study, global "COVID-19 pandemic" data in terms of the "causes of air pollution" are modeled comparatively using six different models in the GLM and GLMM approaches taking "country" as the random effect.When the studies in the literature are investigated, it has been observed that there are very few studies examining the impact of the "causes of air pollution" on the "COVID-19 pandemic" as a whole as in this study.This study provides superiority to the other studies in the literature in the aspect of evaluating the effects of various causes of air pollution on the total deaths due to the COVID-19 pandemic by six powerful statistical modeling techniques in the GLM and GLMM approaches for the 174 countries taken from the six WHO regions.So in addition to the studies in the literature, these extensive features reveal the originality of this study.In statistical terms, the performances of the Poisson, geometric, and NB regression models in the GLM approach, and also Poisson, geometric, and NB mixed regression models in the GLMM approach for the COVID-19 pandemic and causes of air pollution data have not been broadly compared as given in this study.
As a conclusion of this study, the following global statistical inferences are obtained by using the NB mixed regression model as a powerful statistical modeling method in the GLMM approach when "country" is taken as the random effect.
The main conclusions from this study, by using the NB mixed regression model under the "log-link" function given in equation (18) with "country" taken as the random effect and the ML with the Laplace approximation method, can be given as follows: The total number of deaths by the 174 WHO member countries due to the COVID-19 pandemic increases times by 1 death per 100.000population from air pollution caused by "household solid fuels," "ambient particulate matter," and "ambient ozone," respectively.
On the other hand, the total number of deaths of the 174 WHO member countries due to the COVID-19 pandemic according to the six WHO regions as EMR, EUR, Region of the Americas (AMR), SEAR, and WPR are expected to be = e 4.3864 times higher than the total number of deaths in the AFR taken as the reference category, respectively.
Based on this study, the future outlook for further investigation will be to explore the effects of panel data structures for different indicators of air pollution on worldwide pandemics and natural disasters as a highlighted topic on the world agenda.

Figure 1 :
Figure 1: Histogram of the total deaths due to the COVID-19 pandemic according to the 174 WHO member countries taken in the study.

Figure 2 :
Figure 2: Histogram of the total deaths due to the COVID-19 pandemic according to the six WHO regions taken in the study.

Figure 3 :
Figure 3: Histograms of the number of deaths from causes of air pollution per 100.000population.

Figure 4 ,
the scatter graph of the Pearson residuals of the NB mixed regression model for the COVID-19 pandemic data of 174 countries illustrates that the Pearson residuals are randomly dispersed around zero and fall within the range (−1, 2).From the histogram of the Pearson residuals of the model, it can be seen that the Pearson residuals are homogeneously located in the range (−1, 0.1) and gradually decrease in the range (0.1, 2).From the

Figure 4 :
Figure 4: The residual graphs for the NB mixed regression model due to the COVID-19 pandemic and causes of air pollution data.

Table 1 :
All variables taken into the study to model the relationships between the COVID-19 pandemic and air pollution causes

Table 2 :
[49]regions and countries taken in the study according to the total number of deaths due to the COVID-19 pandemic in decreasing order[49]

Table 3 :
Descriptive statistics of the variables taken in the study to model the impact of air pollution causes on the COVID-19 pandemic

Table 4 :
Descriptive statistics of the total deaths due to the COVID-19 pandemic according to the six WHO regions A novel statistical modeling of air pollution and the COVID-19 pandemic mortality data  3 182 deaths, respectively.The top 10 countries with the highest number of deaths related to air pollution from "ambient particulate matter" are Uzbekistan from EUR, Egypt from EMR, Qatar from EMR, Oman from EMR, Iraq from EMR, Tajikistan from EUR, Saudi Arabia from EMR, Azerbaijan from EUR, Mongolia from WPR, and Bahrain from EMR with 177, 158, 129, 128, 122, 116, 110, 109, 107, and 104 deaths, respectively.Finally, the top 10 countries with the highest number of deaths related to air pollution from "ambient ozone" are Nepal from SEAR, India from SEAR, Pakistan from EMR, Bangladesh from SEAR, Central African Republic from AFR, Afghanistan from EMR, Bahrain from EMR, China from WPR, Kyrgyzstan from EUR, and Myanmar from SEAR with equally 35, 19, 14, 9, 7, 6, 6, 6, 6, and 6 deaths, respectively.

Table 5 :
Structures of the exponential family, canonical link functions, and the inverse link functions due to the Poisson, geometric, and NB distributions in the GLM approach

Table 6 :
The results of the Poisson regression model for the WHO regions and causes of air pollution under the log-link function

Table 7 :
The results of the geometric regression model for the WHO regions and causes of air pollution under the log-link function

Table 8 :
The results of the NB regression model for the WHO regions and causes of air pollution under the log-link function

Table 9 :
The results of the Poisson mixed regression model for the WHO regions and causes of air pollution under the log-link function

Table 10 :
The results of the geometric mixed regression model for the WHO regions and causes of air pollution under the log-link function

Table 11 :
The results of the NB mixed regression model for the WHO regions and causes of air pollution under the log-link function

Table 12 :
Goodness-of-fit test statistics for the GLM and GLMM approaches under the log-link function due to the COVID-19 pandemic and causes of air pollution data

Table 13 :
Variance and standard deviation of the random effect belonging to the GLMM approach for the COVID-19 pandemic and causes of air pollution data