A NEW METHOD FOR GENERALIZING BURR AND RELATED DISTRIBUTIONS

. A new method has been proposed to generalize Burr-XII distribution, also called Burr distribution, by adding an extra parameter to an existing Burr distribution for more ﬂexibility. In this method, the exponent of the Burr distribution is modeled using a nonlinear function of the data and one additional parameter. The models of this newly introduced generalized Burr family can signiﬁcantly increase the ﬂexibility of the former Burr distribution with respect to the density and hazard rate shapes. Families expanded using the method proposed here is heavy-tailed and belongs to the maximum domain of attractions of the Frechet distribution. The method is further applied to yield three-parameter classical Pareto and generalized exponentiated distributions which shows the broader application of the proposed idea of generalization. A relevant model of the new generalized Burr family has been considered in detail, with particular emphasis on the hazard functions, stochastic orders, estimation procedures, and testing methods are derived. Finally, as empirical evidence, the new distribution is applied to the analysis of large-scale heavy-tailed network data and compared with other commonly used distributions available for ﬁtting degree distributions of networks. Experimental results suggest that the proposed Burr distribution with nonlinear exponent better ﬁts the large-scale heavy-tailed networks better than the popularly used Marhsall-Olkin generalization of Burr and exponentiated Burr distributions.


Introduction
The development of statistical distributions is one of the oldest research topic in the field of statistics.There has been a renewed interest in developing more flexible statistical distributions in recent decades.Since the seminal work by Karl Pearson in 1895 [37], several general methods have been developed for generating family of distributions.Pearson presented a systematic approach for generating statistical distribution to model non-symmetric type of data using differential equation.The Pearson system of continuous distributions is a system for which every probability density function (PDF) f (x) satisfies the differential equation of the following form: where a, b 0 , b 1 , and b 2 are the shape parameters.Different types of distributions correspond to different forms of solution to Eqn. (1.1).The form of solution of Eqn.(1.1) depends on the root of the equation b 0 + b 1 x + b 2 x 2 = 0. Pearson presented four types of distributions [14] and characterizations of Pearson type distributions are also available in the literature [4].Irving W. Burr proposed another important development in this category.In Burr's method [9], system of distributions satisfy the following differential equation: dF = F (1 − F )g(x)dx, (1.2) where 0 ≤ F ≤ 1 and g(x) is a non-negative function over x.Twelve different solutions to the Eqn.(1.2) in the form of cumulative distribution functions (CDF) were given and named as Burr Types I-XII distributions.The Burr type-XII distribution, being a member of the Burr system has gained more attention in the last decade due to its potential use in practical situations.The Burr Type XII, simply called Burr distribution, is highly useful for fitting heavy-tailed data sets from the field of reliability, economics, hydrology, actuarial science, and network science among many others [24].Burr distribution also emerges as a suitable model to describe stationary states of complex and non-equilibrium systems [38,39].The main advantage of Burr distribution from the extreme value statistics' point of view is that it has algebraic tails which are effective for modeling failures that occur with lesser frequency than with corresponding models based on exponential tails [41].The CDF and PDF of the Burr distribution are defined as follows.
Definition 1.A random variable X follows Burr distribution with parameters c, λ, α, if the CDF is of the following form: where α and c are shape parameters and λ is a scale parameter.The density function of the Burr distribution is given by x > 0, c, α, λ > 0. (1.4) The corresponding survival function is given by S(x; c, λ, α) = 1 + x λ c −α , x > 0. (1.5) The hazard rate function is given by h(x; c, λ, α) = f (x; c, λ, α) S(x; c, λ, α) It is interesting to note that the CDF in (1.3) is regularly varying at infinity, viz.they satisfy for some γ > 0, called the tail-index, and for all t > 0, This result suggests that Burr distribution well-suited for modeling extreme values as a heavytailed distribution.The bivariate and multivariate extensions of Burr distribution are available in the past literature [13,42].Consequently, numerous modifications or generalizations of the Burr distribution, using different compounding and weighting techniques, have been suggested.See, for instance, the beta Burr distribution introduced by [36], the McDonald Burr distribution introduced by [17], order-statistics based generalized Burr distribution [5], the Marshall-Olkin Burr (MO Burr) distribution developed by [22], modified Burr distribution [21] the exponentiated Burr distribution introduced by [26] and the log-Weibull Burr distribution developed by [1].Though all these modifications and generalizations resulted well in specific data analysis but all of these assumed constant exponents for the Burr distribution which causes potential failure when working with large-scale heavy-tailed data sets from various applied domains [24].In this work, the primary hypothesis is that the exponent α of the Burr distribution is not a constant and varies according to a nonlinear function g which depends on the data.This article aims to introduce an extra parameter to Burr, Pareto, and exponentiated distributions to bring more flexibility to the given families when dealing with real-world large-scale heavy-tailed data sets.We introduce a new method of generalization, namely the shape-parameter transformation (SPT) method, where the non-negative shape parameter is assumed to be expressible as a nonlinear function of the empirical data and adds an additional parameter to the base distribution.The proposed SPT method is straightforward to use; hence it can effectively be used for data analysis purposes.The proposed SPT method is first applied to a Burr distribution to yield generalized Burr (GBurr) family of distributions.The newly introduced GBurr family has some excellent statistical properties and belongs to the maximum domain of attractions of the Frechet distribution.The method presented in this paper adds an additional parameter in the distribution and can be a competitor to the popularly used Marshall-Olkin (MO) [30] and Lehmann alternatives-based [27] method of generalizing continuous distributions.The SPT method is further applied to the Pareto and the exponentiated family of distributions which shows the broader applicability of the proposed SPT method.Further, a specific nonlinear variant of the GBurr family, we call it NBurr distribution, is discussed in detail.Complementary theoretical aspects are studied, such as shapes, asymptotes, quantiles, stochastic ordering, reliability parameter, and inferential statistics.The application of the proposed NBurr distribution is shown using large-scale heavy-tailed network data sets from various disciplines.
The rest of the paper is organized as follows.We discuss the SPT method and its application to the Burr, Pareto, and exponentiated family of distributions in Section 2 along with some structural properties.In Section 3, we study a simple model from the GBurr family of distributions and discuss its stochastic and inferential characteristics.The empirical application of this new NBurr distribution to various large scale heavy-tailed network data sets is presented in Section 4. Finally, we conclude this paper in Section 5.

Shape parameter transformation (SPT) method
In this section, we introduce the SPT method to generalize Burr, Pareto, and exponentiated distributions by modeling the shape parameter of these families as a nonlinear function of the data.It also adds an additional parameter to the families as mentioned above of distributions for flexible modeling of real-life data sets.The SPT method is first applied to yield generalized Burr distributions which are regularly varying distributions at infinity and are heavy-tailed.Furthermore, this new SPT method is applied to Pareto distribution and exponentiated family of distributions to generalize these families.

The generalized Burr (GBurr) family
Definition 2. A continuous random variable X follows generalized Burr (Burr) family of distributions if and only if it has the following CDF: and F (x) = 0 if x < 0, where the real valued continuous, positive function g : (0, ∞) → R + is differentiable on (0, ∞).The shape parameter α of the Burr distribution is replaced with TANUJIT CHAKRABORTY -SUCHISMITA DAS -SWARUP CHATTOPADHYAY g(z) = g(x/λ; α, β) = g(x/λ), say, where β is an additional shape parameter and g(z) satisfies the following conditions: (1) The function g(z) is strictly positive and have finite limit at infinity, viz.
Thus, (2.1) is a standard CDF and it can also be expressed as follows: The corresponding survival function is given by ( The probability density function is (2.5) The hazard rate function is given by (2.6) The CDF (2.1) of the GBurr family of distributions is a function with regularly varying tails and belongs to the maximum domain of attraction (MDA) of the Frechet distribution with index α > 0, viz.
(b) To show the tail-equivalent property, we show where G(x) is the CDF of the Pareto Type-II distribution.
(c) It should be noted that any function g as defined in Definition 2 satisfying lim z→∞ g(z) = α > 0, is slowly varying at infinity: for all t > 0 and λ, c > 0.
It can be seen that Burr distribution belongs to this GBurr family and corresponds to the simplest choice g(z) = α.We give some examples of nonlinear g(z) satisfying the condition given in Definition 2 and present some distributions belonging to the GBurr family in Table 1.In order to select a model, one can choose the nonlinear exponent g(z) that meets the empirical characteristics of the given data sets.Obviously, this example list of GBurr models can further be expanded to more models by introducing some other forms of g(z) satisfying lim z→∞ g(z) = α > 0 or by increasing the number of parameters in the function g.
We can also obtain some popular size distributions as the particular cases of the GBurr family: • By taking the constant g function, viz., when g(z) = 1, then GBurr distributions reduces to the Fisk distribution [16].

Generalized classical Pareto distribution
In this section, we apply the SPT method to the classical Pareto distribution to yield generalized classical Pareto (GCP) distributions as defined below.Definition 3. A continuous random variable X follows GCP family of distributions if and only if it has the following CDF: and , and the function m satisfies the following conditions: (1) m is strictly positive and have finite limit at infinity, viz.
It is noted that the condition ( 3) is equivalent to It is very easy to verify that F (x) in (2.7) is a standard distribution function and any continuous random variable X satisfying the above-mentioned conditions are called as GCP distributions.An alternative expression for F (x) can also be given as follows: The survival function is given by (2.9) The PDF for the family of GCP distributions is given by (2.10) The hazard function is given by (2.11) Table 2 shows some examples of GCP distributions satisfying condition (2.8) (i.e., lim z→∞ m(z) = α > 0).It is noted that the simplest choice m(z) = α leads the classical Pareto distribution.The rest of the models are completely new.In order to select a model, one can choose the nonlinear exponent m(z) that meets the empirical characteristics of the given data sets.Obviously, this example list of GCP models can further be expanded to more models by introducing some other forms of m(z) satisfying lim z→∞ m(z) = α > 0 or by increasing the number of parameters in the function g.

Generalized exponentiated distributions
The idea of exponentiated family of distributions is based on Lehmann alternatives [27] that can add a non-negative shape parameter to any continuous probability distributions: where α > 0 (real), (2.12) where F (x) is a standard CDF.This new family F * is called 'exponentiated family' where one raises the CDF of an existing distribution to a power of an additional parameter.Lehmann motivated it in the following way: [F (x)] α is the distribution of the maximum of α independent and identically distributed variables with distribution F when α is an integer and when α is a rational number, [F (x)] α is one-parameter family of nonparametric class of alternatives.In the theory of generalized probability distributions, Lehmann alternatives has been widely used to generate so-called exponentiated Weibull family of distribution by [31].A systematic treatment of exponentiated weibull, exponentiated gamma, and exponentiated pareto distribution are available in [18].Gupta and Kundu studied exponentiated exponential distribution [20] and recently Gupta et al. [19] proposed Power Normal distributions using the same idea of Lehmann's alternatives for standard normal distribution.Nadarajah and Kotz [32] studied a list of exponentiated X family of distributions, including exponentiated Gumbel, Fréchet, gamma, etc. distributions.We use our SPT method to generalize any 'exponentiated family' of distributions.
Definition 4. Any continuous random variable X with distribution function F (x) follows generalized exponentiated distribution if and only if it has the following CDF where m : (−∞, ∞) → (0, ∞) is a real, continuous function which is differentiable on (0, ∞).The function m satisfies the following conditions: (1) m is strictly positive and have finite limit at infinity, viz.
It is important to note that when m(x) = α, then G(x) corresponds to the exponentiated family of distributions [18,32].We give some examples of m(x) to be used for the distributions with x ∈ R.
Some choices of ξ(x) are as follows: where k is any natural numbers.
Some choices of σ(x) are as follows: It is interesting to see that is a standard CDF, where Φ(x) is a standard normal CDF.For all the above examples when α = β, we get G(x) = [F (x)] α .

NBurr distribution: Definition and properties
In this section, we study a relevant model of the GBurr family of distributions with the choice of g(z) = α − β 1+z in Table 1.The CDF is given by where α, λ, c > 0, β > −1 and α > β.We call this simple generalized form of Burr distribution as NBurr distribution (Burr distribution with nonlinear exponent).The NBurr distribution includes the Burr distribution when β = 0.The CDF in (3.1) can alternatively be written in the following form: The corresponding survival function is given by, The probability density function for x > 0 is given by and the hazard rate function is given by The proposed NBurr distribution satisfies the extreme value properties as given in Theorem 2.1.Some graphics of the NBurr model derived from this newly introduced GBurr family of distributions are illustrated in Figure 1.Remark that NBurr distribution with parameters α, β, λ, c, as a distribution of the MDA of the Frechet distribution, satisfies the von Mises condition: Next, we study the reliability properties and inferential properties of the new NBurr distribution in the next subsections.

Reliability properties of NBurr distribution
In this section, we study some reliability properties of the newly introduced NBurr distribution including monotonicity of hazard rates, stochastic orderings, entropy, etc.The following theorem shows that the hazard rate function of the NBurr distribution is increasing and decreasing under certain conditions on c and β.
Theorem 3.1.The hazard rate function of the NBurr distribution satisfies the following properties: (1) If 0 ≤ c ≤ 1 and β > 0 then for all x > 0, h NBurr (x) is decreasing in x; (2) If c > 1 and β > 0, then for all x > 0, (a) h NBurr (x) is increasing in x, when c − 1 P r o o f.Differentiating (3.5) with respect to x, we have The following example shows that the NBurr distribution does not preserves the likelihood ratio ordering.It is useful to recall that a random variable X is said to be larger than another random variable Y in likelihood ratio ordering (written as X ≥ LR Y ) if f X (x)/f Y (x) is an increasing function for x > 0.
Example 3. Let X and Y be two random variables following NBurr distributions with parameters α 1 , β 1 , λ 1 , c 1 and α 2 , β 2 , λ 2 , c 2 respectively.Then, for all x > 0, the ratio of the corresponding density functions of X and Y is given by = P 1 (x), say.
In the next theorem, we show that the NBurr distribution preserves the usual stochastic ordering.It is useful to remind that a random variable X is said to be larger than another random variable Y in usual stochastic ordering (written as X ≥ ST Y ) if S X (x) ≥ S Y (x), for all x > 0.
Theorem 3.2.Let X and Y be two random variables following NBurr distribution with parameters

and
(iv) β 1 ≤ (≥)β 2 for β > 0. P r o o f.X ≥ ST (≤ ST )Y if and only if for all x > 0, S XNBurr (x) ≥ (≤)S YNBurr (x), which is equivalent to The following theorem gives the condition under which the NBurr distribution preserves the hazard rate ordering.It is well known that a random variable X is said to be larger than another random variable Y in hazard rate ordering (written as X ≥ HR Y ) if, for all x > 0, h X (x) ≤ h Y (x).
Theorem 3.3.Let X and Y be two random variables following NBurr distribution with parameters (iv) β 1 ≤ (≥)β 2 , for β > 0. P r o o f.X ≥ HR (≤ HR )Y if and only if for all x > 0, h XNBurr (x) ≤ (≥)h YNBurr (x), which is equivalent to Remark 1.It is interesting to note that the hazard rate function of the NBurr distribution can be both increasing and decreasing under certain conditions as given in Theorem 3.1.Also, NBurr distribution preserves stochastic and hazard rate orderings as shown in Theorems 3.2 and 3.3, respectively.
The Shannon entropy is an important and well-known concept in information theory as well as engineering sciences.Let X be a random variable that follows NBurr distribution with parameters α, β, λ, c.Then Shannon's entropy for the NBurr distribution is defined as When we want to study the system that survived up to an age t, then Shannon's entropy function is not useful in measuring the uncertainty about the residual lifetime of the system.Ebrahimi [3] has introduced residual entropy and defined as Then the residual entropy for NBurr distribution with parameters α, β, λ, c is given by

Parameter estimation
Let x 1 , x 2 , . . ., x n be a sample of size n from NBurr(α, β, λ, c) distribution.We give procedure for parameter estimation the including the log-likelihood functions and corresponding normal equations.The log-likelihood function for the vector of parameters Θ = (α, β, λ, c) T corresponding to NBurr distribution is given by where n is the sample size, and the maximum likelihood estimates of the unknown parameter vector (α, β, λ, c) are those that maximize the log-likelihood function in (3.7).The normal equations can be obtained by taking the partial derivatives of (3.7) w.r.t.α, β, λ, c and equating them to zero: where ω i = 1 + xi λ .The MLEs of the four parameters for the NBurr distribution with α, β, λ and c are obtained by setting the above partial derivatives to zero and solving them simultaneously.The closedform solutions are not available for the equations (3.8), (3.9), (3.10) and (3.11).So, an iterative algorithm should be applied to solve these equations numerically.For practical implementation of the model, we fit the NBurr models in the whole range of the data sets with quasi-Newton BFGS numerical algorithm with initial values to be chosen as ( α0 , β0 , λ0 , ĉ0 ) = (1, 1, 1, 1) to find the MLE estimates of the parameters.
We also present the asymptotic distributions for the NBurr distribution.The Fisher information matrix (I) can be obtained by taking the expected values of the second-order and mixed partial derivatives of (x; α, β, λ, c) w.r.t.α, β, λ and c.Since the analytical expression is hard to compute, it can be approximated by numerically investigating the I = (I ij ) matrix.The asymptotic I matrix can be given as follows: The second order partial derivatives of (x; α, β, λ, c) w.r.t.α, β, λ and c can be calculated but the calculations are very tedious.Hence, we omit the calculation part.The variance-covariance matrix is approximated by M = (M ij ) where M ij = I −1 ij .The asymptotic distribution of MLEs for α, β, λ, and c can be written as Then the approximate 100(1 − k)% confidence intervals for α, β, λ, and c are given by α ± Var( λ), and ĉ ± Z k 2 Var(ĉ); where Θ = (α, β, λ, ĉ) and Z k is the upper 100 k-th percentile of the standard normal distribution.

Goodness of fit
The measure of closeness between the hypothesized NBurr distribution and the observed realworld network can be well determined by goodness-of-fit test.We have used the Chi-square statistic test and its corresponding p value to determine the goodness of fit for the NBurr distribution.We calculate the respective p values using the bootstrap resampling computational technique as given below: • Initially the best fit NBurr distribution can be determined by estimating parameters through the MLE method given network data.Then we calculate the Chi-square statistic value as a measure of goodness-of-fit corresponding to the best-fitted NBurr model.
• Next we generate 50000 synthetic data sets from the NBurr distribution and calculate the Chi-square statistic for each of the synthetic data sets.
• Finally, we obtain the p value for the synthetic data sets as the fraction of NBurr synthetic data sets with a Chi-square value greater than the empirical one.Higher p values signify that the proposed model is 'most' suitable for the data set.
In addition, the effectiveness of the proposed NBurr distribution compared to other heavytailed distributions, is also verified by computing other well known statistical measures such as Kullback-Leibler divergence (KLDiv), root mean squared error (RMSE), and mean absolute error (MAE).

Real-world applications
We show the application of the NBurr distribution in the analysis of large-scale complex network data sets from various disciplines.The examples of such large-scale real-world complex networks include Twitter, Facebook, Orkut, Youtube, Amazon, LinkedIn, Wiki networks, etc. where the number of nodes is of the order of thousands or millions.There has been significant interest and attention devoted toward modeling aspects of such large-scale complex networks.Recent research [23,28,44] involved in the analysis of various important structural characteristics of network such as degree distribution, average nearest neighbor, clustering coefficient, community discovery, motif distribution, etc.Most of the interest has been focused on the analysis of the node degree distribution corresponding to these real-world networks [2,6,7].Empirical observations suggest that the node degree distributions of such real-world networks, for example, collaboration networks, communication networks, social networks, biological networks, etc., follow a heavy-tailed power-law distribution [6,33].Previous researchers reported that a baseline power-law, exponential, Pareto, log-normal, and Burr models are insufficient to fit the empirical data properly in its whole range unless some of the lower degree nodes are left out while fitting the model [10-12, 40, 43].In recent work, Broido et al. [8] pointed out that the recent data concentration on all these networks data shows that they no longer follow the power-law distribution.

Data
We consider large-scale real-world network data sets from different disciplines, namely social networks, collaboration networks, citation networks, web graphs, product co-purchasing networks, temporal networks, communication networks, and ground-truth networks.We study several individual data sets from each discipline to showcase the general applicability of the proposed NBurr distribution.These data sets are publicly available at http://snap.stanford.edu/data/index.html.These are standard network data sets with heavy-tail behaviors and used for modeling in the statistical analysis of networks [33][34][35].An overview of these publicly available network data sets along with statistical measures (mean (µ), standard deviation (s), etc.) are presented in Table 3.Another interesting property of these network data sets is their coefficient of variation (s/µ) exceeding unity.

Experimental results
In this section, we compare the NBurr distribution with the other seven established models, namely Power-law, Pareto, Log-normal, Power-law with cutoff, Burr, exponentiated Burr [26], MO Burr [22] distributions.To estimate the parameters (α, β, λ, c) of the NBurr distribution numerically, we have used 'optim' function along with the quasi-Newton L-BFGS-B algorithm in R statistical software by taking the initial parameters value (α, β, λ, c) = (1, 1, 1, 1).The estimated values of the parameters for all the network data sets satisfied the following conditions: α > 0, β > −1 λ > 0 and c > 0 as depicted in Table 3, which characterize the proposed NBurr distribution.Empirically it is observed that in the case of social networks, the estimated value of the parameter λ attains the higher values as compared to the estimated value of α.From Table 3 it is also clear that the proposed NBurr distribution produces higher p values through bootstrapping chi-square test which suggests that the null hypothesis i.e., "the data are drawn from NBurr distribution", cannot be ruled out at the 0.05 level of significance.This recommends in favor of the use NBurr distribution for fitting the node degree distribution of a network.Experimental results (given in Table 3) suggests that the proposed NBurr distribution is effective in modeling the entire degree distribution of real-world complex networks.Also, we used some other statistical measures, viz.KLDiv, RMSE, and MAE to compare the performance of the proposed NBurr distribution with the competitive heavy-tailed distributions as shown in Tables 4 and 5.
Given a network, the information about the differences between actual and predicted degree frequencies can be determined by calculating the root mean squared error (RMSE) and mean absolute error (MAE).The higher similarity between actual and predicted distributions is achieved by generating smaller values of RMSE and MAE.From Tables 4 and 5, it is clear that the proposed NBurr distribution produces smaller RMSE and MAE values than other competitive distributions.This indicates that the proposed NBurr distribution is competitive in almost all the networks, except a few, where Burr, MO Burr, and power-law cutoff distribution perform better than the proposed one.Power-law and Pareto distributions provide lower performance compared to the other competing distributions in terms of RMSE and MAE measures over all the real-world distributions.We also studied a relevant model, namely NBurr distribution, from the generalized Burr family and derived various statistical properties.The practical usefulness of the proposed NBurr distribution was shown using multiple heavy-tailed network data sets from different disciplines.It is interesting to note that the NBurr distribution is a new competitor for popularly used MO Burr and exponentiated Burr distributions.An immediate extension of this work is to apply these newly introduced probability models for modeling survival and lifetime data sets.Another possible extension of this work would be to look for implementing the proposed SPT method for other size distributions available in the statistics paradigm.

Theorem 2 . 1 .
GBurr family of distributions are: (a) heavy-tailed, (b) right tail-equivalent to a Pareto distribution, a (c) belongs to the MDA of the Frechet distribution.P r o o f.(a) For the GBurr family of distributions, we note that lim x→∞ exp{kx} (1 − F

1 Figure 1 .
Figure 1.Plots of PDFs of the NBurr distribution.
is decreasing in x and again A(0) = 0. Thus, we have A(x) ≤ 0. Similarly, we can prove that if c > 1, then A(x) ≥ 0 when c − 1 > x λ c and A(x) ≤ 0 when c − 1 < x λ c .This implies that for 0 ≤ c ≤ 1, h NBurr (x) is decreasing in x; for c > 1, h NBurr (x) is increasing in x, when c − 1 > x λ c and h NBurr (x) is decreasing in x, when c − 1 < x λ c .Thus, the hazard function h NBurr (x) attain its maximum at x = λ [c − 1] 1/c .