Modelling with star-shaped distributions

Abstract:Weprove and describe in great detail a generalmethod for constructing awide range ofmultivariate probability density functions. We introduce probabilistic models for a large variety of clouds of multivariate data points. In the present paper, the focus is on star-shaped distributions of an arbitrary dimension, where in case of spherical distributions dependence is modeled by a non-Gaussian density generating function.


Introduction
The components of a random vector following a spherical or even elliptically contoured distribution are only independent if the density generating function is a Gaussian one. In order to generalize the class of elliptically contoured distributions, Fernandez et al. [4] introduced the star-shaped distributions. Inference problems in linear models with star-shaped mixtures of Gaussian errors were considered by Jensen [5]. In the paper by Kamiya et al. [6], star-shaped distributions are studied within the framework of orbital decomposition and global cross-sections. Later Balkema et al. [2] examined the limit star shape of scaled sample clouds and the related distributions. Yang and Kotz [23] describe an alternative approach by considering center similar distributions. Recently, Kamiya [7] investigated the estimation of the shape of density level sets from starshaped distributions. In the last decade, star-shaped distributions have become popular because of their exibility in shape. Deviating from the shape of ellipses/ellipsoids, a wide range of shapes of the density contours are possible among the star-shaped distributions.
In scatterplots of real datasets one can frequently see that the shapes of the contour lines/surfaces are not ellipses/ellipsoids. This is a reason for looking for generalizations. Here star-shaped distributions come into focus. The detailed theory of star-shaped distributions including stochastic and geometric representations is developed by Richter in [16]. For understanding normalizing constants of density generating functions as ball numbers (even in more general cases), we refer to [21]. Semiparametric and parametric estimation methods for density generators, generalized radius distributions and the star-shaped densities are examined in the paper [11]. The latter three papers contain references to a lot of other papers on star-shaped distributions and ball numbers. The star-shaped distributions represent a rather general and exible class of distributions including convex as well as non-convex shapes of contours. Thus this class is appropriate for describing the wolf-dieter.richter@uni-rostock.de underlying distribution of a large variety of datasets or data clouds. However, exible parametric classes involve a lot of parameters which in turn carries the risk of over tting.
The main goal of this paper is to introduce useful speci c model classes and to examine their use in the statistical framework. In Section 2 of this paper, the continuous star-shaped distributions are introduced and their basic properties are studied. Here we use rather general coordinates for establishing models for the function de ning the contour (level sets). We derive formulas for the rst two moments. In establishing proper model classes, identi ability is a very important issue. If identi ability is not given, we cannot expect to get consistent estimators in the framework of statistical inference. In Section 2.3 we provide a su cient condition for the identi ability. The de nitions of model classes for the generating function are presented in Section 3. In Section 4 we introduce a lot of models for the Minkowski functional which determines the shape of the level sets of the density in the two-dimensional case. Section 5 provides such models in the higher-dimensional case. Section 6 is dedicated to the simulation of random vectors with a speci ed starshaped distribution. For establishing the simulation procedure we use a representation of the distribution by a generalized radius and spherical coordinates given in Section 2. The maximum likelihood method for the estimation of the parameters is brie y discussed in Section 7. In Section 8 the reader nds a discussion on model checks. Real data examples and its tted distributions are presented in Section 9. The proofs of the statements can be found in Section 10.

Continuous star-shaped distributions . Introduction and general properties
We assume that K ⊂ R d is a star body (a bounded set with the property x ∈ K ⇒ λx ∈ K for ≤ λ ≤ ) having the origin in its interior. The Minkowski functional of K, is de ned for every K under consideration and may particularly be any norm or antinorm. Although, for mathematical reasons discussed for a particular case in [19], and more generally in [22], it is not trivial to further assume that the Minkowski functional h K of K is positively homogeneous of degree one. This restriction made here might be considered not to be too restrictive in many applied situations. A function f : R d → [ , ∞) is called positively homogeneous of degree k if f (λx) = λ k f (x) holds for x ∈ R d , λ > . The set K(r) = rK and its boundary S(r) = rS are called the star ball and star sphere of star radius r > , respectively. Notice that S = {x : h K (x) = }. This star sphere S corresponds to the shape in [7]. Star balls may thus be convex or radially concave subsets of the sample space, for example p-generalized ellipsoids with p ≥ or < p ≤ , respectively.
is called a density generating function, a star-shaped density of a random vector X = (X ( ) , . . . , X (d) ). The star body K de nes the contour. It is thus adapted to the shape of the data cloud. The corresponding probability measure is denoted Φ g,K and the normalizing constant allows the representation where O S (S) means the star-generalized surface content of S, see [16]. If the additional assumption C(g, K) = is satis ed then g is called a density generator. A random vector U following the distribution is said to be star-uniformly distributed. The geometric measure representation of a star-shaped distribution law reads Here B d is the σ-algebra of Borel sets of R d , and is the star sphere intersection proportion function of the set B. The random variable R = h K (X) is called the star radius of the observation vector X. This random vector X satis es the stochastic representation with R and U being independent (symbol d = means that the random variables on both sides have the same distribution law). Let, moreover, . denote any norm or antinorm in R d and B = {x ∈ R d : x ≤ } and its boundary S B the corresponding unit ball and sphere, respectively. Because of the homogeneity property of h K , ϕ g,K allows the representation where the point x x belongs to the norm or antinorm unit sphere S B . One may imagine that there is a rst idea for describing a shape in the data cloud by the level sets of the functional . and afterwards there appears the wish to correct or modify this functional by a suitable direction dependent or locally acting factor h K (x/ x ). For a moment, let us consider the case of the Euclidean norm . = . . In this case, Yang and Kotz [23] studied the class of distributions with density (3), where our function h K corresponds to b − in their paper (b is the so-called bound function), and our function g corresponds to function G. Here G is the antiderivative of r −g(r) · r −d , where G(∞) = and g is de ned as in Lemma 2.3 of Yang and Kotz's paper. Yang and Kotz call this class center-similar distributions and emphasize that X/ X does not follow a uniform but an arbitrary distribution.
In the next step we incorporate location and scale in the formula for the density. To ensure identi ability, we assume in the following that Then κ − = O S (S) − is the normalizing constant of the density. A tractable formula for κ is given below. We consider a random vector X having the density φ g,h K ,µ,Σ This distribution of X is referred to as a continuous star-shaped distribution. In this formula function h = h K determines the contour of the density. µ ∈ R d is the location parameter. The diagonal matrix Σ = diag(σ , . . . , σ d ) with σ i > for i = , . . . , d consists of scale parameters of the distribution. κ is a suitable constant. In view of (2), X also allows the representation For any b ∈ R d and any diagonal matrix A ∈ R d×d with positive diagonal entries, it can be easily shown that AX + b has the density ϕ g,h,Aµ+b,ΣA . The density ϕ g,h, ,I is called standard star-shaped.
Next we develop a representation of h K using general coordinates r, ψ , . . . , ψ d− , where r ∈ [ , ∞), (ψ , . . . , ψ d− ) ∈ A d ⊂ R d− . Later models for h will be based on this representation. Let . be a given norm or antinorm. Further let a transformation from certain d-dimensional coordinates r, ψ , . . . , ψ d− to the Cartesian ones x j be de ned by functions I , . . . , I d : A d → R as for j = , . . . , d, where r = x . Assume that there is a bijective mapping T : .
If the upper limit of the product is smaller than the lower one, then we de ne the product to be 1. Then the mappingT assigns to vector x the corresponding spherical angles α , . . . , α d− from A d . The following Jacobian determinant is well-known: The transformation from these coordinates r, α , . . . , α d− to the Cartesian ones x j is given by (7) and According to Theorem 2 of [13], we can provide the formula for the Jacobian determinant: The interested reader nds even more general coordinates in papers from the references at the end of this note.
In many particular cases of these coordinates (including Examples 1 and 2), ψ , . . . , ψ d− play the role of generalized spherical angles. We introduce a function H : for ψ ∈ A d . Function h K can then be written as for x ∈ M, and corresponding r > , ψ ∈ A d . The idea behind this approach is to separate the dependence of the density on r and the ψ-coordinates. Here h K can be written as a product of r and a function H of ψ, and H(T(.)) is a positively homogeneous function of degree zero.
The following lemma deals with the computation of probabilities and the constant κ: Lemma 2.1. Let X be a random vector distributed as a continuous star-shaped distribution with the density function (5). Then In this lemma, the setsĀ are slices of R d with rectangular range of the coordinates ψ i , the radius depending on the ψ-coordinates in a special way. Now we have the following statement on the rst two moments (V is the symbol for the variance):

Lemma 2.2. Let X be a random vector distributed as a continuous star-shaped distribution with the density function (5). Assume that G d+
.

Random radius and ψ-coordinates
Remember that the star-generalized radius can be computed by Vector Ψ consists of the ψ-coordinates (angles in the spherical coordinates case) representation of Σ − (X − µ). Lemma 2.3 gives the joint distribution of R and Ψ.

Lemma 2.3. The random variables R and Ψ are independent with densities
This lemma shows that the distribution of X can be represented by two independent components. Moreover, the generating function g is directly related to the density of R.
In the following we discuss the identi ability of star-shaped distribution classes.

. Identi ability
Identi ability of model classes for distributions ensures that two di erent sets of parameters lead to di erent probability measures of the random vector X. We pose assumptions which turn out to be crucial for the veri cation of identi ability. A Ag: G is the set of right continuous functions g : [ , ∞) → [ , +∞) satisfying (4) and the following two conditions: Roughly speaking, it is excluded in Assumptions Ag and A h that scale transformations of g ∈ G lead to another element of G, and scale transformations of h ∈ H lead to another element of H. The special form of scale condition (i) is reasoned by the fact that g , g have to ful l (4). The next Theorem 2.4 provides the identi ability result.

Theorem 2.4. Assume that Assumptions Ag and A h are satis ed. Then the model class {φ
Now the task is to check carefully the conditions of Theorem 2.4 in the case of speci c model classes.

Modelling the density generating function
Certain aspects of dependence modeling on using p-generalized non-Gaussian density generating functions for jointly ln,p-symmetrically distributed random variables are studied in [12]. We consider here a model family {g θ : θ ∈ Θ } of generating functions satisfying Ag. Θ ⊂ R q is the corresponding parameter space. The model leads to a density with nite local maximum at zero if lim r→ + g(r) < +∞ and either lim For many applications, it is bene cial to have this property. Let us introduce three models for g θ .
(1) K - Condition (13) is ful lled for s = . The generalized radius R has a generalized Gamma distribution. In the case s = − d, R has a Weibull distribution.
Condition (13) is ful lled for s = . In the case s = − d, R has a scale-transformed one-sided student-tdistribution. These model classes ful l Assumption Ag. Notice that a scaling parameter cannot be incorporated for the reason explained in the previous section.

Modelling function H in the two-dimensional case . Using Euclidean norm
Let d = and φ(x, y) ∈ [ , π) be the angle in radians between the positive x-axis and the line from the point (x, y) to the origin: φ(x, y) = arctan(y/x) for x > , y ≥ , φ(x, y) = arctan(y/x) + π for x < , φ(x, y) = arctan(y/x) + π for x > and y < , φ( , y) = π · ( − sgn(y)) for y ≠ . In the following we propose model classes {Hη : η ∈ Θ } for the function H introduced in (9). Here Θ ⊂ R q is the parameter space of H. In view of (10), we have for Here Hη(φ(.)) is positively homogeneous of degree zero. This leads to the density We state the important principle for Section 4: Construction principle: Assume that for empirical or theoretical reasons it seems that, globally viewed, the sample cloud re ects the shape of the sphere S B or some shape being close to it. Choose then a direction dependent or locally acting function φ Hη(φ) for modelling deviations from this shape and with Hη(φ(x)) being homogeneous of degree zero, such that the complete shape is best in some sense approximated by the level sets of the function (14).
The following model classes are introduced according to this principle. Now we introduce two model classes for Hη satisfying the following Assumption H : Here the constant κ can be evaluated by using the formula A rather simple class of model functions Hη is given by M 1: are the parameters of this function.
Changing the parameter β induces a rotation of the contour curves. Changing a, b, c leads to another shape of the contour lines. In the case c = , we have Model class 1 includes elliptical contours for c = , a = A B − − , b = , where A, B are the lengths of the semi axes of the ellipse. In this case the resulting density ϕ is an elliptically contoured one. The resulting function h of model class 1 is given by To illustrate the shape of the star-shaped density, we depict contour curves (level sets) of the density which are curves {(r, φ) : r = C H(φ) − , φ ∈ [ , π)} in polar coordinates with certain values of C > . Figures 1 and 2 show the variety of shapes of contours within model class 1. In the gures the curves are coloured in blue, orange and green in this order. In Figure 2 left the contour of the density is depicted in a special case to give an impression of the resulting contours.

. Using a general norm
For further dealing with representation (1) we shall make use of suitable coordinates, especially for describing B and S B . It turns out that easiest use of suitable coordinates is in dimension two, see [17]. Let, according to [14] and [18], the . -norm or antinorm or star-generalized polar coordinate transformation The inverse of this transformation is de ned by Here φ(., .) is de ned as in Section 4.1. Because of the relation Pol − S B ( x x ) = ( , φ(x)) and the de nition Hη(φ) = h K (Pol S B ( , φ)), the following general representation formula for star-shaped densities is proved.
For arbitrary norm or antinorm . , the star-shaped density allows the representation where the function x H(φ(x)) is homogeneous of degree zero. We recall that the argument of function g is assumed to be positively homogeneous of degree one. Let us consider the following model class for H.

M 3: Function H is given by
where η = p > and x p denotes the l ,p -norm or antinorm of x if respectively p ≥ or < p < . Using function Hη in (18), it may be checked immediately that the function is positively homogeneous of degree zero. If the shape of the sample cloud deviates from the contour or the level sets of the functional . in a similar manner as the contour or level sets of the norm . p deviates from that of the functional . , then the (standard) probability density function could be suitable to (globally and locally) model the data. This is what one might call data cloud oriented modelling. In the present case, the shape de ning star body K has Minkowski functional In the particular case that . = . , this means In other words, in dependence of the direction determined by angle φ, function Hη in (18) describes the deviation of the l ,p -unit sphere (that is the l ,p -unit circle) from the Euclidean unit sphere. It may seem not to be trivial to explicitly describe the star uniform distribution on S in this way, in general. Nevertheless, modelling data clouds by primary (or global) approximation with norm . and secondary (or directionally correcting or local) approximation with function Hη may be successful. Clearly, if one just starts from an ansatz like (21) then, vice versa, one can derive the coordinate representation of Hη in (18) for the purposes of statistical analysis, see Sections 7 and 9.
The following Figure 4 shows the level sets of the function x Hη(φ(x)) and the Minkowski functional h K for di erent values of p and norms . , respectively. If we are given Hη of model class 1 with β = , a ≠ b, then it follows immediately that Hη(φ(x)) allows the representation (see (15)) with x (u,v) denoting the norm x (u,v) = (( x u ) + ( x v ) ) / . The function Hη(φ(x)) is homogeneous of degree zero. For several values of A and B, and di erent norms . , Figure 5 shows the level sets of the function h K with where the p-generalized sine function sinp de ned in Example 2 of Section 2.1 is just sin S B in (16) for the case . = . p. Then Hη(φ(x)) = + a x x p is positive homogeneous of degree zero and ϕ g,h, ,I (x) = κ − g( x ( + a x x p ).
The following Figure 6 illustrates the shape of density level sets for model class 4. , . , . , right: the same for p = As explained above, an additional parameter β causes a rotation of the contour of h K if φ is replaced by φ − β in the formulas for Hη. We do not discuss this aspect here in more detail. The next section is devoted to higher-dimensional models.

Modelling function H in higher-dimensional cases . Using Euclidean norm
In this section, we use the d-dimensional spherical angles α , . . . , α d− ∈ ( , π), α d− ∈ [ , π) according to Example 1 in Section 2.1, combined as vector α ∈ A d = ( , π) d− × [ , π). It is reasonable to pose a symmetry assumption h K (−x) = h K (x) for all x ∈ R d which is equivalent to for α ∈ A d , where ±π = π for α d− ≤ π, ±π = −π otherwise. Then Hη is homogeneous of degree zero. We consider model functions Hη : A d → ( , ∞) with parameters η ∈ Θ ⊂ R q , and pose the following assumption on H = Hη: Let Qη be an orthogonal matrix describing the rotation of the density in terms of η. Then the formula for the contour de ning function is as follows: Hη(T(Qη x)).
By means of Qη, we rotate the contour determined by the prototype model Hη. It has to be ensured that a rotation of one element Hη does not lead to another element Hη (η ≠ η ) of the model class. To simplify the representation, we do not consider an additional rotation in the following. Incorporating this rotation is left to the reader. Next we introduce the model In particular, the one-dimensional marginal densities can be evaluated by With this lemma, easy-to-handle formulas for the marginal densities of Ψ of every dimension are provided. S gives integrals of powers of the sine function. As a byproduct of the proof of the previous lemma, we obtain a formula for κ.

. Using other norms
Let x be any norm or antinorm of R d . In this section the general model for function h is given by where Qη is an orthogonal matrix describing a rotation. As above we omit this rotation in the following to simplify the presentation.
From a technical point of view, the situation in higher dimensional cases still di ers essentially from that in dimension two because the properties of the simple transformation (17) are not being dominant in the multivariate case. More advanced coordinate systems and techniques might be useful to construct functions Hη being positively homogeneous of degree zero.
However, on using the common multivariate polar or spherical coordinate transformation and its well known inverse, we can introduce another model class.
M 7: For η = p > ,  In case that B = {x ∈ R d : x ≤ }, Model class 3 may directly be generalized to the multivariate case, meaning that equations (19) up to (22) are valid also for x ∈ R d with the norms correspondingly de ned there. Now we intend to generalize function H from (29) using ellipsoidal coordinates on the basis of E (a,b)generalized trigonometric functions being de ned in [15] for positive values of a, b as where N a,b (φ) = (((cos φ)/a) + ((sin φ)/b) ) / . Let the ellipsoidal coordinate transformation where a = (a , ..., a d ) T consists of positive real numbers, be de ned as in [15]. The map T E (a,b) is almost one-to-one, and with its inverse is given a.e. by r = Here, arccos (a j ,a j+ ) denotes the function inverse to cos (a j ,a j+ ) and Q up to Q denote anti-clockwise enumerated quadrants from R . Let B = {x ∈ R d : x a, ≤ } and, for arbitrary p > , the following model class can be de ned: M 8 : For η = (p, a , . . . , a d ) T , p, a j > , Hη(α) = (| cos (a ,a ) (α )| p + | sin (a ,a ) (α ) cos (a ,a ) (α )| p + ...
For the stochastic representation of a vector X following this distribution, see Section 1. A principal component representation of ϕ g,h, ,I has been dealt with in [20].

Simulation
Simulations of the distribution can be based on Lemma 2.3. LetX = Σ − (X − µ), and Ψ =T(Σ − (X − µ)) be de ned as in Section 2.2. Remember that R = h(X). By (9), we have Let model functions g, H be given. Then we can apply the following algorithm to simulate X: 1) Generate R with density f from Lemma 2.3.

Estimation
Throughout this section, let X , . . . , Xn with X i = (X i , . . . , X id ) T be a sample of independent random vectors having the density φ g,h,µ,Σ according to (5) and (10). Suppose that function g belongs to the model class {g θ : θ ∈ Θ } with compact Θ ⊂ R q as described above. Moreover, it is required that function H depends on a parameter η such that H belongs to the model class {Hη : η ∈ Θ } with compact Θ ⊂ R q as described above. Let Θ ⊂ R d and Θ ⊂ ( , ∞) d be compact sets.

Model checks . Checking the model for the distribution of the star-generalized radius
Letμ,σ,η,θ be the maximum-Likelihood estimators for the parameters as explained in the previous section. X , · · · , Xn is the sample as in the previous section. Let Y i =Σ − (X i −μ). Here we consider the pseudo-sample Y , . . . , Yn, and introduce for i = , . . . , n. The corresponding empirical distribution function of R i is given by The order statistics of R i are denoted by R ( ) , . . . , R (n) (R (j− ) ≤ R (j) ). Based on a model function g θ (θ ∈ Θ ) for the generating function, the model distribution function for the generalized radius can be evaluated by (formula in Lemma 2.3) In the following we consider the Anderson-Darling statistic which measures the discrepancy between the model distribution and empirical distribution coming from the sample. This statistic is calculated by Let F exp R (x|λ) = − e −λx be the distribution function of the exponential distribution. The exponential distribution is regarded as a reference distribution here. For comparisons we can calculate the approximation coe cient:ρ (ρ ≤ ). A detailed study of this coe cient can be found in [10]. Here we compare the actual model distribution of R with the exponential distribution as the simplest choice. Ifρ is large enough, ideally close to 1, the distribution of R can be considered as well-approximated. The application of goodness-of-t tests like the Kolmogorov test is straightforward and is omitted here. In the framework of elliptical distributions such goodness-of-t tests are considered in Batsidis and Zografos [3].

. Checking the distribution of Ψ
for i = , . . . , n.Ψ i is the vector of spherical angles of the normalized sample item. The goodness-ofapproximation of the one-dimensional distributions of Ψ , . . . , Ψ d− (marginal distributions of Ψ) can be checked in the same way as described in the previous section with the uniform distribution as the reference distribution.
Next we want to discuss brie y a measure for the goodness-of-approximation of the copula of Ψ. Here we pursue the approach using the Cramér-von Mises divergence. Concerning this approach and theoretical properties of this divergence, we refer to [9]. Let F Ψ (.|η) and F j (.|η) be the distribution functions of Ψ and Ψ j depending on the model parameter η of function H. The corresponding density of Ψ is provided in Lemma 2.3. We introduce the model copula of Ψ : which measures the discrepancy between the empirical copula and the model copula C. Let Cη and C be the copulas of the model and the independent copula (which serves as reference copula), respectively. From [9] we can take the coe cient of goodness-of-approximation: whereη is the estimator for the parameter η, andρ ≤ . The larger the coe cientρ the better is the approximation of the distribution.

Real data examples . Example 1
In this section we consider the dataset 5 of Andrews and Herzberg [1]. The yield of grain and straw are the two variables. Assuming the Pearson VII model for g and the model 1 for H, we achieved an approximation coe cient of 0.9846. The following Figures 9 and 10 show the data and the estimated density. The contour curves in Figure 10 are far away from being ellipses. Thus modelling with star-shaped distributions makes sense.

. Example 2
This example should show that the above described methods work even for economic data. Here we consider weekly index data from Morgan Stanley Capital International of the world and LPX 50 index data for the period April 2003 to December 2016. The LPX 50 Index is a global equity index covering the 50 largest listed private equity companies which ful l certain liquidity constraints. The index is well diversi ed across regions, investment-and nancing styles, and vintage years. We computed the index values as the ratio of subsequent values minus one. Let F Ψ be the distribution function of Ψ. For the function H, we used class 1. Fitting the distribution of the random vector by the maximum likelihood method, the following results were ,â = . ,b = . ,β = . . The model "Pearson VII" turns out to be the best one. The density and the data are depicted in Figures 11  and 12. The subsequent gures show the distribution functions of R and Ψ.      (7) with transformation determinant (8) and by q = rH(ψ), we obtain This identity yields the rst formula of the lemma. Moreover, R d ϕ g,h, ,I (x)dx = implies the second assertion of the lemma. P L 2.2: De neX = Σ − (X − µ) = (X ( ) , . . . ,X (d) ) T . Then we have Thus Analogously, Combining the above formulas, we obtain the lemma. P L 2.3: From (31), we see that (R, Ψ) has the density ϕ (R,Ψ) (q, ψ) = H(ψ) −d∆ (ψ) g(q)q d− .