Abstract
Acyclic phasetype (PH) distributions have been a popular tool in survival analysis, thanks to their natural interpretation in terms of aging toward its inevitable absorption. In this article, we consider an extension to the bivariate setting for the modeling of joint lifetimes. In contrast to previous models in the literature that were based on a separate estimation of the marginal behavior and the dependence structure through a copula, we propose a new timeinhomogeneous version of a multivariate PH (mIPH) class that leads to a model for joint lifetimes without that separation. We study properties of mIPH class members and provide an adapted estimation procedure that allows for rightcensoring and covariate information. We show that initial distribution vectors in our construction can be tailored to reflect the dependence of the random variables and use multinomial regression to determine the influence of covariates on starting probabilities. Moreover, we highlight the flexibility and parsimony, in terms of needed phases, introduced by the time inhomogeneity. Numerical illustrations are given for the data set of joint lifetimes of Frees et al., where 10 phases turn out to be sufficient for a reasonable fitting performance. As a byproduct, the proposed approach enables a natural causal interpretation of the association in the aging mechanism of joint lifetimes that goes beyond a statistical fit.
1 Introduction
When studying insurance products on multiple lives, it is natural to assume that individuals who are exposed to very similar life conditions may have somewhat correlated lifetimes. This is especially true for married couples, since, once married, the spouses typically share to a large extent a similar lifestyle. Indeed, the simplistic assumption of independence of lifetimes of partners has been shown to be inappropriate in various papers. For example, Frees et al. [15] used a bivariate Frank copula model to assess the effect of dependency between husband and wife on insurance annuities, illustrating their approach on a by now classical data set of a large insurer. The same data were used in Carriere [10], where multiple bivariate copula models were studied. The LinearMixing Frailty copula was found to be the bestsuited model to describe the data. Shemyakin and Youn [22] introduced a general conditional Bayesian copula for the joint last survivor analysis. They allow entry ages of spouses to have a selection effect on his/her mortality, as well as on the other spouse’s mortality. For that same data set, Luciano et al. [19] captured the dependence between survival times of spouses by an Archimedean copula, whose marginals were estimated according to a stochastic intensity approach. In a similar spirit, Dufresne et al. [13] allow for the Archimedean copula parameter to depend on the age difference of partners at issue of the policy, to describe the dependence of the remaining lifetime of a couple. In the the study by Gobbi et al. [16], extended MarshallOlkin models were employed for that same data set, where the continuous copula approach is extended by allowing for fatal events that affect both marginal lives.
In this article, we propose an alternative to copulabased methods for the modeling of joint remaining lifetimes in a couple based on multivariate phasetype distributions. Phasetype (PH) distributions are interesting candidates since they broaden favorable properties of exponential random variables to scenarios where the latter alone would not be appropriate. In particular, the denseness of PH distributions among all distributions on the positive halfline in the sense of weak convergence, which extends to the multivariate setup, is a major advantage when one wants to approximate a distribution. For more details on PH distributions, we refer readers to Bladt and Nielsen [8]. As opposed to copulabased methods, a PH distribution can give rise to a natural interpretation when used to approximate lifetime distributions. One can view the path of a Markov jump process as the life of an individual, which goes through several different states (for instance, biological markers) before reaching the inevitable absorption (death) state. Along this interpretation, acyclic PH distributions have been the first choice for modeling the aging process of a human life, since they only allow forward transitions or direct exits to the absorption state. This characteristic makes them an appropriate tool for describing lifetimes ended by natural aging or accidents. In the study by Lin and Liu [18], a PH distribution with the Coxian structure was used to explain the physical aging process of marginal lifetimes. This approach was extended in the study by Asmussen et al. [5] to generalized Coxian distributions for the purpose of pricing equitylinked products. The first contribution to lift the PH approach to bivariate lifetime models was Ji et al. [17], where a Markovian multistate model and a semiMarkov model are used to describe the dependence between the lifetimes of husbands and wives. Spreeuw and Owadally [23] also use a Markovian multistate model, with more attention given on how to tie the bereavement effect to forces of mortality. Moutanabbir and Abdelrahman [20] then used a bivariate Sarmanov distribution with PH marginals to model joint lifetimes. Both articles focused on the pricing of multiplelife insurance contracts.
Recently, Albrecher et al. [2] introduced timeinhomogeneous PH (IPH) distributions for the purpose of lifetime modeling, which leads to a considerable reduction of necessary phases for a satisfactory fit of given data, since the introduced inhomogeneity can more efficiently accommodate nonexponential shapes than an augmentation of the phase dimension. In particular, [2] applied regression on the intensity functions of the IPH distributions to associate lifetimes of different cohorts and populations.
In this article, we propose a different route for using available information in the data set, namely, to incorporate multinomial logistic regressions in the estimation procedure of multivariate PH distributions. In particular, the regression is applied to the initial distribution vectors of each IPH component, which adapts an approach presented in Bladt and Yslas [9] to the multivariate case. The resulting dependence structure allows for explicit formulas alongside an intuitive “aging” interpretation and, beyond the theoretical contribution, for a satisfactory fit to the bivariate spouses’ lifetime data.
The remainder of this article is structured as follows. Section 2 introduces the class of multivariate PH distributions that we will use to describe joint lifetimes of couples; we also provide some additional properties. In Section 3, an estimation method for this multivariate PH distribution is introduced, which allows for right censoring and covariate information. Section 4 then applies and illustrates the procedure on the classical spouses’ lifetime data set from [15] and interprets the results. Section 5 concludes this article.
2 Multivariate phasetype distributions
We first recall the mPH class, which was introduced in the study by Bladt [7].
2.1 mPH distributions
Let
we may write
for
The crucial property of this class of PH distributions is now its dependence structure. Concretely, the assumption is that all jump processes start in the same state at time
We will use
are then all univariate PH distributed. We say that the random vector
The joint cumulative distribution function of
Furthermore, the survival function is
and the probability density function is given by
For more details, compare with [7].
2.2 mIPH distributions
The particular focus in this article will now be on an inhomogeneous extension of the mPH distribution (briefly mentioned in [7, Section 6.1]). When considering timeinhomogeneous Markov pure jump processes on the common statespace
with subintensity matrix
The random variables
Here, we focus on the particularly tractable case
With
the cumulative distribution function, survival function, and density of
and
respectively. Note that one can view each IPH random variable as a transformation of a PH random variable (and correspondingly the absorption time of a timetransformed formerly timehomogeneous Markov jump process), with
The construction of
When we condition a mIPH distribution on one or more marginals, we obtain another mIPH distribution with a new initial distribution vector and smaller dimension: Let
That is,
with initial distribution vector
The same reasoning can be applied to obtain
with
Remark 2.1
One might be tempted to argue that Assumption (2.1) necessarily leads to positive dependence of the resulting random variables (in our case lifetimes). However, sharing the initial state is not sufficient to obtain positive dependence, as the different intensity matrices may introduce countereffects. For instance, after starting in the same state, we could have a very small expected holding time and direct absorption for one marginal, while the second has to pass through the entire state space before absorption happens, leading to a very large survival time. This behavior could be very well reversed when starting in another (but common) state. Consequently, certain combinations of individual intensity matrices may give rise to negative dependence as well.
3 Parameter estimation for rightcensored data and covariate information
In the following, we first introduce the components needed to estimate the parameters of mIPH distributions, when rightcensored data is present. Second, we present how to estimate initial distribution vectors considering covariate information. Finally, we propose an adapted expectation maximization (EM) algorithm, which we name ERMI algorithm.
3.1 EM algorithm for rightcensored data
Taking inspiration from Asmussen et al. [6] and Olsson [21], we now derive conditional expectations needed in the EM algorithm for mPH distributions, where absorption times are allowed to be right censored. Since the eventually targeted mIPH distributions are transformed mPH distributions, after transformation of the data, the EStep and MStep of the algorithm are the same as for the timehomogeneous case.
Let
The sample
which is seen to conveniently fall into the exponential family of distributions and thus has explicit maximum likelihood estimators. The multiplications on the righthand side of (3.1) can be interpreted as follows: The first product gives the probability of starting in each transient state. The second consists of the likelihood of transitions between transient states, associated with respective sojourn times, such that the absorbing state is not considered in the multiplication. Finally, the last product gives the likelihood of transitions from transient states to the absorbing state, along with time spent in respective states before absorption.
With these assumptions, the derivation of
For the
where we see a mix of marginal densities and survival functions appearing in both the numerator and denominator.
Note that we introduce the notation
This expectation can also be expressed, using the
and we shall use this style in the following. The other needed conditional expectations are obtained in a similar way, reading
and finally,
3.2 Initial distribution vectors
Adapting an idea developed in [9], we apply the regression component to the initial distribution, i.e., we estimate a “personalized” initial distribution vector as a function of covariates, which increases the flexibility of the model. To that end, we use multinomial logistic regressions, where we consider the initial probabilities as response variables that depend on covariate information found in
with
In every iteration of the expectationmaximization (EM) algorithm to be described later, we use the conditional expectation of the number of times that the underlying process starts in a specific state as weights for the regression coefficients in
for
and set
in every iteration. The initial distribution hence depends on covariate information. Recall that all marginal processes
3.3 ERMI algorithm
Consider now a multivariate sample of rightcensored absorption times
where
In contrast to copulabased methods, this approach does not separate the estimation of marginals and multivariate parameters. This may be considered preferable as the implied multivariate distribution has a natural and causal interpretation and is intimately connected to the marginal behavior of the risks, whereas choosing a concrete copula family on given (and possibly already fitted) marginal risks is often a somewhat more arbitrary choice for the modeling of multivariate phenomena.
Algorithm 1. Adapted expectation maximization (ERMI) algorithm for mIPH distributions 

Input: Observed absorption times

















Output: Fitted representations

4 Modeling joint excess lifetimes of couples
In this section, we present an application of Algorithm 1 to the wellknown data set of joint lives used in [15]. In addition to survival times, this data set provides information on individuals’ ages at issue of an insurance policy. This leads to lefttruncated data in this particular case, and one might indeed consider the different entry ages with a lefttruncated likelihood in the estimation process, which is, however, quite inefficient. Instead, we propose here to use entry age as a covariate information and use multinomial logistic regression to deal with the different ages at issue. Entry ages were also considered as relevant factors for the dependence modeling in [22]. Initial distribution vectors obtained via regression will then incorporate the fact that an old couple is expected to survive less long than a young one and that the bereavement effect may be different for different age dynamics in a couple. In PH terms, when using an acyclical distribution, the older the couple, the larger the starting probabilities should be for states closer to the absorbing state, so that fewer states will be visited until absorption. Passing through fewer states translates into less time spent in the statespace before exiting, which results in smaller remaining lifetimes.
4.1 Description of the data set
The data set at hand provides information about 14947 insurance products on joint lives, which were observed from December 29, 1988, until December 31, 1993. We consider January 1, 1994, as the right censoring limit. For the purpose of this article, we only consider birthdays, sexes (and potential death dates) of policyholders, given that we are only interested in mortality, i.e., we do not make use of the monetary details of each contract.
After removing samesex couples and multiple entries (due to several contracts of the same couple), we compute the remaining lifetime that any person lived from the start of the observation period until the rightcensoring date 01.01.1994, given that they are at least 40 years old at the issue of the policy. Note that the terms “remaining” and “excess” are used interchangeably in the following. Doing so leads to 8,834 different joint excess survival times, with 155 cases where both individuals died, 1,057 where only one individual died, and 7,622 where neither died. Consequently, less than 2% of joint remaining survival times are fully observed. Concerning the 1057 cases where only one person in the couple died, in 820 of them this was the man, and in the complementary 237 cases the woman died. Hereafter, we refer to the start of the observation period as the “issue of the policy,” although the actual issue date may be later than 29.12.1988.
To prepare the data for the multinomial regression, we construct the covariate matrix
4.2 Fitting the mIPH distribution: marginal behavior
We assume that the remaining lifetimes in a couple, after issue of an insurance policy, follow mIPH distributions. We use information previously discussed as covariates to link starting probabilities to ages of individuals in couples, to reflect different aging dynamics in our model.
Let
According to the dependence structure defined in (2.1), after both underlying processes
Algorithm 1 is now applied for 1,000 iterations. In principle, this estimation procedure provides
for men and
for women. The optimal inhomogeneity parameters are
p  Intercept 




2 




(7.157)  (11.238)  (12.400)  (19.311)  
3 




(6.677)  (10.476)  (11.639)  (18.073)  
4 




(9.894)  (15.348)  (15.820)  (24.374)  
5 




(6.873)  (10.638)  (11.443)  (17.559)  
6 




(8.004)  (12.150)  (12.824)  (19.317)  
7 




(6.617)  (10.263)  (11.010)  (16.817)  
8 


22.990  8.794 
(9.419)  (14.387)  (14.885)  (22.584)  
9 

12.732 

18.559 
(7.224)  (11.046)  (11.889)  (18.050)  
10 




(6.504)  (10.054)  (10.892)  (16.670) 
For a two sided statistical test symbols
Initial distribution vector estimates for these four couples are as follows:
At first glance, it might seem odd that survival times of spouses with large age difference start in the same state of the distribution, but personalized starting probabilities and sexspecific transition intensities account for this. For example, consider Couple 4. If both excess survival times start in State 1, we see that the man’s underlying jump process is much more likely to reach the absorbing state directly from State 1, while the woman’s will at least advance to State 7 before absorption becomes possible. Thus, marginal intensities mixed with agedependent initial distribution vectors compensate the initialization in a shared state. In Figure 2, we depict the resulting marginal densities of the fitted remaining lifetime distributions for each of the four couples. As expected, the densities for women (solid black lines) allocate more mass to larger values than the male counterpart, and for older individuals, there is more probability mass for shorter survival times. Comparing Couples 1 and 4, we see that the density of the 63yearold man has a major mode at
The multimodality we observe in all marginal densities may be due to having different cohorts in the data. We decided not to manipulate the data set further, to avoid restricting our analysis to specific couples. Although spouses with small age difference can be thought of as belonging to the same cohort, we would also need to restrain the ages at issue to instances where enough data points are available for a meaningful estimation procedure (in particular uncensored data points). Doing so would lead to analyzing only couples where both spouses are aged around 65 years.
To assess the precision of our estimated marginals
where
4.3 Joint behavior
Let us now consider the resulting bivariate densities. In Figure 5, we find the bivariate densities for the four specified couples. The joint density of Couple 1 shows that large differences in their survival times are quite unlikely (the majority of the joint mass being located close the identity line). Also, the most likely survival times are close to 23 years. There is also a considerable probability mass above the identity line, where the woman survives longer than the man. For instance, we have
For Couple 2, the situation is different (top right panel of Figure 5): with the man being already 68 years, and the woman being 63 years old, the remaining lifetimes are shorter, with the major mode of the joint distribution being located near
The joint density for Couple 3 resembles the one of Couple 1, with survival times beyond 40 years being more unlikely. Despite the man being the same age as the man in Couple 1, there is a larger probability for the couple to have survival times close to
Finally, the joint density of Couple 4 is close to the one of Couple 2. The spike around
One may be tempted to conclude from these four distributions that for the same age, having a younger partner leads to longer survival times, which would signal that the bereavement effect is weaker when spouses have