Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access September 15, 2022

The Impact of Income Inequality on Mortality: A Replication Study of Leigh and Jencks (Journal of Health Economics, 2007)

  • Weilun Wu EMAIL logo
From the journal Economics


This study replicates “Leigh A., & Jencks C. (2007). Inequality and mortality: Long-run evidence from a panel of countries. Journal of Health Economics, 26(1), 1–24” study of the relationship between income inequality and mortality. L&J find no evidence that income inequality is related to country-level mortality. Their analysis is based on two-way fixed effects specifications applied to a panel of 12 countries over periods ending in 2003. After reconstructing L&J’s data from original sources, I am able to closely reproduce their findings. When I use superior inequality data and extend L&J’s data to an additional 11 countries and 15 years, I confirm their results. When I use multiple imputation to increase the sample to full size, I again do not find a significant relationship between income inequality and mortality. As a result, I conclude that my replication exercise confirms L&J’s results. The new analysis provides consistent evidence for the view that income inequality is not significantly related to mortality.

JEL Classification: I12; N30

1 Introduction

The relationship between income inequality and health has been widely studied. One claim that has attracted particular attention is that greater income equality is associated with better health outcomes in developed economies (Wilkinson, 1998, 2002; Wilkinson & Pickett, 2006). This is commonly referred to as “Wilkinson’s income inequality hypothesis” or just “the income inequality hypothesis.” Despite numerous analyses and numerous systematic reviews, the income inequality hypothesis remains contested (Fiscella & Franks, 2000; Gravelle, Wildman, & Sutton, 2002; Kim, 2019; Kondo et al., 2009; Lynch, Smith, Harper, & Hillemeier, 2004; Macinko, Shi, Starfield, & Wulu, 2003; Macinko, Shi, & Starfield, 2004; Patel et al., 2018; Rodgers, 1979; Subramanian & Kawachi, 2004). One of the most prominent studies disputing this hypothesis is Leigh and Jencks (2007). L&J has been cited 70 times in Scopus and 167 times in Google Scholar.

Using long historical data from 12 OECD countries, L&J estimate the relationship between income inequality, measured by the share of income going to the top 10% of income earners, and (i) life expectancy (LE) and (ii) the log of infant mortality. Their key finding is that the estimated effects of income inequality largely disappear when one controls for country- and year-fixed effects. Not only are the estimated coefficients statistically insignificant, but they find nominal increases in LE and decreases in infant mortality rates from greater income inequality.

While Leigh and Jencks (2007) employ 744 country-year observations in their main analysis, their samples only contain 12 countries. On the one hand, one might be concerned about the external validity of L&J’s analysis as they only examine 12 countries. According to L&J, the reason their study only includes 12 countries is the availability of accurate inequality data. I overcome this limitation by replacing L&J’s inequality data with World Inequality Database (WID). WID provides superior estimates of income distributions for 178 countries and areas. With WID and the updated databases, I am able to reanalyze L&J’s specifications with 1,500 observations, derived from 23 countries through 2018.

On the other hand, when using cross-country data over long periods, missing data are an issue. A severe example of missing data applies to the results reported in Columns (7) and (8) of their Table 4, which includes educational attainment as a control variable. While a complete dataset would allow for 528 country-year observations, their raw data only contain 108 observations of the education variable. This arises because education data are only available in 5-year intervals.

To address the problem of the missing data, L&J use linear interpolation to fill in missing values. Unfortunately, interpolation can cause bias in both estimated coefficients and their standard errors (Allison, 2001; Enders, 2010; Little, 1992; Musil, Warner, Yobas, & Jones, 2002). This raises concerns about both the coefficient estimates and confidence intervals reported in their article.

I have decided to replicate L&J because of its importance in the literature and the concerns mentioned earlier. In the first instance, I am interested in investigating whether L&J’s findings hold up when I update and expand their dataset. Second, I want to determine whether using proper procedures to address missing values in L&J’s dataset might affect their results. The result from this replication exercise is that I confirm L&J’s conclusions, providing robust evidence for the view that income inequality is not related to mortality.

2 Replication of the Data

The data and code that Leigh and Jencks’ (2007) used for their estimation are posted at Harvard’s Dataverse. Their longitudinal dataset consists of annual observations from 1903 to 2003 for 12 countries.[1] The original, underlying data sources are all publicly accessible. Unfortunately, the only version of the inequality data that L&J provide are postinterpolation. As a result, in replicating their work, I need to go back and reconstruct their dataset from the original sources.

L&J focus on two outcome variables, LE and infant mortality. Both of these are primarily sourced from the Human Mortality Database (HMD).[2] HMD has been updated since L&J collected their mortality data in 2002. As the previous HMD data are no longer available, my replication uses the latest release.

There are several issues involved in replicating the key independent variable, “Top Share 10,” which is the share of total income held by the richest 10% of earners. L&J cite Leigh (2007) as their source for Top Share 10. However, there are some differences in these series. One difference is that L&J’s Top Share 10 includes capital gains in its construction of income inequality, while Leigh (2007) does not include capital gains. An explanation for this and other discrepancies is provided in the online supplementary materials that accompany L&J (Leigh & Jencks, 2013).[3] My reconstruction uses Leigh (2007) because preinterpolation values for Top Share 10 are not available for L&J.

With respect to the control variables, GDP is sourced from Maddison Database 2010 (Maddison, 2010). Health expenditure data are taken from OECD Health Data 2007 (OECD, 2007). Educational attainment is acquired from Barro and Lee (2013). Despite the fact that my reconstructed dataset uses updated values, I am able to closely match L&J’s postinterpolation results.

3 Replication of Leigh and Jencks’ (2007) Results

Leigh and Jencks’ (2007) main analysis uses two-way fixed effects specifications. A two-way fixed effects model is often adopted to simultaneously control for unobserved country-specific characteristics and homogenous shocks across years.[4] Their key findings are reported in their Table 4, with the corresponding two-way fixed effects model given by equation (1).

(1) m ij = α + β ( Top Share 10 ) ij + γ Z ij + δ i + ρ j + ε ij ,

where m ij is an indicator of mortality, measured by either LE or the log of infant mortality. The subscript i stands for country, and the subscript j for year. Top Share 10 is the income share of the richest 10% of the population. This variable serves as the measure of income inequality. An implication of the income inequality hypothesis is that greater inequality is positively related to mortality. Z is a vector of covariates including GDP, educational attainment, and health expenditures. ε is an error term. Note that equation (1) includes both country and time fixed effects, indicated by δ i and ρ j , respectively.

I use equation (1) to re-estimate the relationship between inequality and health in the order of regressions presented in L&J’s Table 4. I first attempt to reproduce Leigh and Jencks’ (2007) results using my reconstruction of their dataset. In replicating their results, I rely on their programming code, which provided details about the linear interpolation approach they used. After replicating their results, I then reexamine the relationship between inequality and health in 37 countries with an extended time span.

The results of my analysis are presented in Table 1. All regression estimates are presented in pairs, with the left hand of the pair reporting the results for LE and the right hand of the pair reporting results for the log of infant mortality (IM). As L&J drew attention to the 95% confidence intervals implied by their estimated standard errors, Table 1 reports these below the estimated coefficients.

Table 1

Comparison of two-way fixed-effect estimates in L&J’s Table 4 with reproduction results

(1) (2) (3) (4) (5) (6) (7) (8)
(Period 1895–2003) (Period 1895–2003) (Period 1960–2003) (Period 1960–2003)
Original results
Income share of richest 10% (Top Share 10) 0.016 0.006 0.096 −0.003 0.054 −0.012 0.033 −0.010
[−0.131, 0.163] [−0.009, 0.021] [−0.112, 0.304] [−0.017, 0.010] [−0.091, 0.200] [−0.035, 0.012] [−0.098, 0.164] [−0.034, 0.013]
Real GDP per capita ($ 1,000s) 0.356* −0.036 1.199* −0.138** 0.199 −0.083** 0.170 −0.075*
[−0.010, 0.723] [−0.086, 0.014] [−0.028, 2.426] [−0.237, −0.039] [−0.084, 0.483] [−0.157, −0.009] [−0.102, 0.442] [−0.157, 0.007]
Real GDP per capita squared ($ 1,000s) −0.027 0.003** −0.006 0.002** −0.005 0.002**
[−0.059, 0.006] [0.001, 0.006] [−0.016, 0.004] [0.001, 0.004] [−0.014, 0.004] [0.001, 0.004]
Average years of education of adults aged 15+ −0.332 −0.050
[−0.992, 0.328] [−0.114, 0.015]
Log real public health spending per capita 0.105 −0.049
[−0.432, 0.643] [−0.150, 0.053]
Log real private health spending per capita 0.295 −0.050
[−0.358, 0.949] [−0.240, 0.140]
Observations 744 739 744 739 430 430 430 430
Reproduction results using reconstructed L&J’s data
Income share of richest 10% (Top Share 10) 0.037 0.007 0.110 −0.002 0.051 −0.007 0.026 −0.007
[−0.105, 0.178] [−0.007, 0.021] [−0.087, 0.307] [−0.016, 0.012] [−0.087, 0.189] [−0.025, 0.012] [−0.092, 0.145] [−0.026, 0.012]
Real GDP per capita ($ 1,000s) 0.356** −0.031 1.148** −0.131** 0.208 −0.075** 0.215 −0.065*
[0.092, 0.619] [−0.078, 0.016] [0.014, 2.283] [−0.225, −0.037] [−0.071, 0.487] [−0.147, −0.004] [−0.083, 0.512] [−0.139, 0.010]
Real GDP per capita squared ($ 1,000s) −0.024 0.003** −0.006 0.002*** −0.006 0.002**
[−0.056, 0.007] [0.001, 0.006] [−0.015, 0.002] [0.001, 0.004] [−0.014, 0.002] [0.001, 0.003]
Average years of education of adults aged 15+ −0.163 −0.025
[−0.576, 0.251] [−0.098, 0.048]
Log real public health spending per capita 0.009 −0.042
[−0.454, 0.472] [−0.154, 0.069]
Log real private health spending per capita 0.274 −0.046
[−0.397, 0.944] [−0.225, 0.133]
Observations 723 723 723 723 429 429 429 429

Note: All missing values in regressions are linearly interpolated following L&J’s approach. “Original results” label the results published in Leigh and Jencks (2007); “Reproduction results using reconstructed L&J’s data” labels the replication results using reconstructed L&J’s data. Numbers in brackets are robust standard errors, clustered at the country level. Significance level is labeled with stars, * for 10%, ** for 5%, and *** for 1%. “LE” stands for “Life expectancy at birth,” “IM” is the “Log of the infant mortality rate (per 1,000 live birth).”

In Table 1, the upper panel of results are reproduced directly from L&J’s paper and are the results they reported in their Table 4. The lower panel of results is derived from my reconstruction of their data. I use L&J’s linear interpolation strategy to impute missing values in my reconstructed dataset. Of concern that epidemiologic advances in the developed world between 1901 and 1961 could obscure the effect of inequality on mortality, Leigh and Jencks (2007) drop any data earlier than 1960 after Columns (3) and (4). Shortening the time span reduces the sample by about 300 country-year observations.

The fact that the results using reconstructed data are similar to the “original” indicates that my reproduction of L&J’s results was largely successful. For example, in L&J’s preferred specifications of Columns (7) and (8), the reproduced results of the effect of Top Share 10 on LE is 0.026 with a 95% confidence interval of [−0.092, 0.145]. This compares with L&J’s original estimates of 0.033 and [−0.098, 0.164], respectively. These results imply that a 10-percentage point increase in Top Share 10 is associated with an increase in LE of about 0.3 years. Note that the sign is counter to what the income inequality hypothesis predicts. Further, the estimate is statistically insignificant at the 5% level. This is indicated by the fact that both confidence intervals include 0. On the basis of these and similar results, L&J conclude that income inequality is not associated with greater mortality.

4 Reanalysis with the Extended Data

I next augment L&J’s sample by including additional countries. A major concern in all studies involving income inequality is the reliability of the inequality data. According to Leigh (2007), the only reason that Leigh and Jencks (2007) sampled 12 countries is the availability of high-quality income data. However, in recent years, academic studies of income inequality have increasingly turned to the WID as the authoritative source of world income inequality (Alstadsæter, Johannesen, & Zucman, 2019; Atkinson, Casarico, & Voitchovsky, 2018; Blanchet, Chancel, & Gethin, 2019).

WID covers 178 countries and areas, providing a variety of indicators describing income, wealth, and inequality levels for the top and bottom groups over long periods. It has earned wide acceptance because it (i) integrates micro data sources with national accounts, (ii) provides consistency in economic indicator measurements, and (iii) strives to create comparability across countries and years (Alvaredo, Atkinson, Chancel, Piketty, Alvaredo, Atkinson, Piketty, & Saez, 2013; Alvaredo, Chancel, Piketty, Saez, & Zucman, 2017; Piketty and Atkinson, 2010). WID data are free to use and easily accessible.[5]

L&J’s Top Share 10 data are derived from official tax tabulations. As a result, their samples are limited by tax thresholds. For example, individuals who do not achieve a certain income threshold or only have small taxable incomes would be underrepresented or omitted in L&J’s inequality dataset. This issue is overcome in WID by reconciling tax data with micro surveys. Survey data supplement information on both taxable and tax-exempt incomes from both tax filers and nonfilers. As a result, WID’s income share data have good properties in terms of representativeness and comprehensiveness.

Accordingly, I proceed by replacing Leigh and Jencks’ (2007) income inequality data with WID data. The “top 10% share” indicator in WID is calculated as the pretax national income share held by the richest 10% of the population. Although it employs the same general methodological approach as Leigh and Jencks’ (2007) Top Share 10, WID is different in its income concepts, population age cut-offs, and income units.[6] Other than Top Share 10, all other variables remain the same.

Updating the other variables to a larger set of countries and years was relatively straightforward given all were available online. For example, the measures of LE and infant mortality were updated through 2018 for 41 countries and areas, which are all available on the HMD website. Similarly, updated data for GDP, educational attainments, and health expenditures were all available via, respectively, the Maddison project website, Barro-Lee’s website, and the OECD website.[7] To compile an extended dataset, I merge all the aforementioned datasets, obtaining data on 37 countries from 1895 to 2018.[8]

The updated estimates are presented in Table 2. All the missing values in the dataset are linearly interpolated following L&J’s procedures. The expanded dataset allows for 2,051 and 2,043 observations, approximately three times larger than L&J’s data. Although the sample size is substantially increased, the estimates of interest remain statistically insignificant.[9] Some coefficients continue to have “wrong” signs.

Table 2

Comparison of reproductions of two-way fixed effects estimates in L&J’s Table 4 using updated data

(1) (2) (3) (4) (5) (6) (7) (8)
(Period 1895–2018) (Period 1895–2018) (Period 1960–2018) (Period 1960–2018)
Reproduction results using updated data from 37 countries
Income share of richest 10% (Top Share 10) −0.012 0.001 −0.011 0.000 0.006 −0.003 0.003 0.001
[−0.154, 0.130] [−0.014, 0.015] [−0.136, 0.115] [−0.010, 0.010] [−0.068, 0.079] [−0.012, 0.006] [−0.071, 0.076] [−0.009, 0.011]
Real GDP per capita ($ 1,000s) 0.292** −0.017 0.784*** −0.115*** 0.217** −0.077*** 0.218* −0.059**
[0.067, 0.517] [−0.051, 0.016] [0.232, 1.336] [−0.176, −0.053] [0.003, 0.430] [−0.133, −0.021] [−0.012, 0.447] [−0.112, −0.007]
Real GDP per capita squared ($ 1,000s) −0.012** 0.002*** −0.004 0.002** −0.004 0.001*
[−0.023, −0.002] [0.001, 0.004] [−0.009, 0.001] [0.000, 0.003] [−0.009, 0.002] [−0.000, 0.003]
Average years of education of adults aged 15+ −0.091 -0.036
[−0.476, 0.295] [−0.080, 0.009]
Log real public health spending per capita −0.051 −0.062**
[−0.395, 0.292] [−0.123, −0.001]
Log real private health spending per capita 0.045 −0.066**
[−0.276, 0.367] [−0.126, −0.006]
Observations 2,051 2,043 2,051 2,043 1,398 1,398 1,398 1,398
Reproduction results using updated data from 23 countries
Income share of richest 10% (Top Share 10) 0.058 −0.001 0.064 −0.002 0.047 −0.007 0.037 −0.004
[−0.030, 0.146] [−0.015, 0.014] [−0.024, 0.151] [−0.014, 0.009] [−0.031, 0.125] [−0.022, 0.008] [−0.037, 0.112] [−0.018, 0.010]
Real GDP per capita ($ 1,000s) 0.020 −0.001 0.345 −0.093** 0.087 −0.049 0.050 −0.028
[−0.115, 0.155] [−0.036, 0.035] [−0.162, 0.853] [−0.172, −0.014] [−0.180, 0.354] [−0.124, 0.026] [−0.240, 0.340] [−0.090, 0.034]
Real GDP per capita squared ($ 1,000s) −0.007 0.002* −0.002 0.001 −0.002 0.001
[−0.019, 0.005] [−0.000, 0.004] [−0.009, 0.005] [−0.001, 0.003] [−0.009, 0.006] [−0.001, 0.002]
Average years of education of adults aged 15+ −0.083 −0.036
[−0.496, 0.330] [−0.087, 0.016]
Log real public health spending per capita 0.231 −0.060
[−0.283, 0.744] [−0.173, 0.053]
Log real private health spending per capita 0.155 −0.090**
[−0.209, 0.518] [−0.159, −0.021]
Observations 1,500 1,492 1,500 1,492 988 988 988 988

Note: All missing values in regressions are linearly interpolated following L&J’s approach. The upper panel reports the reproduction results using updated dataset consisting of 37 countries; The lower panel reports the replication results using updated dataset consisting of good-quality data from 23 countries. Numbers in brackets are robust standard errors, clustered at the country level. Significance level is labeled with stars, * for 10%, ** for 5%, and *** for 1%. “LE” stands for “Life expectancy at birth,” “IM” is the “Log of the infant mortality rate (per 1,000 live birth).”

When combining data across many countries and years, it is important to be aware of measurement errors and data comparability problems. For example, one-third of my sample was materially impacted by the dissolutions of the Soviet Union and Yugoslavia.[10] It is certainly plausible that the collapse of these communist unions and the subsequent changes in regime simultaneously affected the distribution of income and health outcomes in these countries. Russia and most East European countries saw discontinuous jumps in their income distributions with the collapse of the USSR in 1991. These have been variously attributed to the real transformations of income distribution in postcommunism and/or the revision of artificial statistics (Blanchet et al., 2019; Novokmet, Piketty, & Zucman, 2018). Similar jumps in LE are also observed in some former communist countries. If this is the case and L&J’s specifications fail to appropriately control these changes, the corresponding regression results could be distorted.

To investigate these issues of data quality, I did further research on my focal variables in the respective databases. For example, in WID’s evaluation of data quality, countries without tax data are considered of poorer quality. This is because WID uses tax data as an important correction of the measurement errors in surveys due to underreported incomes. Consequently, income share estimates in countries that lack tax registration data are likely to be unreliable (Alvaredo et al., 2016).

Identifying data quality problems is more straightforward with the HMD. On the official website of HMD, data quality information is documented and categorized by country and area. In all, mortality data for 11 countries have been flagged as having problems with underestimation of births and disruptions in population statistics.[11] Accordingly, I removed those countries from my samples.[12] The final datasets consisted of data from 23 countries.

The estimates based on these subsamples of high-quality data are reported in the lower panel of Table 2. Even after omitting the poorer quality data, the remaining sample contains more than double the observations compared to L&J’s original data. Using L&J’s linear interpolation and two-way fixed-effects specifications, the corresponding results are generally consistent with L&J’s results. Most noticeably, even after replacing the inequality data and extending the samples to 23 countries and an additional 15 years, the new coefficients of Top Share 10 still all fit firmly within L&J’s 95% confidence intervals that are estimated from 12 countries with a shorter time span. For example, the updated coefficient of inequality in Column (3) is 0.064 with a 95% confidence interval of [−0.024, 0.009]. This corresponds to 0.096 and [−0.112, 0.304] in L&J’s Table 4. In Column (6), the new estimates of coefficient and 95% confidence interval are −0.007 and [−0.022, 0.008], compared to L&J’s −0.012 and [−0.035, 0.012]. When educational attainment and health expenditures are held constant in Column (7), the updated estimates are 0.037 and [−0.037, 0.112] compared to L&J’s 0.033 and [−0.098, 0.164]. In Column (8), the comparison is −0.04 and [−0.018, 0.010] versus −0.007 and [−0.026, 0.012].

All signs on the coefficients of interest are inconsistent with the income inequality hypotheses. Furthermore, the estimated coefficients all imply small effects. For example, the inequality effect estimated in Column (3) implies that a one percentage point increase in Top Share 10 is expected to increase people’s longevity by 0.064 years, or approximately 23 days. To give a sense of relative size, one can compare this to the effect of quitting smoking. As reported by the U.S. Department of Health and Human Services, quitting smoking is expected to extend people’s lives by at least 10 years (General, 2014).

L&J also took a “best/worst” approach using the bounds of the confidence intervals as measures of the largest “possible” effect. When I adopt their maximum effect approach, I find that a one percentage point increase in inequality is associated with a reduction in LE of 0.037 years (Column (7)), and an increase in infant mortality of 1.4% (Column (2)). While these “worst” cases are not negligible, using values from the other end of the confidence intervals reverses these effects. Most importantly, all confidence intervals include 0, indicating that the relationship between economic inequality and mortality is statistically insignificant at the 5% level.

5 The Issue of Missing Data

Missing data in the study by Leigh and Jencks (2007) is another concern. It is difficult to know how severe a problem this is because L&J did not provide a copy of their data before they applied interpolation. Accordingly, I calculate missingness from the dataset that I reconstructed using their original sources. The results are presented in Table 3. The rates of missingness vary widely across variables.

Table 3

Missing Data in Reconstructed Leigh and Jencks’ (2007) Dataset

Variable Total observations Number missing Missing proportion (%)
Income share of richest 10% (Top Share 10) 528 121 22.92
Average life expectancy at birth 528 53 10.4
Infant mortality rate 528 29 5.49
Real GDP per capita 528 0 0
Average years of education of adults aged 15+ 528 420 79.55
Real public health spending per capita 528 108 20.45
Real private health spending per capita 528 108 20.45

Note: The sample period is 1960–2003.

The main independent variable of interest, Top Share 10, has about 23% missing values. Average years of education of adults aged 15+, which is used as a control variable in the analysis, has 80% missing values. The latter is due to the fact that the underlying data are reported at 5-year intervals.

The union of multiple variables with missing values increases the missingness rate even further, as all variables must have data to be included in the regression. For example, in the “Complete Case” specifications of Table 4, the regressions for LE and the log of IM have 64 and 69 observations, respectively. This represents a large loss of information compared to the full set of 528 observations. This explains why L&J turned to linear interpolation to increase the size of their dataset.

Table 4

Reproduction of two-way fixed-effect estimates using multiple imputation

L&J’s reconstructed (period 1960–2003) Extended of 23 countries (period 1960–2018)
Complete Case Linear Interpolation Multiple Imputation Multiple Imputation
Col. (7) Col. (8) Col. (7) Col. (8) Col. (7) Col. (8) Col. (7) Col. (8)
Income share of richest 10% (Top Share 10) 0.006 −0.001 0.026 −0.007 0.018 0.002 0.009 −0.004
[−0.109, 0.121] [−0.022, 0.019] [−0.092, 0.145] [−0.026, 0.012] [−0.058, 0.095] [−0.011, 0.015] [−0.062, 0.080] [−0.019, 0.012]
Real GDP per capita ($ 1,000s) 0.248 −0.056 0.215 −0.065* −0.023 −0.039 −0.094 0.031**
[−0.337, 0.833] [−0.153, 0.041] [−0.083, 0.512] [−0.139, 0.010] [−0.239, 0.194] [−0.102, 0.024] [−0.224, 0.035] [0.002, 0.059]
Real GDP per capita squared ($ 1,000s) −0.010 0.002** −0.006 0.002** −0.004* 0.002*** 0.000 0.000
[−0.022, 0.003] [0.000, 0.003] [−0.014, 0.002] [0.001, 0.003] [−0.001, 0.001] [0.001, 0.003] [−0.002, 0.002] [−0.000, 0.001]
Average years of education of adults aged 15+ −0.176 −0.033 −0.163 −0.025 −0.146 0.005 0.019 −0.076***
[−0.624, 0.271] [−0.119, 0.052] [−0.576, 0.251] [−0.098, 0.048] [−0.483, 0.191] [−0.061, 0.071] [−0.204, 0.243] [−0.125, −0.027]
Log real public health spending per capita 1.247* −0.196** 0.009 −0.042 1.658*** −0.420*** 1.885*** −0.402**
[−0.046, 2.541] [−0.357, −0.036] [−0.454, 0.472] [−0.154, 0.069] [1.003, 2.312] [−0.562, −0.277] [0.750, 3.019] [−0.723, −0.081]
Log real private health spending per capita 0.139 0.044 0.274 −0.046 0.126 −0.025 0.518 -0.143**
[−0.466, 0.744] [−0.109, 0.197] [−0.397, 0.944] [−0.225, 0.133] [−0.458, 0.709] [−0.174, 0.124] [−0.219, 1.255] [−0.263, −0.023]
Observations 64 69 429 429 528 528 1,357 1,357

Note: “Complete Case” labels the replication results without missing data treatment; “Linear Interpolation” labels the replication results with linear interpolation; “Multiple Imputation” labels the replication results with multiple imputation. Numbers in brackets are robust standard errors, clustered at the country level. Significance level is labeled with stars, * for 10%, ** for 5%, and *** for 1%. “LE” stands for “Life expectancy at birth,” “IM” is the “Log of the infant mortality rate (per 1,000 live birth).”

Linear interpolation assumes that variables are a function of time and uses the closest data points to “predict” the missing values. This approach has several shortcomings. It ignores the association between missing values and other variables. It only relies on time to “predict” missing values and ignores the predictive power of other variables. Further, interpolation artificially reduces variation in the data. Interpolated values all lie on a straight time line, ignoring the randomness that occurs in naturally occurring data.

In fact, the linear interpolation method that L&J employed did not fill in all of the missing values in their dataset. For example, it did not interpolate Top Share 10 data when there was a gap in the data of more than 4 years. The postinterpolation datasets that they used to estimate the specifications in Columns 7 and 8 in their Table 4 only used 430 of the 528 observations. Thus, they did not exploit all the information that was available in their dataset.

To address these shortcomings, this study uses multiple imputation (MI). MI is superior to linear interpolation for several reasons. First, MI uses correlations with all other variables to impute missing values. Second, MI uses information from any observations that have nonmissing data. Third, MI produces estimates with attractive properties. If the data are “missing at random”  – that is, the probability that a variable is missing is independent of that variable – then MI will produce estimates that are (i) consistent, (ii) asymptotically efficient, and (iii) asymptotically normal (Enders, 2010; Little and Rubin, 2019; Paul, 2001; Pedersen & Petersen, 2017; Rubin, 1987, 1996).

MI entails two phases, an imputation phase and an analysis phase. In the imputation phase, MI iteratively produces multiple versions of a completed dataset without any missing values. Each version uses stochastically generated values to replace missing values. The analysis phase runs regressions on each of the individual datasets, resulting in multiple sets of estimates. The estimates are then combined into a single, final set of estimates via Rubin’s rule (Rubin, 1987).

The imputation process uses nonmissing values of all the other variables plus the country and year dummy variables.[13] My exploratory analysis sets the “burnin” and “burnbetween” parameters sufficient to produce datasets that demonstrated “stationarity” and “independence” (Enders, 2010) (see Appendix B). To determine the appropriate number of imputations, I used Von Hippel’s (2020) “how_many_imputations,” user-written Stata program. One of the disadvantages of MI is that it stochastically creates different datasets. Differing attempts to implement MI using the same data and code produce different results with each attempt. To address this issue, von Hippel’s approach calculates the number of imputed datasets sufficient to cause the standard errors of the respective parameter estimates to vary less than 5%.

As L&J’s Column (7) and (8) suffer most from missingness of observations, I proceed by re-estimating this set of regressions using the dataset I reconstructed and applying MI to fill in values for missing data. Another reason I focus on Columns (7) and (8) is that these specifications have more covariates. This is helpful in providing extra information for imputation as MI proceeds by treating the dataset as an entity.

The results of my analysis are presented in Table 4. The first pair of results is designated as “Complete Case.” It only uses country-year observations in my reconstructed dataset for which there are no missing values. As a point of comparison, the next pair of results, identified as “Linear Interpolation,” is identical to the corresponding reproduction results presented in Table 1. It consists of the “Complete Case” observations plus additional observations for which missing values were replaced by linearly interpolated values. The “Multiple Imputation” results are based on the full sample observations where all missing values were replaced with simulated values following the MI procedure described earlier. By using my reconstructed dataset, the full sample has 528 observations. In addition, I also investigate the MI estimates using the extended, high-quality dataset. The goal is to obtain consistent and efficient estimates of the inequality effect using the more reliable data.

As noted earlier, in the absence of any efforts to fill in missing values, the number of usable observations shrinks from 528 observations to 64 (LE) and 69 observations (IM) (compared with “Complete Case” regressions). This represents a substantial loss of information. It should be obvious that it is nonoptimal to not utilize the information in these discarded observations, especially since many of the lost observations are the result of missing values in control variables, particularly Average years of education of adults 15+.

When I apply Leigh and Jencks’ (2007) linear interpolation approach, I am able to increase the size of the dataset to 429 observations. Note that this is still less than the full size of the dataset, which is 528 observations. The reason L&J’s method of linear interpolation did not impute all missing values was because there were instances where they determined that linear interpolation would lead to misleading values. One such instance occurred whenever there were more than four missing values in a “row.” The effect of not using all available observations is that L&J does not exploit all the available information in the dataset. An additional problem with this approach is that linear interpolation treats the imputed values as “true” values and does not account for the sampling randomness in the real data. As a result, the linear imputed estimates of standard errors are expected to be biased.

The last two pairs of estimates report results from my MI analysis. By employing MI, I am able to use all 528 observations in the reconstructed L&J dataset. While there are differences, they are not dramatic. The size of the estimated effect of Top Share 10 decreased from 0.026 to 0.018 for LE and from −0.007 to 0.002 for infant mortality.

The 95% confidence intervals have also become narrower under MI estimation. For example, the width of the 95% confidence interval for Top Share 10 in the Linear Interpolation/LE regression is 0.092 + 0.145 = 0.237. The width of the corresponding 95% confidence interval in the Multiple Imputation/LE regression is 0.058 + 0.095 = 0.153. Turning to the other variables, the MI confidence intervals are generally narrower than the “Linear Interpolation” confidence intervals. While MI has allowed me to take full advantage of the information in the dataset and increase the number of observations, the final results are still consistent with L&J’s conclusion.

The final pair of MI estimates uses the updated dataset, which removes the problematic countries. This sample has 1,357 observations, consisting of data from 23 countries over the period from 1960 to 2018.

The MI estimates with the more reliable data indicate that a 10 percentage point increase in Top Share 10 is estimated to increase LE by 0.09 years and lower the infant mortality by 4%. These respective signs are still “unexpected,” and none of the coefficients are statistically significant.

Once again, it should be noted that the Top Share 10 estimates of 0.009 and −0.004 in the updated regressions of Table 4 fit comfortably within the confidence intervals estimated by L&J using their 1960–2003 data. As reported in Table 1, the 95% confidence intervals for Top Share 10 in the Original/LE regression and Original/IM regressions are [−0.098, 0.164] and [−0.034, 0.013], respectively. Seen from that perspective, using MI and updating the data to 2018 has resulted in estimates consistent with what L&J report in their paper.

6 Conclusion

This study replicates Leigh and Jencks’ (2007) analysis of the relationship between income inequality and mortality. By using L&J’s two-way fixed-effects specification, I am able to closely reproduce their original findings after reconstructing their data from original sources. When I update L&J’s data and extend their sample to more recent years and almost double the number of countries, I continue to find a statistically insignificant relationship between income inequality and mortality. When I use MI to address L&J’s missing data problems, the MI estimates are again consistent with L&J’s. As a result, I conclude that my replication exercise confirms L&J’s results, providing consistent and more robust evidence for the view that there is no statistically significant evidence for relationship between income inequality and mortality.


I acknowledge helpful comments from the reviewers and feedback from participants at the New Zealand Association of Economists 2022 conference. Special thanks go to Andrea Monclova and W. Robert Reed, the supervisors of my thesis, for their input on my research. I especially thank W. Robert Reed for providing generous editorial support on this paper.

  1. Funding information: None.

  2. Conflict of interest: None.

  3. Data availability statement: Supplementary material related to this article is open to be downloaded at

  4. Article note: As part of the open assessment, reviews and the original submission are available as supplementary files on our website.

Appendix A Summary Statistics of Reconstructed L&J’s Data

Mean Std. dev Min Max N
Income share of richest 10% 33.672 5.412 22.300 53.310 647
Average life expectancy at birth (years) 66.605 10.007 30.360 80.550 1,026
Log infant mortality rate (per 1,000 live births) 3.293 1.062 1.147 5.198 1,084
Real GDP per capita ($ 1,000s) 8.825 5.910 1.548 29.074 1,283
Average years of education of adults aged 15+ 8.747 2.083 3.830 12.640 132
Log real public health spending per capita 6.278 1.144 2.197 7.835 420
Log real private health spending per capita 5.278 1.231 1.946 8.057 420

Note: Data period ranges from 1895–2003. All variables are summarized with missing values.

Appendix B Option Settings and Diagnostics in Multiple Imputation

In multiple imputation, estimates are sensitive to the options used in the imputation model algorithm. An imputation model is mostly determined by three options. In Stata, these are (i) “burnin,” (ii) “burnbetween,” and (iii) “add.” The first two options determine which set of imputed data should be retained, while the last option determines the total number of retained datasets before analysis. This appendix explains how I set the values for these three options.

Like maximum likelihood, multiple imputation employs a concept of “convergence.” I follow the approach of focusing on the “worst linear function” (WLF) to diagnose the convergence of my imputation models (Enders, 2010; StataCorp, 2017). “Convergence” in imputation models refers to the properties of “stationarity” and “independence,” which are mediated by the options burnin and burnbetween, respectively.

My study includes two pairs of MI regressions in Table 4, according to whether the dependent variable is life expectancy (LE) or log of infant mortality (IM). Although there are two different specifications of regression models, one for LE and one for IM, the imputation model is the same for both for a given dataset because each imputation algorithm uses the other dependent variable as an auxiliary variable in imputing missing values.

In an imputation model, the burnin and burnbetween values are set first and then diagnostics are used to assess them using worst linear function (WLF) plots (Enders, 2010; StataCorp, 2017). After specifying burnin and burnbetween, I set the total number of imputations via the add option following Von Hippel’s (2020) approach.

To produce the Multiple Imputation results with reconstructed L&J’s data in Table 4, I set burnin equal to 1,000 and kept burnbetween at its default value of 100. Figure A1 shows a “time series” graph of the mean of WLF for each of the 1,000 iterations, where iterations substitute as a measure of “time.” For burnin = 1,000, no trend is apparent, indicating the multiple imputation algorithm is “stationary.”

Figure A1 
                        Time-series plot of WLF (Table 4/L&J’s reconstructed data).
Figure A1

Time-series plot of WLF (Table 4/L&J’s reconstructed data).

Figure A2 plots a correlogram of mean WLF values over iterations. The imputation algorithm builds in a certain degree of dependence from one imputation to another. One wants sufficient “spacing” between imputations so that the datasets are independent. According to Figure A2, anything beyond 10–20 iterations should be sufficient to ensure independence. Thus, I keep the default value for burnbetween of 100.

Figure A2 
                        Autocorrelation plot of WLF (Table 4/L&J’s reconstructed data).
Figure A2

Autocorrelation plot of WLF (Table 4/L&J’s reconstructed data).

By using the verified imputation model, I first ran a pilot imputation with five datasets, I then applied the user-written (by von Hippel) Stata command “how_many_imputations” to determine the total number of imputations. This command selects a total number of imputations such that the standard errors of the respective estimates will vary less than 5%. For the first pair of MI estimates in Table 4, the associated total numbers of imputations were 196 for the life expectancy regression and 167 for the infant mortality regression.

To diagnose the other imputation model in Table 4, I continue to set the burnin and burnbetween options at 1,000 and 100, respectively. The diagnostic trend figures for WLF and the diagnostic correlograms are presented in Figures A3 and A4. All the plots are consistent with “stationarity” and “independence,” confirming the selection of my burnin and burnbetween option values. The corresponding numbers of total imputations are 180 and 186 for the regressions with extended data.

Figure A3 
                        Time-series plot of WLF (Table 4/Extended data).
Figure A3

Time-series plot of WLF (Table 4/Extended data).

Figure A4 
                        Autocorrelation plot of WLF (Table 4/extended data).
Figure A4

Autocorrelation plot of WLF (Table 4/extended data).


Allison, P. D. (2001). Missing data. Sage Publications.Search in Google Scholar

Alstadsæter, A., Johannesen, N., & Zucman, G. (2019). Tax evasion and inequality. American Economic Review, 109(6), 2073–2103.10.3386/w23772Search in Google Scholar

Alvaredo, F., Atkinson, A. B., Piketty, T., & Saez, E. (2013). The top 1 percent in international and historical perspective. Journal of Economic Perspectives, 27(3), 3–20.10.3386/w19075Search in Google Scholar

Alvaredo, F., Atkinson, A., Chancel, L., Piketty, T., Saez, E., & Zucman, G. (2016). Distributional National Accounts (DINA) guidelines: Concepts and methods used in WID. World.Search in Google Scholar

Alvaredo, F., Chancel, L., Piketty, T., Saez, E., & Zucman, G. (2017). Global inequality dynamics: New findings from WID. World. American Economic Review, 107(5), 404–409.10.3386/w23119Search in Google Scholar

Atkinson, A. B., Casarico, A., & Voitchovsky, S. (2018). Top incomes and the gender divide. The Journal of Economic Inequality, 16(2), 225–256.10.1007/s10888-018-9384-zSearch in Google Scholar

Barro, R. J., & Lee, J. W. (2013). A new dataset of educational attainment in the world, 1950–2010. Journal of Development Economics, 104, 184–198.10.1016/j.jdeveco.2012.10.001Search in Google Scholar

Blanchet, T., Chancel, L., & Gethin, A. (2019). How unequal is Europe? Evidence from distributional national accounts, 1980–2017. WID. world working paper, 6.Search in Google Scholar

de Chaisemartin, C., & D’Haultfœuille, X. (2020). Two-way fixed effects estimators with heterogeneous treatment effects. American Economic Review, 110(9), 2964–2996.10.3386/w25904Search in Google Scholar

Enders, C. K. (2010). Applied missing data analysis. Guilford press.Search in Google Scholar

Fiscella, K., & Franks, P. (2000). Individual income, income inequality, health, and mortality: What are the relationships? Health Services Research, 35(1 Pt 2), 307.Search in Google Scholar

General, S. (2014). The health consequences of smoking—50 years of progress: A report of the surgeon general. Paper Presented at the US Department of Health and Human Services.Search in Google Scholar

Gravelle, H., Wildman, J., & Sutton, M. (2002). Income, income inequality and health: What can we learn from aggregate data? Social Science & Medicine, 54(4), 577–589.10.1016/S0277-9536(01)00053-3Search in Google Scholar

Hill, T. D., Davis, A. P., Roos, J. M., & French, M. T. (2020). Limitations of fixed-effects models for panel data. Sociological Perspectives, 63(3), 357–369.10.1177/0731121419863785Search in Google Scholar

Kim, K. T. (2019). Income inequality, welfare regimes and aggregate health: Review of reviews. International Journal of Social Welfare, 28(1), 31–43.10.1111/ijsw.12322Search in Google Scholar

Kondo, N., Sembajwe, G., Kawachi, I., Van Dam, R. M., Subramanian, S. V., & Yamagata, Z. (2009). Income inequality, mortality, and self-rated health: Meta-analysis of multilevel studies. BMJ, 339(7731), 1178–1181.10.1136/bmj.b4471Search in Google Scholar

Leigh, A., & Jencks, C. (2007). Inequality and mortality: Long-run evidence from a panel of countries. Journal of Health Economics, 26(1), 1–24.10.1016/j.jhealeco.2006.07.003Search in Google Scholar

Leigh, A., & Jencks, C. (2013). Healthtopinc_readme_web.pdf. Inequality and Mortality: Long-run evidence from a panel of countries. doi: 10.7910/DVN/QB76A1/VJVTSS, Harvard Dataverse, V1.Search in Google Scholar

Leigh, A. (2007). How closely do top income shares track other measures of inequality? The Economic Journal, 117(524), F619–F633.10.1111/j.1468-0297.2007.02099.xSearch in Google Scholar

Little, R. J. (1992). Regression with missing X’s: A review. Journal of the American Statistical Association, 87(420), 1227–1237.10.2307/2290664Search in Google Scholar

Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793). Hoboken: John Wiley & Sons.Search in Google Scholar

Lynch, J., Smith, G. D., Harper, S., & Hillemeier, M. (2004). Is income inequality a determinant of population health? Part 2. US national and regional trends in Income inequality and age-and cause-specific mortality. The Milbank Quarterly, 82(2), 355–400.10.1111/j.0887-378X.2004.00312.xSearch in Google Scholar

Macinko, J. A., Shi, L., & Starfield, B. (2004). Wage inequality, the health system, and infant mortality in wealthy industrialized countries, 1970–1996. Social Science & Medicine, 58(2), 279–292.10.1016/S0277-9536(03)00200-4Search in Google Scholar

Macinko, J. A., Shi, L., Starfield, B., & Wulu, Jr, J. T. (2003). Income inequality and health: A critical review of the literature. Medical Care Research and Review, 60(4), 407–452.10.1177/1077558703257169Search in Google Scholar

Maddison, A. (2010). Statistics on world population, GDP and per capita GDP, 1-2008 AD. Historical Statistics, 3, 1–36.Search in Google Scholar

Musil, C. M., Warner, C. B., Yobas, P. K., & Jones, S. L. (2002). A comparison of imputation techniques for handling missing data. Western Journal of Nursing Research, 24(7), 815–829.10.1177/019394502762477004Search in Google Scholar

Novokmet, F., Piketty, T., & Zucman, G. (2018). From Soviets to oligarchs: Inequality and property in Russia 1905–2016. The Journal of Economic Inequality, 16(2), 189–223.10.3386/w23712Search in Google Scholar

OECD. (2007). OECD health data 2007: Statistics and indicators for 30 countries. Paris: Organisation for Economic Co-operation and Development, Institute for Research and Information in Health Economics (IRDES).10.1787/9789264064638-enSearch in Google Scholar

Patel, V., Burns, J. K., Dhingra, M., Tarver, L., Kohrt, B. A., & Lund, C. (2018). Income inequality and depression: A systematic review and meta-analysis of the association and a scoping review of mechanisms. World Psychiatry, 17(1), 76–89.10.1002/wps.20492Search in Google Scholar

Pedersen, L., & Petersen, I. (2017). Missing data and multiple imputation in clinical epidemiological research. Clinical Epidemiology, 9, 157.10.2147/CLEP.S129785Search in Google Scholar

Piketty, T., & Atkinson, A. B. (2010). Top incomes: A global perspective. Oxford: Oxford University Press.Search in Google Scholar

Rodgers, G. B. (1979). Income and inequality as determinants of mortality: An international cross-section analysis. Population Studies, 33(2), 343–351.10.1080/00324728.1979.10410449Search in Google Scholar

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.10.1002/9780470316696Search in Google Scholar

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.10.1080/01621459.1996.10476908Search in Google Scholar

StataCorp, L. (2017). Stata survival analysis reference manual. Texas, College Station.Search in Google Scholar

Subramanian, S. V., & Kawachi, I. (2004). Income inequality and health: What have we learned so far? Epidemiologic Reviews, 26(1), 78–91.10.1093/epirev/mxh003Search in Google Scholar

Von Hippel, P. T. (2020). How many imputations do you need? A two-stage calculation using a quadratic rule. Sociological Methods & Research, 49(3), 699–718.10.1177/0049124117747303Search in Google Scholar

Wilkinson, R. G. (1998). Mortality and distribution of income. Low relative income affects mortality. BMJ, 316(7144), 1611–1612.10.1136/bmj.316.7144.1611aSearch in Google Scholar

Wilkinson, R. G. (2002). Unhealthy societies: The afflictions of inequality. London; New York: Routledge.10.4324/9780203421680Search in Google Scholar

Wilkinson, R. G., & Pickett, K. E. (2006). Income inequality and population health: A review and explanation of the evidence. Social Science & Medicine, 62(7), 1768–1784.10.1016/j.socscimed.2005.08.036Search in Google Scholar

Received: 2022-02-06
Revised: 2022-08-12
Accepted: 2022-08-13
Published Online: 2022-09-15

© 2022 Weilun Wu, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 4.2.2023 from
Scroll Up Arrow