Human Capital Investments and Family Size in Italy: IV Estimates Using Twin Births as an Instrument

: Human capital investments at an early age appear crucial for individual outcomes. Family size might affect these investments inﬂuencing parental time and economic resources invested in children’s education. This feature is related to the children quantity-quality trade-off proposed by Becker that has been investigated only for a few countries because of data limitations. We investigate this issue for Italy – even in the absence of Census data relating family of origin to children’s educational outcomes – using many waves of the Survey on Household Income and Wealth of the Bank of Italy and focusing on the educational attainments of 19–23 years old. We use twin births as an instrumental variable to identify exogenous variations in family size. In contrast with the results from other developed countries, we ﬁnd a signiﬁcant negative effect of family size on children’s education, probably related to the low quality of education in some regions of the country and to the poor public assistance of families with children. We show that these ﬁndings are robust to a number of checks. The effects appear stronger for women, Southern regions, low-income families and when spacing between births is limited,


Introduction
Investments in human capital at an early age are key for the development of children and subsequent adults' outcomes (Cunha and Heckman 2007;Heckman 2006) but their determinants are not yet completely explored.Parental decisions on family size could significantly affect these investments: one additional child in the family could dilute parental time and economic resources devoted to each child, reducing their educational achievement.
This issue is related to the trade-off between quantity and quality of children originally proposed by Becker and coauthors (Becker 1960;Becker and Lewis 1973;Becker and Tomes 1976): parents derive utility from both children's quantity and quality (more spending on each child) and the income elasticity for children's quality is assumed to be higher than the income elasticity for children's quantity.Quantity and quality are related through the household's budget constraint: if children's quality rises, increasing quantity (more children) becomes more expensive.On the other hand, if quantity increases, quality (which is assumed to be the same for each child) becomes costlier and will be reduced.
From an empirical point of view, researchers have to tackle two big challenges to examine this question.The first issue is related to data availability: it is hard to link data on family of origin and siblings with data on children's educational attainments, labor market outcomes and so on.
Matching data on educational attainments of individuals (when they have left parental home) with detailed information on the family of origin is rarely possible with the existing datasets.The second issue is related to an endogeneity problem: family size is likely related to many other observable and unobservable parental characteristics that can also affect children's educational achievements, producing estimation biases.
The empirical evidence on the quantity-quality trade-off is quite ambiguous, with some works finding a null effect for some countries (Black, Devereux and Salvanes (2005) for Norway; Angrist, Lavy and Schlosser (2010) for Israel) while others studies have shown negative effects of family size on children's outcomes (Rosenzweig and Zhang (2009) and Li, Zhang and Zhu (2008) for China; Åslund and Grönqvist (2010) for Sweden; Grawe (2008) for UK, among others).
The relationship between family size and children's education has not been investigated for Italy.Italy is a country with a surprising low share of individuals with tertiary education: in 2018 only 19 % got a tertiary education among 25-64 years old, while the OECD average is 37 % (OECD 2019).At the same time, Italy recorded high fertility rates until mid-70s.However, tertiary education is increasing in the last decades: currently 28 % among 25-34 years old has attained a tertiary education.At the same time, maybe not by coincidence, the Italian fertility rate has fallen at historical low levels since Nineties (in 2018 to 1.3 children per woman, one of the lowest in the OECD countries, whose average is 1.6).
We investigate the relationship between family size and children's educational achievements for Italy using many waves of the Survey on Household Income and Wealth (SHIW) of the Bank of Italy.We tackle the problem of linking characteristics of family of origin to children's outcomes focusing on the educational achievement of 19-23 years old that still live in their family of origin, using the academic generalist secondary school track (Lyceum) as a predictor for University enrollment.In this way, we are able to investigate the existence of a trade-off between children's quantity and quality for Italy, even in the absence of Census data relating family of origin to children's education.
To avoid endogeneity problems, we adopt an Instrumental Variable estimation strategy using multiple births as an instrument to identify exogenous variations in family size and we estimate using a Two-Stage-Least Squares estimator.Since the occurrence of twin birth is not completely independent of parental characteristics, in our models we also control for a large number of these variables.Moreover, we document that in our sample it is likely that medically assisted procreation (producing a higher probability of twin births and probably making this correlated to family characteristics) is not a relevant issue, since individuals considered are born before 1997 when the diffusion of assisted procreation was limited.
In contrast with the results from other developed countries, we find a strong negative effect of family size on children's educational outcomes.An additional child reduces of 18-20 percentage points the probability of attending a Lyceum and presumably of 10 points the probability to attain a College Degree.
We argue that this impact is related to the poor quality of education in some regions of the country and to the limited public assistance for families with children.In fact, when we analyze if family size effects are heterogeneous according to individual and family characteristics we find interestingly that the negative impact of a larger family is stronger in Southern regions, for low income families and when spacing (the difference in years among the births of children) is reduced.We also find a strong birth order effect.These findings suggest that both time and financial constraints are mechanisms at work in reducing educational outcomes for children coming from large families.
We also run a number of robustness checks changing the age range we consider in our sample and we find very similar effects.We also use an alternative dataset, the Italian Health Conditions Survey, which reports directly university attendance of individuals and we confirm our main findings.Furthermore, we carry out a Monte Carlo simulation to investigate whether a small sample selection bias due to individuals leaving earlier their home might determine our results and we show that in fact our coefficient of interest tends to be biased towards zero, suggesting that we are estimating a lower bound.
The paper is organized as follows.Section 2 discusses the related literature.In Section 3 we describe our data and the criteria we use to build our sample.In Section 4 we investigate the relationship between family size and educational achievement using an OLS estimator.In Section 5 we present our Instrumental Variables estimates and in Section 6 we investigate if the effects are heterogeneous according to individual and family characteristics.Section 7 is devoted to a number of robustness checks and to deal with the problem of sample selection through a Monte Carlo simulation.Section 8 offers some concluding remarks.

Related Literature
An early empirical evidence tried to test the quantity-quality trade-off between family size and child quality, generally supporting a negative impact of family size (see Schultz 2005, for a review).However, most of these studies does not deal with the endogeneity problem of family size: family size and children's investments are jointly chosen by parents and, hence, they are both affected by unobservable parental preferences and household's characteristics.
To address this endogeneity problem, economists have used different natural experiments to exploit exogenous variations in family size: the most commonly adopted strategy consists in using Instrumental Variables, exploiting twin births as an instrument. 1 1 A frequently used alternative instrument, instead of twin births, consists of instrumenting the number of children with the gender mix of children born in the family.Generally, parents prefer to have offspring of both genders (Angrist, Lavy, and Schlosser 2010;Becker, Cinnirella, and Woessmann 2010;Fitzsimons and Malde 2014), and so those having the first two children of the same sex are more likely to have an additional child.Lee (2008) exploits the preference for sons in South Korea as an instrument (at the first birth) for the number of children and finds that larger families tend to reduce the investments in children's human capital.
A parallel literature has evaluated the effect of fertility on parental outcomes, such as female labor supply.See, for example, Bronars and Grogger (1994) and Angrist and Evans (1998).
In this literature, Rosenzweig and Wolpin (1980), in a pioneering study that uses multiple births as an instrument in an IV identification strategy, with a small sample of about 1600 children from India, find that family size -as affected by the birth of twins -has a negative impact on children's (aged 5-14) educational attainments.
Two very influential studies employing Census data from Norway and Israel and exploiting multiple births as an instrument for family size are Black, Devereux and Salvanes (2005) and Angrist, Lavy and Schlosser (2010).Using data from the entire population of Norway, Black, Devereux and Salvanes (2005) use twins as an exogenous source of variation to estimate family size effects on children's education and adult earnings.They show that the effect of family size shrinks to almost zero after controlling for birth order (though the standard errors are very large), and that a monotonic decline emerges in educational attainment according to birth order.Similar effects are found for later outcomes as earnings and employment.
A similar study by Angrist, Lavy and Schlosser (2010) -exploiting 20 % Census data for Israel -uses both twin births and siblings' gender composition as instrumental variables and shows no evidence for a quantity-quality trade-off of children in Israel, even when the authors focus on high-fertility populations.
A number of other studies have examined family size in developed and developing countries.In general, these studies tend to show significant tradeoffs between family size and children's quality in developing countries, but almost no effects in developed countries or a negative effect limited to particular sub-populations.
As regards developing countries, Ponczek and Souza (2012) use data from the 1991 Brazilian Census and using the birth of twins as an instrument show a negative impact of family size especially for females and provide some evidence that both credit and time constraints drive the results.Using data from the Chinese Population Census and adopting a similar approach Li, Zhang and Zhu (2008) estimate a negative effect of family size on children's education, especially for rural China with a poor educational system. 2 few studies have investigated the quantity-quality trade-off in developed countries such as US, Sweden, UK.Caceres-Delpiano (2006) analyses the impact of the number of children on investment on some inputs in children's human capital and other outcomes for the US and finds evidence of a trade-off only for some outcomes.He shows that families facing an exogenous change in family size due to multiple births re-allocate resources consistent with Becker's model in that an additional sibling reduces the likelihood that older siblings attend a private school and increases the likelihood that their parent's divorce.On the other hand, he finds little evidence that an exogenous change in family size affects educational achievement such as grade retention or the highest grade completed.Åslund and Grönqvist (2010) find no effect of family size on long-term educational attainments in Sweden, but they show a negative impact of family size on school grades among children of more vulnerable families, such as those with large sibships and low-educated parents.When investigating heterogenous effects according to family background, Grawe (2008) shows evidence of a trade-off between family size and several child outcomes for Britain.In particular, he splits the sample on the basis of father's earnings and finds strong negative family size effects on children's education even for the richest family (in the top quintile of the income distribution), suggesting that the trade-off is not a matter of financial resources but rather a problem of time constraints.
Therefore, the quantity-quality trade-off seems to be absent in developed countries, such as Norway or Sweden, where there are both a well-functioning public education system and generous support for childbearing and childcare.In these contexts, families receive public support and they can protect children's quality.In contrast, studies from developing countries have often found evidence in support of the quantity-quality trade-off.
Recently, in two influential contributions Bhalotra andClarke (2019, 2020) show that twin births are not completely random and, in particular, they document that the probability of having twins is positively associated with mother's health.Basically, the explanation relies on the fact that twin births are more demanding to carry out from a physical and health point of view.Therefore, mothers with better health conditions tend to complete with higher probability a pregnancy with twins with respect to mothers with worse health conditions.Bhalotra and Clarke (2019) show using data from several countries the positive correlation between mother's health and the probability of having twins.
The key implication of this finding -discussed in details in Bhalotra and Clarke (2020) -is that, if maternal health conditions are not controlled for, IV estimates of the impact of family size on investments in human capital tend to be biased towards a zero effect, since better maternal health conditions has typically a positive effect on investments in children's human capital.This result contributes to explain why important studies such as Black, Devereux, and Salvanes (2005) and Angrist, Lavy, and Schlosser (2010) find zero effects of family size on school investments.Therefore, in contrast to OLS that tend to be biased towards larger (negative) effects because of "negative selection of women into fertility", IV estimates tend to be biased towards zero.
No evidence exists for a developed country as Italy in which public support for families with children is rather poor, the availability of childcare services is limited and the quality of public education is poor, especially in Southern regions.

The Data
This section describes the data and the criteria we have followed to build the sample.The data source for our empirical analysis is the Survey of Household Income and Wealth (SHIW) that is conducted every two years by the Bank of Italy on a representative sample of about 8000 Italian households. 3The SHIW collects detailed information on the demographic and social characteristics of all the individuals in a household, such as age, gender, marital status, education, region of residence, on their working activity (earnings, employment status, type of occupation, experience, and so on) and on real and financial wealth. 4 In our main analysis, we pool together 11 waves of the SHIW conducted from 1995 to 2016. 5o select the sample, we adopt a number of criteria.We use only children from 19 to 23 years old since they have almost certainly completed secondary education (and therefore we are able to observe the type of secondary school they have attended) and it is very likely that they are still at home with their parents: in this way we have information on the family of origin and on our proxy for educational achievement. 6Since these subjects typically have not yet completed their educational path, as a proxy for educational achievement we use the type of High School attended by individuals.
3 SHIW data are freely available at www.bancaditalia.it.We refer to Baffigi, Cannari and D'Alessio (2016) for detailed information about the dataset.See also the Appendix of Guiso, Sapienza, and Zingales (2004).4 The SHIW uses a representative sample of the Italian resident population.The sample is drawn in two stages, first sampling from municipalities (stratifying by region and population size) and then the households are selected randomly from the population register.Comparing individuals in SHIW datasets (waves 1995-2016) -before selecting the subsample we use -with the statistics in the general population (mainly Census Data 2011) we find that the average statistics are very similar: age (44.06 in SHIW and 43.21 in Census); years of education (9.61 vs. 10.02);females (51.4 % vs. 51.5 %); Southern inhabitants (37.7 vs. 34.8); married (60.3 % vs. 56.7);number of children for families with children (1.67 vs. 1.59); twins are 1.94 % and twin births are 0.97 % in SHIW while are respectively 2.33 % and 1.16 % in the population (ISTAT, Parti Plurimi).
The Italian secondary school system can be described as tripartite, with an academic generalist track ("Lyceum"), a technically oriented education (Technical schools) and a more labor market orientated track (Vocational or Professional schools).Track selection is a relevant factor for individual future career since the type of secondary school strongly affects university attendance.Lyceum is considered the most prestigious secondary educational track and provides an in-depth, general knowledge aimed at preparing students for university.In contrast, Technical and Vocational schools offer an education oriented toward more practical subjects, enabling students to start searching for a job as soon as they have completed their studies.
Preliminarily, to document to what extent Lyceum is related to University enrollment we use data on ISTAT 2015 Survey of High School Graduates (sampling individuals graduated in 2011).Using about 26,000 observations and estimating a Linear Probability Model we find that having attended a Lyceum increases the probability of going to College on average of 52 percentage points (p.p. thereafter), increasing the probability of College enrollment from 40 % to 92 % (t-stat = 106.7),see column (1) of Table A1 in the Appendix.In column (2), controlling for Female, Immigrant, and dummies of Region of Residence, we find very similar results.With respect to Vocational schools (chosen by about one third of students) attending a Lyceum increases the probability of going to College of more than 70 p.p. 7As a further evidence, using the whole sample of SHIW with individuals of any age equal or above 26 (about 167,000 obs.) we also estimate with OLS the probability of graduating from University in relationship to the choice of a Lyceum as High School.In column 1 of Table A2 of the Appendix, we find that having attended a Lyceum increases the probability of graduating of 40 p.p. (in the first basic specification from 7.4 % to 47.8 %; t-stat = 87.1).We obtain very similar results (+37-38 p.p.) when we control for gender, year of birth (or using, in alternative, birth of year dummies), and geographical areas in columns ( 2)-(4) of Table A2. 8e then turn our attention to our sample of 19-23 years old living with their parents and we try to verify how representative this sample is of the whole population aged 19-23.To investigate this issue, we verify in the complete SHIW dataset at which age people are likely to leave their parental home and form a new family.We build a dummy Own Family equal to one if an individual i lives alone or he/she is the head of the family or his/her partner; Own Family is instead equal to zero if an individual i is a son/daughter in a family.We regress Own Family on a dummy for each age level.Results are reported in Table A3 in the Appendix: we show that the probability of leaving home is about 4-5 points higher for individuals aged 19-23 with respect to those 16-18 years old (the reference category).Therefore, we conclude that only a very small fraction of individuals is leaving home in the age range that we consider, while almost 95-96 % remains in the family of origin.
From Table A3 we also notice that Lyceum attracts a negative coefficient: students who have attended a Lyceum are about 9 p.p. less likely to leave home.Therefore, in our main analysis -in which we use Lyceum as a dependent variable -a sample selection is at work.In Section 7 we show that this sample selection imparts a bias towards zero to our coefficient of interest.
Since the SHIW dataset does not include an explicit identifier for twins, we define twins as children who were born in the same calendar year in the same family.9Our main dependent variable Lyceum is equal to one if individual i has attended a Lyceum (Classical, Scientific or Linguistic Lyceum) and 0 otherwise (that is, if i has attended a Vocational or Technical School or if i has attained a grade lower than High School).
We include in our sample only the households reporting that no other children have left home and live outside the family (since we do not know the age and educational attainments of the latter).By focusing on young adults of 19-23 years, families without children are excluded.Furthermore, in our analysis we use only married couples. 10pplying our selection criteria of individuals aged 19-23, in our complete sample we have 10,139 observations.Descriptive statistics are shown in Table 1.About 29.1 % of individuals have attended a Lyceum.11Family are composed on average of 2.26 children.About 1.6 % of our sample are twins and 2.5 % are individuals in twins' families.1246.5 % are females, average age is 20.9.Mothers and fathers have acquired nearly 10 years of education13 and their respective age (at the birth of their children) is 27.7 and 31.4.14Average birth order is 1.49.The individuals in our sample are born from 1973 to 1997.We also use as controls the geographical areas: residents in the North-West or North-East constitute 38 %, 19 % live in the Center and 45 % live in the South and on the Islands. 15Urban is a dummy variable taking the value of 1 for large cities with 50 thousand inhabitants or more (and 0 otherwise). 16Almost 44 % live in large cities.

Family Size and Education: OLS Estimates
We first aim to show that the educational achievement of individuals is negatively related to family size (number of children) employing a simple OLS estimator.We use the SHIW dataset but we focus on observations on the head of the family and his/her partner exploiting some questions asking them their respective number of brothers and sisters (still alive), their own educational attainments and the education of their parents.We use seven surveys (from 1995 to 2008) in which these questions were asked (about 44,000 obs.).We build the variable Family Size as the number of children in the family of origin (Family Size = #Brothers + #Sisters + 1) and we consider individuals between 26 and 55 years old (to reduce measurement errors deriving from siblings' premature deaths).
The two great advantages of using the entire sample of individuals aged 26-55 is that, first, in estimating the OLS model we are not constrained by the sample of 19-23 years old (since only for this small sample we are able to recover the information on twins in the family) and, then, we can estimate the relationship between family size and investments in education using a very large sample.Furthermore, since individuals older than 26 have typically completed their educational path, we can use the comprehensive variable "Years of Education" that captures all the investments in human capital rather than looking only to the intermediate educational level of Lyceum.
In this analysis we estimate the following equation: where we use as a dependent variable the years of education attained; Family Size represents the number of children in the family of origin, X i are individual control variables (gender, age, area of residence) while P i are parental characteristics (father's and mother's years of education, father's and mother's age).
We estimate with OLS and the results are reported in Table 2.In the first three columns we use Family Size in linear form, while in columns ( 4)-( 6) we include a dummy variable for each number of children.
In the first column, controlling only for gender, for five geographical area dummies, for the year of birth of cohorts to take into account that educational levels tend to change over time, we find that an additional child in the family reduces years of education by 0.62 and the effect is highly statistically significant (t-stat = −49.0).
In the following columns we control for father's and mother's years of education, father's and mother's age and for city size dummies.We find a quite relevant impact of family size of −0.48 (t-stat = −36.9),slightly reduced in magnitude: this corresponds to −0.12 SD of the dependent variable for each additional child in the family.In column (3) we use a dummy variable for each cohort of five years (nine dummies) instead of year of birth in linear form and we obtain almost identical result.
In columns (4)-( 6) in which -instead of Family Size in continuous form -we use dummies for the number of children, we show that while in families with two children the years of education are not lower with respect to family with one child, more than two children strongly reduces their educational attainments: in column (6) we find that in families with three children the years of education are reduced of 0.59 with respect to families with one child, four children reduce years of education of 1.23, five children of 1.64 and six children of 2.24. 17herefore, our OLS estimator shows a strong negative correlation between family size and investment in education.
We find similar qualitatively results if we use the data on children aged 19-23 still in their family of origin and estimate a simple Linear Probability Model for the probability of attending a Lyceum (since in this age range education is not yet complete) in relationship with the number of children in the family (results not reported to avoid to clutter the paper).

Using Twin Births as an Instrumental Variable
In the previous Section we have estimated an OLS model but, as explained above, OLS is likely to be biased since Family Size might be correlated with a number of observable or unobservable factors that can affect directly the educational achievement of children.
In this Section to overcome this problem -following the analyses of Black, Devereux and Salvanes (2005) and Angrist, Lavy and Schlosser (2010) -we adopt an Instrumental Variable approach in which Family Size is instrumented with the birth of twins.The idea is that the birth of twins represents -under certain conditions -an exogenous shock to the size of the family, uncorrelated with other determinants of educational achievement.
However, a number of studies using twin births as an instrument (see, for example, Bhalotra and Clarke 2019;Bronars and Grogger 1994;Jacobsen, Pearce, and Rosenbloom 1999) have shown that the instrument is not completely exogenous and tends to be correlated to some parental characteristics.In the next Section we check these aspects.

Threats to the Validity of the Twin Births Instrument
Using our data, we preliminarily check if the characteristics of individuals coming from families with twins are comparable to the families without twins.
In Table 3 we show the averages of some predetermined variables for families without and with twins.In column (3) we report their difference.Whereas several characteristics are balanced, we find significant differences in a few variables.The age of individuals in "Twins family" is slighter lower (−0.17years).The mother's age turns out to be 0.55 years older for twins (not statistically significant) but mother's education is lower (−0.846,significant at the 5-percent level).Twins families are prevailing in the South (strongly significant).On the other hand, there are no significant differences for gender, mother's and father's age, father's education, immigrant, urban.However, if we model the probability of being in a "twin" family in relationship with the predetermined characteristics, once we control for South the other variables turn out to be not jointly significant.
The balance checks confirm that twin births' probabilities are not completely exogenous with respect to parental characteristics and, in particular, increase with maternal age at birth, decreases with mother's education, and twins' probabilities varies by geographical and ethnic groups.In order to avoid biases in our estimators in all our regressions we need to control for these predetermined characteristics to avoid any spurious correlation between our instrument and the error term.Our identifying assumption -along the lines of similar studies that have used twins as an instrument for family size -is that controlling for these characteristics twin births are as good as randomly assigned.
Another potential problem with the use of twin births as an instrument is related to the diffusion of medical techniques of assisted procreation or In Vitro Fertilization (IVF).These techniques tend to increases the probability of twin births   1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Year of Birth and could relate twin births to parental characteristics.However, assisted procreation is a recent phenomenon: in our sample individuals are all born before the year 1997 when the IVF was still little used in Italy.A Law governing medically assisted procreation was introduced in Italy only in 2004 (Law no.40/2004) and this Law has been rather restrictive in allowing the use of techniques of assisted procreation.
The low relevance of assisted procreation for twin births before 1997 is confirmed by the fact that the annual rate of multiple births in our sample is almost constant.The percentage of twin births in Italy from 1868 to 1998 is reported in Figure 1 (Statistics on Multiple Births from ISTAT Historical Time Series ("Parti semplici e plurimi -Anni 1868-1998").On average, from 1870 to 1998, there have been 1.16 twin births (out of 100 births) and 2.33 twins (out of 100 children born).In 1998 the percentage of twin births was 1.1.From Figure 1 it does not seem that twin births have become more frequent than the historical trend. 18o further attenuate the concerns for the twin instrument related to assisted procreation, it is worthwhile to notice that IVF is typically used for the first child while in our main analysis we use twin births as an instrument for higher parity.

Instrumental Variables Estimates of the Impact of Family Size on Educational Achievement
The equation we estimate in our IV analysis is slightly different from equation (1): where the educational attainment is proxied with the dummy Lyceum (which in turn, as shown in Section 3, is highly correlated with College Degree attainment) and the number of children in the family, Family Size i , is instrumented with Twins through our First Stage equation: First, we estimate on the whole sample of families (with at least one child) using simply the number of children in the family as endogenous variable and the dummy Twins in Family as an instrument.
As regards the First Stage (Table 4, Panel B), we find that twin births increase the average number of children of 0.70-0.90,according to specifications.The F-statistics for the null hypothesis that the coefficient on the instrument in the First Stage regression is zero takes on values in the range 52-65, confirming that weak instruments are not a concern for our IV analysis.
In the Second Stage (Table 4, Panel A), our estimates show that an additional child in the family reduces the probability of attending the Lyceum of about 8-9 percentage points (t-stats are typically around 2.5).In the first specification we control only for Female, Age and Mother's and Father's Education and five geographical area dummies; in the second specification we include mother's and father's age and survey time dummies; in the third specification we also control for city size dummies; in the fourth specification we include controls for Birth Order and Household Wealth; 19 in column (5) we control for Immigrant. 2019 Household Wealth is determined as Real Assets + Financial Assets − Financial Liabilites.We include it at constant prices.Household Wealth is a variable that probably suffers a problem of reverse causality: accumulation of wealth could be a function of the decision to invest in education, rather than the other way around and therefore these estimates should be taken with caution.20 In the SHIW waves Immigrant is not reported before 2006.In this case, we set Immigrant equal to 1 if an individual is born abroad and 0 otherwise.The effect of family size on educational achievement is remarkably stable across specifications. 21t is worthwhile to notice that Instrumental Variables estimates -as shown by Imbens and Angrist (1994) among others -do not produce an Average Treatment Effect among the whole population, but rather a Local Average Treatment Effect (LATE), that is, an average effect for the categories of compliers, who in general are the subjects that are treated when the instrument is one and are not treated when the instrument is zero.
In the specific setting we are using, compliers are the families who have an additional child when twin births occur but they would not have an additional child if twin births did not occur at a given level of parity.Therefore, the effects we estimate with our IV strategy do not take into account the categories of "Always Takers", who are the families who desire an addition child, regardless of whether twin births occur or not.Neither our estimator considers "Never Takers" who are the families that do not desire more children, regardless of whether twin births occur or not.

Using Restricted Samples with Older Siblings à la Black, Devereux and Salvanes (2005)
The estimates in Table 4 could still suffer from some residual bias since our instrument could be correlated with the error term in the main equation.In particular, twins could be characterized by some differences with respect to single birth children, the probability of having twin births increases with the number of births, and the decision to have additional children after twins could be related to some other unobservable factors.In order to avoid any possible correlation between the twin births instrument and the error term, following Black, Devereux and Salvanes (2005), we carefully define two specific samples and modify the definition of the instrument.
The first sample we use is defined for families with two or more births ("At Least Two Births") that at the second birth might or might not have had twins.We define the instrument Twins (Second Birth) equal to 1 if the second birth is a multiple birth and 0 if the second birth is a single child.In this way we are excluding families with only one child or families with twins at the first birth.
Comparing families with two or more births in which the exogenous variation in the number of children is given by whether or not multiple births occurred at the second birth ensures us that the variations in family size that we consider are exclusively determined by the arrival of twins or singletons.
Furthermore, in this sample we consider only first born children, that is, children born before the second birth.We exclude later born children because the decision to have additional children might be correlated to other factors.More importantly, we exclude twins, which tend to have different characteristics with respect to other children: in fact, twins are often born prematurely, have lower birthweight, tend to suffer of some health problems, and so on (see, among others, Behrman and Rosenzweig 2004): these characteristics could affect directly their educational outcomes and undermine the exogeneity of the instrument.
Using these criteria our sample reduces to 4419 observations.We report in Table 5 the estimates of the First and Second Stage of various specifications.Considering the First Stage in Panel B, we find that the arrival of twins at the second birth increases the number of children in the family of about 0.60.This means that for most of the families the effective number of children has been increased by twins, while other families (approximately 40 percent) planned and did have more children also in the absence of twins.The F-statistics in the First Stage regressions take on values in the range 145-187, well above the threshold of 10 necessary to avoid a "weak instrument" problem.
Considering the Second Stage in Panel A of Table 5, in column (1) we control for gender, age, parents' years of education and geographical area dummies.We show that an additional child in the family reduces of about 18 p.p. the probability to attend a Lyceum (t-stat = −2.34).In the second and third specification we include as additional controls mother's and father's age (at the children's birth), time dummies and city size dummies.We find again that the probability of attending a Lyceum decreases of about 19 p.p. in larger families.In the fourth specification we include Household Wealth and in the fifth we also control for Immigrant.The effect of our interest does not change much (−18 p.p.) when we include these additional control variables.
As regards the impact of control variables, we find that females attend a Lyceum with a higher probability of about 13 p.p.; this probability strongly increases -of about 3 p.p. -for each additional year of father's and mother's education and slightly increases (+0.7 p.p.) when mothers are older; richer families tend invest more in education; immigrants have lower educational attainments of about 10 p.p.
The second sample we build along the lines of the analysis of Black, Devereux and Salvanes ( 2005) is analogous to the first one but deals with families with at least three births that at the third birth may, or may not, have multiple births.Twins (Third Birth) is set to one if a multiple birth occurs at the third birth (and 0 otherwise).For this sample, for the reasons explained above, the regressions are run only for the sample of first and second born children, who are not twins, giving a final sample of 2362 observations.
We report the estimates in Table 6.In the First Stage (Panel B) we show that the birth of twins increases the number of children of about 1, much higher than in the previous Table, implying that rarely families desire to furtherly increase the number of children after the birth of twins at the third delivery.We estimate the same specifications of Table 5 and we find very similar results: the probability of attending a Lyceum for first and second born children decreases of about 20 p.p. when the number of children in the family increases by one.As regards the other variables, the effects are very similar to the ones we have found in Table 5. 22On the whole, in all the samples we use we find clear negative effects of family size on children's educational attainment.The uncovered impact of an additional child on the probability of attending a Lyceum is in the range 18-20 p.p., although when we use the whole sample with twins the estimate of the impact seems to be almost half of the latter.Considering that the effect of Lyceum on College Degree is estimated around 40 % (see Table A2 in the Appendix), one can infer that an additional child determines a reduction of the probability of attaining a College Degree of about 7-8 p.p.One should also consider that as shown by Bhalotra and Clarke (2020) the IV estimates are likely biased towards zero, and hence the impact of family size on education is underestimated.
Our results for Italy are in contrast with the results of Black, Devereux and Salvanes (2005) and Angrist, Lavy and Schlosser (2010) finding a null effect in developed countries such as Norway and Israel, but are in line with the results found in developing countries (China, Brazil), or for some specific subsamples of the population in Sweden, US and UK.
In general, the trade-off between the number of children and investments in human capital can arise from the dilution of economic resources or from insufficient parental time devoted to each child in larger families.The factors that in Italy might contribute to the trade-off are likely related to the functioning of the educational system and the poor public support for childbearing and childcare.
Although the education system in Italy is mainly state-funded until upper secondary school, the quality is not high, especially in Southern regions and rural areas, as shown in a number of International Studies of students' achievement (PISA: Programme for International Student Assessment; TIMSS: Trends in International Mathematics and Science Study).For example, in PISA 2018 the performance of Italy in Reading (476) was below the OECD average ( 486), but Southern and Island regions performed far below the average (453 and 439), obtaining results comparable to countries such as Serbia (439), United Arab Emirates (432) and Romania (428).In Mathematics, Southern and Island regions obtained 458 and 445, in line with Turkey (454), Ukraine (453) and Cyprus (451).Similar results were found in other waves of the PISA study or in other Assessment as TIMSS or Invalsi (on these aspects see, among others, Bratti, Checchi, and Filippin 2007).
The poor quality of the educational system in some Italian regions requires additional investments (in terms of financial resources and time) from parents to foster students' preparation and allow them to access to higher educational levels.In larger families the need of these investments probably constitutes an insurmountable obstacle. 23 The second factor affecting the trade-off between quantity and quality of children is probably the insufficient public support to families for childbearing and childcare, implying that the related costs are mainly borne by parents.The public expenditure on social protection devoted to family and children in Italy has been rather small, only 0.9 % of the GDP on average between 1980 and 2000, while the OECD average in the same period was 1.56 %, Norway 2.6 %, United Kingdom 2.1 %, Israel 2.4 % (OECD Social Expenditure Database 2019).
In addition, childcare availability for children under age 3 is on average very low in Italy and almost non existing in some areas.According to ISTAT (2012), reporting data from 2003-04 to 2010-11, the percentage of children aged 0-3 enrolled in public child care in Italy was on average 9-10 %.However, around 16 % were enrolled in Central and Northern regions while only 3 % in the South of Italy.The supply of private child care services is similarly low (only 16 % of children in Italy 23 Furthermore, while University is in large part public financed, room and board expenses at the University can be particularly high in good Universities located in Center and Northern regions. was enrolled in public or private services in 2010) and implies also a problem of high costs for families.On the other hand, the availability of slots for children 3-6 is quite high on average (90 % of children 3-6 are enrolled), 67 % are supplied by public structures while 33 % are provided by private structures.In the latter case, childcare services for families tend to be quite expensive. 24n general, the poor support for families with children and the limited availability of child care services put pressure on parents, both in terms of economic resources and in terms of time devoted to children, and this tends to lower investments in children's human capital. 25When in Section 6 we analyze whether the impact of family size is heterogeneous according to family characteristics we will shed some further light on these mechanisms.26

The Impact of Birth Order
An important determinant of investment in education -partially overlapping with family size -is birth order.To evaluate the effects of birth order we have conducted a regression analysis using our main sample.We refer to Black, Devereux and Salvanes (2005) for a similar analysis.
To avoid to confuse birth order with family size effects, we estimate the impact of birth order taking constant the number of children in a family or, alternatively, including family fixed effects.Therefore, we run three separate regressions, according to the number of children in each family and we use controls as in our preferred specification (col.3 of Table 4).
Estimates are reported in Table 7.In the first regression we consider only families with two children (5547 obs.).We find that birth order is negative and strongly statistically significant: the second child has a 5.7 percentage points lower probability of attending a Lyceum (t-stat = −4.4) with respect to the first born.In the second regression we find that in families of three children, at each higher birth  order the probability decreases of 2.6 p.p. Finally, in families with four children, the probability of attending a Lyceum decreases of 4.5 p.p. for later born children.
Starting from column (4), we also run three separate regressions using dummy variables for each birth order.In these regressions the reference category is the first born child.Column (4) replies the result of column (1).In column (5) we find that the second and the third children have 3.9-4.6p.p. lower probability of attending a Lyceum with respect to the first born, while in column ( 6) we find that second, third and fourth children are about 9-13 p.p. less likely to attend a Lyceum.Alternatively, in columns ( 7) and ( 8) of Table 7 we estimate including family fixed effects using the whole sample of families (with 2-4 children) and we find similar negative effects of birth order on investments in schooling.
The impact of birth order is negative and strongly significant.From our estimates we especially find a preference for the first born child with respect to the following children.In general, the evidence on birth order complements our findings about the impact of family size: later born children appear to borne a greater cost of a larger family.

Heterogeneous Effects of Family Size
In this Section we investigate if the effects of family size on educational achievement are different according to a number of children's or families' characteristics.
It should be noticed that the number of twins' families in general is not high and when we split the sample the number of "treated" units is further reduced in each subsample.For this reason, instead of using the restricted sample used in Section 5.3 we use the whole sample (as in Table 4) but we have checked that the results are qualitatively similar using the restricted sample as in Table 5.However, due to the low number of treated families, the following analysis should be considered only as suggestive evidence.We estimate specification (3) of Table 4 on a number of subsamples and report results in Tables 8 and 9.
First of all, we consider the different impact on males and females in columns (1) and (2) of Table 8, respectively.We find that the effect of family size is stronger for females (−14 p.p.) and lower for males (−5.2 p.p., p-value = 0.15).Cultural factors might play a role in this case, since it seems that parents try to protect from negative effects of family size their investments for sons, but they care less for daughters, suggesting another channel of gender inequality. 27he effect is zero in large cities.28While these differences might be related to cultural factors (giving in non-urban areas less emphasis to education as a social and economic engine of development), they could also be related to the lower quality of education offered in peripheral schools.
Then, we divide the sample according to family background.In column (1) of Table 9 we estimate our model for the sample of low educated parents (less than 8 years of education on average) and we find a weaker effect of family size (−5.1 p.p.).On the other hand, estimating in column (2) on the sample of parents with higher levels of education we show that the effect is larger (−15.3 p.p.).However, in relative terms, considering that for individuals coming from low educated families the average probability of attending a Lyceum is 12 % while is 48 % for parents with higher education, the impact appears more relevant for low educated families (−42 %) than for highly educated (−31 %).Similarly, when we split the sample according to household income, distinguishing between low income families (below the median income) and high income families (above the median) in columns ( 3) and (4) of Table 9,29 we find very similar coefficients in terms of magnitude (−8.3 vs. −8), but in relative terms, considering that for low income families the average probability of attending a Lyceum is 20 % while is 38 % for high income families, the impact appears much more relevant for low income families (−41 %) than for richer families (−21 %).The uncovered heterogeneity according to parental education and income confirms that a channel through which family size affects children's educational attainment is the dilution of economic resources.
In column ( 5) and ( 6) of Table 9 we also consider spacing of the first born children with respect to the following children.Our estimates show that when spacing is below the median (5 years) the effect is strong (−17.1 p.p.), while the effect is much smaller (−4.2 p.p.) (although negative) when spacing is greater than 5.This finding -in line with the analysis of Grawe (2008) for the UK -suggest that the difficulties arising from a larger family size in investing in children's human capital are deriving not only from economic resources but also from parental time constraints.

Concluding Remarks
Early investments in human capital have been shown to be crucial for children's outcome.These could be negatively affected by family size, in that a higher number of siblings tends to dilute parental time and economic resources invested for each child.The related children's quantity-quality trade-off originally proposed by Becker has been investigated only for a few countries mainly for the difficulty in linking data on family of origin to children's educational achievement and for the endogeneity problem of the family size variable.
In this paper we have tackled these problems using many waves of the Survey on Household Income and Wealth of the Bank of Italy, and focusing on the educational achievement of children aged 19-23 years old that still live in their family of origin.Since individuals at this age typically have not yet completed their education, we have used the academic generalist secondary school track (Lyceum) as a measure of educational achievement, showing preliminarily with several data sources that Lyceum is a very strong predictor of education and College Degree attainment.
Since family size cannot be considered exogenous but it is related to observable and unobservable family characteristics, to overcome endogeneity problems we have used twin births as an instrumental variable affecting the number of children in a family and employed a Two-Stage Least Squares estimator.
Following the literature we have also defined a sample and a definition of the instrument to avoid any correlation between the instrument and the error term in the main equation: we have used only samples with the possibility of multiple births, alternatively, at the second or third birth, excluding twins from the analysis and considering only children born before the considered parity (only first born and only first and second born children, respectively).
In contrast with the results from other developed countries, we find a strong negative effect of family size on children's educational attainments.An additional child in the family reduces, for children born earlier, the probability of attending a Lyceum of about 16-20 percentage points.Given the estimated relationship between Lyceum and College, the probability of attaining a College Degree is presumably reduced of about 7-8 percentage points when an additional child is born in a family.These findings are robust to a number of checks.
In a complementary analysis we have also found strong birth order effects.
We have also shown that the effects are stronger in Southern regions, for low income families and when spacing between births is limited.Our evidence showing that the negative impact of family size on children's education is more accentuated for low income families suggests that insufficient economic resources could be the mechanism driving the results.On the other hand, the larger impact found when spacing between births is limited suggests that also the dilution of parental time devoted to each child in larger families could play a role.We have argued -and provided suggestive evidence -that these results are likely related to the fact that Italy is a country with a poor system of assistance for families with children, in which childcare services are heavily undersupplied and the quality of public education is rather low, especially in Southern regions.All these features put pressure on economic resources and parental time when family size becomes larger and contributes to the trade-off between number of children and educational attainments.

Figure 1 :
Figure 1: The trend of twin births in Italy (1870-1998) from ISTAT historical time series.

Figure 2 :
Figure 2: Monte Carlo simulation: coefficients from the whole sample in blue; coefficients from the selected sample in red.
Notes: 11 waves of the Survey of Household Income and Wealth (SHIW) dataset.

Table 2 :
OLS estimates.The impact of the number of children on years of education.

Table 3 :
Balance checks between twins' and no-twins' families.

Mean no twins Mean twins Diff. (2)-(1) t-test
Notes: Columns (1) and (2) report average values for individuals in, respectively, no-twins' and twins' families.Column (3) reports the differences between averages and column (4) reports t-test for the significance of this difference.

Table 4 :
Two-stage least squares estimates.Educational achievements and family size.The dependent variable is Lyceum.Standard errors, corrected for heteroskedasticity and allowed for clustering at household level, are reported in parentheses.The symbols * * * , * * , * indicate that coefficients are statistically significant, respectively, at the 1, 5, and 10 percent level.

Table 5 :
Two-stage least squares estimates.number of children and educational achievement.Sample: first born children in families with at least two births.Sample of first-born children in families with two or more children; twins are excluded.The dependent variable is Lyceum.Twins (Second Birth) is set to 1 only if multiple births occurred at the second birth (and 0 otherwise).Standard errors, corrected for heteroskedasticity and allowed for clustering at household level, are reported in parentheses.The symbols * * * , * * , * indicate that coefficients are statistically significant, respectively, at the 1, 5, and 10 percent level.

Table 6 :
Two-stage least squares estimates.Number of children and educational achievement.Sample: first two children in families with at least three births.Notes: Sample of first and second born children in families with three or more children; twins are excluded.The dependent variable is Lyceum.Twins (Third Birth) is set to 1 only if multiple births occurred at the third birth (and 0 otherwise).Controls: as in Table5.Standard errors, corrected for heteroskedasticity and allowed for clustering at family level, are reported in parentheses.The symbols * * * , * * , * indicate that coefficients are statistically significant, respectively, at the 1, 5, and 10 percent level.

Table 7 :
OLS estimates.Investment in human capital and birth order.

Table A2 :
The probability of attaining a college degree in relationship to Lyceum.Linear probability model.SHIW Dataset.OLS estimates.Sample:All individuals aged 26 or more.The dependent variable is College Degree.Controls in columns (3) and (4): 10 dummies for cohorts; five dummies for geographical areas; five dummies for city size; Standard errors, corrected for heteroskedasticity, are reported in parentheses.The symbols * * * , * * , * indicate that coefficients are statistically significant, respectively, at the 1, 5, and 10 percent level.

Table A3 :
The probability to leave parental home with respect to age and Lyceum.Linear probability model.The dependent variable is Own Family.Standard errors, corrected for heteroskedasticity, are reported in parentheses.The symbols * * * , * * , * indicate that coefficients are statistically significant, respectively, at the 1, 5, and 10 percent level.The dependent variable is College.Sample: 19-25 years old.Standard errors, corrected for heteroskedasticity and allowed for clustering at family level, are reported in parentheses.The symbols * * * , * * , * indicate that coefficients are statistically significant, respectively, at the 1, 5, and 10 percent level.
Notes: SHIW Dataset.Sample: all individuals aged 16 or more.The dependent variable is Own Family.