PISA Performance of Natives and Immigrants: Selection versus Efficiency

Abstract In most countries, immigrant and native students perform differently in the Programme for International Student Assessment (PISA) due to two main reasons: different immigration regimes and differences in their home-country educational systems. While there is sophisticated literature on the reasons for these performance gaps, it is barely considered in the educational efficiency research. Our approach distinguishes between selection effects caused by immigration policies, and the efficiency of educational systems in integrating immigrant students, given their socio–economic background. Accordingly, we split our sample, which consists of 153,374 students in 20 countries, calculate various different efficient frontiers, and ultimately decompose and interpret the resulting efficiency values. We find large differences in educational system efficiency, when controlling for negative selection effects caused by immigration regimes.


Introduction
The differences between natives and immigrants in the Programme for International Student Assessment (PISA), published by the Organisation for Economic Co-operation and Development (OECD), has gained considerable attention in the literature.¹ Apart from social, cultural, religious and *Corresponding Author: Andreas Behr: University of Duisburg-Essen, andreas.behr@uni-due.de, ORCID iD: 0000-0002-6818-1878; We thank two anonymous reviewers whose comments have helped to improve and clarify this manuscript Gerald Fugger: University of Duisburg-Essen, ORCID iD: 0000-0001-8546-7600 1 PISA is a worldwide study that assesses the 15-year-old students' performance in mathematics, science, and reading. In addition, the individual backgrounds of the pupils and school data are collected. historical reasons, different immigration policies and different levels of success in integrating immigrants are the two most important aspects (Kunz, 2016;Isphording et al., 2016). Countries attract different groups of immigrants with different socio-economic environments, due to country attractiveness, as well as their immigration policies (Entorf & Minoiu, 2005;Hochschild & Cropper, 2010). In most countries, socio-economic endowment is one of the most important factors for the educational success of students (Parr & Bonitz, 2015;Rogiers et al., 2020). This is illustrated by the left panel of Figure 1, which shows a strong positive within-country correlation between the average reading, mathematics and science student scores in PISA, and their average ESCS values, the latter being an index of their socio-economic backgrounds, in 2015.² The index of economic, social and cultural status (ESCS) comprises several subcategories in the areas of parental education, highest parental employment and student housing. It is considered to be an appropriate measure of the students' socioeconomic background (Hwang et al., 2018). The right side of Figure 1 shows the strong correlation between socioeconomic endorsement gaps (ESCS gaps) and educational performance gaps between natives and immigrants across countries (Rogiers et al., 2020).³ Our descriptive analysis reveals substantial educational (PISA) and socio-economic (ESCS) gaps between immigrants and natives and, that performance comparisons to a large extent implicitly reveal the students' different social and economic backgrounds. Without account-Following the PISA definition of immigration, an immigrant foreignborn in the second or first generation (OECD, 2017).  Figure S1 in the appendix shows the positive correlation between average PISA scores and the average ESCS scores on the country-level. The right panel elucidates that the average PISA scores are negatively correlated with the mean absolute deviations from the median ESCS scores. ing for the students' backgrounds, studies run the risk of making implicit statements about immigration policy. We take this problem into account explicitly, by analysing the performance of the educational system, given the varied social backgrounds of immigrant and native students.
An educational system can be integrating, despite a large educational gap, if it at least partially compensates for the gaps in socio-economic background. We use Data Envelopment Analysis (DEA) to examine the efficiency of educational systems. DEA models provide efficiency scores based on the students' performance relative to the performance of the best students comparable in their ESCS endowments. Our analysis is conducted at student-level, the most disaggregated data available in PISA. The students are evaluated according to their ability to maximise PISA scores given their socio-economic endowment. To account for the differences in socio-economic endowment between immigrants and natives, we split the PISA 2015 data into subsamples of natives and immigrants. Efficiency scores are calculated relative to various efficiency frontiers, which provides further insights and fosters our understanding of the relationship between selection effects in immigration, and the integrational abilities of the educational institutions in this context. Educational system performance is then obtained from the average efficiency scores of the students and further decomposed.
Our first efficiency analysis uses the average PISA score of the mathematics, science, and reading scores as output and the ESCS values as input. These three PISA scores are highly positively correlated. The aggregation into one output enables a straightforward interpretation and decomposition of the efficiency frontiers. In a further analysis we use the ESCS as input and include the three PISA scores (mathematics, science, and reading) as seperate outputs. DEA models allow the inclusion of several outputs, whereby all inputs and outputs are simultaneously included in the efficiency assessment by weighting them. The results of the second efficiency assessment confirm our main findings for average scores that in countries with restrictive immigration regimes, immigrants are not only performing relatively well but also use their endwoments rather efficiently. Some countries (e.g. Spain and France) perform considerably better according to their efficiency considering their ESCS endowments relative to their PISA ranking.
After this introduction, a literature overview of the performance gaps between natives and immigrants is provided. The third section outlines our methodology. In section four we explain the methodology of the ESCS and PISA scores, the differences between immigration regimes, and provides some initial results. The results of the efficiency analyses and their decomposition are discussed in section five preceding the conclusion.

Literature Overview
Differences in country attractiveness for immigrants, and different immigration policy regimes, attract different groups of immigrants, resulting in heterogeneous immigration populations between countries, and a wide range of challenges for the educational systems and societies in general (Entorf & Minoiu, 2005;Hochschild & Cropper, 2010). While some countries attract immigrants whose socio-economic endowments are equal or even higher than those of the natives (Arabian oil-based economies, English speaking countries and Singapore), others, such as Central European countries, mainly attract immigrants who have a poorer socio-economic endowment than the natives (Jerrim, 2015). In Austria, Denmark, and Germany, for example, the differences between native and immigrant students are especially striking (Rindermann & Thompson, 2016).
Besides their levels of educational, human capital, and wealth-related aspects (all part of PISA's ESCS index), native and immigration populations may also differ in cultural, religious, historical, and reputational aspects (Parr & Bonitz, 2015;Kunz, 2016). Immigrants may also face formal rights and legal status challenges, lack accumulated experiences as well as social connections that may result in educational information asymmetries, which can influence the educational performance of their children (Rindermann & Thompson, 2016;Camehl et al., 2018). Schneeweis (2011) decomposes the educational gap between immigrants and natives using the data of five international student assessment studies. Her results show that institutional characteristics of the education systems can increase differences between immigrants and natives. The results of Borgna & Contini (2014) indicate that educational institutions and socio-economic backgrounds are mostly causing the gaps between immigrants and natives. Furthermore, PISA 2006 and 2009 data reveal that school attendance significantly reduces educational gaps. Dronkers et al. (2014) find that the countries' educational systems and the students' individual characteristics cause the differences between immigrants and natives. Harris et al. (2019) show that the access to certain areas of the curriculum depends at least in part on the socio-economic endowment of the students in the schools. Woessmann (2016) finds that educational institutions and family background have the highest explanatory power in determining educational achievements. Interestingly, the impact of school resources is much smaller than the students' social-economic endowment and institutional characteristics, which is also found by Falck et al. (2018).
Further empirical studies based on PISA data reveal that the different socio-economic backgrounds of immigrants and natives have the highest overall explanatory power regarding differences in educational attainment. Especially in European countries, nearly three-quarters of the performance gaps between natives and immigrants are accounted for primarily by differences in economic, social, and cultural status (Ammermueller, 2007;Levels et al., 2008;Arikan et al., 2017). Other factors, like linguistic barriers (previously considered the most important barrier for immigrants) only partially explain the performance gaps (Isphording et al., 2016;Rindermann & Thompson, 2016).
Another important aspect in explaining performance gaps is the selection process among immigrants. Individual background factors vary between different immigrant groups which themselves vary between the countries (Schnepf, 2007;Arikan et al., 2017). In countries where immigrants are highly educated like Australia, they perform on average better in national and international comparisons than their native counterparts (Dustmann et al., 2012;Jerrim, 2015). The opposite holds for Central European countries in which a considerable share of the immigrants have on average a lower economic, social, and cultural status than the population of their immigration target countries and perform worse in PISA (Dustmann et al., 2012;Rindermann & Thompson, 2016;Arikan et al., 2017).
Accordingly, heterogeneous immigrant populations provide specific challenges for educational systems that should be considered in efficiency analysis. Although the ESCS is an input (among others) in most educational efficiency analyses, regarding the importance of socioeconomic backgrounds, international efficiency studies are deficient, in how they consider the differences between immigrants and natives within and between countries.
Efficiency scores are based on the relationship between the sum of weighted output to the sum of weighted input of the students relative to the best students. As the socio-economic status is an environmental or nondiscretionary input, it is not amenable to direct control by the educational system, and therefore cannot be regarded as a traditional input in efficiency analysis. But since it is found to have a significant impact in determining performance in PISA, socio-economic status is included in most efficiency analyses (Agasisti & Zoido, 2018). For example, Sutherland et al. (2009) argue that student achievements depend on their social environment (family and peer-groups) and therefore must be included in student efficiency analysis. Similarly, Cordero-Ferrera et al. (2017) argue that student socio-economic background is crucial for evaluating students according to their ability to make the most with their inputs (Cordero-Ferrera et al., 2017). Apari-cio et al. (2017a) refer to students as "raw material" that is transformed in schools and the impact of which is best reflected by the students' socio-economic status (Aparicio et al., 2017a).
In the cross-country analyses of Sutherland et al. (2009), Aparicio et al. (2017a, and Agasisti & Zoido (2018), the students are not distinguished according to their country of origin. Moreover, the studies do not account for selection effects caused by immigration policies, that can lead to distinct immigrant groups with different socioeconomic backgrounds. Aparicio et al. (2017b) proxy the socio-economic backgrounds of students by including the educational experience of their parents, which is only one aspect of the broader ESCS. As the performance gap determinants are manifold, a more comprehensive index should be preferred. De Witte & Lopez-Torres (2017) provide a broad overview of recent educational efficiency studies.
A considerable number of publications have been published in both the efficiency strand and the literature strand, focusing on the determination of the performance gaps between immigrants and natives. However, no international educational efficiency study so far accounts for the different challenges arising from different immigration policy regimes.

Methodology
In this section we explain our notation and our methodological approach in detail using a small artificial data set. As our decomposition approach regards several countries and the two subsets of students with or without immigration background, we introduce index sets denoted in calligraphic characters to facilitate referencing to specific groups of students.

Sets of students
The set of all countries is denoted K and individual countries are referred to with index k (k = 1, . . . , K). In each country k we have two sets of students. The set of students in country k having an immigration background is denoted with I k . Immigrant students in country k are referred to using the index i (i = 1, . . . , I k ). Native students (home) in country k build the index set H k and are indexed with h (h = 1, . . . , H k ). All students in country k, that is students with and without immigration background are referred to with E k = {I k , H k }.
Calligraphic characters without an index refer to the set combining the subsets from all K countries. I.e. E = {E 1 , . . . , E k , . . . , E K } is the set of all students from all K countries and I ={I 1 , . . . , I k , . . . , I K } is the set of all students with immigration background from all K countries. We also have E = {I, H} with H ={H 1 , . . . , H k , . . . , H K }.

Students and different frontiers of potential scores
In our illustrating example we only consider two countries, that is k and k ′ . First, we consider country k and the two subsets I k (immigrant students) and H k (native students).
For each we observe their input x (ESCS-score) and their output y (PISA-score). We represent in Figure 2 native students (H k ) by closed circles and students with immigration background (I k ) with open circles. We observe that some students with rather similar inputs reach quite different outputs. The observations of the 'best students', subsequently named efficient students, are joined with linear junctions and the resulting frontier is used as a yardstick to benchmark the remaining students. How we identify the best students is explained in more detail below (see model 2). As we have three different subgroups, natives (H k ), immigrants (I k ) and all students combined (E k ), we can obtain three different frontiers. These frontiers we denote in general by F and the superscript indicates based on which subset of students the frontier is obtained, accordingly we have drawn the three different frontiers F I k , F H k and F E k in Figure 2.

Benchmarking individual students
The performance of a specific student h, we pick for illustration the one indicated with the square, can now be assessed using three different benchmarks. To ease the readability, the right panel of Figure 2 displays a part of the left panel enlarged.
A benchmark student denoted byh 1 is a synthetic student on frontier F H k . This benchmark student is a linear combination of two efficient native students located at the frontier F H k (dotted line). If we compare the obtained score of student h with the score ofh 1 on the frontier F H k , we obtain a relative efficiency score of D H k (h) = 2.740/3.230 = 0.850. We use D for the efficiency score and the superscript indicates on which set of students the frontier is obtained, here we use frontier F H k . Hence, the student h only obtained 85% of the score that is regarded as being possible given his input amount. Or, equivalently, he could increase his output by 17.6% if he would be as efficient as his benchmark fellow students.
If we compare our native student h with an efficient synthetic student with immigration backgroundh 2 , which is located at the frontier F I k (solid line) obtained from immigration students I k , we obtain students h score as D I k (h) = 2.740/3.370 = 0.810, hence, in this comparison he is underperforming by 19%.
And finally we can benchmark student h with synthetic studenth 3 located at the frontier F E k (dashed line) which is based on all students in country k. As this hypothetical benchmark studenth 3 performs even better thañ h 1 andh 2 , we find that according to this yardstick, student h underperforms by D E k (h) = 2.740/3.510 = 0.780, i.e. 22%. Note that in this last comparison the benchmark studenth 3 is a hypothetical student obtained as a linear combination of an efficient native and an efficient immigrant student.

Benchmarking sets of students
To obtain a measure of the performance of a complete set of students we use the arithmetic mean of individual scores. E.g. to obtain the average performance of immigrant students I k using the frontier F I k obtained based on this set of students, we calculate I k is the number of students benchmarked, here the students with immigrant background in country k. We use M for arithmetic mean, the superscript I k to indicate that we use the frontier F I k and the argument in parentheses indicates which group of students is benchmarked. In our illustrative example considered in Figure 2 we obtain for immigrants M I k (I k ) = 0.827 and for natives M H k (H k ) = 0.839. For comparing the performance of immigrants and natives, one may like to use the frontier F E k obtained considering all students E k in country k. In this example we obtain for immigrants M E k (I k ) = 0.788 and for natives M E k (H k ) = 0.796 as average efficiencies.

Considering a second country
We now consider a second country k ′ . We use filled diamonds for native students and open diamonds for immigrant students. The left panel of Figure 3 contains the situation for country k ′ , again with the three different national frontiers indicated by dotted, dashed and solid lines. The right panel combines the students of both countries and allows us to obtain an international frontier F E collated from all students of all (here: two) countries.
This allows the benchmarking of the immigrant students of county k (I k ) and of the native students of country k (H k ) using the international frontier. E.g. our student h of country k is now benchmarked based on the score of a synthetic studenth 4 located at the international frontier F E . Accordingly in this comparison her efficiency score D E (h) = 2.740/4.010 = 0.680 is the lowest obtained in the comparisons and hints for a potential increase in her score of 47%. Using the international frontier F E for benchmarking all native students in country k results in an average score M E (H k ) = 0.686. The immigrants of country k obtain an average score M E (I k ) = 0.698.

The DEA model
We use the output-oriented BCC model, introduced by Banker et al. (1984). The output orientation implies that students maximise their output given their inputs. For student o the model is defined as: Output r of student o is given by yro and is weighted by ur (r = 1, . . . , s). s equals the number of outputs. Her in- . m is the number of inputs and n is the number of all students under analysis. The weights are restricted to be non-negative, derived from the data, and most likely vary between students. The weights are not chosen a priori but determined when solving the linear program. The most favourable composition of weights to make student o as efficient as possible are chosen given the restrictions. The linear program is set up and solved for each student under analysis indivdually (Behr, 2015;Cooper et al., 2007). η * denotes the solution to the minimisation problem. For convenience, we define D * = 1 η * . If η * = D * = 1 student o is efficient. The limits of η * and D * depend on whether the student o belongs to the group of students she is compared to. If she belongs to the group of students she is compared to, η * is equal to or greater than one and D * is equal to or less than one. If student o does not belong to the group of students she is compared to, η * may be less than one (the student is super-efficient). In this case the student o is above the efficiency frontier of the students she is compared to, and D * is greater than one (Chen, 2005).
The scalar u 0 is free in sign and implements the assumption of variable returns to scale (VRS). VRS allow non-proportional output changes when the inputs change. The input and output tuples of students are neither allowed to be scaled up (increasing returns to scale) nor down (decreasing returns to scale) in the BCC model.

The PISA Study, Migration Regimes and Descriptive Results
We use students' socio-economic status as the input and the average PISA score of the students in reading, mathematics and science, as output in the first efficiency analysis. If necessary, the data are transformed to obtain positive values as DEA can only handle positive inputs and outputs. Outliers are excluded.

The PISA study and the ESCS
PISA is a worldwide stratified two-stage sample study conducted by the OECD, to measure 15-year-old students' performance in mathematics, science, and reading. It was conceived to offer insights into sources of performance variation within and between countries. It was first performed in 2000 and then repeated every three years. The PISA assessment in 2015 focused on science, and was published in December 2016 (OECD, 2016). Student performance is reported as the corresponding mathematics, science, and reading scores.⁴ A minimum of 150 schools must be selected in each country to ensure quality standards. If a participating country has fewer than 150 schools, all schools are selected. Within each participating school, a predetermined number of 15-years-old students, usually 42 students, is randomly chosen with equal probability. In schools with fewer students, all students are selected. If the response rate is too low, the sample size of schools is increased beyond 150 to ensure a minimum student sample size. A response rate of 85% is required for initially selected schools. If the initial school response rate falls between 65% and 85%, an acceptable school response rate can still be achieved by using replacement schools. Schools are classified into similar groups according to selected variables (region, private or public school, funding,. . . ). A minimum student response rate of 50% within each school is required for a school to be regarded as participating (OECD, 2016).
Since its publication, the results of the PISA study have influenced the design of the education systems of the participating countries. For example, Ho (2016) shows how the insights resulting from PISA were used in Hong Kong, Damiani (2016) in Italy, and Ababneh et al. (2016) in Jordan. Tobin et al. (2016) provide a world wide overview of how large-scale educational assessments influence education policy and most studies find significant effects of secondary education on the economic development of countries (Aduand and Denkyirah, 2017;Karatheodoros, 2017).
The index of economic, social and cultural status (ESCS) comprises three main categories: parental education, highest parental occupation, and home possessions. The latter combines five indices: family wealth, household possessions, cultural possessions, home educational resources, and information and communication technology resources. These indices are derived from the availability of 16 household items at home, including three countryspecific household items. The ESCS's three main components are standardized with a mean of zero and a standard deviation of one, over the full sample. Finally, a principal component analysis (PCA) of the three main components is conducted, and the ESCS is defined as the first principal component score (OECD, 2017).⁵ For first-generation immigrants, parental education and partly the highest parental occupation may result from the educational institutions of their country of origin, rather than from integration results or the educational system of their target country, in whose educational efficiency we are interested. However, both the home possession measures and the success of the second-generation immigrants depend on the integration and education quality in their target country (Reparaz & Sotés-Elizalde, 2019). The ESCS covers a wide range of different economic, social and cultural topics, enabling an approximation of possible determinants of education performance gaps between immigrants and natives. Furthermore, through the use of PCA, the ESCS is a construct that is well suited for capturing and comparing the whole students' socio-economic status (Hwang et al., 2018).

Migration regimes
When examining the efficiency of educational systems in terms of the immigrant performance, the respective immigration regimes of the countries must be taken into account. Bjerre et al. (2015) and Bonjour & Chauvin (2018) provide an overview of a large number of definitions and distinctions in the literature.
In addition to limiting official immigration policies (strict ones are mainly based on points systems), another important aspect is how many people enter the country through unofficial channels. For example, a comparison between Germany and Australia shows that the proportion of immigrants in Australia for family and humanitarian reasons is far lower and the percentage who do so for economic reasons is higher (Beine et al., 2016). Based on their selective immigration policy and low proportion of noneconomic immigration, Australia, Canada, New Zealand, and the United Kingdom can be regarded as having rather restrictive immigration regimes. The United States of America also has a restrictive immigration policy, but unlike the remaining countries in this group, it does not succeed in attracting immigrants who perform on average at least as well as their native peer group, as shown below (see also Camarota & Zeigler (2016)). The European Union introduced a points-based system in 2009, but it is far less strict than in the other countries with a selective immigration policy, and the share of immigrants for family and humanitarian reasons is relatively high. Therefore, we do not regard the members of the European Union as being restrictive (Bertoli et al., 2016).
We use the average occupational status of parents, which is available in PISA (higher values stand for better status) to substantiate our country classification. The occupational status of parents is an important determinant of the educational attainment of immigrants, as the educational mobility of immigrants is generally lower than that of natives (Schneebaum et al., 2016;Reparaz & Sotés-Elizalde, 2019). Descriptive results show that in most countries, the occupational status of parents of natives is higher than that of immigrants. Only in countries with a selective immigration regime, are the gaps close to zero or even negative. Singapore attracts immigrants whose parents have the highest level of education.⁶ These results can be provided upon request.

The data and descriptive results
Our sample comprises 153,374 students in 20 industrialized countries for PISA 2015.⁷ We combine first-and second-generation immigrants, otherwise several countries would have too few data points in at least one group (e.g. Finland and the Netherlands), and both groups have similar performance differences (relative to the natives), 6 In Singapore, the recruitment of skilled workers is systematically promoted and part of the official government strategy, as the following quote from prime minister Goh Chok Tong's speech at the national day rally 2001 shows: "[. . . ] some Singaporeans may again question the need for more global talent. I urge you to understand that this is a matter of life and death for us in the long term. [. . . ] If we do not top up our talent pool from the outside, in ten years time, many of the highvalued jobs we do now will immigrate to China and elsewhere, for lack of sufficient talent here." (Tong, 2001) Singapore is the most successful of all countries in attracting highly qualified and top performing immigrants. In our analysis, immigrants in Singapore are on average the most efficient. 7 Japan, Korea, and Poland are excluded because of having too few immigrants.
which are determined to a similar extent by the ESCS (Rangvid, 2007).
As a frontier based non-parametric technique, DEA is sensitive to outliers. We exclude outliers based on their influence, measured by Cook's distance. We define outliers to have a Cooks' distance of at least eight times the average distance for each country and each regression, which is a reasonable threshold according to Cook (1979).⁸ Table  S1 shows the number (between 44 and 207) and the percentages (ranging from 0.759% to 1.186%) of excluded outliers per country.
PISA reading, mathematics, and science scores are constructed to have an international mean of 500 and a standard deviation of 100. The standardization provides student results that are directly comparable between countries. Table 1 summarizes within-country correlations between the scores. All scores are highly positively correlated, and the correlations vary between 0.743 for the mathematics and reading results in Italy, and 0.908 for the reading and science results in Singapore. Table S5 in the appendix depicts the correlation coefficients for each country.
We use the students' average PISA scores as output y, to enable comprehensible visual and contextual illustrations. After discussing the results for the average PISA score as output, we also present the results for the three PISA scores in mathematics, science, and reading as outputs. Figure 4 presents the average PISA score distributions, using a Gaussian kernel with a bandwidth of 70% of Silverman's "rule of thumb" to disclose more details for immigrant and native students separately for each country (Silverman, 1986). In countries with selective immigration policies, as well as in Israel and Portugal, immigrants and natives perform similarly well. In Singapore, the immigrants perform even better than the natives. In the other countries and especially in most European countries, natives perform better. The differences between natives and immigrants between countries further indicate that the prevailing immigration regime influences the selection among immigrants. However, Figure 4 focuses only on our output and does not distinguish between the selection effects and the efficiency of educational systems. Figures S2 to S4 in the appendix provide the distributions of the three PISA scores. They are rather similar to the distributions of the average PISA scores and the same dis-   where v represents the students' PISA and ESCS values and n their respective numbers.
tinctions between countries with and without restrictive regimes can be made. The index of economic, social and cultural status of each student (ESCS) is regarded as input (x ′ ). x ′ is internationally comparable, has a mean of zero and a standard deviation of one. Radial DEA models can only handle strictly positive variables. Therefore, x ′ is transformed: x is the input used in our efficiency analysis and is not further transformed. Table 2 provides descriptive results and correlation coefficients between the average PISA scores and the ESCS values for each country at the student level, for students with and without an immigration background. In most countries, natives perform better and have a better average socio-economic background. In Australia, Canada, and New Zealand (all countries with selective immigration systems), immigrants achieve higher average PISA scores and have higher ESCS endowments. On average, immigrants in Singapore have ESCS values that are above the PISA average, and the values of the natives are lower (Becker, 2012;Facchini & Lodigiani, 2014). In comparison, both Canadian population groups have above-average ESCS averages and the smallest gap. This hints for the selectivity of the Canadian immigration system, so that the average immigrant in Canada has a socio-economic background similar to that of the average native. The United States has the largest ESCS gap between the two groups. Although the United States has a selective immigration system, it attracts immigrants with relatively poorer socio-economic backgrounds. However, the differences in performance are smaller in the United States than in Germany and Norway, for example. Spanish immigrants have, on average, the lowest ESCS values, and Portugal is the only country in which the natives achieve higher PISA values despite worse socio-economic backgrounds, although the gap is not significantly different from zero. Such specific challenges must be taken into account in an international efficiency analysis of educational systems. Tables S2 to S4 in the appendix provide descriptive results for the individual PISA scores. All scores are greater than zero and students with missing values are excluded from our analyses. We use regressions to gauge the relationship between students' average educational performance and their socioeconomic endowments for each country separately. The regressions include both a dummy for immigrant background and an interaction term. The results indicate that performance gaps between immigrants and natives are determined strongly by their respective ESCS endowments. Increasing ESCS values have the highest positive impact in France and lowest in Spain and Italy. The results indicate a significantly better performance of immigrants in Australia, Canada and Singapore and a positive but insignificant relationship in Israel and the United States of America. In all other countries, immigrants perform significantly worse than natives. All results can be provided upon request.

Eflciency Results and Eflciency Decomposition
All results are obtained using R (version 3.6) and, unless otherwise stated, the average PISA results are used as output. The efficiency scores indicate how relatively well the students perform, given their socio-economic backgrounds. First, the results are decomposed relative to national and then international frontiers, followed by a comparison of the performance of natives and immigrants, and finally, the impact of the selection processes and the efficiency of educational systems are evaluated. Table 3 provides country-specific arithmetic mean efficiency scores for all students, for the student groups rel-  ative to both national frontiers, and comparisons between the groups. The column numbers are given above the formal terms to simplify the interpretation. The initial descriptive results showed that natives have higher average PISA scores (see Table 2 and Figure 4), but they disregard the socio-economic backgrounds of the students, that are taken into account in the efficiency analysis. Column (1) and (2) of Table 3 contain the results of natives and immigrants relative to their respective frontiers for each country. Across all countries, both groups of students are on average almost equally efficient (0.699 in column (1) to 0.708 in column (2)) if compared to their benchmark students from their group. Columns 3 and 4 of Table 3 show the average efficiency scores when using the national frontier based on both subsets of students. We observe that there are hardly any changes among the natives, if immigrants are also taken into account when calculating the efficient frontier. In contrast, the performance of immigrants decreases when natives are taken into account as revealed by the comparison of column (2) and (4).

National frontiers
Natives outperform immigrants on average by (M E k (H k ) − M E k (I k )) · 100 = 5.741% in Denmark, by 7.188% in Finland, and by 5.009% in Italy. Natives also perform better in most countries, but the gaps are not as large as in the previous countries and range from 0.100% in New Zealand to 4.774% in Sweden. In all these countries, immigrants perform far worse, according to their efficiency scores, taking into account their socio-economic endowment. The educational systems do not succeed in fostering both groups equally, which leads to inequalities in educational performance beyond the differences due to their endowments.
In Australia, Canada, Israel, Singapore, and the United States, immigrants perform on average better than their native peer group, considering their efficiency based on ESCS endowments. In the United States, immigrants perform best relative to the natives. The performance difference is 2.066%. In Israel both groups perform similarly, immigrants being slightly better (0.613%).
Column 5 of Table 3 provides the mean efficiency scores for all students, based on their own frontiers for each country. Israel achieves the lowest (0.646) and Spain the highest (0.725) mean. Since the efficiency frontiers are country-and group-specific, they are rather a measure of inequality than a means of comparing efficiency between countries. Table 3 does not provide any information on which students form the efficiency frontiers, and how efficient the national educational systems are. Figure S5 in the appendix displays the frontiers for each student group within the countries and the interna-tional frontier, calculated for all students. In several countries, the best-performing students are immigrants for low ESCS values and natives for higher ESCS values (e.g. in Austria, France, Germany, and the United States). In the remaining countries, only natives constitute the efficiency frontier, as is the case in Finland, Portugal, Singapore, Spain, and the United Kingdom. It is striking that the students in Portugal, Singapore, and Spain have input-output combinations that are on average far less distant from the international efficiency frontier than in the other countries. Therefore, these countries are among the top performers in our analysis.

International comparisons
Including all students, Figure S5 shows that the international efficiency frontier for low ESCS values consists of three Spanish native speakers (one of whom has the lowest ESCS value in the sample), followed by one Portuguese and one Singaporean native speaker (with the highest average PISA value).⁹ Table 4 provides further within and between-country comparisons. M E (H k ) is the average score of the native students of country k, M E (I k ) is the average efficiency of its immigrant students, and M E (E k ) is the mean efficiency of all of students from country k with respect to the international frontier of all students.
Columns 1 and 2 show how well each group performs within each country, and allows within-country comparisons relative to the international frontier consisting of all students. Compared to their native peer group, immigrants perform best in Australia (on average 1.977% better), followed by the United States (1.558%), and Singapore (1.415%). The countries where natives perform best compared to immigrants are Finland (on average 6.646% better), Sweden (5.476%), and Denmark (5.277%).
The results so far have been group-specific. Column 3, on the other hand, provides a comparisons of the efficiencies of the national educational systems. The values result from an international frontier and do not differentiate between natives and immigrants within countries. The mean inefficiencies show how much the average PISA scores of a country could be increased, if its educational system were to enable students to perform similarly to the most efficient international students with comparable ESCS endow-  Figure 5: Arithmetic mean eflciency differences between native and immigrant students in each country relative to the international frontier Table 4: Decomposition, national students and international frontier, average PISA scores as output  Figure 5 shows the differences between the arithmetic means of students with and without immigration back-ground, relative to the countries' frontiers, providing an overview of the within-country differences. By including the ESCS as input, our analysis takes into account the socio-economic endowment of the students. Selection effects that result in high or low ESCS scores should therefore not influence the efficiency scores, given the ESCS input levels.

Differences between immigrants and natives
The efficiency gaps between the groups are smallest in Canada (−0.008), Israel (−0.006), New Zealand (0.001), and Portugal (0.010). In the other countries, the differences are greater than 1%. In all European countries and especially in Sweden (0.048), Denmark (0.057), and Finland (0.072), the immigrant students perform on average considerably worse than their native counterparts given their ESCS backgrounds. Finland is often regarded as a country with a superior educational system and integration success, but according to the efficiency scores the educational system in Finland is highly inefficient in closing the gap between natives and immigrants. Recent literature confirms these performance deficits of immigrants in Finland, taking into account background factors such as gender, grades, socio-economic background, home language and age of arrival in Finland (Kirjavainen (2015); Yeasmin & Uusiautti (2018)). However, these results have not yet attracted much attention in recent literature. Arikan et al. (2017), for example, claim that reducing the ESCS gap would close the performance gap between natives and immigrants in Finland, but our results indicate that especially an efficient use of the ESCS endowment is more important than the low ESCS levels (Arikan et al., 2017). We argue that the sole use of PISA results in native immigrant comparisons mainly reflects selection effects due to different immigration policies, rather than an analysis of the educational systems. Given the social structure of immigrants (and natives) we evaluate the educational systems according their ability to transform social endowments into good PISA results. Educational performance (average efficiency scores)

Selection effects and educational eflciency
In the upper line of Figure 6, the countries are arranged in descending order according to their immigrants' average PISA scores. The order is solely based on the absolute performance of immigrants in PISA. Here, the efficient countries are characterised by a strict immigration policy, selecting immigrants who achieve the highest PISA levels. In the lower line, the countries are ordered according to their immigrants' average efficiency relative to the international frontier (M E (I k )). Thus, the countries are ranked according to their students' performance, given their ESCS endowments. Therefore, the impact of selection procedures is to a large extent controlled for, and the ranking reveals how successfully educational systems use the ESCS endowment.
The arrows indicate the rank changes. In both analyses, students perform best in Singapore and worst in Denmark. The ranks of all other countries change due to taking the ESCS endowment into account. Without regarding the ESCS endowment (upper ranking), countries with strongly selective immigration systems rank second to fifth. Taking into account the socio-economic backgrounds of their stu-dents (lower ranking), their ranks deteriorate to four, five, six and nine. This indicates that simple PISA score comparisons examine rather immigration policy and less so the efficiency of educational systems.
Austria, Denmark and Sweden are the countries where immigrants perform worst according to their average DEA scores. Given the socio-economic background of their students, these countries could achieve much higher PISA scores, if they were to adapt their educational systems to those of the efficient countries.
Without including the ESCS as input, immigrants in Spain perform relatively poorly, but on average they perform very well regarding their efficiency. France (five ranks), Italy (four), and Portugal (three) are also countries which improve their rankings compared to the simple PISA score comparison. Regarding their socio-economic backgrounds, these three countries have relatively less favourable immigrant compositions, but their educational systems are relatively more efficient than in most other countries. Our analysis shows that, despite a very large educational gap (see Table 2), the French school system performs well on average, because it at least partially compensates for the large differences in the socio-economic background of immigrants. While most countries lose up

PISA scores as separate outputs
The DEA allows the inclusion of separate outputs that are simultaneously included in the efficiency assessment. In this section, students are assessed on the basis of their ability to maximize the three PISA scores, given their ESCS endpoints. Model (2) allows for specialisation so that the efficiency of students focusing on a subset of the three abilities is adequately taken into account. Tables S6 and S7 in the appendix contain the decomposition of the efficiency results for national and international frontiers. The efficiency scores of the average PISA score and three separated PISA scores as outputs are highly positively correlated. The Pearson correlation coefficient between the DEAs are 0.984 for M H (H), 0.981 for M I (I), and 0.984 for M E (E). Table 5 provides the correlation coefficients of the efficiency scores based on the aggregated output and that of the three outputs for each country.
The inclusion of the separated PISA scores as outputs allows the DEA model to weight the outputs separately and thus to calculate overall higher efficiency scores. The similarity of the results to those of the previous analysis shows that students who perform well on average also perform quite well in the individual PISA subjects. These results confirm that immigrants in countries with restrictive immigration regime perform relatively better than in other countries and that immigrants in Spain, Portugal, and Singapore perform relatively best given their socio-economic endowments.

Conclusion
Our analysis focuses on the abilities of the national educational systems to integrate immigrants, given their socio-economic backgrounds. Country-specific means of efficiency scores based on national frontiers reveal that in Denmark, Finland, and Sweden, native students perform substantially better than immigrants. In Australia, Canada, Israel, Singapore, and the United States, immigrants are more efficient than their native peer group.
Relative to the international frontier consisting of all students and compared to their native peer groups, immi-grants in Australia, Singapore, and the United States perform relatively best. The opposite is true in Finland, Sweden, and Denmark.
Even if the differences in the socio-economic endowment of the students are taken into account, differences between natives and immigrants persist. According to PISA scores, as well as the efficiency scores, in most countries with more selective immigration regimes, immigrants perform on average similar or even better than natives. The persistent differences are somewhat surprising, as the broad ESCS should capture the most relevant socioeconomic factors.
We find that the Spanish educational system is relatively best in increasing immigrants' performance, and Israel's system is worst, given the respective socioeconomic backgrounds of their immigrants. Australia, Canada, the United Kingdom, and New Zealand are countries with selective immigration policies, which attract immigrants who perform relatively better or almost as well as their natives. If, however, the socio-economic backgrounds are taken into account, the immigrants in these countries perform on average worse than in Spain and Portugal. The latter have low PISA values, but highly efficient education systems.
The result that countries with relatively selective immigrant policies perform not only well in absolute PISA scores, but are also quite efficient given their ESCS input levels, is truly astonishing. This result implies that the selection process not only affects ESCS levels, but also the immigrant capacity to use their endowments efficiently.