Growing Potentials for Migration Research using the German Socio-Economic Panel Study

: This article highlights the potentials for migration research using the German Socio-Economic Panel Study (SOEP), a longitudinal panel dataset of private households in Germany running since 1984. We provide a concise overview of its basic features, describe the survey contents and research potentials, and demonstrate opportunities to link external data sources to the SOEP thereby presenting its diverse and impactful applications in migration research.

years 2015 and 2016. This increase in migratory movements and deepening of European integration are only reflections of broader processes of economic and political globalization, which open up increasing opportunities to find better living conditions by moving abroad. Yet not only globalization but also humanitarian crises and ecological catastrophes are driving forces in many global migration movements.
Migration poses challenges to national economies in terms of labor supply, social security, and social cohesion (e.g., Freeman 1986) and requires targeted policy measures to foster integration. As migration often extends across multiple generations, evidence-based research on migration calls for longitudinal data that span the life course. Only with such data can researchers identify the causal mechanisms that drive migratory movements and underlie their individual and societal effects. Alongside administrative data, data from household panel studies and retrospective cohort studies provide a crucial basis for these kinds of longitudinal micro-analytical approaches to migration research. The main goal of household panel studies is to provide an accurate representation of national populations. Panel studies also have the advantage of low recall error thanks to their prospective longitudinal design (Peters 1988). Because of this, they can track transnational movements across the life course. To address such longitudinal research questions, household studies in migrant-receiving countries must systematically incorporate migrants into their samples.
This article focuses on the German Socio-Economic Panel (SOEP), a longitudinal survey of randomly sampled households in Germany, and provides an overview of its far-reaching potentials to observe migrant households in Germany (Goebel et al. 2019). The SOEP has been underway since 1984, allowing researchers to study processes of transformation and change in Germany for almost four decades. Its mission is to provide information on diverse aspects of German society including income, wealth, labour market participation, life satisfaction, wellbeing, and other aspects of life Brell et al. 2020). To live up to this mission, the SOEP has been responding to exogenous changes in its underlying target population, including immigration, since its inception. According to its founding mission statement, a cornerstone of the SOEP is the systematic inclusion and oversampling the migrant population (Krupp 2008). Up to the 35th wave in 2018, the SOEP has surveyed 96,461 individuals, 31,982 of whom have a background of migration. 1 Most of these individuals are part of the SOEP's five specific migrant sub-samples, who are surveyed with targeted questionnaires allowing for systematic observation of migrants' experiences in the Federal Republic of Germany. Besides individuals who migrated to Germany themselves (first generation immigrants), the SOEP also focuses on direct descendants of migrants (second generation immigrants) and thus captures various aspects of German immigration history. The SOEP is thus a unique data source for the study of immigration and integration trajectories in Germany, and provides researchers with reliable data on how immigration has affected the German economy and society over time. Each new wave of the SOEP increases these research potentials, especially in the area of migration.
The remainder of this paper is structured as follows: Section 2 of this article provides an overview of basic features of the SOEP, including a description of the sample composition, the sampling and weighting approaches of the SOEP migrant samples, and of the dataset structure. Section 3 describes the survey and illustrates research potentials. Section 4 outlines opportunities for linking external data sources to the SOEP, and Section 5 concludes this article.

Basic Features of the SOEP
The general target population of the SOEP consists of private households in Germany. Since 1984, around 96,461 individuals resident in 42,263 households have been interviewed at least once about their lives (see Figure 1 for the number of interviews in each survey year). The SOEP consists of several distinct sub-samples Growing Potentials for Migration Research that have been added sequentially over time. Starting with the baseline Samples A (general population sample) and B (migrant sample) in 1984, the SOEP research group has subsequently reacted to panel attrition (refresher Samples E, F, H, J, K, N), changes in the target population resulting from events such as German reunification (East-Sample C), and to rising immigration to Germany (Samples D, M1-M5). Other samples focus, for instance, on people with high income or high net wealth (Samples G, P), households situated in big cities (Sample O), low income families and families with children (Samples L1-L3), or LGB families (Sample Q), some of which represent relatively small populations that require disproportional sampling to make them visible for in-depth analysis. : 1984-2018 In its effort to capture immigration to Germany, the SOEP study includes a large share of immigrants (both cross-sectionally and over time), allowing for detailed analysis of sub-groups and for analysis in comparison to the non-migrant population. In the initial 1984 sample, five southern European countries-the so-called "guest worker" countries-were oversampled (households with Italian, Spanish, Greek, [Ex-]Yugoslavian or Turkish household heads). Hence, as Figure 1 shows, immigrants and their descendants have always been a substantial part of the SOEP. Over the course of the 36 survey years, a total of 25,869 people with a direct (first-generation) and 6,113 individuals with an indirect (second-generation) migration background have been part of a SOEP household. Figure 2 shows the countries of origin of all SOEP respondents. In line with Germany's immigration history, the SOEP data mostly cover immigration flows from southern Europe (e.g., guest workers) and since 1995, from new eastern EU member states, the former Soviet Union (late expatriates, ethnic Germans), and the Middle East (refugees).

Sampling, Weighting, and Fieldwork
Most of the SOEP samples are two-stage random samples. In a first step, the SOEP group merges, for instance, addresses to regional clusters (Primary Sampling Units, PSUs). The sampling of PSUs is commonly stratified by region (e.g., federal states) and in some cases by urban/rural areas in order to ensure that German regions are comprehensively covered. Usually, in the second stage, individuals or addresses are sampled randomly from selected SSUs (secondary sampling units). Because the SOEP is a household study, not only the sampled respondent (anchor respondent), but all adult members of the household are interviewed and basic information on the educational participation of children is collected.
In order to ensure sufficient sample size for small sub-populations in the migrant samples (e.g., female refugees, elderly refugees), the SOEP in most cases applies disproportionate sampling on the SSU level. Such disproportionate sampling probabilities are corrected by means of design weighting. However, the weighting adjustment not only accounts for the sampling design, but allows for integrating the SOEP's various sub-samples. We generally recommend using the SOEP as a whole and not just specific sub-samples, as in some cases the target populations are not mutually exclusive (for an overview, see Kroh et al. 2014). In Germany, sampling migrants is generally challenging because there is no comprehensive database containing all first and second generation immigrants in Germany. While the German central registry of foreigners contains all non-German nationals who reside in Germany for a period of three months or more, former migrants who have acquired German citizenship cannot be identified in this register. The SOEP therefore relies on a variety of sampling frames and techniques. Table 1 summarizes key sampling characteristics for each migrant sample.

Dataset Structure
The structure of the data allows researchers to analyze SOEP data in two dimensions. 2 With each survey year representing a data wave, the first option is cross-sectional analysis of any given survey wave. Second, the longitudinal nature of the data can be exploited either in the "wide" or "long" data format. Furthermore, since 2012, the SOEP group has provided user-friendly datasets in the long format and harmonizes variables over time. For instance, the SOEP group generates a variable displaying the migration background of a respondent, and provides information on income prior to the launch of the euro in 2001 in euros. Longitudinal analysis is straightforward with the SOEPlong format. The SOEP contains a great variety of datasets on different observational levels, including original datasets, datasets containing generated variables (e.g. ISCED), and spell data. For empirical analyses, these may be combined using merging procedures on the basis of relevant dataset identifiers (e.g., pid (time-constant personal identifier), hid (time-constant household identifier)). Moreover, due to the household concept of the study, family context information can be added to analyses on the individual level by way of partner, child, and parent identifiers. Table 2 describes the six different types of datasets in the SOEP and lists some relevant examples. All datasets are available free of charge to universities and research institutes for both research and teaching purposes in various data formats (see Goebel et al. 2019).

Research Potentials
Migration and integration are complex, multi-faceted processes that affect the lives of those who experience them in numerous ways. The questionnaire content of the SOEP tries to cover this complexity, allowing researchers to closely study migrants' living circumstances over time.
Even before the more recent special samples of migrants and refugees were added to the study population in 2013 and 2016, respectively, the SOEP's questionnaires offered deep insights into immigrants' country of origin, process of arrival in Germany and patterns of integration. Table A1 provides an overview of key dimensions of the SOEP's survey content for its migrant population. As  Table A1 shows, since the study's inception in 1984, the SOEP has collected data on crucial aspects of migrants' backgrounds, including country of birth and citizenship status, as well as aspects of their lives in Germany. Migrants have been asked, for instance, about their German language proficiency and sense of national or ethnic identity (Dustmann 1994;Dustmann and Van Soest 2001), allowing for analysis of the processes of cognitive and social assimilation (Constant and Zimmermann 2008;Diehl and Schnell 2006). Furthermore, since the addition of Sample D in 1994, the questionnaires items dealing with the individual's experience of immigrating to Germany have been systematically expanded to include additional information on matters such as residence status and social links to Germany prior to immigration. Based on these contents, studies have examined the immigrant-native wage gap, the transferability of human capital, immigrants' saving behavior or remittances to relatives in their home countries, the effects of naturalization and citizenship, social networks, and the influence of macroeconomic conditions in home countries on immigrants' well-being ( With the addition of samples M1-2, the SOEP-in cooperation with the Institute for Employment Research (IAB)-has begun to collect even more detailed information on the unique living circumstances of migrants and their descendants, and since 2013, the migrant-specific questionnaire content has been significantly expanded (see Table A1). To this end, the SOEP has used concepts that are well established in the literature and has applied Esser's (1980Esser's ( , 2001Esser's ( , 2006 concept of social integration as a basis for developing additional survey content. According to Esser (1980), "social integration" is a conglomerate of four dimensions: cultural adaption, positioning (individuals' rights, residence titles, etc.), interaction, and identification. These four aspects of integration are reflected in the SOEP's questionnaires (Esser 2008), which contain multiple indicators for each dimension of Esser's theoretical framework (see Table A1). However, the extensive questionnaire content also makes it possible to study other theoretical constructs related to migrants' integration.
Since 2016, refugee-specific samples (M3-5) have also been part of the SOEP. Similar to the migrant samples discussed above, the questionnaires distributed to this sub-population aim at understanding their unique situations and challenges and therefore cover the periods before, during, and after migration. Examples are the refugee's route to Germany, the asylum process, shared accommodations and attendance of integration classes, mental health, subjective well-being, labor market access, educational attainment, personality, and value orientations (see Table A1; Guichard 2020; Hahn et al. 2019; Jacobsen 2019; Jacobsen and Fuchs 2020; Kosyakova and Brücker 2020;Löbel 2020;Siegert in press;Walther et al. 2020). 3 To allow for comparative analyses between the host society and newcomers, migrants also receive part of the same standard SOEP survey instruments as nonmigrants. This makes it possible to study aspects that are not specific to migration, such as well-being, within these populations. For some aspects of integration, such as employment and German language proficiency, retrospectively provided, pre-migration data is also available. This allows for analysis of changes in migrants' lives since their arrival in Germany (Krieger 2020b).
Besides providing novel and unique data on migrants in Germany, the SOEP also provides an opportunity to study different cross-cultural survey methodological tools. In its work with the Institute for Employment Research (IAB) and the Research Centre of the Federal Office for Migration and Refugees (BAMF-FZ) the SOEP has used novel sampling frames such as the Integrated Employment Biographies and the Central Register of Foreigners and novel sampling techniques such as onomastic procedures. Further, all questionnaires are translated into the most important languages of the target population. These vary depending on the samples' composition and include, among others, Turkish, High Arabic, and English (see Table 1). All of these features make SOEP a unique tool to learn about the practical application of cross-cultural survey methods and to observe changes and improvements in their application over time (Eisnecker and Kroh 2017;Jacobsen and Fuchs 2020;Liebau et al. 2018).

Linking SOEP Data with Additional Data Sources
There are numerous options to augment the SOEP data with data from other sources that provide contextual or regional information. For instance, the SOEP offers unique potentials for spatial analysis. By using information on respondents' region of residence, researchers can link SOEP data with spatial indicators at different levels, including states, spatial planning units, counties, municipalities, and postal codes. Since 2000, exact geo-locations are available within a specialized secured setting at the SOEP Research Data Center (Goebel 2020). Recent examples of such spatial analyses include, for example, Dill et al. (2015), Krieger (2020a), Lersch (2013), Sager (2012), and Schaffner and Treude (2018).
There are a number of projects that seek to combine the advantages of administrative data with the benefits of survey information by linking the SOEP data with data from administrative sources: One is the Linked Employer-Employee Study (SOEP-LEE, see Weinhardt et al. 2017) and another is SOEP-RV, which combines SOEP data with individual-level data from the German pension insurance. Another important project, IEB-SOEP, adds administrative information from the Integrated Employment Biographies to SOEP migration samples M1/M2 (Brücker et al. 2014). This enables researchers to add administrative information on long biographical periods to the SOEP data and thus analyze the labor market integration of immigrants in Germany. The refugee Samples M3-M5 can also be linked to the IEBs (Keita and Trübswetter 2020).
Furthermore, it is possible to use the SOEP to conduct cross-country comparative research. For instance, the Cross-National Equivalent File (CNEF) 4 is an international panel dataset providing harmonized information on education, employment, income, health, and life satisfaction. The information stems from longitudinal household panel studies in a number of countries including Australia, Germany, Great Britain, Canada, South Korea, Russia, Switzerland, and the United States.

SOEP Related Study: Mentoring of Refugees (MORE)
Further, the SOEP regularly implements related studies. For instance, the SOEP-related study Mentoring of Refugees (MORE) 5 explores whether and how friendships and contact with locals affects refugees' integration into Germany (Jursch et al. 2020;Legewie et al. 2019). As part of this project, in 2017, respondents in Samples M3-M5 were asked whether they would be interested in participating in the non-profit initiative Start with a Friend (SwaF). SwaF is a social start-up founded in 2014 in Berlin. Its mission is to connect recently arrived refugees with locals in Germany with the ultimate goal of initiating long-lasting friendships between the two parties. Participants are advised to meet with their partner for two to three hours on a weekly basis. Participants can freely decide how they wish to spend this time together; some meet to do sports, others to visit museums or exhibitions.
In 2017, SwaF was active in 14 German cities. 6 Respondents in Samples M3-M5 who resided in one of these cities (N = 733) were asked whether they would be interested in program participation. Those who expressed interest (N = 446) were subsequently randomly assigned to either the group of participants (treatment group, N = 234, 52%) or the group of non-participants (control group, N = 212, 48%). Accordingly, MORE is designed as a randomized controlled trial. Following randomization, interviewers registered refugees who were assigned to the treatment group on SwaF's website to transmit their contact information to program organizers. This administrative step was successful for 215 refugees. In kick-off meetings with the 127 refugees assigned to treatment, SwaF asked about their hopes and expectations for the mentoring relationship in order to be able to find a suitable local mentor. Based on these insights, 85 refugees were assigned a local mentor. The remaining 130 refugees could not be matched to a local for various reasons; among others due to fading interest or insufficient German language skills on the part of the refugees.
As part of the 36th SOEP wave, datasets related to MORE will be released in February 2021. First, these data include information on which refugees were assigned to the treatment versus control group. This information can be matched to the SOEP in order to compare refugee outcomes before and after treatment and assess the impact of SwaF (information included in $p, see Table 2). Second, during the matching process, SwaF coordinators kept track of key steps in the matching process, including dates of registration, first meeting and matching, as well as refugees' and locals' hopes and expectations concerning the program (dataset: more_docu). Finally, three waves of online survey were conducted with local mentors (at the initiation of the match, after six weeks, and after four months). These data allow for insights into the mentoring relationships and how refugee-local pairs spent their time together (dataset: more_local).

Concluding Remarks and Outlook
The SOEP sets national and international standards in the field of migration research from a life course perspective. From the start of this household panel study in 1984, it has included survey content on immigrants and their unique biographies and has responded to changes in the German population by sequentially adding several distinct (migration) sub-samples.
As the SOEP migrant samples have grown over time, the content of questionnaires on migrant-specific life circumstances has also increased. This currently makes it possible for researchers to analyze diverse aspects of immigrants' livestheir routes to Germany, their mental health, employment, and language skills. Migrants' and refugees' outcomes can further be compared to those of natives or be analyzed in relation to spatial information. As the SOEP employs a household concept and thus interviews all household members, it is also possible to analyze multiple generations of immigrant families. Finally, the SOEP offers great potential for cross-cultural survey methodological research.
In 2020, the SOEP responded to the spread of the virus COVID-19 by interviewing households in Germany during the lockdown. As part of the research project SOEP-CoV, the migration samples M1-2 were also interviewed, allowing unique insights into immigrant life during the pandemic. Overall, the SOEP is a unique data source that strives to gather knowledge about migrants in innovative ways and that offers unique opportunities for multidisciplinary migration research.