AKM Effects for German Labour Market Data from 1985 to 2021

This article describes the processing and accessibility of the person and establishment fixed wage effects in German administrative data. These effects have been estimated following the approach of Abowd, J., Kramarz, F., and Margolis, D. (1999. High wage workers and high wage firms. Econometrica 67: 251–333) and Card, D., Heining, J., and Kline, P. (2013.Workplace heterogeneity and the rise ofWest German wage inequality. Q. J. Econ. 128: 967–1015). They can be linked to most of the available administrative datasets provided by the Research Data Center (FDZ) of the German Federal Employment Agency at the Institute for Employment Research (IAB). They are available for different time intervals from 1985 until 2021. These effects have been used in numerous articles that deal with the contributions of workers and establishments to earnings inequality.


Introduction
In an influential article, Card et al. (2013) (henceforth CHK) study the role of establishment-specific wage premiums for wage inequality in Germany. Their approach builds upon a framework initially proposed by Abowd et al. (1999) (henceforth AKM). It uses matched administrative employer-employee data to estimate a statistical model of wage determination, where wages are expressed as the sum of worker effects, establishment effects, time-varying covariates, and idiosyncratic error terms. Since then, numerous articles 1 have used this technique to answer many related research questions.
The Research Data Center (IAB-FDZ) of the German Federal Employment Agency at the Institute for Employment Research (IAB) has applied the CHK/AKM approach to the universe of German employment data and estimated the person and establishment fixed effects (henceforth AKM effects). The IAB-FDZ has offered these effects to external data users since 2015 and has recently updated the procedure and the corresponding data files. They are available for the most common administrative data products, such as the Sample of Integrated Employment Biographies (SIAB; see Frodermann et al. 2021) or the Linked Employer-Employee Data from the IAB (LIAB; see Ruf et al. 2021). Currently, the AKM effects are available for different time windows from 1985 until 2021. Users with an unexpired data usage agreement can access the effects in addition to their primary data product.
This article describes how these effects are estimated and processed, how they can be linked to the existing available data, and for which research topics the data might be relevant. It is organized as follows. Section 2 explains the data sources. Section 3 explains the initial data preparation that has been applied in order to estimate the AKM effects. Section 4 explains the details of the estimation procedure. Section 5 provides descriptive results about the estimated effects. Section 6 explains the data structure and how to access them. Section 7 summarizes and gives an overview of potential research questions.

Data Source
The data source that is used to estimate the AKM effects is the IAB Employment History File (BEH) which originates from the Integrated Employment Biographies (IEB) which are processed by the IAB.
The BeH covers all employees who are subject to social security at least one day in a given year. Since 1999, the data also cover employees in marginal part-time employment and unpaid family workers. Workers' employment spells are supplemented by establishment information from the Establishment History Panel ("Betriebs-Historik-Panel", see Ganzer et al. 2022), which aggregates establishmentlevel variables (with June 30 each year as the record day). The IEB unites data from different data sources, each of which may contain information from different administrative procedures. The main data source is the integrated notification procedure for health, pension, and unemployment insurance.
The BeH covers the majority of the German workforce, only excluding civil servants and the self-employed.
The main ingredients that the CHK/AKM approach builds upon are the BeH information on employment and total earnings. Since the identification of the effects relies on worker mobility, being able to track workers is crucial. The BeH offers an ideal laboratory because each worker and establishment has distinct identification numbers.

Data Preparation
The data preparation mostly follows CHK, with a few differences, which we explain in detail below. From the universe of BeH employment spells, we restrict the sample to full-time aged 20 to 60. For each of these workers, we identify the main job in a given year as the job with the highest total earnings (including any bonus payments).
In the BeH data, earnings are right-censored at the contribution assessment ceiling. ("Beitragsbemessungsgrenze").
We define a wage observation as censored whenever the reported wage is higher than 98 % of the censoring thresholds. Following CHK, we fit a series of Tobit regressions to impute the right tail of the wage distribution. We estimate the Tobit regressions by year, sex, education, and age group. 2 As opposed to CHK, we do not restrict our sample to West Germany and males.
The estimation is performed on the largest connected set of movers that contains the group of establishments that are connected by worker mobility over each sample interval. Figure 1 illustrates the concept of the largest connected set with four connected (A, B, C, D) and one unconnected establishment (E). We estimate the AKM model for five non-overlapping time intervals : 1985-1992, 1993 -1999, 2000 -2006, 2007 -2013, and 2014 -2021. Table 1 shows the number of workers and establishments that are in the largest connected set.

Estimation
Person and establishment fixed effects are estimated using the following regression equation: We estimate this equation separately for each of the aforementioned time intervals. The data comprises N * person-year observations on N workers and J  establishments. J(i, t) gives the identity of the unique establishment that employs worker i in year t. As in CHK, the daily real wage y it of individual i in year t is assumed to be the sum of a worker component α i , an establishment component ψ J( i,t) , an index of timevarying observable characteristics x′ it β, and an error component r it . x′ it β includes an unrestricted set of year dummies as well as quadratic and cubic terms in age fully interacted with educational attainment.
The literature interprets the establishment effect ψ j as a wage premium (or discount) that is paid by establishment j to all of its employees, while the person effect α i represents time-invariant factors that are attributed to the worker. Table 2 shows the mean, standard deviation, and the distribution of the estimated AKM effects. The results point to the usual findings that both the person effects and the establishment effects become more variable over time (CHK). However, in the most recent years, this trend has stopped (see Lochner et al. 2020). 6 Data Structure and Access

Data Structure
The AKM effects are estimated for five different non-overlapping time intervals. Data users can readily link the effects using the provided unique person and establishment identifiers. The person and establishment effects are provided in separate timespecific variables, i.e. the data is in wide format.

Data Access
The estimated person and establishment effects are offered by the Research Data Centre of the Federal Employment Agency at the Institute for Employment Research (IAB-RDC) in addition to the SIAB and LIAB data. The effects can be accessed during guest visits to one of the locations of the Research Data Centre of the IAB-RDC. Subsequently, remote data processing is possible. Initially, an application for data use must be submitted to the IAB-RDC. After approval, a contract needs to be signed between the institution of the researcher and the IAB. An unexpired agreement is necessary for using the estimated effects. Details on how to apply for the data set and the data processing options can be found on the website of the IAB-RDC (https://fdz.iab.de/en/data-access/). Access to the data can only be granted once the applicant has signed contracts with both research data centers.

Summary and Outlook
The AKM approach allows researchers to separately identify unobserved, timeinvariant worker and establishment heterogeneity in wages. Using the universe of German employment data of full-time workers aged 20-60, we have applied the AKM wage model to estimate person and establishment fixed wage effects. These effects have been estimated for different time intervals from 1985 to 2021 and can be linked to the workhorse data sets provided by the IAB-FDZ.
So far, many articles have used the AKM effects since the IAB-FDZ started to offer them in 2020. The following paragraph summarizes a selective list of them (incomplete and without any specific ordering): Butschek (2022) investigates whether the minimum wage introduction raised hiring standards, using worker fixed effects as a proxy for worker productivity. Zimmermann (2022) studies the relationship between worker productivity and the share of female managers. Müller and Neuschaeffer (2021) study the relationship between worker participation, worker sorting, and firm performance and find that establishments with works councils employ higher-quality workers. Niebuhr and Peters (2020) study workforce composition and individual wages, using the estimated worker and establishment fixed effects as a proxy for unobserved factors that might influence the entry wages of new hires. Illing et al. (2021) analyse the gender wage gap after job loss and show, among other things, that women tend to work at establishments with lower AKM establishment effects. Lochner and Merkl (2022) explore the role of gender-specific application behavior as well the selection behavior of firms for the gender earnings gap in Germany. They find that women tend to apply less often at high-wage establishments, but if they do they have about the same likelihood of being selected by these firms.
As the aforementioned examples show, the provided AKM effects offer an ideal laboratory to exploit some sort of worker and establishment heterogeneity in the context of many different research questions.