Fostering Excellent Research by the Austrian Micro Data Center (AMDC)

: Access to high quality microdata is a precondition for the empirical investigation of many interrelationships in the economic and social sciences. Therefore, well-functioning research data infrastructure is a cornerstone of a successful science location. While other countries in Europe, such as Denmark and the Netherlands, have had microdata centres at their respective National Statistical O ﬃ ces for quite some time, microdata access for research purposes in Austria was very limited for a long time. Established in 2022, the Austrian Micro Data Center (AMDC) at Statistics Austria enables researchers of accredited research institutions to work with pseudonymized microdata on individuals and ﬁ rms. The available microdata includes not just microdata of Statistics Austria but also registry data of the Austrian federal government. The main novelty is that microdata can be linked deterministically to each other via unique pseudonymized identi ﬁ ers among data sets of Statistics Austria, administrative registers, and also to microdata brought in by the researchers themselves. The AMDC is operated by Statistics Austria and its services are open to research institutions worldwide.


Introduction
Internationally, National Statistical Offices have played a major role in establishing access to administrative microdata for research purposes. 1 Examples include Statistics Denmark, 2 Statistics Finland, 3 and the Centraal Bureau voor de Statistiek (CBS) in the Netherlands. 4 In Austria, microdata access for research purposes was very limited for a long time. However, in 2022, a change in legislation enabled access to microdata for research entities comparable to the European forerunner countries. Before this change, Austria's National Statistical Office, "Statistics Austria", 5 was not authorized to provide access to data on the level of individuals or firms to researchers. Consequently, it was often not possible to conduct empirical tests of important research hypothesis and identify causal inference using Austrian data. Thus, Austria was at a disadvantage as a science location and it was difficult for research entities to provide evidence-based scientific advice to Austrian policy makers in many fields. In contrast, more than 1500 scientific projects and publications were published from 1 For an early overview on microdata access from an OECD perspective see Ahmad et al. (2009). 2 See: https://www.dst.dk/en/TilSalg/Forskningsservice (accessed May 26, 2023) and Borchsenius (2006). 3 See: https://www.stat.fi/tup/mikroaineistot/index_en.html (accessed May 26, 2023). 4 See: https://www.cbs.nl/en-gb/onze-diensten/customised-services-microdata/microdata-conducting-yourown-research (accessed May 26, 2023). 5 Statistics Austria is the Federal Statistical Office of Austria and is an independent and non-profitmaking federal institution under public law. It produces federal statistics, which also includes analyses and projections, on the economic, demographic, social, ecological, and cultural conditions in Austria. To fulfil its purpose, many different governmental institutions, enterprises, and citizens provide data to Statistics Austria via survey or specific registers. These statistics, which are decreed by EU legal acts, federal laws, or by regulations, are published and made available to the public. By law, Statistics Austria must observe different principles, which include objectivity and impartiality in the compilation of statistics, application of statistical methods and procedures according to internationally accepted scientific principles, and standards and disclosure thereof and keeping personal data confidential. Statistics Austria provides access to collected data to scientific and business communities, the broader public, while also providing additional information, expert services, and special statistical evaluations. (See: https://statistik.at/en/about-us/responsibilities-and-principles/ responsibilities-and-principles-of-statistics-austria (accessed May 26, 2023)).
2006 to 2022 using Dutch microdata provided by the Centraal Bureau voor de Statistiek (CBS). 6 Since 2022, following an amendment to the Federal Statistics Act (German: Bundesstatistikgesetz, BStatG, see Section 3), the legal basis for remote access to indirectly identifiable microdata for scientific purposes was established. With the amendment, Statistics Austria established a new research data infrastructure, Austrian Micro Data Center (AMDC), which opened July 1, 2022.
The aim of the Austrian Micro Data Center (AMDC) is to provide a central, data protection-compliant remote access to statistical registers, as well as to other microdata of federal governmental entities for empirical research. In this sense, a one-stop shop for scientific purposes has been created and, by doing so, an essential research data infrastructure established. 7 According the Federal Statistics Act (BStatG § § 31, 32), the basic infrastructure of the AMDC is financed by the Austrian Federal Ministry of Education, Science and Research (BMBWF), whereas the variable costs must be borne by the research entities. This highlights the importance of competitive research calls for registry research.
The establishment of the AMDC strongly follows existing best-practices models, in particular microdata accesses by the Centraal Bureau voor de Statistiek (CBS) and Statistics Denmark. This paper describes the application process and project work with the AMDC (Section 2), provides an overview of the microdata accessible in the AMDC (Section 3), and gives an outlook on possible further developments (Section 4).

Accreditation of a Research Entity
To meet the legal requirements and to gain remote access to microdata for scientific purposes, the applying research entity must meet several requirements. These requirements are defined in the Federal Statistics Act (BStatG § 31 (7)) and include: -Conduct research at university level and make the results available to the public free of charge. -Be an organisation with legal personality, with a primary focus on research.
-Be independent and autonomous in scientific activity and in formulating scientific conclusions.
-Fulfil the technical and infrastructural requirements with regard to guaranteeing data security.
These requirements must be delivered in the process of official registration of a research organization with the AMDC. After reviewing the documents, the research organization is awarded an official confirmation of accreditation. On top of these institutional requirements, the members of these organizations must commit themselves to the strict data protection measures of the AMDC, which include no reidentification of individuals or firms and only analysing microdata for research purposes.
The Federal Statistics Act (BStatG § 31 (8)) lists a number of scientific institutions that meet the first three requirements (BStatG § 31 (7)). Within the first 12 months, a considerable number of more than 40 national and international research organizations successfully applied for accreditation with the AMDC, including Vrije Universiteit Amsterdam, University of Gothenburg, and almost all Austrian universities as well as national and international research institutes like the Austrian Institute of Economic Research (WIFO), Complexity Science Hub Vienna (CSH), Geneva Graduate Institute, ifo Institute -Leibniz Institute for Economic Research Munich, Institute for Advanced Studies Vienna (IHS), and the Halle Institute for Economic Research (IWH). 8 Accreditation is typically valid for five years. Only in case of expiration after five years or if there are significant changes, for instance in the legal structure or the main activities of an organization, does the entity have to re-apply for accreditation.

Application for a Research Project
The application for a research project is open to any employee of a research organization accredited by the AMDC, 9 Similar to the accreditation process, the project proposal to gain access to the AMDC is implemented via the AMDC online application. 10 For legal reasons, the application for a research project must include a research proposal and a justification to access the requested data. In more detail, an AMDC project has to include: a title and a brief description of the aim, research question and/or main hypotheses; analytical methods and expected results; a detailed justification of the selection of data sets and variables in reference to the aims, questions and hypotheses; 11 information on the researchers who want to access and work with the data (e.g. proof of employment with the accredited scientific institution); and the timeframe of the research project.
The AMDC reviews the proposal, checking both data protection concerns and the project feasibility. In this step, the AMDC not only checks all legal requirements but also ensures that that the selected data sets and variables are compatible with respect to sample overlap (over statistical units and time) and external identifiers.
After reviewing the whole application, the AMDC provides feedback to the researchers. 12 At this stage, revisions to the proposal are possible. After all formal requirements are fulfilled by the research proposal and the researchers, as well as the approval of the final research proposal by the AMDC, the AMDC provides a formal offer for data access, which includes detailed costs to be borne by the research entity. 13 If the research entity accepts the offer, the contract will be concluded. Before this formal offer, it is also a possible to get an estimate of the costs of the research project, for instance to use it in the course of grant applications or for other funding 11 This step is owed due to the data minimisation principle, which the AMDC has to implement for all data requests. Based on the General Data Protection Regulation (Regulation (EU) 2016/679), microdata provided through AMDC must be essential for the research project. For this purpose, the project proposal must justify the data requests and show that the scope of the data requested is necessary for adequately testing their hypotheses (see Section 2 Application Process and Project Work). In fact, this requires the research entity to detail the type of data that they intend to access in their project proposal. This requirement also manifests in § 31 (3) Federal Statistics Law. 12 Legally, Statistics Austria is only allowed to review requests that pertain to microdata of Statistics Austria and/or microdata by the user (see Section 3 Accessible Microdata). Should the research entity request access to administrative microdata of the federal governmental bodies (e.g. ministries), those governing bodies must review the request themselves. In this case, the request will still be submitted via the online application form tool and then forwarded to the data provider. 13 According to the Federal Statistics Act (BStatG § 32 (7)) the costs of the technical infrastructure of the AMDC are financed by the Federal Ministry of Education, Science and Research of Austria (German: Bundesministerium für Bildung, Wissenschaft und Forschung -BMBWF) with a budget of 505,000 Euro p. a. (2022). All other variable costs have to be financed by the research entity according to Federal Statistics Act (BStatG § 32 (1)). These include costs for application, consulting hours by Statistics Austria, creation of research data body, number remote access software licenses, number of statistical software licenses, etc. For a detailed description of the services offered by the AMDC (incl. rates per hour and tariffs) see: https:// www.statistik.at/fileadmin/pages/1805/Katalog_der_Serviceleistungen.pdf (accessed May 26, 2023).

Austrian Micro Data Center (AMDC)
opportunities. After successful application for remote access to microdata, the access can be provided for a maximum duration of 5 years.
A notable asset of the AMDC is that a research entity can request not only microdata from one microdata set, but of any combination of data sets available from the AMDC. The huge advantage of the AMDC is that, via deterministic linking, different microdata sets can be connected to one another. For instance, when working with person data in the AMDC, the deterministic linking is operated by a specific encrypted identifier (German: verschlüsseltes bereichsspezifisches Personenkennzeichen Amtliche Statistik, vbPK-AS) 14 that is provided by the Austrian Identifier Registry Authority (German: Stammzahlenregisterbehörde). 15 The pseudonymization with the vbPK-AS is the precondition for both the protected data use and the deterministic linking of the data on persons within the secure environment of the AMDC. Hence, only data that are pseudonymized with vbPK-AS can be processed. When working with enterprises, Statistics Austria itself is responsible to create an encrypted enterprise identifier (German: verschlüsselte Unternehmenskennzahl). If researchers provide data on companies with a usable identifier (e.g. the enterprise number, German: Firmenbuchnummer) to Statistics Austria, this identifier is replaced with the encrypted enterprise identifier. Thereafter, the data can be linked to other microdata sets that also use this identifier within the secure environment of the AMDC.

Research Project Work
The AMDC provides data access via a secure Remote Research Environment (RRE). In no instance is microdata ever sent to researchers; rather it is only available within the RRE, which is located on servers of Statistics Austria.
Hence, after researchers sign the contract stating that all the conditions for the project (including costs, start and duration of the research and commitment to data protection) are met, the AMDC prepares an RRE for the research project and compiles the requested microdata. After the completion of all preparations, users will experience the onboarding process, which is a technical introduction for using the RRE. The AMDC connects the researcher to the RRE via a Virtual Desktop Infrastructure in a "terminal server" solution, similar to those employed by the microdata centres of the Netherlands, Finland and Denmark (Reuter and Museux 2010). Logging and researching via the RRE is possible 24/7 during the whole timeframe of the research project except for scheduled or urgent system maintenance work.
The physical entry point to the AMDC RREaccording to Federal Statistics Act (BStatG § 31 (7 4))requires researchers to be located in a separate and lockable room at the accredited research entity. The main objective is that there must be no risk of unauthorized viewing of data or observation of research activities. In fact, this means that the access point may not be located in an open or public space at the research entity and that home office use of the AMDC is not possible under the current legislation.
Researchers are unable to add or remove any software or data to the RRE by themselves. The AMDC provides statistical and data analytics software, which includes, as of 2023: SPSS, Stata, R (RStudio Desktop) and Python (Spyder). Upon request by the research entity, special statistical software can be installed for a fee covering the corresponding costs. Additionally, LibreOffice, Jupyter Notebook and a text editor are available. Import of external data or analyses code provided by researchers is conducted by the AMDC after a security and data protection check. Finally, when fulfilling requests to export intermediate results of research projects (e.g. writing a research paper), the AMDC checks all tables and graphs with respect to the strict data protection guidelines ("output control"). Simply put, only outputs where no individual and/or firms are (indirectly) identifiable are permitted to leave the RRE. 16 Once they pass output control, the outputs will be provided to the research entity via a secure data exchange service.
After the research phase, the data and all scripts and logs of the research project will be archived and stored for five years for the purpose of possible revisions (e.g. during journal review processes). For the purpose of replicability of results, this time period can be extended. The researchers have to cover the costs for the extended storage of all files. During the storage period researchers can apply for access to their archived files.

Accessible Microdata
In general, the AMDC provides access to a wide range of microdata derived from three different sources (for the structure of the AMDC see Figure 1): 1. Microdata of Statistics Austria 2. Microdata of the Federal State 3. Microdata of the Research Entity First, the AMDC grants access to microdata of Statistics Austria. This microdata includes not only the microdata generated by Statistics Austria by survey but also include data from many administrative registers that are used by Statistics Austria for the production of official statistics, such as the Central Population Register (German: Zentrales Melderegister, ZMR) as well as the complete corporate, income and value added tax data (for an overview over the microdata of Statistics Austria provided in the AMDC, see Table 1 in the Appendix). 17 In addition, research entities may request access to additional administrative data from the federal state based on the Research Organization Act (German: Forschungsorganisationsgesetz, FOG, § 38b). The rules in this second track of data access are different, as the precondition for the access to these data are FOG regulations by the responsible ministries together with the Federal Ministry of Education, Science and Research (German: Bundesministerium für Bildung, Wissenschaft und Forschung, BMBWF). In this case, in line with its one-stop-shop-approach, the AMDC will forward the data request to the responsible federal ministry, acting as a liaison with the research entities and communicating decisions rendered by these ministries.
The final data track is microdata provided by the research entities themselves, which can be linked to the other data. Precondition for the use of this data source is the removal of all direct identifiers and the pseudonymization with a specific encrypted identifier (e.g. vbPK-AS; see above) or any available firm identifier compatible with AMDC data. Individual and firm level pseudonyms are used by the AMDC to make data sets linkable.
The AMDC microdata catalogue holds time series starting from the early 2000s. Researchers' requests for longer observation periods cannot be met in the foreseeable future as the legal foundation for safely linking data by the encrypted identifier 17 For meta information on all currently available microdata sets see https://www.statistik.at/amdcdata/ (accessed May 26, 2023).  COVID-19 vaccinations and data from COVID-19 infections will be available in the AMDC creating research opportunities to analyse public health measures during the pandemic and the long-term effects of a COVID-19 infection. With respect to business statistics, with data including the companies register, corporate tax statistics, trade statistics and foreign affiliates statistics, remarkable opportunities are provided within the AMDC to conduct excellent business-related research, for instance when it come to the analysis of the development of productivity or factors of success. The coordinated employment statistics allows for linking firm level data to individual level data. In this regard, it enables researchers to analyse data on the firm and employer levels, while simultaneously changing perspective and including individual, family and household levels (see Table 1).

Status and Outlook as of 2023
The potential for further development of the AMDC is very promising, for instance by linking existing and new data sets or expanding the data available to new data sources and new topical areas. First, within the AMDC it is possible to link individual level data (e.g. employees) to firms (employers) (Abowd and Kramarz 1999;Goetz et al. 2015, Weinhardt et al. 2017. Here, the AMDC can rely on unique identifiers, which can consistently be used for past, current and future data. Second, in terms of the topical coverage of the AMDC, it is expected that the microdata sets will expand to include a number of health and socio-economic characteristics. The sources include data from the public administration, data from research entities and even (tailormade) survey data to fill current gaps. 18 In this context, one highlight is the Austrian Socio-Economic Panel (ASEP), which will start its full operations in 2024. The aim of ASEP is to establish a longitudinal household panel comprising an annual household survey that is complemented with register-based data, thus allowing researchers to fully benefit from the linkage of numerous, already existing and new data sources from public administration and data held by Statistics Austria. 19 The introduction of a domain specific unique personal identifier (e.g. the vbPK-AS for Statistics Austria) in the Austrian public administration, legally implemented in 2004, 20 opened doors to link data within and across the Austrian public administration and, in turn, for academic research. Since implementation, the identifier is gaining increasing importance in administrative bodies. With the implementation of new public digital registers (e.g. the Austrian Vital Statistics Registry in 2015), the number of linkable data sets is still growing.
As of 2023, data available in AMDC is based primarily on data from within the statistical production process (source 1, see Section 3). It is expected that second sourcesadditional administrative data from the federal statewill grow over time, as separate legal acts by the responsible ministries are the precondition for the use of these data. In order to maximize the potential of the AMDC for excellent research and evidence-based scientific policy advice, access to these data must be released by the responsible ministries through FOG regulations. Since the ministries have committed themselves to data access for science, especially in the context of the current crises, the authors assume that the ministries will issue such regulations in a timely manner. Ultimately, this will also further strengthen Austria as a location for science. Thomas, T., Heß, M., and Wagner, G.G. (2017). Reluctant to reform? A note on risk loving of politicians and bureaucrats. Rev. Econ. 68: 167-179. Weinhardt, M., Meyermann, A., Liebig, S., and Schupp, J. (2017). The linked employer-employee study of the socio-economic panel (SOEP-LEE): content, design and research potential. Jahrb. Natl. Stat. 237: 457-467.