Skip to content
BY-NC-ND 4.0 license Open Access Published by Akademie Verlag April 15, 2023

Introduction: Digital History

Einleitung: Digital History
  • Ulrich Fritsche

    Ulrich Fritsche studied economics at Freie Universität Berlin (1996 Dipl.-Vw., 2003 Dr. rer. pol.) Freelancer at the HWWA-Institute in Hamburg in 1997, staff member at DIW Berlin (1998-2008) with focus on transition and developing countries, business cycle analysis and forecasting as well as empirical economic research and quantitative methods. In 2003, he worked at UNCTAD in Geneva. In 2005, he became an assistant professor and in 2009 a full professor at the University of Hamburg. Visiting positions at the IMF and KOF ETH Zurich. For several years, more intensive involvement with quantitative computer-assisted text analysis in research.

    EMAIL logo
    and Mark Spoerer

    Mark Spoerer studied history and economics in Bonn (1987 M.A., 1991 Dipl.-Vw.). Academic visits in Barcelona (1998) and Paris (2006, 2008-11). Since 2011, holder of the newly created Chair of Economic and Social History at the Institute of History of the University of Regensburg. Since 2017 chairman of the Gesellschaft für Sozial- und Wirtschaftsgeschichte (GSWG), since 2018 managing editor of the Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte (VSWG). Research interests include economic, business, and social history of Germany and Europe since the late 18th century.

Abstract

New digital methods are currently enhancing the historian’s toolbox fundamentally. This thematic issue is a collection of papers discussing case studies in the fields of digitization, optical character recognition, distant reading, text mining, network analysis, and historical geographical information systems. The papers discuss opportunities and limitations in the application of digital methods in historical studies and point out fields of future applications.

JEL Classification: C 49; C 88; N 01

In Google’s Ngram Viewer, the keyword “digital history” first appears in 1993, then again in 2000 and 2003, and from 2006 onward it had a steep career that continues to this day. [1] As early as in the 1990s, institutions emerged in the United States that took it upon themselves to make historical documents accessible to a broad interested public via the emerging and rapidly expanding World Wide Web, such as the Virginia Center for Digital History.

While such an understanding of the subject goes in the direction of public history, digital history can also be read as a methodological programme, as formulated by Hannu Salmi, for example:

“Digital history is an approach to examining and representing the past; it uses new communication technologies and media applications and experiments with computational methods for the analysis, production and dissemination of historical knowledge.” [2]

He thus distinguishes representation or visualization of history on the one hand and new historiographical methods on the other. In a similar vein, Torsten Hiltmann distinguishes medial digitality from conceptual digitality. [3] We are interested in the latter: Experiments with computational methods for the analysis, production and dissemination of historical knowledge are the focus of this thematic issue of the Yearbook.

The workshop “Digital Methods in History and Economics”, held in virtual space in October 14th and 15th, 2022, was as an integral part of the DFG priority programme 1859 “Experience & Expectation: Historical Foundations of Economic Behaviour”. The workshop aimed to bring together and to discuss different applications of digital methods with scholars from different disciplinary backgrounds with a special emphasis on “text as data”. The purpose of this interdisciplinary workshop was to foster the exchange between the different fields and the exchange between the researchers involved in the priority programme with external scholars. Several essays presented and discussed during the workshop have found their way into this issue.

The young field of digital history is not yet sharply defined. The basis for most applications is the existence of digitized documents. While in the early days these were only digital images, today it is almost always necessary to convert them into strings of alphanumeric characters (optical character recognition). A particularly sophisticated study in which numerical data are extracted semi-automatically from the digitized data for econometric analysis is presented in the paper by Jeremy Atack, Robert A. Margo, and Paul W. Rhode.

The paper describes in a very well documented way the elaborate digitization of one of the most important data sources in the field of technological revolutions and industrialization, the “Hand and Machine Labor study” (HML study). This study was commissioned by the United States Department of Labor in 1899. In detail, the authors describe the path from digitization using OCR to a structured dataset that allows econometric-quantitative analysis of the effects of mechanization. For a control regression for possible endogeneity, the authors use instrumental variable estimators and construct instruments based on text analytic methods. The paper provides insights into the research process while also providing interesting results regarding the quantification of productivity gains based on digitized sources.

While Margo et al. were in control of the process from digitization to analysis themselves, researchers often have to rely on already existing digitized text corpora. Especially with text corpora that were created relatively early and with procedures that are outdated from today’s perspective, one must expect considerable error rates. [4] Jørgen Burchardt has investigated this more closely for a number of newspapers.

Burchardt’s paper explores the question of which methods are suitable and established for digitizing journal archives, what difficulties arise, and what challenges and pitfalls researchers face here. Beyond the discussion of advantages and disadvantages from the evaluation of technical literature, the essay contains results of an – admittedly random but methodically well-founded – own investigation, which meticulously counts and evaluates the errors of machine processing. This makes it much easier to estimate the quality of the material. The test revealed several weaknesses in the search process, including an average 18 percent error rate for single words in body text, and far higher error rates for advertisements. Although these errors can be reduced by a re-digitization and with improved software and new search algorithms, searches will nevertheless return error-prone results. In response, and to identify bias and efficiency, database owners need to provide thorough metadata to ensure source criticism.

Digitized and machine-readable documents can – after some steps of preprocessing – be examined by text mining methods. Nowadays, these go far beyond pure word counts. For example, topic models can examine huge text corpora and identify topics based on a generative language with words having a well-defined probability to be a member of a topic on the one hand and topics having a well-defined probability to be part of a document on the other hand. A well-known mixed-membership model for language is, for example, the Latent Dirichlet Allocation (LDA) model. [5] This simple method is unsupervised (agnostic to exogenous information), which has advantages and disadvantages. In any case, the user must assign themes – and thus meaning – to the topics produced by the software. [6]

Text mining methods allow the analysis of huge text corpora that an author or even an author team would not able to read in a lifetime (“distant reading”). While distant reading is necessary to identify the broad lines, old-fashioned “close reading” remains often necessary to interpreting the findings. [7] In his contribution to this volume, Anselm Küsters analyses the flagship publication of the “Freiburg School” of economists, who lobbied a basically capitalist economic system framed with regulations to prevent abuse of market power, both with simple keyword searches and sophisticated topic modelling methods.

In his work, he goes far beyond a descriptive analysis. On the one hand, he uses statistical methods to work out distinctive characteristics of the texts over time (decades) and authors and – starting from a very good knowledge of the debates about ordoliberalism – to verify them with methods of classical content analysis. Second, he goes beyond the use of the largely atheoretical LDA models by resorting to the relatively new class of structural topic models (STM). [8] This allows him to describe changes in topic prevalence as a function of exogenous variables, and also to describe word usage within a topic as a function of authors. In doing so, statistical inference can be used to make statements about significant differences that can be well substantiated, opening up a new dimension to mixed-method analysis.

Another fascinating feature of text mining methods is that they allow to extract categorized information from an only vaguely structured text. For that purpose, Lynn Rother, Fabio Mariani, and Max Koss use machine-learning techniques with a special focus on semantic entities and related events.

The route taken here is different from a corpus linguistic approach. The latter is primarily concerned with an analysis of word distributions on the basis of corpora (collections of semantically rich documents). Here the focus is on semantic identification of events, roles, and actions based on machine learning or artificial intelligence methods. The resulting structured (assigned, classified) events form the basis for a subsequent quantitative analysis. As an example, gender roles and inheritances for the transfer of artworks are investigated with provenance data.

A similar approach is taken by Alexander Engel. As part of a larger research project, the complete issues of the “Avisblatt”, a weekly advertisement paper for Basel, from 1729 to 1844, have been digitized and transformed into a database of classified ads in machine-readable characters using the transcription software Transkribus. The research question of the project presented here in this issue is what influences income changes had on the consumption behaviour of three generic household types (upper middle class, lower middle class and lower class). Engel approximates the income changes using price series for the main consumer goods, weighted by a class-specific consumer basket. For the analysis of the behavioural changes and other questions in the larger project, Engel and his colleagues use a dictionary-based approach to classify ads, which they refer to as dynamic tagging. In the first step, they specify vocabulary that is distinctive for pre-specified household consumption or behaviour categories (e.g. offer or request ads concerning consumables, clothing, furniture, securities; loans, jobs, housing). In a second step, they programme R scripts that operate as lexical classifiers. Each thus defined “tag filter” contains two dictionaries, one including sequences of characters for ads they want to include and the other one to exclude “unwanted by-catch”. Thirdly, these R-scripts are used for analysing the text corpus of 850,000 advertisements. In this paper quarterly changes in advertisement activity are assessed, e.g. if the ads offering clothing increased or decreased in the second quarter of 1773. Finally, with the help of regression analyses, Engel shows in a differentiated way how income changes impacted the supply or demand behaviour of the three household types (as expressed in the change in the ads’ frequency). A major methodological advantage of this approach is that the R scripts are transparent, reproducible, consistent, and falsifiable. Moreover, they are extremely flexible and thus can be easily adapted to changing research criteria or research questions.

The ever-increasing availability of structured historical data offers entirely new possibilities for combining them. Thus, relationships between actors (people, firms, etc.), places, and objects can be systematically correlated to explore the density or orientation of a network. [9] An important measure to understand how central an element (a node) of a network is, i.e. a person, a group or a firm, is the betweenness centrality. Simplified that is to say: The more contacts a person has in a network, the higher will be his or her betweenness centrality.

Network relationships can also be visualized and analysed using geographical information systems (GIS) software. The paper by Bart Holterman and Angela Huang employs both network analysis and GIS tools (QGIS).

In the premodern era, trade suffered from comparably high transaction costs and in particular transport costs. For assessing travel times and costs in long-distance trade more accurately, Bart Holterman, Angela Huang and an international team set up the Viabundus dataset and webmap which has been online since December 2021. [10] This database currently covers parts of Northern Europe (mostly around the Baltic Sea) and focuses on overland trade routes (actual, not aerial distance) and node-related data guiding traffic such as town status of settlements, tolls, staples, and fairs for the period from 1350 to 1650. The article makes use of this dataset in an exploratory analysis. The idea is that visualizing these features on maps helps understanding the structure, functioning, and development of the premodern trade system. In their contribution to this volume, Holterman and Huang also employ GIS software to look at the betweenness centrality measure for fairs, staple markets, and toll stations. While the analysis confirms that fairs and staple markets featured a high centrality, toll stations, somewhat surprisingly, did not, which in turn leads to new research questions.

The authors argue that methods of network analysis and GIS mapping do not just visualise data. In both cases, the software groups large amounts of data and produces data structures which offer new insights and avenues for research. In other words, these methods are powerful heuristic devices to prevent scholars from not seeing the forest for the trees.

We are convinced that digital methods will establish themselves in historical scholarship in the coming years. They will not replace the traditional canon of methods, but rather complement it. Maybe in two decades, only historians of historiography will still be talking about “digital history”, precisely because by then histor(iograph)y will (also) be digital.

About the authors

(Prof. Dr.) Ulrich Fritsche

Ulrich Fritsche studied economics at Freie Universität Berlin (1996 Dipl.-Vw., 2003 Dr. rer. pol.) Freelancer at the HWWA-Institute in Hamburg in 1997, staff member at DIW Berlin (1998-2008) with focus on transition and developing countries, business cycle analysis and forecasting as well as empirical economic research and quantitative methods. In 2003, he worked at UNCTAD in Geneva. In 2005, he became an assistant professor and in 2009 a full professor at the University of Hamburg. Visiting positions at the IMF and KOF ETH Zurich. For several years, more intensive involvement with quantitative computer-assisted text analysis in research.

(Prof. Dr.) Mark Spoerer

Mark Spoerer studied history and economics in Bonn (1987 M.A., 1991 Dipl.-Vw.). Academic visits in Barcelona (1998) and Paris (2006, 2008-11). Since 2011, holder of the newly created Chair of Economic and Social History at the Institute of History of the University of Regensburg. Since 2017 chairman of the Gesellschaft für Sozial- und Wirtschaftsgeschichte (GSWG), since 2018 managing editor of the Vierteljahrschrift für Sozial- und Wirtschaftsgeschichte (VSWG). Research interests include economic, business, and social history of Germany and Europe since the late 18th century.

Published Online: 2023-04-15
Published in Print: 2023-05-25

© 2023 Ulrich Fritsche/Mark Spoerer, published by De Gruyter

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Downloaded on 10.12.2023 from https://www.degruyter.com/document/doi/10.1515/jbwg-2023-0001/html?rand=4331
Scroll to top button