The Archaeological Distribution of the Cuneiform Corpus

: The present study offers a first comprehensive, quantifiable overview of the geographical extent and scale of the cuneiform corpus. Though one of the oldest and longest-lived scripts in history, the sheer size of this corpus, being among the largest discrete bodies of written source material from the pre-modern world, is seldom properly appreciated. We review and evaluate past quantitative assessments of the corpus and current levels of catalogue digitisation and integration, pointing to gaps in general catalogues and principal issues relating to the quantification and interrogation of textual sources at the corpus-level. Combining a newly developed open access spatial index of c. 600 locations from across Europe, Asia, and Africa where cuneiform texts have been found with a quantitative survey of reported finds from scholarly literature, we then proceed to discuss the formation of the cuneiform corpus as an archaeological artefact. Aided by an extremely broad diachronic and diatopic outlook on a uniquely large body of written source material, this study offers an innovative and novel perspective on written corpora as archaeological artefacts.


Introduction
...und hat sie entweder als naiv und laienhaft abgetan oder als Antwort nur ganz vage von einigen hunderttausend Texten dahingemurmelt?(Streck 2010: 36) Known inscriptions from the cuneiform world, retrieved through the scientific excavation and clandestine digging of archaeological sites found across the Middle East over the last two centuries or so, are said to number anywhere from a couple of hundred thousand to several million artefacts (for various estimates, some of which are discussed in the following, see e.g.Pallis 1956: 185-187;Kramer 1962: 299-301;Van De Mieroop 1999: 10-13;Peust 2000;Streck 2010;Reade 2017: 163;Michel 2020: 25).Trivial as they may be, the more serious perusal of such figures is curiously absent not only from the vast majority of introductions to the study of the cuneiform world, but also from related disciplines and comparative research.Authoritative studies on the emergence and spread of comprehensive quantitative survey of the geographical distribution of cuneiform inscriptions in archaeological terms.In our discussion, we review some key points arising from our analyses, centring on the characteristics of spatial distribution in particular.The thrust of our argument focuses on the question of corpus representativity and authority, an attempt to offer a first measure of the intensity of the light that our sources shine on the past.

Metrics of Corpus Scale
We should begin first by defining the units of measure, and some issues of nomenclature.We use 'inscription' in the following with reference to the inscribed artefact as a delineable, physical object.By 'text', we refer to the composition as an entity, not its physical medium.The traditional use of the latter term in cuneiform studies is largely interchangeable with the use of 'inscription' as applied in a range of related epigraphical subdisciplines, where 'text' is used in reference more specifically to the philological element, i.e., the actual writing (for a discussion of this ontology in formal terms, see, e.g., Avanzini et al. 2019: 5-7).A thorough review of this semantic conundrum as it applies to the study of cuneiform is beyond the remit of this study (see Michel 2021 for an insightful review).In using 'inscription' in the following, we simply wish to recognise a need to distinguish clearly between the text and its medium.
As cautioned above, estimates of corpus size are-or at least should be-subject to considerable qualification, and considered with reference to several parameters.Singular metrics are certainly poorly equipped to convey the essential empirical differences between the Natural History of Pliny the Elder and any inscribed Egyptian scarab, both of which would produce the same value-1-if counting inscriptions, but wildly different valuesupwards of 1,000,000 against 1-if counting words.The former emphasises materiality and place, the latter lexical wealth and information content.Measures of corpus size are, in other words, dictated by the divergent empirical and epistemological positions separating subfields more inclined towards epigraphy (the study of inscription) and philology (the study of text), respectively.Corpus word count may be a more meaningful yardstick when assessing the prevalence of Greek or Latin corpora, as both hold several extraordinarily large textual compositions that are duplicated across a large number of manuscripts.As repetitions of the same textual entity would inflate the statistically relevant number of unique attestations of discrete works or of a single word, word count metrics are particularly relevant for assessing the relative prevalence of linguistic entities in a corpus (Peust 2000: 252).This approach, therefore, also emphasises an etic attitude towards philological and linguistic analyses, in which language takes precedence over materiality.
Corpus inscription count, on the contrary, is a very common metric when dealing with corpora such as cuneiform, Runic, or Old Arabic.Here, the individual inscription is almost always unique and often carries a very tangible, typically archaeological, provenience, allowing us to place the creation, use, and deposition of the inscription in place and time with an oftentimes high level of spatial and temporal accuracy.Introducing a word count metric here is possible in theory, if mostly barred by lack of properly refined data in practice.Word counts are also hampered by issues of definition at the micro-level: how to isolate discrete words, how to approach fragments carrying only a grapheme or two, and so on.The ability of inscription count metrics to more meaningfully relate the prevalence of writing as a part of a given material culture horizon makes them more immediately relevant to the indexing and evaluation of archaeological data, however.
Different metrics, in short, cater to diverging disciplinary outlooks.Word count may provide a fuller appreciation of the evidential sample available to students of languages, as well as a more accurate measure of the full extent of textual information nested within a corpus.Inscription count is likely to provide us with a better impression of the spatial, chronological, and ultimately social permeability of writing from a material perspective, when accounting for physical and cultural processes of preservation and decay.Measuring the number of inscriptions is a more reliable vector for assessing the prevalence of writing within material culture horizons of a given society, as can be inferred from several studies of the production and consumption of texts in more recent historical periods where the number of discrete works, rather than their content, is the adopted unit of measure (Buringh/ van Zanden 2009;Xu 2013).To drive home this point, the position of book production as a measure of learning, education, and economic development worldwide has been central to policymaking of various agencies of the United Nations ever since the end of the Second World War (Giton 2016: 51-53).

Measures of the Cuneiform Corpus
The basic definition of 'cuneiform', as the writing system has been known to European scholars since the seventeenth century CE CE (Pallis 1956: 18-27), is a type of writing produced by the pressing of a stylus into a surface of damp clay.This action produces wedge-shaped elements (cuneus is Latin for 'wedge') that are combined to form signs of varying degrees of complexity.Even if their appearance remains firmly tied to the physical properties of clay, cuneiform signs were used on wax writing-boards, incised on stone and metal surfaces, incorporated as a graphic element in frescoes and glazes, and even, though rarely, written using ink (for examples of the latter, see Finkel/Taylor 2015: 87).As a particular writing system, cuneiform is attested from around 3,200 BCE BCE until c. 80 CE CE, serving as the carrier of a wide range of Semitic, Indo-European, and isolate languages, including Sumerian, Eblaite, Akkadian, Hurrian, Urartean, Hittite, Luwian, Palaic, Hattic, Elamite, and Old Persian (augmented from Edzard 1976Edzard -1980: 545): 545).To these are added, due primarily to scholarly convention, the geographically and historically proximate Proto-Elamite and Linear Elamite scripts from western Iran, even though these constitute clearly separate systems of writing.Finally, one should add the transitional case of Ugaritic or alphabetic cuneiform from the Eastern Mediterranean coast, where a selection of cuneiform signs has been deployed in the writing of alphabetic values.
Quantitative surveys of this corpus are few and far between, even if its scale, richness, and diversity is widely acknowledged by specialists (Van De Mieroop 1999: 11).This lack may, to some extent, be ascribed to the rapid growth of the corpus from archaeological excavations during the first half of the twentieth century CE CE.Next to copies and casts, the nineteenth century initial deciphering and early study of cuneiform drew on a limited number of physical inscriptions acquired by European antiquarians and Western travellers to the Middle East.An inventory of these sources might, even at the turn of the century, still be presented with at least some reasonable claim to being exhaustive (Fossey 1904: 65-79).The onset of several large archaeological excavation projects around this time, accompanied by a surge in the acquisition of cuneiform texts from the antiquities market by museums and collectors, radically altered this state of affairs (Kramer 1962: 300-301).Just fifty years later, reviews of the research history of cuneiform studies could provide only approximate, if breathtaking, estimates of the number of inscriptions unearthed.The, by no means comprehensive, survey of the number of inscriptions from major sites summarised in Pallis' retrospective Antiquity of Iraq gives a total of close to 200,000 inscriptions (1956: 185-187; see also Schmökel 1955: 122 for a similar number).A cursory overview by Samuel N. Kramer, certainly one of the most pre-eminent cuneiform scholars of the twentieth century, provided largely similar figures, followed by a tentative estimate of the entire corpus as numbering perhaps half a million inscriptions (Kramer 1962: 299-301; for an earlier appearance of the same figure, see Neugebauer 1952).This estimate, cavalier in its inception as it may seem, remains widely cited to this day (see, for example, Roaf 1990: 14;Bottéro 2001: 22;Fink 2020: 137) In 2000, as a voluminous aside to a book review, the Egyptologist Carsten Peust assembled and evaluated basic numerical estimates of corpus size for a number of scripts from the ancient world in order to assess the relative strength of each corpus in terms of lexicographical authority.The basic metrical premise for Peust's discussion was the number of words in all unique compositions available from currently published texts (2000: 252-253).As such, his survey only indirectly touches on estimates of corpus size as derived from the number of discrete inscriptions, for which, as far as cuneiform was concerned, there was still little but Kramer's estimate of c. 500,000 records to quote.The results are illuminating and important, however, in that they embody a first attempt at quantitatively comparing a number of written corpora from the ancient world through the formal definition of basic variables and units of measurement.Drawing on this study, a seminal paper published by Michael P. Streck in 2010 offered a first detailed and formalised survey of the number of cuneiform inscriptions known from excavations and museum collections around the world.Providing an overall figure for the entire corpus hovering around 550,000 individual inscribed objects, fragments, and seals, this survey also included an estimated total number of words contained within the corpus, modelled on counts from select subsets of texts (Streck 2010).The estimated total of some 15 million words, if juxtaposed with the calculations presented by Peust, exceeds all other major corpora of the ancient world, except Greek.
Looking into the near future, open access digital resources are certain to become the principal authorities for qualified assessments of corpus size.In the case of cuneiform, the catalogue of the Cuneiform Digital Library Initiative (CDLI),6 the canonical index of cuneiform inscriptions worldwide, currently holds metadata records on some 350,000 unique inscriptions.Repositories devoted to specific spatial or temporal transects of the corpus may contain comparatively fewer records, but typically display higher degrees of consistency in terms of accuracy and detail-although a formal evaluation of data quality across digital repositories in cuneiform studies remains to be seen.The Database of Neo-Sumerian Texts (BDTNS),7 devoted to the administrative records of the twenty-first century BCE BCE Third Dynasty of Ur, now includes in excess of 100,000 records.ARCHIBAB,8 focusing on second millennium BCE BCE Syria and Iraq, has recently passed 35,000 records.The gradual consolidation of such repositories into a comprehensive index of all cuneiform inscriptions, and their further augmentation with more elaborate resources for spatial, chronological, and artefactual metadata will eventually provide students of cuneiform with an immense and unique empirical base for large-scale data analysis.As will be demonstrated in the following, digital resources still display considerable gaps and sample bias, and not only because troves of tablets remain locked away in museum basements and storerooms around the world.

Exploring Distributions of Writing
Considering the current state of digitisation of text catalogues and the curation of associated metadata collections, large-scale analyses of textual corpora from an archaeological perspective remain surprisingly rare.Where researchers have indeed attempted to explore digital conversions of inscription catalogue data into spatially sensitive studies of the ancient past, results have proven both promising and insightful.An illuminating approach employed the spatial distribution of finds of dated display inscriptions to detect spatial polity contraction in the eighth to tenth century CE CE Maya Lowlands (Ebert et al. 2012; also Kennett et al. 2012: 791), demonstrating the ability of the textual record to relay social currents in a spatial dimension.Research on epigraphic production within the Roman Empire has fostered several studies on spatial and chronological variation in the distribution of writing at the level of individual provinces or by comparing data from different parts of the empire (e.g., MacMullen 1982;Meyer 1990; Nawotka 2021).More recently, researchers have used a curated version of the catalogues of the Epigraphische Datenbank Heidelberg9 and the Epigraphik-Datenbank Clauss/Slaby,10 totalling more than 600,000 inscriptions, to evaluate corpus characteristics of Latin epigraphic writing from Western Europe over time and space.The resulting distributions demonstrate clear trends in the production and consumption of writing over a period of many centuries, including regionally specific spikes in epigraphic activity and changing prominences of public and private inscriptions from province to province (Heřmánková et al. 2021: 168-174).Furthermore, the combined analysis of two temporally and spatially overlapping data collections brings out sampling biases not clearly recognised otherwise (2021: 69-166).Such studies should serve as exemplary applications of the vast digital resources now available for analysis within a spatial environment.
Novel as such approaches may seem from a philological standpoint, they are commonplace in closely related disciplines.Studying spatial patterning of material culture has a long and distinguished track record as a means of understanding and analysing archaeological data in a geographical and relational frame (Hodder/ Orton 1976;Conolly/Lake 2006;Gillings et al. 2020).The ability to integrate and analyse vast amounts of digital data in a spatial dimension, so characteristic of approaches developed in archaeological research in the Middle East over the last couple of decades, offers ample demonstration of the potential insights that can be gained from a consideration of spatial patterns in the distribution of all types of material culture.The concurrent expansion of analytical breadth and depth, especially within a comparative frame of analysis, has spawned a wide range of studies on broader societal dynamics, reaching across eras and continents (Kintigh et al. 2014;Smith et al. 2012;Wright/Richards 2018).
Cuneiform inscriptions, when considered as manifestations of material culture, hold particular qualities that should make the relevance of broader, corpus-level analyses sensitive to diachronic and diatopic patterning abundantly clear.As the cuneiform script has not been in active use since c. 80 CE CE, and remained largely undeciphered until the mid-nineteenth century (Larsen 1996), the vast majority of cuneiform inscriptions belongs to an archaeological reality.They derive from stratified archaeological contexts on a par with potsherds, bones, and soil samples, and the same level of accuracy that can be assigned to such types of material can also, in theory, if not always in practice, be assigned to a cuneiform tablet.Such a statement is not made in ignorance of the drawbacks presented by a long history of looting and smuggling of cuneiform artefacts, nor the manifold problems of provenance assessment and archaeological contextualisation produced by this state of affairs.But it should be generally accepted by all that virtually all cuneiform inscriptions known to us derive from archaeological strata or standing monuments, from which they have been removed relatively recently.As with epigraphical corpora more generally, the preservation and discovery of cuneiform inscriptions is then subject in the first instance not to the whims of the archivist or librarian, but to the workings of the spade (or pick, rather).Even when dealing with the largely undocumented and entirely clandestine retrieval of inscriptions, this basic premise underlies the many, and often successful, efforts by researchers to follow inscriptions back to their archaeological origin (e.g., Tsouparopoulou 2017).Where a comprehensively documented archaeological context of a cuneiform find is available to us, the range of potential insights on the production and consumption of writing in the ancient world is vastly expanded.Courtesy of its preferred physical medium, the nature of inscriptions deposited and preserved is also special.Consider, for example, the excellent study by Sauvage (1995) on the dynamics of tablet production, deposition, destruction, and reuse at eighteenth century BCE BCE Ḫarādum, demonstrating how cuneiform archives may habitually include inscriptions that have survived a great many tribulations.The study of patterns in the deposition and discovery of cuneiform inscriptions in larger assemblages has been pioneered by Pedersén (1998;consider also van Soldt 1991), reflecting a close relationship with the past use context of writing, in archives, libraries, and similar settings.In all, the cuneiform corpus displays a fairly unique level of archaeological grounding, preservation abilities, and abundance and diversity, windows towards a deeper understanding of broader patterns of ancient writing that can be fruitfully exploited within a spatial dimension.
Despite their inherent ties to the archaeological record, more ambitious analyses of the spatial distribution of inscriptions have not previously been attempted in cuneiform studies, due to inconsistencies in geographical coverage and poor standardisation of data collections as far as the working georeferencing of inscribed finds is concerned (Rattenborg et al. 2021a).There is no questioning the great service done to the field by numerous digitisation initiatives over the last few decades, yet an evaluation of spatial coverage of current online catalogues will demonstrate noticeable gaps in resources available for large-scale analysis of corpus composition and distribution.This is a consequence of current emphases on curating, editing, analysing and disseminating cuneiform inscriptions in the digital sphere, being biased towards larger textual assemblages from major archaeological sites.There are tangible reasons for such priorities, to be sure.From a philological point of view, it is rather pointless to argue the superior importance of an assemblage of tens of thousands of inscriptions over an assemblage of one or two.Even so, the lack of a comprehensive understanding of the spatial distribution of cuneiform inscriptions will leave us poorly equipped to address questions relating to the overall prevalence of writing in the cuneiform world at a general level, and to extrapolate from there to more fundamental aspects of social history, such as literacy, recording, knowledge production, etc.In offering a comprehensive overview and provisional evaluation of the geographical distribution of cuneiform finds, this study offers a first step towards spatially sensitive approaches to the cuneiform corpus.

Methodology
Our approach is founded on the integration and evaluation of two related data sets, both assembled as part of Geomapping Landscapes of Writing (GLoW), a research project hosted by the Department of Linguistics and Philology of Uppsala University and funded by Riksbankens Jubileumsfond (grant number MXM19-1160:1).The first comprises a geospatial index of locations with reported finds of cuneiform inscriptions at site level, counting al-most 600 records spread across the wider Middle East and adjoining regions.The second constitutes the data register of a related survey of secondary literature, which provides an estimate of all inscriptions found at each archaeological locale for an aggregate total of c. 430,000 inscribed artefacts.When combined, these data sets then provide a basic, quantifiable and global overview of archaeological finds of cuneiform inscriptions from all periods of the history of the script.Below, we review the formal conventions guiding the assembly of the respective resources, as well as some characteristics of the acquired data relevant for our analysis.

Geospatial Index
The geolocation of sites with known finds of cuneiform inscriptions is retrieved using the most recent version (v.1.6, 1 July 2023) of the Cuneiform Inscriptions Geographical Site (CIGS) index (Rattenborg et al. 2021b).This resource provides point vector centroids and associated attribute data for close to 600 discrete locations across Europe, Africa, and Asia where cuneiform has been found (Fig. 1).At the time of writing, the CIGS index constitutes the most comprehensive geographical overview of the corpus available, more than doubling the number of geolocated records retrievable from the provenience index of the CDLI and the Pleiades data subsets deployed by various Open Richly Annotated Cuneiform Corpus (ORACC) 11 projects, for example Ancient Records of Middle Eastern Polities (ARMEP). 12Past and current versions of the data set are deposited with the Zenodo 13 research data repository maintained by Europe OpenAIRE and available for reuse under a CC-BY 4.0 licence. 14A more in-depth description of the data set, as well as suggestions for its application in research and data visualisation, has been provided elsewhere (Rattenborg et al. 2021a), and we will only review specific variables employed in the following analyses in more detail here.Individual archaeological sites mentioned in the following are followed by their three-letter CIGS identifier, given in parentheses.Records included in CIGS refer to geographical locations with finds of cuneiform inscriptions.Locations have been established from the availability in printed sources or digital catalogues of any inscribed artefact carrying a catalogue identifier, for example, an excavation or museum number, regardless of the extent or detail of associated metadata.Aspiring to a high level of empirical transparency, the index does not include locations with reported finds of cuneiform, but without discretely documented artefacts.Thus, purported bricks with inscriptions 'in the cuneiform character' reported from Balḫ and Farah in Afghanistan in the mid-nineteenth century (Ferrier 1856: 207 and 393-394) are not included, as the account provides no identification of any one unique inscribed artefact.Included, in contrast, are discrete, if only provisionally documented inscriptions, e.g., an illustration and preliminary translation of an inscribed brick of Sîn-aḫḫē-erība (Sennacherib) from Qalʿat ʿAwayna (AWA) in northern Iraq (Layard 1853: 225-226;Furlani 1934: 125).
The index does not include proveniences defined on the basis of a conceptual historical, but geographically undefined place alone.Though many provenience indices available from online digital text catalogues within the disciplinary domain make little distinction between a historical place and a geographical location, their verification is subject to very different criteria, and should be considered entirely different entities. 15As historical places are relational and geographical locations are physical entities, they are also not fully compatible in analytical terms.In the present study, for example, we augment the publicly available version of the index with polygon vector data for proveniences that can be traced on satellite imagery, to establish a basic variable for surface extent of places where cuneiform has been found.Generating this type of spatial information for a historical place without a known physical location is not possible.Some cases where the association of a historical toponym and an archaeological feature may still be debated, but with a relatively high degree of certainty as to the whereabouts of the provenience on a more general level, are included, for example in the inclusion of ancient Garšana (GRS) as modern Tall Baridīya (Molina/Steinkeller 2017).
Finally, the index does not reflect knowledge of the primary or secondary historical context of an inscribed artefact, only its archaeological origin.As such, the likely origin of the Law Stele of Ḫammurabi, namely Sippar (SAP) in southern Iraq, will not be included, only its place of archaeological discovery, namely the city of Šušan/ Susa (SUS) in southwestern Iran, where it was taken by a triumphant Elamite army in the fourteenth century BCE BCE (Roth 1997: 73).At a more mundane level, the index will also not seek to qualify finds of inscribed bricks or stone fragments that may theoretically have been brought to their archaeological origin in a more recent era as building material (e.g.finds of baked bricks at small sites in the Iraqi alluvium, cf.Adams/Nissen 1972: 217; or the inclusion of a royal stele from the Sîn Temple in Ḫarrān (HAR) in the building of a town house at nearby Eski Harran (EHA), cf.Pognon 1908: 1-14; Rice 1957: 469).

Assemblage Estimates
We join the CIGS geospatial index with a tabular data set providing overall assemblage estimates, or the total number of cuneiform inscriptions derived from individual archaeological locations, referred to here as CIGS-AE (Smidt et al. 2023).This data set has been compiled by Gustav Ryberg Smidt from 2020 to 2021, with subsequent updates by GLoW project staff based on additions to the working version of the CIGS index.The version employed here complements records included in CIGS version 1.6 (1 July 2023), thus providing estimates for an overall c. 430,000 inscriptions and fragments of inscriptions reported from close to 600 archaeological locations, as well as bibliographical references for these figures.This data set, including a complete bibliographical index of sources cited, is deposited with Zenodo and available for reuse under a CC-BY 4.0 licence.
In building this data set, we have taken as our basic entity of study the material, inscribed artefact as it exists archaeologically.This means that scholarly estimates of, or attempts at assigning individual fragments to, a notion of an original document have been ignored to maintain a uniform data set.Very few reliable estimates of the original number of documents from any one archaeological location are available from the general literature, and the vast majority of reports offer no information on such matters.Readers are encouraged to consider discussions of the ratio of complete documents to inscribed fragments in various regions and periods (Archi 1986: 78-79;Zimansky 2004: 316), but extrapolating from such figures is much too speculative an exercise to be pursued here (as noted also in related studies, e.g., Pedersén 1998: 6).This will also become evident from the discrepancy observed between estimated totals found in the literature-and included in the present survey-and corresponding numbers retrieved from digital catalogues concerning some major assemblages, for example the c. 11,000 records reported for Ebla (EBA) (Archi 1997: 184) as opposed to the more than 3,000 edited tablets currently included in the Ebla Digital Archives. 16The former figure includes a substantial number of fragments that will eventually dissolve into a smaller number of discrete inscriptions (Scarpa 2021: 3).
As we are concerned with the material, inscribed artefact that exists in the archaeological record, assemblage estimates will consider only the number of artefacts excavated (scientifically or clandestinely) at a given site, not the number of artefacts that can be expected to be retrieved from a given site (cf.Peust 2000: 252).It is, theoretically at least, possible to roughly calculate the number of inscribed bricks from Čoġā Zanbīl (COZ) (Ghirshman 1966: 13) or the number of inscribed ashlars included in the construction of Sîn-aḫḫē-erība's aqueduct at Ǧarwāna/Jerwan (JRW) (Jacobsen/Lloyd 1935: 19-27), but applying such conventions across the entire data set is not feasible.Even when referring only to the number of inscriptions found at any one site, numbers are bound to fluctuate considerably.Where multiple and significantly diverging estimates are available from the secondary literature, we have to the extent possible sought to evaluate individual estimates in light of the pertinent research history and the empirical basis of the figures suggested.For a great many assemblages, the figure given is not contested.For others, let us take Umma (JOK) as an example, aggregate numbers of inscriptions listed in encyclopaedic site biographies (Waetzoldt 2014(Waetzoldt -2016) ) may deviate from the sum of records contained in relevant digital catalogues (Molina 2008: 52 and more recent counts from BDTNS).To maintain formal data collection consistency, we have generally sought to give preference to figures provided in peer-reviewed print publications where these did not significantly disagree with numbers available from dynamic digital resources.In the few cases, as with Šušan/Susa (SUS), for example, where no updated printed survey of inscription finds is available, we have relied on aggregate figures from digital catalogues, accepting that these are dynamic resources that are unlikely to present an exhaustive overview of inscriptions known from a single site at the present stage.With all of these caveats duly noted, individual data points in the resulting index are certain to draw comments from experts more familiar with their particular characteristics than the present resource may aspire to be.

Analyses
Applying and linking these two data sets allows us to review the relative scale of cuneiform finds at site level and the geographical distribution of cuneiform finds with reference to location, site size, and assemblage size.At a more general level, the assembled resources will also enable a consideration of certain characteristics of spatial density of find-spots and the number of inscriptions found.This approach provides a basic statistical overview, in a quantitative as well as spatial sense, against which we may evaluate current atti-tudes towards the number and prevalence of cuneiform inscriptions as they appear in the archaeological record.The provisional nature of this exercise should be emphasised, however.As our quantitative data is based on estimates of the overall number of cuneiform inscriptions retrieved from individual archaeological sites, it follows that significant numbers of the artefacts encompassed by these figures have not been subjected to proper metadata indexing, much less edited and published.Accordingly, the data sets presented here hold no chronological dimension, nor will they offer any information on distributions of material, script, language, or genre.With these limitations in mind, we may turn to consider aspects of data quality, accuracy, and coverage of the data employed, prior to discussing statistical impressions arising from their analysis.

Spatial Accuracy and Certainty
As noted elsewhere (Rattenborg et al. 2021a), records included in the CIGS index are assigned a level of locational accuracy reflecting the degree of certainty with which their position can be established.Individual accuracy levels are listed and described in the table below (Table 1).The highest, level 3, indicates an archaeological feature, typically a settlement mound, that can be accurately traced from satellite imagery, and for which the corresponding point vector contained in the index has been derived from an associated polygon vector drawn around the archaeological feature.The next, level 2, is a representative point without a defined surface extent, typically a submerged site, for example Tall Bazmusian (BZM) now inundated by the Dūkan Lake, or the known location of a rock inscription like Ganǧnāma (GJN) in the Iranian Zagros, which can, however, not be drawn.The third, level 1, is tentative, indicating a horizontal margin of error of up to around 1,000 m, including the approximate find-spot of a tablet in a certain field, for example at Hasankeyf (HSK) in central Turkey, or the approximate location of a poorly documented rock inscription.The fourth, level 0, indicates that the record in question relates to a discrete archaeological site, but that the geographical location of this site is not known with any meaningful degree of precision.These typically appear in museum registers, and ultimately derive from antiquities dealers, e.g.Zaʿala (ZAA), a mound located somewhere on the southern outskirts of modern Baġdād (Reade 1987).It should be emphasised that point vector data for certain and representative locations are largely commensurable in terms of the level of locational certainty implied.They differ mainly in the way in which they have been defined, the former automatic, the latter manually.As such, close to seventy-five per cent of records contained in the data set can be considered accurately located.If turning to the distribution of cuneiform inscriptions across these categories as derived from the CIGS-AE survey, certain locations account for 99.2 per cent of all inscriptions included in the data set, with around 0.4 per cent of inscriptions deriving from representative and tentative locations respectively.Locational accuracy is impacted by the relative prominence of certain types of finds, which will become evident when comparing distributions for the four modern countries with which most records in the data set are associated (Fig. 2).The higher numbers of individual, open-air inscriptions (more on which below), which are typically poorly recorded as far as accurate information on their archaeological origin is concerned, cer-tainly impact distributions for Turkey and Iran.A comparatively higher number of finds from much more easily delineated settlement mounds are present in distributions from Syria and Iraq.

Levels of Coverage
The CIGS geospatial index and the associated assemblage estimates contained in the CIGS-AE file are the results of a thorough survey of specialist literature and online data collections conducted over the course of three years.To review and contextualise levels of global coverage of these indices relative to the data collections of existing digital repositories in the field, the two maps presented below (Fig. 3) compare the geographical distribution of records in the CDLI catalogue-for which an archaeological provenience can be either securely or tentatively established-with the distribution of estimates contained in our survey.Inscription counts from the former resource are drawn from a catalogue dump dated 8 August 2020, reflecting the state of the CDLI catalogue prior to the initiation of data sharing with the GLoW project, which is still ongoing.As will be readily apparent, the former resource exhibits considerable gaps in coverage and comprehensiveness when compared to that of the CIGS and CIGS-AE indices.The divergence applies primarily to smaller finds from peripheral areas, as the latter includes a large number of archaeological locations with finds of one or a couple of cuneiform inscriptions.As previously noted, there are reasonable explanations for such a bias, but it nevertheless remains an important factor to consider when addressing research questions contingent upon geographical distribution.In addition to the exhaustive geographical coverage of the CIGS index, it is worth noting here the relative agreement of overall estimates of corpus size available from the CIGS-AE index and related surveys, namely the corpus overviews assembled by Pallis (1956: 185-187), Kramer (1962: 299-301), andStreck (2010).As noted above, the aggregate totals of approximately 200,000 inscriptions suggested by Pallis and the estimated c. 500,000 inscriptions proposed by Kramer find further confirmation in the much more detailed overview provided by Streck, which arrives at a total of 530,000 inscriptions and fragments.It is worth emphasising here that the CIGS-AE index has been assembled based on estimated totals from the scientific literature, rather than catalogued artefacts, meaning that while this data set ignores unprovenanced inscriptions found in public and private collections around the globe, the figures provided will include, in theory at least, uncatalogued artefacts from archaeological sites.

The Geographical Extent of the Cuneiform Corpus
Let us consider the basic spatial statistics of the assembled data set.Finds of inscriptions in cuneiform or related scripts included in the data set are found across Europe, Asia, and Africa, and the territories of 24 modern nations (Fig. 4).17While the vast majority of the locations included here are situated within Iraq, Turkey, Iran, and Syria, peripheral finds extend over a much wider geographical zone.Cuneiform occurs across an area reaching from the Central Mediterranean to Eastern Afghanistan, and from the steppes of Romania and Southern Russia to the Libyan and the Arabian Desert.A distance of roughly 5,000 km separates the westernmost find, an inscribed bronze vessel discovered in a seventh century BCE BCE tomb in Falerii (FLO) in Central Italy (Cristofani/Fronzaroli 1971), from the easternmost find, an Achaemenid silver piece bearing parts of an Elamite inscription found in a coin hoard in a Kābūl suburb (KBL) in eastern Afghanistan (Curiel/ Schlumberger 1953: 41 and III 12;also Hulin 1954).There are close to 3,000 km between the northernmost find, an inscribed alabastron of Ŗtaxšaça (Artaxerxes) I found at Novyy Kumak (NKM) on the Ural River (Trejster 2012), to the southernmost, a small stone portable with an Achaemenid inscription from Idfū in Upper Egypt (Michaélidis 1943: 96-97).
This geographical distribution certainly underscores the pre-eminence of Iraq in the study of cuneiform cultures.More than two thirds of all inscriptions, or upwards of 300,000 of the records included here, were found in Iraq, dwarfing the many tens of thousands reported from archaeological sites in neighbouring Syria, Turkey, and Iran.Together, these four countries account for some eighty per cent of all known locations with cuneiform finds, and more than ninety-nine per cent of all inscriptions (Fig. 5).In quantitative terms, finds from other countries are many times smaller, and often attributable to haphazard discoveries or a limited range of artefact types.The c. 400 inscriptions reported for Egypt, in fifth place, for example, derive almost exclusively from the famous Late Bronze Age correspondence of al-ʿAmārna (AKH).The c. 200 inscriptions from Armenia, in sixth place, are predominantly display inscriptions, for example, stelae, rock faces, and metal implements.But then again, a comparatively high number of finds of small numbers of inscriptions across archaeological sites in Israel, Palestine, and Lebanon seems to replicate assemblages found in Syria and Iraq, though on a much more modest scale.Looking beyond the geographical outline and primary concentrations of cuneiform inscriptions, liminal cases turn up even further afield.Three fragmentary display inscriptions found in some sixteenth century housing foundations in central London in 1890 (Fig. 6) probably reached Great Britain as ballast of a merchant vessel or the occasional souvenirs of a curious traveller (Evetts 1891), and well illustrate the possible reach of the corpus prior to the initiation of large-scale trade in cultural artefacts in the late nineteenth century.A host of debated strays can also be found in the literature.An inscribed Old Babylonian cylinder seal-used as an amulet in more recent times-held in the collections of Nagpur Central Museum (Suboor 1914), has been suggested, without further evidence, to derive from somewhere on the central Indian subcontinent (Lal 1953: 101).Incisions on a silver piece from Mōhenǧō Dārō / Mohenjo Daro originally suggested to be cuneiform characters, and occasionally presented as such in the literature (Kosambi 1941: 395-398;see, e.g., Dhavalikar 1975;Goyal 1999: 130), have not been confirmed by specialists (Marshall 1931: 519).The reported mid-twentieth century discovery of an inscribed hexagonal cylinder seal from a Roman legionary camp on the Austrian Danube (Swoboda 1964: 275) has since been revised, as the purported inscription is unconvincing (see Dembski 2005: pl. 130).To these may be added a host of peculiar finds from Western Europe and North America (see discussion with further references in Finkel 1983), all generally interpreted as the misplaced findings of Western antiquarian interests of the late 18 th and early 19 th centuries.While such a stance seems entirely logical for most cases, provenance histories are far from always easy to disentangle (for a colourful example, see Rattenborg 2023).
Inscriptions found in a clearly secondary context, but-theoretically, at least-in relative proximity to their origins, have been maintained in the current data set and are worth noting, the more so because they demonstrate an aspect of the history of cuneiform writing that has received surprisingly little attention in the literature, namely the keeping of cuneiform inscriptions as relics of local communities in more recent history (see Verderame 2020: 228 and note 63 for a raising of similar points).A most illustrative example here are the Babylonian display inscriptions incorporated into the early construction of the great mosque in Ḫarrān (HAR) in southern Turkey.Here, three large stelae of Nabû-nāʾid/Nabonid (r.556-539 BCE BCE), ostensibly taken from the much-famed temple to the Babylonian moon god Sîn, were intentionally embedded face-down in the pavement at each of the three main gateways leading into the mosque courtyard (Rice 1957: 468-469;Gadd 1958: 35), illustrating an intriguing dialogue between past and present systems of belief.The use of cuneiform inscriptions as relics in a religious setting is seen also on Baḥrain, namely the so-called 'Durand Stone' acquired by British explorers in the eighteenth century, which was embedded in the inner sanctuary of a madrassa in the town of Bilād al-Qadīm (BLQ) in the northern part of the island (Durand/Rawlinson 1880: 193-194).A similar example comes from the fifteenth century tomb of Šāh Nimatullāh in the town of Māhān (MHN) in central Iran, where a ceremonial stone weight produced during the reign of Dārayavauš/Darius the Great (r.522-486 BCE BCE) was first reported in the late 1850s (de Gobineau 1864: 323; also Weissbach 1910: 481).Many more possible examples of such engagements with the material remains of the past are on display in the highlands of eastern Turkey and Armenia, where fragments of a substantial number of Urartean display inscriptions have been found incorporated into the buildings of mosques and churches (examples are too many to mention here, but consider, e.g., Schulz 1840: 299 no. 38;examples in Salvini 2008: 55-64).Although the precise context of their inclusion in such structures is not always possible to re-enact, the often prominent placement of inscribed pieces seems too regular to be a mere coincidence.Looking beyond inscriptions with an affixed archaeological provenience, inscriptions transformed into portable amulets also occur and accentuate comparable nodes of memory and veneration.A particularly illustrative example is the unprovenanced twenty-second century BCE BCE stone foundation plaque of Gudea carrying a much later ʿUmayyad incantation in Kūfī script (George 2011: 19-20 and pl. XI).The use of engraved and/or inscribed cylinder seals (including examples from India and Austria, cf. this section, above) may equally well be seen as a conscious engagement with relics of the past.The engagement of later local tradition with cuneiform inscriptions, regardless of whether these could be read or not, closely resembles popular veneration of inscribed items by village communities in Medieval Western Europe (Moreland 2001).While outside the scope of the present study, these brief points should serve as a manifest reminder of the cultural meaning of these inscriptions also for more recent inhabitants of the Middle East.The veneration bestowed upon such items historically adds a further facet to discussions of the diaspora of inscribed artefacts now found in museums around the world.

Assemblage Size Distribution
Whereas the above concerns the discovery of cuneiform inscriptions without reference to their number, the distribution of assemblage sizes, namely the overall number of inscriptions found at any one archaeological site, adds further nuance to the spatial characteristics of the corpus.Examining such distributions necessitates qualification, however.As previously noted, the data set presented here includes no chronological dimension, and as such reflects an archaeological reality, not a historical record.The number of inscriptions retrieved from any one archaeological site may represent the aggregate of just one or a multitude of specific historical events, and be the product of archaeological excavation or illicit looting conducted over anything from an afternoon to several generations.As a provisional overview, assemblage distributions do, however, bring out certain tangible patterns (consider related visualisations appearing in print publications, e.g., Postgate 1994: Fig. 2:12; Van De Mieroop 1999: 12; Sauvage 2020: 2) To simplify the data, we employ a six-tiered binning of assemblage estimates included in the data set, as summarised in the present table (Table 2).When plotted (Fig. 7), it will be seen that a larger part of assemblages, in excess of sixty-five per cent, concern singular or a handful of inscriptions.Typical such examples include openair inscriptions and crafted artefacts, but our impression is that quite a significant number of assemblages also concern clay artefacts, namely tablets or sealings, relating to everyday acts of storing and transmitting communication in writing.The remaining thirty-five per cent of the data set exhibit a rather steady decline in the numbers of individual assemblages as the number of inscriptions attributed to each assemblage increases.This would suggest that the spatial prevalence of cuneiform writing is perhaps less appreciated than should be the case, a point that can be further explored with reference to the size of archaeological proveniences (more on which below).Such a bias is certainly hinted at in our review of the coverage and consistency of the CDLI catalogue previously discussed.A total of only sixteen archaeological sites have reported finds of more than 5,000 inscriptions each, including such well-known locales as Ĝirsu (GIR), Ḫattuša (HAT), Ninuwa/Nineveh (NNV), Nippur (NIP), and Umma (JOK) in the range of 30,000-40,000 artefacts each, and Kaneš (KNS), Mari (MAR), Uruk (URU), Pārsa/Persepolis (PRS), Tall ad-Duraihim/Drēhem (DRE), Ur (URI), Bābili/Babylon (BAB), and Ebla (EBA) in the range of 25,000-10,000 each.Tall Abū Ḥabba (SAP) southwest of modern Baġdād/Baghdad is the only site within the current survey to exceed 50,000 reported inscriptions.Even if the cumulative number of inscriptions from such immense assemblages accounts for the majority of finds of cuneiform artefacts, it does obscure a much more diverse distribution of writing in geographical terms.We can test the degree to which the relative prominence of different assemblage sizes as well as their distribution is entirely arbitrary by grouping subsets of records according to modern country (Fig. 8).Preliminary as it may be, this will enable us to sample the data using a variable which is at least partially sensitive to varying intensities of archaeological field research and cataloguing within different national territories (consider Hodder/Orton 1976: 20-29 for related applications).The resulting graph is interesting, as far as the overall trend in assemblage size distribution across four modern countries is concerned.The correlation between Turkey and Iran on the one hand, and Syria and Iraq on the other, would appear an expected outcome if considering the general prominence of singular display inscriptions, typically in stone, in the former areas against the more prevalent finds of cuneiform tablets, which tend to show up in larger numbers, in the latter.More generally, all subsets suggest a relatively consistent decline in the number of assemblages as the number of inscriptions of the individual assemblage goes up.

Surface Extent of Locations
Another angle provided by the presented data sets focuses on the relationship between writing and settlement size.The close association of writing and urbanism is a firmly established trope of many studies on cuneiform culture throughout the history of the script (see, for example, Van De Mieroop 1997: 215-226;Liverani 2013: 73-80).While such a dialectic may be relevant at a broader theoretical level (consider the enduring power of Childe 1950), it may also obscure a more divergent empirical reality.With regards to the present data sets, we can evaluate the relationship using polygon vector data giving the outline, or surface extent, of archaeological features as drawn from high-resolution satellite imagery (Rattenborg et al. 2021a).These are available for a little more than half of the locations included in the CIGS v.1.6index, namely all of those with a certain degree of locational accuracy (Table 3).A number of caveats relating to the overall authority and quality of this data should be noted in advance.The resulting vector data will provide only a rough approximation of site size and is likely to deviate considerably from results using more fine-grained methods of remote sensing and groundbased site survey (see for example Wilkinson et al. 2006;Casana 2020).To the extent possible, optical delineation has been guided by available site reports, but even so may overlook elements not visible to the human eye that would, if included, alter the established surface extent of the site.Furthermore, the defined surface extent will relate to the site as an archaeological feature, not to the extent of any one historical settlement, which is, in many cases, known to have changed considerably from one historical period to another.Without a notion of the general distribution of site sizes that have seen excavation, our ability to assess the regularity of the resulting distribution is severely limited.Juxtaposing the current data set with settlement hierarchies as defined through archaeological survey and research runs up against changing currents of social organisation over time and space, meaning that a more finely attuned evaluation of settlement hierarchies and the production and consumption of writing would require much more extensive and fine-grained data than what is currently available.As in the preceding section, we are reviewing an archaeological, rather than a historical, reality.As archaeological features go, the resulting subset covers a wide variety of site types (Fig. 9).While some only partially exposed features sit at 1,000-2,000 square metres, the smallest, fully delineable sites included in our data set extend over less than 0.5 hectares.Examples of the latter include the predictable remains of isolated monumental structures, such as the monumental gate discovered at Tul-i Āǧuri (JOR) just west of Pārsa/Persepolis (Basello 2017), the aqueduct at Ǧarwāna/Jerwan (JRW) in northern Iraq (Jacobsen/Lloyd 1935, 19-27) and similar features at Nigūb (NGB) (Davey 1985) and Ḫinis/Bāwiān (BVI).But there are also minuscule mounds with finds of inscriptions indicative of everyday practices of accounting or communication, for example Mezraa Teleilat (MZT) (MacGinnis 2018: 224) on the Turkish Euphrates, Tall al-Šiyūḫ Fawqānī (BUR) further downstream in Syria (Fales et al. 2005), and Tall Dāmiyā in the central Jordan Valley (Petit/Kafafi 2016, 24-25), all of which extend over one or two hectares at best.Quite substantial bodies of inscriptions, in the range of a hundred and up to several thousand, are retrieved from tightly packed settlements and modestly sized mounds.The former include, for example, Tall Ḥarmal (SDP) in south-eastern Baġdād (van Koppen 2006-2008: 488) and Ḫirbat al-Dinīya (HRD) on the Middle Euphrates (Joannès 2006) both at around two hectares.The latter comprise, for example, Tall Abū ʾAntīq (ANT), east of al-Naǧaf in southern Iraq, seemingly at a maximum of five hectares (Fahad 2019), as well as Tall Imlīḥiya (MLH) on the lower Diyāla, extending over perhaps 6 ha (Boehmer/Dämmer 1985: 3-5).The largest sites included are massive urban agglomerations, extending over 500-1,000 hectares, counting well-known metropoleis, such as Ninuwa/Nineveh (NNV), Kār-Tukultī-Ninurta (KTN), and Uruk (URU), with assemblages spanning a multitude of discrete finds, archival contexts, and periods.The renown of the latter cases notwithstanding, the overall picture provided by our data, as far as site size goes, suggests a noteworthy prevalence of cuneiform inscriptions also at more modest archaeological locales.In all, around fifty per cent of the c. 300 delineable sites included in this sample constitute features less than ten hectares in extent, thus in the range of hamlets and up to modestly sized towns.Issues of sample reliability, especially as these apply to the accurate definition of surface extent from commercial satellite imagery, should be duly considered here, as should the expected discrepancy between the extent of an archaeological site and that of any given historical settlement that it may represent.On the other hand, the traditional interests-of archaeologists as well as looters-in prominent and larger archaeological features is an equally relevant factor to include (Pedersén 1998: 240;Matthews 2003: 158).Based on the present subset, finds of inscriptions are certainly not confined to major urban sites.Quite the contrary, cuneiform is found at all types of settlements.The vast majority of the inscriptions available to us remain tied to major archaeological locales, however, as seen if plotting the number of inscriptions according to surface extent.Archaeological features with a surface extent in excess of 150 hectares account for twenty-five per cent of all cuneiform inscriptions.Eighty per cent of all inscriptions are found at sites exceeding fifty hectares in surface extent (reflected also in the distribution of discrete archives and libraries, cf.Pedersén 1998: 238-241).In sheer numbers, only a relatively modest 80,000 inscriptions derive from smaller sites.All of these examples underscore a more general-and relatively obvious-statistical truth, namely that there is no straightforward relationship between the size of an archaeological site and the number of inscriptions retrieved from it.This lack of correlation carries another important implication, however: if a small site can yield many hundreds and thousands of tablets, and a larger site only a few, it follows that the inclusion of even very modest finds of cuneiform inscriptions is critical to the proper evaluation of the overall spatial prevalence of the corpus.The geographical distribution of inscriptions is an indication of prevalence, but the number of inscriptions not necessarily a proxy of peripheral or core areas of ancient writing culture.As a widely acknowledged example of this reasoning, the discovery of the immense archives of twenty-fourth century BCE BCE Tall Mardīkh (EBA), ancient Ebla, in the early 1970s is still held out as a potent reminder of how singular archaeological discoveries may force a complete redrawing of established notions of central and peripheral areas of writing cultures (Akkermans and Schwartz 2003: 235).

Spatial Density and Prevalence
These notions of centrality can be readily appreciated through an examination of the spatial density of the cuneiform corpus.Here, we employ kernel density estimations as a means to visualise general concentrations of cuneiform inscriptions across the Middle East.This allows us to qualify the number of inscriptions found in any one place against the number and concentration of places where inscriptions are discovered.Kernel density estimation is a common means of representing first-order intensity of point distributions in archaeology, and can be rewardingly applied to the present data set (Bevan 2020).
A kernel density estimation of inscription density based on the present data sets employs spatial proximity as the basic variable and the estimated number of inscriptions from a given location as a weight parameter in spatial statistical calculations.This implies that the statistical impact of a location will correspond to the number of other locations proximal to this location and to the scale of the assemblage of inscriptions found at this particular location.The result will rehearse visions of the overall distribution of cuneiform inscriptions familiar to specialist and layperson alike.The map below highlights the vast assemblages of cuneiform inscriptions retrieved from established centres of the cuneiform world, including the capital regions around Boğazkale in central Turkey, the Assyrian capital cities of northern Iraq, the Babylonian heartland between Baġdād and Baṣra, and the Achaemenid imperial seats of Pārsa/Persepolis and Pāṯragadā/Pasargadae.The geographical emphasis sketched here will also be rehearsed if mapping the holdings of most digital catalogues of cuneiform inscriptions currently available.When disregarding the number of cuneiform inscriptions found at any given location, an approach warranted by our previous discussion of the lack of correlation between assemblage size and site surface extent, a rather different picture emerges.Here, we deploy the same set of point locations using the same parameters for spatial calculation, but disregard the weight of the assemblage, meaning that all points will exert the same weight relative to each other.Spatial density will thus be defined from the proximity of locations alone.The resulting distribution still highlights the notional core areas of the cuneiform world, but at a more general level, the emphasis is shifted to the intensity of archaeological field research.Outside of the heartlands of Babylonia and Assyria, the alluvial plain between Baġdād and Baṣra, and the city and hinterland of Mawṣil/Mosul respectively, noticeable concentrations are found in areas that have, for a variety of reasons, seen intensive archaeological research within the last half century or so.Such include, e.g., the salvage excavation programmes preceding the construction of the Mosul Dam in northern Iraq (State Board of Antiquities and Heritage of Iraq 1986; Roaf 1997; Simi/Sconzo 2020) and the Tabqa Dam in central Syria (McClellan 1997), as well as the many excavations conducted in the headwaters of the Ḫābūr in north-eastern Syria in the decades preceding the outbreak of the Syrian Civil War in 2011 (Akkermans/Schwartz 2003: 10-12).Even more interesting is the equally dense cluster found in Israel and the Palestinian Territories, areas that have historically seen markedly higher levels of archaeological research and recording when compared to neighbouring countries (Mazar 1997;Greenberg/Keinan 2007;Lash et al. 2020).It is hardly surprising that a higher frequency of archaeological field research is likely to produce more finds, but the broader implications seem less widely acknowledged.Statistically speaking, we should not be all that surprised if larger assemblages of cuneiform inscriptions were still to be discovered in areas that continue to be considered largely peripheral in general overviews.The palatial archives from Qaṭna/ Tall al-Mušrifa (QTN), on the Syrian Orontes, are a case in point (Richter/Lange 2012), as are the scatters of inscriptions from Qadeš/Tall al-Nabī Mandū (QDS) in the northern Biqāʿ Valley (Millard 2010) and Ḥȧṣor (HAZ) in northern Israel (Horowitz et al. 2018: 63-88).

Further Perspectives
The present study is based on an initial survey of the secondary literature, intended as a starting point for a more thorough and comprehensive cataloguing of cuneiform inscription metadata.The merit of the data set presented, in consequence, is one of perspective and extent, rather than detail and erudition.Provisional as it may be, the preceding sections have demonstrated the potential of such resources for developing novel perspectives on a corpus of writing long considered beyond measure (e.g., Gelb 1967: 3).Introducing elements of inscription metadata, including chronological placement, material composition, artefact type, language of inscription, and inscription genre to such analyses are certain to produce immeasurably denser overviews of the compositional trajectories of the corpus over time and space (for a concise exploratory example, see Nett et al. in press).
Such approaches may reach well beyond descriptive statistics, for example by exposing tangible preferences in the production and consumption of writing in early human history.The number of individual display inscriptions found within the Urartean cultural sphere seems, on present evidence, of such outsized proportions compared to the corresponding type of finds from the Assyrian heartland as to suggest diverging habits of epigraphical production rather than diverging levels of archaeological fieldwork to be at play.Should we be compelled to ask if the emplacement of writing within the landscape was considered a more prestigious undertaking in the former than it was in the latter?Or one might consider why, next to myriads of recovered impressions of inscribed seals and stamped bricks, that the impression of cuneiform inscriptions on pottery vessels is such an exceedingly rare occurrence?Identifying and addressing such broader questions are contingent upon large, comprehensive, and increasingly standardised collections of data being made available for analysis, an effort that this study hopes to contribute to.
Previous surveys of the cuneiform corpus have approached questions of corpus scale and diversity primarily from a philological angle, assembling statistical overviews with reference to known major finds and collections of cuneiform inscriptions, but without explicit or exhaustive attention to geographical distribution.Coupled with a notable bias towards larger assemblages of inscriptions observable in current digital catalogues, such perspectives have overemphasised larger finds of written material at the expense of minor discoveries which do, so we hope to have demonstrated, carry considerable weight in statistically oriented assessments of the wider regional prevalence of the script.As such, accepted notions of centrality-where cuneiform is found and where not-may be more rewardingly considered with reference to multiple parameters.The broader applicability of metadata collections suitable for large-scale computational analysis is their ability to alter and refine the basic interpretational framework of a given set of research questions.
At a more philosophical level, the presented survey touches on a much more contentious issue, namely the question of sample authority.Plenty of anecdotal discussions of the serendipity of discovery, the chance encounter of a specific inscription, and the particular and unique nature of written sources as an object of study will all consider the possibility of a statistically significant historical record from the cuneiform world a fleeting mirage at best.The absence of larger assemblages of cuneiform inscriptions in the highlands of Turkey, Armenia, and Iran, for example, has been attributed to the haphazard nature of archaeological field work (Wilhelm 2008: 105-106).Reviews of the particular histories of deposition of archives and libraries have been leveraged to argue for an entirely accidental corpus being handed down to us (Millard 2005: 306-314).And yet, the sheer scale of the corpus discussed here should give reason to pause.Assessments of the number of Latin epigraphical remains have, very tentatively, suggested survival rates in the range of one to five per cent of inscriptions produced for some regions of the Roman Empire (Heřmánková et al. 2021: 14 with further references).The number of Greek epigraphic inscriptions preserved from Classical Athens seems to represent a considerable part of actual epigraphical production, even if researchers have abstained from giving exact figures (Hedrick 1999: 390-395).Dealing with a relatively narrow selection of materials, artefact types, and textual genres, the scale of such corpora provides some basic illustration of the magnitude with which writing may have been produced and consumed in antiquity.
To the knowledge of the present authors, no systematic attempts have been made at quantitatively assessing the production or consumption of writing in the cuneiform world over any given period of time or geographical area.Despite the easy quantitative juxtaposition of various corpora of writing advanced earlier, comparing numbers across writing cultures is hardly without problems when considering the character and composition of the individual corpora.The relatively equal numerical size of Latin and cuneiform corpora, for example, overlooks the almost inverted nature of their relative composition with respect to material, artefact type, and genre.Close to ninety-five per cent of the cuneiform corpus is made up of unbaked clay artefacts.At least three quarters of preserved Latin inscriptions are rendered on stone.The vast majority of preserved Latin inscriptions were intended for public display, against some five or ten per cent of preserved cuneiform inscriptions.Cuneiform diverges also in the nature of its geographical distribution.Whereas a number of epigraphic corpora are distributed over a relatively high number of individual geographical locations with correspondingly fewer inscribed artefacts found at any one locale, cuneiform inscriptions tend to be found in comparatively larger assemblages distributed over a much smaller number of sites.These are but a few variables reflecting essentially different bodies of historical source material, underscoring the need for larger, data-driven perspectives as a critical prerequisite to fully grasp the processes of formation and deposition that has handed the cuneiform corpus down to us.

Conclusions
Reaching from Rome to the Himalayas, cuneiform inscriptions are encountered across a vast transect of the Eurasian and African land mass.Counting an estimated 540,000 inscriptions and fragments when including unprovenanced artefacts (Streck 2010), this corpus is one of the largest discrete bodies of writing from early human history available to us.Exploiting the intimate archaeological grounding of artefacts inscribed in cuneiform, the present study has provided a first provisional, quantitative overview of the geographical distribution of this immense corpus.In so doing, we have pointed to noticeable gaps in current digital research catalogues, highlighting an archaeological body of inscribed artefacts with a considerably higher degree of spatial prevalence than typically suggested in scholarly literature.Using a variety of basic variables, including finds distribution, assemblage size, surface extent of the associated archaeological feature, and derived analyses focusing on spatial density and prevalence, the preceding pages have suggested some provisional patterning in the geographical distribution of the corpus.The further exploration of broader, long-term trends in the production and consumption of writing through the analysis of metadata catalogues within a spatial frame of reference is liable to open up entirely new facets of the cuneiform world to substantive research.In providing an initial framework for such examinations, including open access data for further augmentation and reuse, the present study hopes to motivate future research within this area.

Figure 1 :
Figure 1: Point distribution of c. 600 locations of cuneiform finds included in the Cuneiform Inscriptions Geographical Site (CIGS) index version 1.6 (1 July 2023).Map prepared by Rune Rattenborg.

Figure 2 :
Figure 2: Locational accuracy distribution of archaeological sites with cuneiform finds according to select modern countries (n = 477).Data derived from CIGS version 1.6 (1 July 2023).

Figure
Figure 3a-b: Comparison of data coverage in the CIGS-AE index and the Cuneiform Digital Library Initiative (CDLI) catalogue.The first (a) shows the distribution of 597 locations and the spatial density of 429,398 inscriptions.The second (b) shows the distribution of 219 find locations and the density distribution of 260,861 geolocatable inscriptions retrieved from the CDLI catalogue (as of August 2020).Maps prepared by Carolin Johansson.

Figure 6 :
Figure 6: BM 90849.A diorite door socket fragment with an inscription of Gudea of Lagaš.Found during construction works in Knightrider Street, central London, in 1890.© The Trustees of the British Museum.Shared under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) licence.

Figure 9 :
Figure 9: Number of sites with cuneiform finds by site size (n = 301).Based on data from CIGS v. 1.6 (1 July 2023).Surface extent derived from polygon vector data prepared by Carolin Johansson.

Figure 10 :
Figure 10: Distribution of size of sites with cuneiform finds by select modern countries (n = 253).Based on data from CIGS v. 1.6 (1 July 2023).Surface extent derived from polygon vector data prepared by Carolin Johansson.

Figure 12 :
Figure 12: Kernel density estimation of locations with finds of cuneiform inscriptions.Data derived from CIGS v. 1.6 (1 July 2023), including 597 locations.Map prepared by Carolin Johansson.

Table 1 :
Locational accuracy of archaeological sites included in CIGS version 1.6 (1 July 2023)

Table 3 :
Distribution of site sizes based on data from CIGS v. 1.6 (1 July 2023).Surface extent derived from polygon vector data prepared by Carolin Johansson.