On the Emerging Supremacy of Structured Digital Data in Archaeology: A Preliminary Assessment of Information, Knowledge and Wisdom Left Behind

: While the epistemological a ﬀ ordances and varied impacts of di ﬀ erent media on archaeological knowledge production have been scrutinized by many practitioners in recent decades, sources of digital structured data ( e.g., spreadsheets, traditional relational databases, content management systems ) have seen far less critical enquiry. Structured digital data are often venerated for their capacities to facilitate interoperability, equitable data exchange, democratic forms of engagement with, and widespread reuse of archaeological records, yet their constraints on our knowledge formation processes are arguably profound and deserving of detailed interrogation. In this article, we discuss what we call the emerging supremacy of structured digital data in archaeology and seek to question the consequences of their ubiquity. We ground our argument in a case study of a range of texts produced by practitioners working on the Çatalhöyük Research Project. We attempt to map short excerpts from these texts to structured data via the CIDOC Conceptual Reference Model. This exercise allows making preliminary observations about the representa - tional a ﬀ ordances and resistances of texts ( which can be considered as a type of semi - or unstructured data ) and structured data. Ultimately, we argue that the push to create more and more structured and structur - able data needs to be tempered by a more inclusive digital practice in archaeology that protects di ﬀ erence, incommensurability, and interpretative nuance.


Introduction
There can be little doubt that the digital transition, especially of the last three decades within archaeology, has had far-reaching impacts upon the routine practices of representing archaeological information and, in turn, producing archaeological knowledge. One impact of this transition has been the generation of structured data in increasingly larger quantities, which has arguably resulted in an ever-increasing dependence of archaeological knowledge production processes upon such data. By structured data, we mean any data set that is "born" organized and relatively easy to search through by machines, or organized at a later stage using metadata and other means to improve its searchability. As we discuss below, this emerging ubiquityindeed, supremacyof structured digital data grows out of a range of factors: new digital tools and knowhow becoming available to archaeologists; high quality (semi-)structured data sets being increasingly taken as primary reliable resources in archaeological knowledge production; and the push toward open data. Simultaneously, the proliferation of more-than-representational and nonanthropocentric discourses over the last two decades within archaeology (e.g., Olsen, Shanks, Webmoor, & Witmore, 2012;Witmore, 2007) and beyond it (e.g., Bennett, 2010;Law, 2004;Mol, 2002;Thrift, 2007) has made clear that scientific knowledge is produced via a skilled, messy process that involves complex interactions between people and their material and representational resources. The message in these studies has been that the epistemological agency of representations in the way we construct knowledge should not be downplayed: representations matter.
Although the epistemological affordances and varied impacts of different media on archaeological knowledge production have been scrutinized by many practitioners in recent decades (e.g., Clack & Brittain, 2007;Hacıgüzeller, 2017;Lucas, 2019;Moser, 2001;Perry, 2015;Wickstead, 2013), sources of digital structured data (e.g., spreadsheets, traditional relational databases, content management systems, online data infrastructures) have seen far less critical enquiry. As Labrador (2012, p. 237) writes, "Despite their importance and ubiquity, archaeological database systems are rarely the subject of theoretical analysis." In fact, much like the Cartesian map, high-quality archaeological databases that act as containers of "structured" data are generally considered to enable objective and neutral representation. This phenomenon is evident across fields of practice, with the geographer Schuurman (2008Schuurman ( , p. 1538, for example, suggesting that "the vast majority of data users remain uncritical of data and unaware of their more nuanced narratives" and ascribing some of the blame here specifically to the "minimalist configuration" of databases. Per Schuurman (2008Schuurman ( , p. 1529, "once data are placed in tables, their social lineage is forgotten." Yet such data and data infrastructures are designed by people, reflect those people's ideologies and norms, and in turn help to shape others' values and overall conceptualizations of the world (Burns & Wark, 2020, p. 602). As Labrador (2012, pp. 238-239) describes it, "entering information into a database is an interpretative act" even if the database itself is rarely perceived as a representation or as a highly contextualized entity. Accordingly, following a series of other practitioners working in and beyond the cultural heritage sector (e.g., Boast & Biehl, 2011;Labrador, 2012;Labrador & Chilton, 2010;Srinivasan, 2018), here we contend that although structured digital data may be lauded for their capacities to facilitate interoperability, equitable data exchange, democratic forms of engagement with, and widespread reuse of archaeological records, their constraints on our knowledge formation processes are arguably profound and deserving of detailed interrogation. High-quality sources of digital structured data are representations like any other (e.g., books, maps, video games, TV shows, scientific reports, etc.), and as such they demand that we ask questions about what, by the very nature of their structure, they enable and resist and, specifically, what structured data do to archaeological knowledge formation processes.
Below, we look into the consequences of the emerging supremacy of structured digital data sets in archaeology. That is, we critically reflect upon the ever-increasing ubiquity of structured data in varied archaeological information ecosystems including the Web. We begin by offering an introduction to structured digital data in Section 2. We follow Labrador (2012, p. 238) in defining sources of integrated or aggregated structured data as "any systematized assemblage of data points." By our reckoning, databases and other structured data sources are typically used to record observations and "distilled facts" that are often characterized as primary data. To borrow from Srinivasan (2018, p. 122), they "take specific actions, events and practices and abstract these into indexable, comparative data"; however in so doing, they also "filter out that which fails to 'fit' with existing classification protocols…the voices, values, and protocols of communities on the ground." Although Srinivasan (2018) goes on to make a compelling case for the negative impacts of these homogenizing tendencies of databases upon diverse human populations, we focus here in the first instance on the impacts on archaeological specialists themselves, resulting from the failures of structured digital data to account for rich forms of archaeological representation, such as interpretative texts. Such texts are arguably one of the more important legacies of postprocessual archaeologies, and hence their seeming incompatibility with structured digital data has implications for the discipline on multiple levels.
In the subsequent section, we attempt to examine the representational affordances and constraints of archaeological structured data. We do so through an experiment with mapping short excerpts from a selection of texts produced by practitioners at the archaeological site of Çatalhöyük, in Turkey, to the CIDOC Conceptual Reference Model of the International Committee for Documentation of the International Council of Museums (henceforth, referred to as CIDOC CRM or CRM). CIDOC CRM is an official ontological standard that can be used for semantic data integration in cultural heritage. More specifically, CRM is a data model designed to guide practitioners in classifying, ordering, and structuring local data sets, including the logic and meaning of the relationships between data classes, in a standard way so that they can become part of a coherent global information resource (see Bekiari et al., 2021, p. i); hence, it is understood as an "ontology." Our choice of CRM here is due to its wide acceptance in cultural heritage documentation practices and its wide scope and relatively high semantic expressiveness as a metadata standard for cultural heritage.
The results we present shine a light upon how structured data may be inherently unable to express a range of phenomena, including nontechnical discourse, spatio-temporal ambiguity, ontological ambiguity, nonsimilarity, and archaeological interpretations that are not strictly positivist (i.e., not based strictly on empirical evidence). In the concluding section, and in line with the wider interdisciplinary literature, we suggest that to respond to this predicament we need to devise more inclusive approaches to the development and standardization of digital data sets in archaeology, including investment in genuinely cocreated ontologies and in the technologies that support such ontology development, in order to heighten their flexibility and representativeness. We must also commit to richer studies of our data and their supporting infrastructures via, for example, the pursuit of database ethnographies (e.g., Burns & Wark, 2020;Schuurman, 2008). Given the sheer impossibility of creating structured data sets that are as expressive as natural language, we also join others in advocating for deeper reflection upon the implications of the recent "scientific turn" as well as the turn toward open data. Both trends generate a considerable push to produce ever more structured or, at least, "structurable" data in archaeology, reflecting what Srinivasan (2018, p. 42) calls an increasing propensity to "see the world in terms of opportunities for databasing rather than seeing databases as tools to support our subjective experiences." 2 Structured Digital Data (and Texts as Unstructured Data) in Archaeology As discussed briefly above, any data set that is "born" organized and relatively easy to search through by machines, or organized at a later stage to improve its searchability, can be understood as structured to a certain degree. Metadata have a big role to play in structuring data sets, as do database schemas describing how data should be organized in a database. Naturally, a range of different data sets fit the broad definition of structured data sets. Some of these will be highly structured in the sense of being organized strictly and rigidly into rows and columns either in a single flat data table (e.g., a spreadsheet) or in a collection of tables in a relational database (i.e., databases in which data are represented in tables and relationships are established among them via unique identifiers assigned to each row of the table). In the latter case, a database schema will serve to organize the data sets further with a clear description of, for instance, the relationship between different tables and the kinds of information that can be stored in the database.
Various types of what are commonly known as nonrelational databases (e.g., graph databases, JavaScript Object Notation [JSON] documents) can also be considered structured to various degrees. Similar to relational databases, they can be enriched and further structured with the help of standardized metadata terms (e.g., Dublin Core metadata terms, Getty's Art and Architecture Thesaurus), data schemas (e.g., Resource Description Framework Schema), or formal ontologies (e.g., CIDOC CRM). Texts in natural language (i.e., human language as opposed to, for instance, computer language), however, are different. Together with other media in image, audio, and video formats meant primarily for human interpretation, they are typically understood as unstructured in this context despite their internal structure. The reason for this distinction is that computers cannot make easy sense of these large bodies of information; they need to be marked up with metadata (also see our discussion below on natural language processing [NLP]). When texts are marked up (with, for instance, eXtensible Markup Language [XML] inserting [meta]data directly into the text), they will be considered semistructured in the sense of having a structure that is "less rigid and strict than in conventional databases" (Calvanese, De Giacomo, & Lenzerini, 1999, p. 254). Our article originates from the observation that there is an ever-increasing reliance on structured data in archaeology today in the processes of producing archaeological information, knowledge, and insights. This observation is testified to not only by the importance and scale of massive, cross-country data integration initiatives such as ARIADNE (Meghini et al., 2017) but by continuously emerging calls for the need for more or better structures for our data (e.g., Holdaway, Emmitt, Phillipps, & Masoud-Ansari, 2019). We contend that there are three major overlapping reasons for what we call this emerging supremacy of structured digital data in archaeology. First, the digital transition in the discipline, especially of the last three decades, in terms of digital tools and know-how has made collecting data in a structured way during fieldwork easier while also improving the structure of existing data sets at the postexcavation stage. Here relative availability and affordability of, as well as familiarity and proficiency with, devices such as rugged laptops and tablets (e.g., Toughbooks) and digital services such as Open Refine (see openrefine.org) have proven pivotal facilitators of the growth in structured data production.
Second, the analytics-ready nature of structured data sits well with an increasingly prominent trend within archaeology, referred to variously as the "scientific turn" (Sørensen, 2017), "new empiricism," or the "Third Science Revolution" (Kristiansen, 2014). In this context of a new wave of positivism, high-quality, (semi-)structured data sets are increasingly taken as primary reliable information resources in archaeology (Huggett, 2020a). The hope is often to render these analysis-friendly structured data sets interoperable so that they can eventually be aggregated to create "Big Data" sets and, what we may also then call, Big Tables, Big Graphs, and Big Maps. Archaeologists hope that these ambitious representations of data and information may provide them with answers to longstanding questions that can seemingly not be answered by a focus on the local context and associated "small data." We often witness data being used as a synonym for evidence in this context and reality being reduced to data (Silva & Tomaz, 2021), with data size emerging as a proxy for accuracy, and with data-scientific workflows that are enabled through digital technologies being portrayed as sources of archaeological objectivity and truthful knowledge (see Huggett, 2020b).
Third, the well-pronounced and compelling call for open data in archaeology within the last decade, often justified by ethical, political, and knowledge-based rationales (e.g., Costa, Beck, Bevan, & Ogden, 2014;Lake, 2012;Marwick et al., 2017), has only exacerbated this trend of "structured datafication" in the discipline. Critical here is that what renders open data sets findable and reusable on the Web and across other information ecosystems is their well-structured nature. Regarding findability, it is mainly thanks to being structured that data sets can be efficiently coupled with rich automatically interoperable metadata. As to reusability, Heath and Bizer (2011, p. 2) stress that "a key factor in the reusability of data is the extent to which it is well structured. The more regular and well-defined the structure of the data the more easily people can create tools to reliably process it for reuse." In fact, the increasing importance of linked data which can be understood as "a set of best practices for publishing and connecting structured data on the Web" (Bizer, Heath, & Berners-Lee, 2011, p. 206) and semantic web technologies in archaeology and cultural heritage also have to be understood partially as a consequence of this trend toward data structuring. A major objective of these technologies is not only to aggregate or link data, using common or aligned Uniform Resource Identifiers (URIs) to identify nameable entities but they also serve to structure unstructured data sets using semantically expressive, standard data models (see Allemang, Hendler, & Gandon, 2020).
Against this forceful push toward both the creation of more tabulated data and the heightened structuring of texts, images, videos, and so on, crucial questions arise: Can all information, knowledge, and wisdom in archaeology be turned into structured data? If so, what does this structuring cost us in terms of knowledge production and our larger understandings of the past? If not, what is left behind if some forms of knowledge are left out of the equation?
Importantly, one of the key legacies of the postprocessualist movement in archaeology has been its emphasis upon "interpretative archaeology," with the archaeologist acting as "an interpreter…[,] a translator, an interlocutor, or a go-between" (Hodder et al., 1995, p. 5). This positioning of "the archaeologist" as an agent for understanding (or as an "interpreter" of) the past within the postprocessual school rests upon the notion that "the past" (the "interpreted") can be "read" like a text through a combination of reflexive, dialectic, and hermeneutic investigation. By drawing upon a range of theoretical perspectives rooted in hermeneutics, structuralist/poststructuralist theory, and structuration, the posprocessualist critique therefore (especially Ian Hodder's particular brand of it) has always emphasized the "theory-ladenness" of data: "One cannot sit back and observe the data; they must be brought into action by asking questionswhy should anyone want to erect a building like that, what was the purpose of the shape of this ditch, why is this wall made of turf and that of stone?" (Hodder, 1986, p. 196).
Although theoretical discourse has moved on from the trends of these debates in recent years, the relationship between data (of any sort, but especially structured digital data) and the various modes of publication that are rooted in that data still has relevance to the discipline. In fact, one might argue that with increasing emphasis being placed upon the production of large and interoperable datasets (i.e., Cooper & Green, 2016) and new and innovative media for publication (i.e., Morgan, 2013), it is as important now as it ever has been to consider carefully the epistemological implications of the intrinsic relationship between "interpretation" (and text which conveys/carries the interpretation) and data.
From a theoretical perspective, Lucas (2019) has recently comprehensively reviewed the epistemological implications of the various paradigmatic shifts in our relationship between "data" and "texts" reflected within our writing practices as archaeologists. He explores the relationship between different "writing styles," or "modes of writing," and the very character of the knowledge we create as a discipline. Lucas emphasizes four modes of writing: narrative, description, argument, and exposition. He sees the relationship between these various styles in the production of "literary text" in archaeology as a dynamic thing: "Texts act to embed and yet also uproot knowledge" (Lucas, 2019, p. 159); arguing that "knowledge" itself (as embodied in text) "is always local, context specific, and yet, at the same time, has the potential to move and work in different contexts." There is an implication in Lucas' discourse that, in practice, we (archaeologists) naturally tend to blend our "interpretation" and "observation" through the subtle interplay of our modes of writing. This can be linked to a lineage of discussion and critique of the object/subject dichotomy in archaeology by the likes of Yarrow (2003). Yarrow contends that there is no clear distinction between object and subject in the act of excavation, dispelling the myth of "interpretation at the trowel's edge," and suggesting rather that the point of interpretation is a fluid dynamic between "the archaeologist, […] the trowel, or the feature" none of which "could be seen as the sole origin of the interpretation as it emerged" (2003, pp. 70-71; but see also the extended discourse on object/subject in Cobb, Harris, Jones, & Richardson, 2012). Arguably in practice, Lucas' subtle and fluid "interplay" between our modes of writing almost intuitively reflects Yarrow's ambiguity around the "point of interpretation" on site.
It is also interesting to note that, within the context of British archaeological publication traditions, various conventional types of publication (from site reports, to archives, to synthetic articles or monographs, following Frere, 1975, modified by Cunliffe, 1983) afford differing degrees of Lucas' four modes of writing. For example, one expects an interim excavation report to contain a greater proportion of description and exposition than higher-order interpretative publications such as monographs, which might be centered upon more argument and synthesis.
Recognizing the potential distinctiveness of different forms of common archaeological text, the remainder of this article focuses on mapping written extracts from a series of publication outputs from the Çatalhöyük Research Project to the CIDOC CRM in an effort to understand what can and cannot be accommodated within the conceptual model and with what consequences such inclusion or exclusion comes. The Çatalhöyük Research Project tends to publish following the typical British commercial publication practice (i.e., conforming to the Frere (1975) model referenced earlier; note that it is beyond the scope of this article to fully unpack the tensions associated with both Frere's model and conventional publication hierarchies). We discuss the project's various publication types in detail below. We anticipated prior to the mapping exercise that we would see these texts map differently depending on their content, with higherorder "interpretive" textual data less likely to be accommodated within the structured confines of CRM and more "descriptive" primary data proving more amenable to its rules.

Mapping Çatalhöyük Texts to CRM
Our research here is grounded in an analysis of five types of output generally produced by the Çatalhöyük Research Project (Figure 1). This focus on the site of Çatalhöyük is the result of two primary drivers. Firstly, all three authors have worked at the site over a collective total of 25 years and, as such, have a deep understanding of the way the Çatalhöyük Research Project operated, its data collection protocols, infrastructure, and its outputs. Secondly, the Çatalhöyük Research Project has, since its conception under the direction of Ian Hodder, developed a very clear strategy for experimenting with reflexive methods of data collection and interpretation, building these into its corpus of formal publications (Hodder, 2000). In doing so, the project positioned itself as a flagship for an applied reflexive postprocessual method in archaeology.
In some ways, though, despite this innovative outlook and history of experimentation, the core publication model of the Çatalhöyük Research Project has been deeply conventional (although see, e.g., Tringham & Stevanović, 2012 for an exception). It is reflective of wider Euro-American trends within the commercial and research arms of the discipline (e.g., see Watson, 2019), and is centered upon a fairly conventional, hierarchical publication pipeline. The foundation of this publication model is the project's own hybrid of a raw single context recording system supplemented by various reflective (or reflexive) Figure 1: Table showing various text-based data (i.e. types of output) by the Çatalhöyük Research Project ordered according to the conventional understanding that they are increasingly interpretive. The data exemplars annotated for this article are represented in the right column in dark gray. mechanisms (i.e., diaries, sketches, etc.), all digitally crystallized in an integrated geographic information system (GIS) and relational database (see Berggren et al., 2015). This digital infrastructure is public facing (via a research portal on the project's website: http://www.catalhoyuk.com/), with plans to develop it into a permanent sustainable open-access, interactive, linked-data research tool (Lukas, Engel, & Mazzucato, 2018). Drawing upon this, the bulk of publication from the project centers upon the "grey literature" or archive reports (all in the public domain), the extensive (and expensive) official monograph series (which tends to vary between more standard excavation volumes (e.g., Hodder, 2007Hodder, , 2014a) and more thematic collaborative volumes (e.g., Hodder, 2005Hodder, , 2014b, and a raft of (debatably) more widely accessible syntheses and technical papers that tend to be published in journals, often reflecting both independent or collaborative research interests of team members or research students affiliated with the project. The project has always encouraged this peripheral publication by the team as part of its ethos of multivocality and its inclusive ideal of scholarship. Overarching all of this is the role of heritage interpreters at the site, who also produce a range of materials for consumption by wider audiences, including site guidebooks and signage for Turkish-speaking and English-speaking visitors to the site (e.g., Çatalhöyük Research Project, n.d.) and online interactive educational resources for English-speaking youth and adults (e.g., McKinney, Perry, Katifori, & Kourtis, 2020;Roussou et al., 2019). Although this is by no means unique to the Çatalhöyük Research Project, the project is somewhat unusual in that this process of specialized heritage interpretation takes place in the field in close collaboration with the full range of team members (Perry, 2018).
For the purposes of this study (in terms of digital output at least), we have classified the Çatalhöyük publication model into five common "output types," as visualized in Figure 1. As noted above, here we follow British convention in defining these types, but we note that such conventions are deserving of significant debate, which is beyond the scope of our paper. Firstly, we recognize the site's "primary data," the digitized but essentially raw documentation of observational and metric data (i.e., field recording sheets digitized into the database), followed by the closely related "secondary data," which, although still generated as part of the "field record," function in a more interpretative mode (such as diary entries and interpretative descriptions within the database). Following on, we recognize the site's grey literature (archive reports), then its more "formal academic publications" (the research project's monographs, which are also supplemented by a wide range of smaller journal-based publication outputs). Finally, drawing together all of these elements, are the more public-facing interpretative outputs for wider nonspecialist audiences, which we have called "interpretive literature" (i.e., heritage interpretation panels, site guides, and storytelling devices such as interactive web content or mobile applications).
To be precise, our focus in the case study here is upon "mapping" short excerpts from texts from each mode of publication to the CRM Core and five of its extensions, namely CRM inf ,¹ CRM sci ,² CRM archaeo ,³ CRM ba ,⁴ and CRM geo .⁵ CRM Core can be considered the principal component of CRM, comprising a set of metadata elements describing "the most fundamental relationships that connect things, concepts, people, time and place" (Doerr & Kritsotaki, 2005;see Bekiari et al., 2021). Extensions of CRM are compatible with and complement CRM Core, focusing on specialized research questions and documentation needs developed in partnership with respective research communities (CIDOC CRM, n.d.).
Mapping here refers to digitally annotating words or phrases in these texts as they correspond to CRM classes and properties (defined further below). As described earlier, different knowledge organization systems exist to structure data sets, including standardized metadata terms (e.g., Dublin Core metadata terms, Getty's Art and Architecture Thesaurus), data schemas (e.g., Resource Description Framework Schema), or formal ontologies (e.g., CIDOC CRM). Formal ontologies in this context have the role of enabling data aggregation by facilitating the mapping of different data sets to common information elements, describing both these elements and the relationships between them. In practice, the use of formal ontologies such as CRM allows for the creation of a standardized representation of a data model (i.e., how information is organized in a data set) and the integration of different data sets with one another. Within the context of cultural heritage and archaeology, the ontology with the widest acceptance and semantic expressiveness is CRM; hence our choice in this case study.
CRM was initiated as an effort of the International Council of Museums (ICOM). It was initially meant as a database prototype aiming to structure and integrate heterogenous data sets made available by the entire ICOM museum community. As Bruseker, Carboni, and Guillem (2017, p. 108) extensively review, "the resulting maximalist work was an impressive feat of research work, but resulted in a highly complex relational database model with over 400 tables that was difficult in practice to put into effect." It was this difficulty that led to further research on the CIDOC relational data model to create a formal ontology that would allow nonambiguous and transparent documentation of cultural heritage data sets with a particular focus on museum documentation. The ultimate aim here was to assure the long-term understanding and analysis of these data sets (Bruseker et al., 2017, p. 109;Doerr, n.d.). As Bruseker et al. (2017) summarize, the creation of this formal ontology was achieved successfully in 2006 when CRM was officially recognized by the International Organization for Standardization (ISO), confirming its status as an international standard for information organization. Per Bruseker et al. (2017, p. 108), this made CRM "the only ontology in the [cultural heritage] domain to have this official recognition, which can be read both as a result of and also as a cause of its acceptance in the community." Practitioners continue to build extensions to CRM to account for obvious gaps (e.g., Canning, 2019), and the CIDOC community is aware of matters of bias in its data structures that might constrain the utility of the model, and hence have actively begun candid conversations about how to minimize and mitigate such bias (CIDOC CRM, 2021a). Our results will hopefully create further momentum in the call forand the associated building and exploration ofthese alternatives by the CIDOC community and within the archaeology and cultural heritage fields more generally (see Dallas, 2016;Huggett, 2020a;Jackson, Richissin, McCabe, & Lee, 2020 with references). We also aim to add to earlier methodological efforts to map archaeological texts to structured data with different research questions in mind (e.g., Doerr, Kritsotaki, & Boutsika, 2011;Martín-Rodilla, 2015).
As mentioned above, in this study we digitally annotate words or phrases in a small corpus of selected Çatalhöyük texts as they correspond to CRM classes and properties. Annotation in this context refers to the practice of enriching an information medium (e.g., text, image, or audio) by adding extra information to it, and it can take place in both analog or digital formats. In its analog form, it typically involves activities such as highlighting sections of printed texts or adding notes or marginalia (see Barker & Terras, 2016, p. 6). Digitally, annotation can be as simple as mimicking this analog practice by, for instance, highlighting or commenting on a digital text document using text editing software (e.g., Microsoft Word). However, more innovative and contemporary methods of digital annotation include, for example, "semantic annotation," which involves attaching metadata to (parts of) texts or images in order to describe their content (Andrews, Zaihrayeu, & Pane, 2012, p. 5). Importantly, implementation of linked data often features as a key consideration in the semantic annotation process. This is because URIs, used to uniquely identify nameable entities (e.g., things, people, time, events, places) and other resources across a computer network (including the World Wide Web), can be consistently used as part of metadata to link the annotated entities to larger information ecosystems through common URIs (see, e.g., Peripleo; https://peripleo.pelagios.org).
In our case study, for digital annotation, we used Recogito, a free and open-source web-based annotation tool developed by the Pelagios digital humanities project (see https://recogito.pelagios.org). The decision to employ Recogito rests primarily on its user friendliness for humanities research involving image and text annotation (Simon et al. 2017). As mentioned earlier, our aim here is to observe the process of turning texts written in natural language to structured data. The tensions we observed in the annotation process, we suggest, closely resemble the tensions involved in any other attempt (e.g., during archaeological fieldwork) to structure archaeological wisdom, knowledge, and information into data that fit under predetermined column/field names. In other words, by experimenting with how archaeological texts of different genres may or may not be mapped to CRM, we aim to observe which types of archaeological wisdom, knowledge, and information may be in danger of being left behindand which are in danger of being overrepresentedin our knowledge organization systems that are primarily based on structured data.
Annotation may happen at various levels of detail depending on the linguistic component chosen to carry it out: document, paragraph, sentence, phrase, word, or character. Our annotations comprise a mixture of word-and phrase-level annotations. We annotated the Çatalhöyük texts comprehensively in the sense that we aimed for a complete coverage of all words and phrases. That is, every word or phrase was either annotated with a CRM class or property or marked as not annotatable. We are fully aware that the aim of CRM "is not to capture, in a structured form, everything that can be said about an item" (Bekiari et al., 2021, p. 61) and that it is "intended to focus on the high-level entities and relationships needed to describe data structures" (Bekiari et al., 2021, p. 60). However, as structured data increasingly come to prevail over texts and other forms of unstructured data sources, it certainly appears timely to identify what exactly CRM (or other formal ontologies) cannot capture. During the annotation exercise, it was clear at times that the same word or phrase could be annotated in more than one way with CRM. However, given that our research aims to identify what is annotatable or not annotatable (even when the highest level concepts in CRM are used; e.g., E2 Temporal Entity, E77 Persistent Item), we did not aim for exhausting the possibilities of alternative annotations in our case study.
To have a closer look at how we carried out annotations in practice, consider the example in Figure 2 presenting a sentence in Hodder (2012, p. 186). For this particular annotation, CRM Core and CRM sci classes and properties were employed. CRM classes are categories of things with common characteristics that allow them to be identified as belonging to the same CRM class (Bekiari et al., 2021, p. iii). CRM classes can have subclasses or superclasses. A subclass can be defined as "a specialization of another class (its superclass)" and, as such, a superclass "is a class that is a generalization of one or more other classes (its subclasses)" (Bekiari et al., 2021, p. iii). CRM properties are relationships between CRM classes (see below). In this particular example, S15 Observable Entity, E7 Activity, S4 Observation, and E53 Place are all CRM classes. P137 exemplifies (is exemplified by) is a CRM property that associates instances of two CRM classes: E1 CRM Entity and E55 Type (not possible in this case, see below). Letters and numbers at the beginning of each class and property (e.g., E53) uniquely identify them, indicating also whether a class or property belongs to CRM Core or a particular CRM extension (in this case, E for classes and P for properties in CRM Core and S for classes in CRM sci ).
In Figure 2, words and phrases that do not correspond to any CRM class or property and thus cannot be annotated are indicated in red (see also Section 4). Although "the site" can be understood as E53 Place, "elsewhere on the site" arguably does not fit well into the definition of E53 Place or any other CRM class: the phrase relationally defines an area which is characterized by "not being" the particular building in question in the text, Building 44 (cf. Hodder, 2012, p. 186). This "not" logic defining "elsewhere on site" creates a tension with the definition of E53 Place in CRM, which refers to a more specific and content-wise homogenous place of interest (see Bekiari et al., 2021, p. 87). The other not-annotatable entity, "although," introduces a subordinate clause weakening the strength of the statement made in the main clause regarding the observation of the same pattern of burials elsewhere at Çatalhöyük. This kind of argumentative relationship is not possible to express through CRM, even via CRM inf . Note also that, in this example, CRM property P137 exemplifies (is exemplified by) cannot function properly even though we annotated it as such: it associates Building 1 (an instance of E53 Place which is a subclass of E1 CRM Entity) with a nonannotatable Figure 2: An example of the way in which annotations have been applied in our research (annotated sentence from Hodder, 2012, p. 186). Text in red shows words and phrases that could not be annotated with CRM. entity, "elsewhere on the site," which does not qualify as E55 Type or any other CRM class as discussed earlier.
Note that the example in Figure 2 also involves a typical issue we faced in performing annotations related to restrictions caused by CRM properties' domain and range.⁶ Specifically, in this example, it would appear at first that "associated with" could be annotated as AP11 has Physical Relation (is Physical Relation of) property of CRM archaeo . However, the "domain" and "range" of AP 11 are both A8 Stratigraphic Unit (CRM archaeo ), which means the property can only be used to indicate a stratigraphical relation between two instances of A8. Neither "this pattern of burials" nor "the construction of a building" in the sentence in Figure 2 fit the definition of A8. They fit the definitions of S15 Observable Entity of CRM sci (or its subclass E77 Persistent Item of CRM Core) and E7 Activity of CRM Core, respectively. It is therefore not possible to annotate "associated with" as AP11. We encountered several other instances of this particular issue in the annotation process. These instances (including the one featuring P137 described above) act as a clear reminder that natural language and the texts that convey it are inherently complex archaeological representations that will readily challenge attempts to structure them.
The annotation process has involved skilled and time-consuming labor, sometimes progressing with an average speed of 10 words per hour when time spent double-or, at times, triple-checking the annotation quality is also included. This resulted in a relatively small amount of high-quality (but probably still not error-free) text annotations utilized in our analysis (ca. 350-450 words from each type of text in Figure 1). Therefore, the results we present in Section 4 should be considered strictly preliminary. Also, importantly, despite the limited number of words we annotated, the potential issues that could be discussed in this  6 In CRM, "the domain is the class for which a property is formally defined" and "the range is the class that comprises all potential values of a property" (Bekiari et al., 2021, pp. v-vi). This means that instances of a CRM property link "only" from the instances of the property's domain class to the instances of its range class. article are many. This predicament arguably serves to testify to the fruitful nature of the methodology and hopefully encourages future research on the topic. Given the space limitations, we summarize only the major issues we identify in Figure 3 and in the discussion below. Interested readers can consult the annotations at the following link as TEI XML files to explore additional matters or reuse the annotation data set in further studies: 10.5281/zenodo.5795295.⁷ , ⁸

Summary of Results
A range of issues emerged regarding the limitations of CRM while annotating short excerpts from different types of text from Çatalhöyük. These issues mainly involve the impossibility of expressing instances of, what can be named as, nontechnical discourse, spatio-temporal ambiguity, ontological ambiguity, "highlevel" archaeological interpretations (i.e., archaeological interpretations based only partially on empirical evidence), and nonsimilarity. Figure 3 lists the most informative examples for each of these issues. The annotation exercise broadly illustrated that some of the texts we annotated were more easy to express as CRM classes and properties than others ( Figure 4). Specifically, of all the excerpts we annotated, interpretative literature was the least expressible by CRM: about 46% of the annotations from the text of the mobile application (a storytelling-based on-site experience for English-speaking visitors to Çatalhöyük focused on Building 52) was not annotatable with CRM classes and properties. The "secondary data" category, represented by a diary entry by Roddy Regan about Building 56 from 31 July 2005, was the second  7 The Recogito link to the annotation data sets, which may not be permanent but is visually easier to consult for nontechnical users, can be found at https://recogito.pelagios.org/ghentcdh#6c93b35b-9b85-4755-85cc-2404794ad96f. 8 Nonannotatable words and phrases have been marked as "place" and annotatable ones as "person" in Recogito to benefit from the data visualization functions of the application. CRM classes or properties are indicated as tags. most difficult to annotate with 44% being not annotatable. "Grey" literature, represented here as an excerpt about Buildings 56 and 65 from the archive report published in 2006 (Regan, 2006), was 37% not annotatable. The formal academic publication category, involving an excerpt on the Building 65-56-44-10 sequence from a book chapter written by Hodder (2012), was the most expressible via CRM: only 33% of the excerpt was not possible to annotate via the classes and properties of the ontology. It is crucial to note, however, that as we have only annotated a very small number of words from each text category, only one person annotated the texts (P. Hacıgüzeller) and only one archaeological site is the focus of our case study, these numeric comparisons should be considered preliminary observations only, and as food for thought for future research.
One of the major tensions between Çatalhöyük's texts and the CRM in our case study presented itself in annotating anything that fell outside of what we want to call here "scientific discourse" or, following Fisher (1987), "technical discourse", the discourse of ration (not fiction), logos (not mythos) and thought (not imagination). It is historically understood as a type of discourse meant for experts and involving true knowledge (Fisher, 1987, Chs. 1, 2). In the annotation process, we identified nontechnical discourse in particular where author positionality was explicit. This was the case in those contexts of "dialogic communication" that involved personal and possessive pronouns, and possessive adjectives mainly in the first singular and plural persons (e.g., I, we, our). Author positionality in these cases was very clear through references to acts of, for instance, recalling and remembering as well as emotions, expectations, and what appeared to be clearly personal opinions. The following sentence from the diary entry text forms an example: "It is at this point I believe the burial sequence was started, and while I suspect the earlier burials cut through 11652 we could not prove it […]." Although I2 Belief and I6 Belief Value (CRM inf ) have some useful expressivity here, we would argue that the nuances involved in expressing the archaeological interpretative process at the very moment of excavation, perhaps the most crucial moment for archaeological knowledge production, are not possible to capture through CRM. This observation becomes particularly clear in another sentence from the diary entry: "As usual the frustration of not knowing has subsided as the archaeology again explains itself as we take things apart piece by piece" where it is only possible to express "as," "we," and "take things apart" in the entire sentence (via P114 is equal in time to, E21 Person and E7 Activity, respectively, all from CRM Core).
Another critical tension we came across in the annotation process was CRM's inability to express spatio-temporal relationality. This type of relationality involves an understanding of spatio-temporal entities strictly in their contexts where their identity "depends upon everything else going on around [them]" (Harvey, 2006, p. 124). As such, the concept of relational space-time is rather different from the conceptualization of space-time as absolute (one of the foundational pillars of modern Western science), which understands space as fixed, preexisting, immutable, and "amenable to standardized measurement and open to calculation" (Harvey, 2006, p. 121).
Interestingly, CRM documentation of the class E53 Place includes explicit acknowledgement and accurate observation regarding the relevance of relative space: "relative references are often more relevant in the context of cultural documentation and tend to be more precise" (Bekiari et al., 2021, p. 32). An example provided in the documentation then is "the place referred to in the phrase: 'Fish collected at three miles north of the confluence of the Arve and the Rhone'" (Bekiari et al., 2021, p. 32). Although "relative space" described by CRM can be considered a form of spatial relationality, it also has to be noted that this example describes a fairly precise spatial relationship between two places and actually manages to communicate the absolute location of the fish collection place in question without any ambiguity. Importantly, CRM also affords expressions of less absolute relationality between spatial and temporal entities through several properties such as P121 overlaps with and P114 is equal in time to (CRM Core). However, the range of relations these properties cover are limited. That is, as encountered in annotated excerpts, expressions of spatio-temporal relationality between things ("close," "below," and "upper") and ambiguous spatiotemporal descriptions and references in terms of absolute coordinates and measurements cannot be expressed by the ontology ("the rest of the building," "over time," "was"). Moreover, in the context of annotations of interpretative literaturespecifically the mobile app-based experience for nonspecialist visiting audiences to Çatalhöyük, where storytelling sits at the core of the textit was not possible to express relational qualities of space-time based on personal experiences such as "my beautiful home" or "the good times." It can therefore be argued that CRM in its current form is most compatible with a conceptualization of space as an absolute phenomenon, being able to express references and descriptions regarding absolute space more comprehensively.
It was especially in the case of annotating the mobile application text that the difficulty in annotating some of the archaeological interpretations with CRM became clear. In order to resonate with wider public audiences, the storyline was written as a semifictional narrative linking the archaeological data to hypothetical former residents of the site of Çatalhöyük. What the archaeological interpretations that cannot be annotated with CRM have in common is that they are not based entirely on empirical evidence. Accordingly, the veracity of these textual statements about the past cannot be determined. Consider, for instance, expressions about the affective atmosphere in Building 52 by one of the fictional occupants described in the application. More specifically, consider the emotion-laden sensual human experiences described in these excerpts, such as remembering "the scent of baked foods coming from the oven." The main difficulty here is annotating entities such as "the scent" and "baked food," which actually do not qualify as an instance of high-level S15 Observable Entity (CRM sci ) class, as they cannot be observed (at present). This is unlike "the oven" that does qualify as an instance of S15 Observable Entity and more precisely B2 Morphological Building Section (CRM ba ) as it is still observable on site. Another example is the reference in the storyline to the big bull skull with horns as "scary," which is an affective quality that cannot be observed (at present). This rift between observable and unobservable entities caused by CRM in our case study is of course a familiar one. It is essentially a rift between positivist and nonpositivist epistemologies, the first being strictly based on observable/empirical knowledge. The tension has a long, well-known history in archaeological theory and practice, embodied by discussions around a range of false and unhelpful dichotomies (e.g., observational knowledge and archaeological interpretations, objectivity and subjectivity, and processual and postprocessual archaeologies).
There were also a range of entities represented in the Çatalhöyük texts that resisted annotation because "what they were" (the primary question CRM and other metadata ontologies are created to answer) was decidedly uncertain. Questions such as "what is living?," "what is becoming?," "what is home?," or "what is presence?" are, in fact, of philosophical interest and thus notoriously escape the grip of readymade answers. Notwithstanding the frequency with which we encountered this particular issue during the annotation exercise, which was not high, the tension between the inexhaustible nature of the "what is" questions and CRM's inclination to provide readymade answers for them was unmistakable. The resistance of these few words (living, becoming, presence) to annotation acted as a clear reminder that in today's archaeological practice, metadata ontologies are designed and function to facilitate representations of archaeological information, knowledge, and wisdom for the purposes of speaking for reality rather than proliferating questions about reality (Joronen & Häkli, 2017). Metadata ontologies serve to pin down meanings and identities once and for all without managing to take into account how those meanings and identities may get performed locally.
We also observed in the annotation process that there were words or phrases that had shifting meanings resulting in them being annotated with different CRM classes and properties in different contexts. CRM was able to express these entities in their alternative meanings and identities in our case study (unlike the examples of "living," "becoming," and "presence" discussed earlier). An example here are the entities that regularly match the definition of E53 Place (CRM Core) such as architectural spaces (e.g., Space 118) or rooms and buildings (e.g., Building 56) in sentences such as "This seasons (sic) work concentrated in Buildings 56 and 65" (Regan, 2006, p. 89). However, spaces, rooms, and buildings matched the definition of B1 Built Work (CRM ba ) and its subclasses in other contexts, as in the case of "excavation of remnants of Building 10 and Building 44" (Regan, 2006, p. 89). These observations acted as a reminder during our case study that ontological ambiguity is not a characteristic reserved for a few philosophical concepts. As nonrepresentational theory, among other philosophies and theoretical approaches, would stress (Thrift, 2007), it is almost always there.
A final major issue that emerged in our case study with the expressive capabilities of CRM is nonsimilarity. In the annotation exercise we observed that CRM functioned through a logic of similarity in the sense that expressing uniqueness, exceptions, differences, and alternatives was difficult, if at all possible. Take for instance the example of "Although there were some differences [,] both buildings displayed [a] similar layout with an arrangement of platforms and benches laid out along the eastern wall…" while discussing Buildings 56 and 65 (Regan, 2006, p. 89). The similarity in this sentence can be expressed through the CRM Core property P130 shows features of (features are also found on) with the additional possibility of providing further detail about the kind of similarity through P130.1 kind of similarity (Bekiari et al., 2021, p. 121). However, a comparable way to express the difference between the two buildings is not possible even at the very general level at which it is presented in this sentence (i.e., stating that there were some differences). Another example is "not every house had bull's horns in them … this was just one more way I knew our house was special" in the mobile application storyline. Here "special" cannot be expressed by CRM. These preliminary observations point to the tendency of structured data and CRM to homogenize data, information, knowledge, and insights through references to similarity, and absence of references to differences/otherness.

Structured Data are "Representations"
As discussed in Section 2, the growing empiricism of the so-called "Scientific Turn" (after Huggett, 2020a;Kristiansen, 2014;Sørensen, 2017) increasingly means that high-quality structured data, especially in the case of aggregated large data sets, are often uncritically taken at face value as primary reliable resources in archaeological research. Sure enough there has been some discussion of the limitation of such large data sets, highlighting both the complexities and the diversity of form and structure of digital data in archaeology (see, e.g., Boyd & Crawford, 2012;Dam, Austin, & Kenny, 2010;Kintigh, 2006;Snow et al., 2006;Van Valkenburgh & Dufton, 2020). This diversity and complexity has deep implications if it is not critically taken into account when interpretations are empirically drawn from large aggregated data sets. Ultimately we have argued, on the basis of the above-presented research, that rather than being associated with "facts" and "truth" (cf. Sørensen, 2017) archaeological structured data are better considered as yet another type of representation.
Like any representation, then, structured data has affordances that permit and inhibit certain sets of archaeological knowledge creation practices. At the point of acquisition, data are often encoded using data structures that reflect both the underlying world view (the ontologies) of the acquisitor and the preexisting scientific (archaeological) traditions within which they are operating (their epistemologies). In the end, digital data structures often reinforce, or perhaps ossify, data acquisition practices, which are themselves already rooted in long project histories and archaeological traditions (and dogmas?) that themselves demand rigid approaches to data management (both in digital and analogue modes). In this sense, databases (and their underlying analogue recording systems) privilege similarity. Queries are defined by classificatory taxonomiesby actively seeking similarities to classify. It is therefore hard to frame differences; even pulling out "different" objects must be done by defining a group of similar things that are not what you are looking for. As such, the ubiquitous "other" category has emerged (created precisely because the data cannot be classified as "typical"), acting as a database field in which to "bury" those data that do not conform to our predefined schema and taxonomies.
Çatalhöyük is no exception to these problems and indeed in some way epitomizes them. Here, the digital infrastructure of the project is effectively a representation of (being initially conceived as an exact simulacrum of) the project's underlying (analog) single context recording system (Berggren et al., 2015;Lukas et al., 2018). This type of recording system is particularly well suited to relational data models by virtue of the way in which it atomizes data pertaining to the metric and descriptive aspects of the stratigraphic sequence at the primary level: data which are easily translated into fields and entities. As a representation then, in this sense it is comparable to the way in which GIS favors the representation of distinctly Cartesian world views. Digital data management systems therefore privilege building upon or working with certain types of analog methodologies that produce data, which are "sympathetic" to an implicit preexisting analog data structure. Ultimately our perceptions and understanding of the site, the project, and the broader context of the archaeology we are trying to abstract (the world around us!) are fundamentally embedded in the archives we produce and in the narrative text that we construct from them. The implication here is that data and information that do not conform to a consensus of the "norm" are hard to situate within any type of structured data.

Structured Data are Not a "Neutral Resource"
This relationship between modes of practice in archaeology and the data/knowledge we produce as archaeologists should come as no surprise, but it is useful to spell it out. If we accept that structured data (large or small in size) are an imperfect, theory-laden mode of representation, like the narratives which they underpin, then it is useful to critique them within the context of the emergence of postrepresentational approaches to science (e.g., science and technology studies: Latour, 1987;Latour & WooIgar, 1986;or nonrepresentational theory: Thrift, 2007). Recent theoretical trends within archaeology related to the "material turn" and calls for a "posthuman" or "symmetrical archaeology" (see, e.g., Haraway, 2006;Olsen, 2010;Witmore, 2007) seek to decenter our anthropocentric understanding of the past, flatten out ontologies and shift our focus upon the relationships between "people" and "things." Crucially all these approaches recognize that our positionality and relationships with the archaeological resource as we interact with it, record it, and interpret it form part of a more complex whole and directly affect our understandings of the past. If we consider archaeological databases that hold structured data not as containers of knowledge but rather as artifacts of previous archaeological knowledge making episodes (Verran & Christie, 2014), then in this sense, digital technologies should be seen as a "thing," "agent" (or "actant" after Latour), in their own right. Structured data sets, therefore, embody their own set of socio-technical relationships between the "archaeologist"/"archaeological material culture"/"people in the past," ultimately privileging our particular understanding/interpretation of those people and their material culture. These data sets cannot be neutral or objective; ultimately any attempt to frame them as such is a myth.
The case study presented here will resemble applications that involve enriching texts with metadata, including their marking up with standard ontologies, or uncontrolled vocabularies and semantic relations. Particularly noteworthy in this context are the significant developments in the context of NLP and, more specifically, named entity recognition (NER) within the last decade or so, which aim to automatically index textual documents and retrieve entities, documents, and information. Within the discipline of archaeology, these machine learning approaches have demonstrated the main ways in which metadata standards can be used to great effect to make archaeological textual information more findable within large digital archaeological infrastructures and more generally on other web services (e.g., Brandsen & Lippok, 2021;Brandsen, Verberne, Wansleeben, & Lambers, 2020;Felicetti, 2017;Richards, Tudhope, & Vlachidis, 2015;Vlachidis, Binding, Tudhope, & May, 2010;Vlachidis et al., 2017). However, despite their promises, NER algorithms remain subject to the same limitations as the preset metadata categories they rely on to retrieve required information from archaeological texts (see Huggett, 2021, p. 422). After all, they employ these metadata sets for the annotation of entities within training data sets by experts. Once NER models are successfully trained, they can accurately identify entities originally annotated in the training data sets while they manage to bypass certain issues such as synonyms and polysemy (see Brandsen & Lippok, 2021). This means that NER technologies manage to automate and, hence, speed up text mark-up that we manually carried out in this study. However, the limitations as to what can be marked up or annotated are not about the speed of that application but rather about the initial annotation of entities in training data sets. As such, the key to more inclusivity here cannot be NER, or more advanced technologies, but rather the possibilities provided by the metadata (e.g., vocabularies, semantic relations, standard ontologies) used in the text mark-up process.

Structured Data May Not Be a "Democratizing Trend"
Furthermore, there is a danger that uncritical overreliance on structured data for larger disciplinary synthesis may in fact be reinforcing the structural fault lines inherent in our underlying disciplinary ontologies and epistemologies; in other words, these data may be inadvertently promoting or reinforcing forms of social injustice, colonialism, and neoliberalism. Given that intrinsic to the intent of CRM is a desire to ensure accessible and dynamic uses of cultural heritage data, the fact that the ontology itself manages to strip away much of the context that provides the means by which wider audiences might derive relevance and meaning from the data should be a cause for concern. The predicament evidenced through our annotation experiments is even more pressing because CRM's strength is arguably precisely that its event-centric nature aims to foreground context and hence offers more possibilities for representing complex relationships between entities (after Canning, 2019).
The expressive failures witnessed in our mapping exercise are, in fact, widely recognized in relation to structured data and their infrastructures (including ontologies), with many noting that they are mostly devoid of affective, sensuous, agential, or embodied experience (Krmpotich & Somerville, 2016;Labrador & Chilton, 2010;Reid & Sieber, 2020). Most importantly, a growing interdisciplinary community of practitioners appreciates that such failures are directly linked to systemic bias, social inequity, and racial injustice (e.g., Sanderson & Clemens, 2020;Turner, 2020). The possibilities to proliferate digital data-based representations to make space for ethical and political interventions are obliterated by the structuring of the data with ontologies that unapologetically overlook such experiences. Efforts to rectify these biases range from archival redescription (Sutherland & Purcell, 2021), to ethical revisions to metadata standards (Farnel, 2018), to felt-experience extensions to extant conceptual models (Canning, 2019), to the development of alternative "fluid ontologies" (Srinivasan, 2018). This imperative for fundamental change to our data infrastructures is acknowledged by many, from data curators (e.g., Pringle, 2020;Yale University Special Collections, 2021) to those overseeing whole conceptual models (e.g., CIDOC CRM, 2021b).
Archaeology is perhaps well placed to address many of these issues because of the rapid pace of innovation in our digital data capture technologies. This includes the continuing rapid evolution of mature technologies, such as remote sensing and survey technologies, GIS and database management systems (DBMS), and newer and emergent technologies such as 3D technologies and virtual/augmented realities, all of which are increasingly affordable, usable, and integratable as hardware technologies catch up with software capabilities. Yet we would argue that the inclusion of these technologies into our methodologies are bound by data workflows that still fail to capture important descriptive information, emotion, personal values, and nuanced understanding from a variety of viewpoints, thereby contributing further to the predicament outlined above. Even as community-driven or participatory practices grow in popularity, the fundamental redesign of our workflows, methods, and data to embed communities and community values at their core is still lacking (Dolcetti, Boardman, Opitz, & Perry, 2021).

Conclusion
In this article, through a process of manual annotation and mapping of short excerpts from a variety of archaeological texts to the CIDOC Conceptual Reference Model, we have sought to examine and compare the representational affordances and resistances of texts (which we see as just one type of unstructured data) and structured data. The case study is considerably limited in scope especially in terms of its focus on a single site (Çatalhöyük), its small number of annotated words, and its single annotator (P. Hacıgüzeller).
As such, the results we present should be considered preliminary and as a call for more exhaustive research on the topic.
Our intent here is not to discredit structured data or otherwise suggest that they lack value, utility, or potential. We neither seek to discredit CIDOC CRM or metadata standards in general. Rather, we apply this methodology of mapping archaeological knowledge to CIDOC CRM in an effort to surface gaps and partialities in the data captured by the model, and thereby to catalyze discussion of both the implications of such omissions and the means to navigate or rectify them in the future. In this way, we follow Huggett's (2020a, p. 13) suggestion of conducting "inverse modeling," in which "tacit elements in terms of what is 'left over' from or does not fit within the formal model" are exposed. From there, the groundwork is laid to truly begin to understand the complexity of the archaeological knowledge-making process and its distinctiveness from the practices of other disciplines, for instance, laboratory sciences (Huggett, 2020a, citing the work of Leighton, 2015). As Goodwin and Urbaneja's (n.d.) overview of the application of CRM to their Worlding Public Cultures Project hints, much of the critique of structured data ultimately collapses into deliberations about terms and labels and repeated exposés of bias in the tools used to make and manage these data. Meanwhile, the database and its structures persist.
Fortunately, as noted above, a number of initiatives have begun to emerge to grapple with this predicament, focused on directly intervening withrebuilding or extendingour conceptual models and data standards, our communities of practice, our technologies, and their design. Accordingly, and as a crucial next step, we can actively participate in and follow along with such work. The CIDOC CRM community, for instance, recently initiated "Issue 530: Bias in data structure" wherein first attempts have been made to define a bias awareness statement for CIDOC CRM (CIDOC CRM, 2021a,b), as well as to bring together the existing discourse about bias, consider how non-Euro-American epistemologies are accommodated (or not) in the CRM, and constitute a working group (Goodwin & Urbaneja, n.d.). The remit of the latter is to drive forward genuine change via a series of steps, including discussing key questions of bias (e.g., "Which forms of bias in data structures can interfere with cultural points of view, and what empirical or theoretical means [do] we have to detect them?"). The group also seeks to publish a bias statement for the CRM specification document, create criteria for examining classes and properties, and put forth new issues to the community in order to ameliorate the CRM (see CIDOC CRM, 2021b).
More broadly, and in line with the literature cited above, we can commit to deploying codesign and design justice methodologies (e.g., Costanza-Chock, 2020; see Dolcetti et al., 2021 for a discussion of potential applications to heritage and archaeology) in all future efforts to elaborate (or define new) data structures in archaeology, as well as in conceiving of any new representational output. Here we would be working to realize what Reid and Sieber (2020, p. 229) call a "socio-semantic web," drawing in the learnings and innovations around fluid ontologies reported by Srinivasan (2018), in order to nurture "a semantics dependent on the human subject and on ontologies updated through cooperation and debate, in contrast to an ontology defined by fixed standards." Simultaneously, we can heed Reid and Sieber's (2020, p. 227) proposition to attend to the tools we might enroll in supporting such worknamely, ontology development softwarebuilding better technological options that might thus "…allow for more elaborate ways of deploying relationships…." Alongside these actions, we must continue to recognize and tackle the consequences of the fact that structured data are "representations," and as such they are inherently laden with ethical and socio-political meaning. To truly understand the implications here, we should take seriously Huggett's (2012), now decade-old, call to conduct ethnographies of our "e-ontologies." From a science and technology studies inspired perspective, he argues for the ethnographic study of practitioners who are building the ontologies we use in archaeology in order to understand how these ontologies come into being and how the "decisions, policies and strategies" that informed the process then manifest in different ways through application. To date, such ethnographic e-ontology research is very rare in archaeology; Huggett, for example, cites just one previous example: Khazraee and Khoo (2011). However, outside of the discipline it has been pursued by sociologists, geographers and others, demonstrating its utility not only in creating more "nuanced metadata" (Schuurman, 2008), therein improving the standards themselves, but also in understanding how we "internalise," interpret, and manage data structures (Burns & Wark, 2020), with profound impacts on the robustness of the resulting knowledge produced from those structures. It is arguably precisely through pursuing such detailed ethnographic analysis that we can inform and facilitate the various movements to engender discussion, innovation, and change outlined above.
Finally, knowing that all representations have certain affordances and constraints, and that we can never create a perfect representation of the past, we should be working to proliferate different types of representation rather than limiting ourselves to a select few such as structured data/databases, articles, books, and so on. Mickel (2020, p. 556) alludes to how those seeking to push back against the limits of standardization in archaeology have sought out representational forms (e.g., experimental online publications) that are more "expansive" and hence potentially more likely to "capture the unusual and unexpected." Lucas (2019, p. 94), synthesizing the work of Katherine Hayles, notes that forms such as database and narrative need not be ranked or set in opposition but rather can be understood as complementary. More generally, one might argue that a wider variety of representations should thus allow more diverse and, indeed, representative audiences to engage with archaeology. Proliferating archaeological representations, arguably, also allows us to proliferate the questions we ask in and of archaeology (see Thrift, 2007).
Ultimately, though, we need to cultivate (more) representations that encourage and reflect emotion and embodiment, equivocation and difference, and partiality and closeness, as well as nurture what Mickel (2020) calls "proximity." Moreover, we must acknowledge that most such representations, if they are ever to be used by others, will inevitably pass through some system that aims to structure them to become findable. It is imperative, then, that we prime these structures to manage intimacy, ambiguity, incommensurability, and interpretative nuance, using the already-tested methodologies highlighted in this study (e.g., ethnographic study to elaborate metadata and to expose spaces in our infrastructures that can accommodate more nuanced extensions; creation of new such extensions with a focus on incorporating felt emotions and difference; design of new ontologies "led by" representative and diverse nonspecialist communities and not by existing heritage practitioners; and redesign of the technologies that support such ontology development, again led by local communities and prioritizing local values). It is only in doing so that we can work to ensure these fundamental forms of archaeological knowledge and reasoning are not systematically left behind.
Acknowledgments: The authors would like to explicitly thank the editorial team for their continued support and patience throughout the writing process. We extend our great thanks to the many colleagues with whom we have collaborated on the Çatalhöyük Research Project over the past 15 years. We also wish to acknowledge our two anonymous reviewers, whose very meaningful feedback we have aimed to fully incorporate in the final article.
Funding information: The authors state that no funding was involved in the preparation of this article.
Author contributions: All authors prepared data sets for use in the research. PH annotated the data and PH authored Section 4: Summary of Results. PH authored Section 2: Structured Digital Data (and Texts as Unstructured Data) in Archaeology and Section 3: Mapping Çatalhöyük Texts to CRM with major contributions from JST and minor contributions from SP. JST authored Section 5: Discussion with major contributions from SP and PH. JST and SP authored the Introduction with minor contributions from PH. SP authored the Conclusion. JST prepared the majority of the figures and tables with additions from PH. SP took the lead in the revision process on the basis of the comments of two anonymous reviewers. All authors contributed equally to the final editing. The authors applied the SDC approach for the sequence of authors.