Spatial Analysis of New Testament Textual Emendations Utilizing Confusion Distances

Abstract Before the interpretation of any text can start, the original wording of the text itself must be critically established. Conventionally, this is done following qualitative criteria. This article, however, explores the application of spatial analyses to New Testament textual criticism by demonstrating how the Levenshtein edit distance could be adapted to calculate confusion distances for variant readings in New Testament manuscripts, i.e. the possibility that a (combination of) letter(s) is confused by another (combination of) letter(s). Subsequently the outcomes are translated to Euclidian space using classical Multi-Dimensional Scaling, which enables visualisation and spatial analyses (in this case not related to geographical space). The article focuses on the data preparation and algorithm to make the data suitable for spatial analyses, thus providing the New Testament textual critic with new analytical tools.


Introduction
The original documents of almost all ancient writings have been lost, and the writings of the New Testament form no exception. Therefore, before any interpretation of a New Testament text, a researcher first must face the challenge of establishing its original wording by critically evaluating the differences in the existing manuscripts. The discipline of textual criticism provides criteria for systematic evaluation of such texts. Besides identified differences, there are texts where the different manuscripts do correspond, but where the content of the text puzzles the researcher. In these cases, some researchers assume a corruption of the text and emend the text by conjecture. Both the establishment of the original text from differing manuscripts and conjectural emendation are traditionally based on qualitative criteria, which is not to say that the discipline does not utilize quantitative methods.1 In this paper, we propose a method to estimate the probability of palaeographic confusion to explain the origination of conjectural emendations. Therefore, we introduce the confusion distance, a quantitative metric which indicates the relative proximity in orthography of alternative readings. This metric is based on the Levenshtein edit distance but is here expanded in two directions. First, our algorithm now accounts for the probability of a particular combination of (adjacent) letters; these combinations can be provided by the user as a confusion table. The table used in our experiments (see Table 5 in Appendix) was derived by the authors using data from Metzger2 and Rutgers,3 and provides a first approximation of the ease with which certain letters or combinations of letters could be confused. The probability score was based on the experience of a textual critic in dealing with manuscripts.4 Second, our algorithm evaluates three additional operations (contraction, explosion, and complex substitution) besides the three operations provided in the original Levenshtein algorithm (which are substitution, insertion, and deletion). The resulting distances of words are subsequently spatialized, i.e. translated to a two-dimensional non-geographical space utilizing Multi-Dimensional Scaling. To demonstrate the potential of our confusion distance, we apply spatial analysis to evaluate the probability of the originality of variant readings. To our understanding this is the first time spatial analysis and a quantitative metric are used to compare the orthographic features of textual variants in New Testament manuscripts.
This article is structured in seven sections. Since spatial analyses are relatively new to the field of New Testament textual criticism and, conversely, textual criticism may be an unexplored area for the spatial scientist, sections 1 and 2 contain some background information and references to important literature. In section 1 we elaborate on the transmission of manuscripts and introduce the reader to the disciplines of textual and conjectural criticism. Section 2 provides criteria for equating words, evaluates the appropriateness of existing metrics to establish edit distances, and describes our adaptations to the Levenshtein algorithm to better simulate transcriptional confusion. In section 3 and section 4, we use two case studies to experiment with the application of spatial analysis to the results from our algorithm. We conclude with a discussion of our findings and recommendations for further research in section 5 and section 6.

Scribal errors in the transmission of manuscripts
Before the invention of printing (around 1450 CE in the Western world), the multiplication of documents was performed by copyists. In a digital age like ours, the painstaking effort, which was basic to the multiplication of written documents in the past, is easily overlooked. Metzger and Ehrman illustrate the physiological effects of the prolonged labour of copying by a traditional formula appearing at the close of many manuscripts: "Writing bows one's back, thrusts the ribs into one's stomach, and fosters a general debility of the body."5 The available manuscripts for the New Testament works show both resemblance and variance with the textual traditions of other ancient works. Like other ancient texts, the autographa (the original manuscript from the original author) of the New Testament are not available.6 The perishable materials used for writing had a significant impact on the sustainability of the manuscripts. While moisture was devastating for papyrus, drought was disastrous for wooden writing materials. Only a few places offered the right conditions for the conservation of ancient texts.7 Considering the availability of manuscripts on the other hand, more than 5,000 ancient manuscripts for the Greek New Testament are extant, which is an unusual amount of textual evidence for ancient manuscripts.8 The first substantive portions of the New Testament text date from the third and fourth centuries CE.9 Although the texts have been transmitted from generation to generation with great care, inevitably differences between the several manuscripts exist.  Over the ages, writing style, script, and material used for manuscripts evolved.10 The earliest New Testament texts have survived in papyrus codices, but parchment and eventually paper gradually became the common media for copying the texts. The choice of script also changed from majuscule script (which shows resemblance with our system of capital letters) to minuscule script (which could be compared to modern small italic letters). In the case of majuscule scripts, scriptio continua was usually applied. In effect, spacing between words and punctuation are scarce, and words are often split across lines without hyphens. Minuscule script, in contrast, contained spaces between words. An impression of the different scripts can be gained from Figure 1.11 Nowadays, Greek New Testament manuscripts are classified into four categories: papyri, majuscules, minuscules, and lectionaries.12 The classification system is based on three criteria: writing material, type of script, and content. The timeline in Figure 2 summarises the history of textual transmission.  Figure 2. Different types of manuscripts and dates of occurrence.13

Textual criticism
Mistakes in the transmission of texts were likely to occur during activities of reading (or hearing), remembering, and writing the contents of the original manuscript and were easily made due to bad sight, letter confusion, sloppy handwriting, misinterpretation of abbreviations, attrition, lack of attention or simple stupidity. In effect, variant readings were produced containing differences in punctuation and misspellings, but also alterations of words or omission of complete verses or paragraphs.14 In addition to this unintentional production of errors, copyists sometimes also intentionally altered the reading of the same text, perhaps motivated by their understanding or dogmatic convictions.15 To account for this existence of variant readings, and given the lack of autographs (originals), the aim of textual criticism was traditionally perceived as the reconstruction of the original text from available manuscripts.16 However, this definition has been increasingly criticized due to the ambiguity of the terminology.17 For our discussion, we adopt the goal of the Editio Critica Maior (ECM): textual criticism aims to establish the "initial text" or Ausgangstext of a document. This Ausgangstext (hereafter, Aus) must be distinguished from the "original text" or Urtext.18 Very early in the process of copying the Urtext, the original readings might have been lost without leaving a trace in the surviving manuscripts.19 On the other hand, Aus must also be distinguished from the "established text" in our critical editions for the simple reason that some readings cannot be attributed to Aus with sufficient certainty. In such cases, the only reasonable conclusion for the editor is postpone the decision and to inform the reader about the difficulties in establishing Aus. For the following discussion on conjectural emendations, it is important to note that scribal changes are both presumed between Urtext and Ausgangstext or between attested readings and the Ausgangstext.20 To establish Aus, generally agreed principles are applied to distinguish between intrinsic (how would an author have written) and transcriptional probabilities (how would a scribe have transcribed) in the transmission process of the text. This is accomplished by asking whether any of the readings may be the result of "scribal slips, errors, or alterations in the copying process [… or …] scribal tendencies to smooth over or resolve difficulties rather than create them, to harmonize passages, and to add rather than omit material … the variant most likely to be original is the one that best accounts for, in terms of both external and internal considerations, the origin of the others."21 13 Loader and Wischmeyer, "Twentieth Century Interpretation"; Parker, An Introduction to the New Testament Manuscripts and Their Texts. 14 Holmes, "Reconstructing the Text of the New Testament." 15 Cf. Metzger and Ehrman, The Text of the New Testament, 259-271. 16 Holmes, "Reconstructing the Text of the New Testament." 17 Cf. Wasserman and Gurry,A New Approach to Text Criticism,11. For an overview of the debate, see Holmes, "From 'Original Text' to 'Initial Text.'" 18 Aland, "New Testament Textual Research, Its Methods and Its Goals," 16-17. 19 Cf. "Between the autograph and the initial text considerable changes may have taken place for which there may not be a single trace in the surviving textual tradition. Even if this should not be the case, differences between the original and the initial text must be taken into account." Aland, "New Testament Textual Research, Its Methods and Its Goals," 17. 20 So far ECM has adopted conjectures at 2 Pet 3:10 (cj11713) and Acts 13:23 (cj10092). 21 Holmes, "Reconstructing the Text of the New Testament," 180.
Traditionally, the discipline has been concerned with existing variant readings, which are known from manuscripts, glosses, and lectionaries; however, the discipline has broadened its scope to gain insight into the transmission history of texts and, hence, into the convictions and guiding principles of the transmitting communities.

Conjectural criticism
Sometimes deciding between existing competing variant readings is not enough. Scholars sometimes face difficulties in the text, such as logical contradictions and inconsistencies, and "cannot assert that the original form of the text has for certain survived at every point somewhere or other among our witnesses."22 According to Metzger and Ehrman, therefore, the "only remaining resource is to conjecture what the original reading must have been."23 These so-called conjectural emendations (speculative alterations of the texts for which no manuscript evidence exists) have also become the object of scrutiny for the textual critic.24

John the Baptist's food as an example
The practice of conjectural emendation can be illustrated from Matt 3:4 and its parallel text Mark 1:6. In these passages the character of John the Baptist is introduced in the narrative. John wears a camel skin garment and is girded with a leather belt. According to the textual evidence John ate locusts and wild honey (ἀκρίδες καὶ μέλι). Although there is no reason to doubt the reading uniformly attested by the manuscript evidence, the text nicely sketches how conjectures originate and is therefore suitable to illustrate the study of conjectures as historical phenomena. In this study, the researcher is not so much concerned with emending the text with the most suitable conjecture, but rather with the reconstruction of the reasoning which led to the origination of the conjectures for the particular locus.
Any conjecture starts with an observation on the text, in which a critic is guided by some preunderstanding that leads to the detection of an oddity. In our example text, the substance of John's food has puzzled some critics: how could someone possibly eat insects? Others presumed John must have been a vegan and they therefore raised objections to the reading "locusts." After the detection of the textual problem, the critic needs to suggest an alternative that (1) fits the grammatical function of the disputed reading, (2) makes sense in the internal logic of the text, and (3) solves the assumed difficulties. In John the Baptist's case, some critics have suggested emendations, including cake (ἐγκρίδες),25 coconuts (καρίδες), sea-crabs/shrimps (γαρίδες), wild pears (ἀχράδες), crops (ἀκρεμώνες) or root and fruit (ῥίζας καὶ καρπόν).26 Here we observe that speculations cannot be boundless: (a) the proposed alternative must have the same grammatical function in the text and should therefore be a noun. (b) However, not every available noun in Greek is suitable, since the internal logic of the text demands something that can be eaten. (c) Likewise, not everything that can be eaten is suitable since it must fit within the contemporary context. Having John eating a Big Mac would be anachronistic (and ridiculous). (d) Furthermore, not all food available during the time of John fits in the geographical context of the narrative. It is, for instance, hard to conceive how John, living in the desert, would have been able to catch shrimps. To summarize, the credibility of a conjecture is restricted by grammar, semantics, and its historical, cultural, and geographical suitedness.
Finally, the critic must also explain how the attested reading or readings could have originated from the proposed conjecture. Usually, a very early corruption during the transcription process is assumed, which could have been caused by palaeographic or phonetic confusion of letters.
In the example of John the Baptist's food, it is not hard to understand how ακριδεσ27 (locusts) could easily be confused with καριδεσ (coconuts). Such a confusion only requires the transposition of the letters α and κ. In the case of γαριδεσ (sea crabs), two confusions might have occurred: first the substitution of the letters γ and κ and second the transposition of the letters α and κ. This second example is a bit more complex, but the combination of a phonetic and a palaeographic confusion is still conceivable. The other alternatives seem less likely due to palaeographic confusion.

Amsterdam Database of New Testament Conjectural Emendation
An important tool to study the conjectures critically is the Amsterdam Database of New Testament Conjectural Emendation (ADNTCE).28 This database contains approximately 6500 conjectures for the New Testament text, collected from theological literature, such as commentaries. It also includes data on the discussion of particular emendations. Unfortunately, the data is thus far presented in tabular form (see Figure 3) which restricts analysis to individual conjectures and makes an analysis of the filiation of conjectures difficult.

Summary
An enormous amount of manuscripts are available for the New Testament, but due to differences, lack of the originals and additional speculation, textual criticism aims (1) to reconstruct the initial texts and (2) to study the history of textual transmission to gain insights in the convictions of the transmitting communities. Today both are not limited to existing manuscript evidence (variant readings), but also encompass speculations (conjectural emendations). This material will be used in the following analyses.
In previous paragraphs, we discussed the ways in which textual critics deal with transcriptional and internal difficulties to reconstruct the original text and what insights are gained from the history of textual 27 In the remainder of this article we use Greek majuscule script. In the earliest period of textual transmission this was the commonly used type of script and, therefore, it best simulates the palaeographic appearance of the earliest texts and provides insights in the probability of confusion of typical letter combinations. 28 Krans and Lietaert Peerbolte, "The Amsterdam Database of New Testament Conjectural Emendation." transmission. One of these insights is that not every suggestion is equally probable. Some alternatives are more related, (i.e., in closer proximity) while others are more distant (i.e., unlikely). As we have seen, textual criticism tries to establish how one reading could have originated from another using qualitative evaluation criteria. Palaeographic confusion is a feature of textual transmission that often explains the origin of different readings.

String matching and edit distances
Algorithms for string matching which have been developed within the field of computer science might be helpful for approaching textual variation from a different angle.29 These algorithms calculate edit distances to quantify the relationship(s) of strings. In this section, we first establish criteria for assessing the applicability of algorithms. Next, we explore existing algorithms and evaluate their applicability to textual criticism. Finally, we propose our own algorithm, which basically is an extension of an existing algorithm.

Evaluation criteria
An algorithm should simulate the process of textual corruption in the case of transcriptional confusion and should be based on the palaeographic appearance of characters. Therefore the algorithm must at least account for (1) the comparison of strings of different length, since the length of a conjecture is not always equal to the length of the reading found in the manuscripts; (2) a minimal set of operations to change a string into another string, i.e., insertion, deletion, substitution, and transposition of characters; (3) the dissimilarity of words instead of their resemblances, i.e. we are interested to know in which way strings differ; (4) the outcome must be reciprocal, i.e., the calculated distance based on the operations to change string a into string b should be the same as the calculated distance to change string b into string a; and (5) the probability of confusion of characters. The underlying assumption is that the more similar two characters are, the more likely they can be confused.
In a handwritten English text, it is easy to confuse a small letter L (l) with a capital letter i (I) or even with the number 1. Likewise, when writing a text in majuscule script, it is, for instance, more likely to confuse an Α for a Δ than an Α for an Ε. To elaborate on this a bit more, specific combinations of characters also are likely to be confused. For example when Γ and Ι appear as adjacent characters (ΓΙ) within a word, a confusion with Π is not difficult to perceive.

Edit operations and existing string matching algorithms
Multiple functions have been developed outside the domain of the New Testament to measure the (dis-) similarity between strings and these all conform to a basic form: The distance δ(x,y) between two strings x and y is the minimal cost of a sequence of operations that transform x into y (and ∞ if no such sequence exists). The cost of a sequence of operations is the sum of the costs of the individual operations. The operations are a finite set of rules of the form δ(z,w) = t where z and w are different strings and t is a non-negative real number. Once the operation has converted a substring z into w, no further operations can be done on w.30 Most commonly implemented operations in string matching are insertion, deletion, substitution, and transposition (see Table 1), although the actual number of operations implemented within several functions differs.
29 The concept string is used in computer processes to define a piece of text consisting of letters, numbers, and/or symbols. String matching is a process to establish the (dis-)similarity of strings. An edit distance is a metric (i.e. unit of measurement) to express the (dis-)similarity of strings and it quantifies the number of operations to change string a into string b. 30 Navarro, "A Guided Tour to Approximate String Matching," 37. According to Navarro,31 four metrics are most prominent in string matching, but despite the fact they are commonly used, we should discard the Hamming distance,32 the longest common subsequence (LCS),33 and episode matching.34 These metrics do not fit the required type of operations. (Hamming only allows substitution, LCS only allows insertions and deletions, and episode matching only allows insertions.) Furthermore, they do not meet our criteria of complexity, dissimilarity, and reciprocity. The Levenshtein distance,35 however, has potential for estimating the probability of palaeographic confusion to explain the origination of conjectural emendations (and likewise, but secondary, textual variants). It measures the minimal number of insertions, deletions, and substitutions of one character for another that will transform one string into the other. The distance is also reciprocal and might "be useful in spelling correction, where for example because of the conventional keyboard arrangement it may be far more likely that a character 'A' be mistyped as an 'S' than as a 'Y.'"36 We will use the Wagner-Fischer implementation since it is available in many programming languages, including Python.37

Expansion of the algorithm
To even better meet our requirements, we have tailored the Levenshtein algorithm (1) by providing a confusion table (see Appendix) which contains character pairs together with an integer indicating the probability of palaeographic confusion; and (2) by adding three sophisticated operations to simulate better the origination of scribal errors. We can summarize our adaptation of the Levenshtein algorithm using a mathematical function: the confusion distance between two strings a,b (of length |a| and |b| respectively) is given by confdist a,b |a|,|b| where To avoid bias, we added two constants: 3 for contractions and explosions and 5 for complex substitutions. These values guarantee that a combination not present in the confusion table will always result in a value higher than the ones resulting from other, simpler, operations. Furthermore, using the different constants 3 and 5 resembles the complexity of the operation.

Methodology
Until now, researchers evaluated textual differences and conjectural emendations by well-established qualitative norms, but the central thesis of this paper is that the probability of palaeographic confusion can also be evaluated by quantitative means utilizing spatial analysis methods.
The expressions "he is a close relative of mine" or "their views were miles apart" illustrate that spatial metaphors are omnipresent in everyday language to explain abstract concepts and their relatedness.38 To take advantage of this spatial language for visualisation, several researchers developed methods for information visualization and analysis. These methods are identified under the umbrella "spatialization," which Yuan defines as the process of transforming "non-geographic data to spatial forms for visual analysis."39 As such, spatialization should be distinguished from various geocoding techniques that aim to extract geographical references from unstructured text. 40 Transforming raw data into a visual form is dependent on the data's degree of structure and size. Data can be structured, semi-structured, or unstructured and this characteristic influences the necessity for previsualisation manipulation. Furthermore, the size of the raw data determines whether a specific technique is applicable. Self-Organizing Maps (SOM) are for instance very suited for large text corpora, while Multi-Dimensional Scaling (MDS) best fits small data sets.41 Due to the limited size of the conjectural data, we will apply MDS for spatialization.
MDS has been applied previously to visualise unknown geographical data in geographical space. For example by Tobler and Wineburg to estimate the geospatial locations of merchant colonies in Bronze Age Anatolia.42 The technique has also been used by Louwerse et al.43 and Louwerse and Zwaan44 to visualize locations from large text corpora like newspaper archives. These two researches obtained the locations from the texts using Latent Semantic Analysis. Davies applied MDS to explore the geographic component of large-scale semantic networks contained in text and cognitive geographies.45 Additionally, MDS has been used to visualize non-geographic data in non-geographical space, for instance by Goodchild and Janelle to spatialize the interrelatedness of special interest groups within the American Association of Geographers;46 by Skupin to spatialize articles from the New York Times based solely on the information content;47 and by Old to enable spatial analysis and visualization of co-citation data.48 Although all these studies spatialize the individual entities of interest using MDS, our approach deviates from these studies in several ways. Considering pre-visualisation manipulation techniques to define the mutual distances between the entities, Louwerse et al., 49 Louwerse and Zwaan,50 and Davies51 used Latent Semantic Analysis (LSA); Tobler and Wineburg52 interactively defined them, and Old53 re-used data from previous research without explicitly stating the distance retrieval methods. In contrast to these studies, our study proposed the palaeographic confusion distance to establish these distances.
Furthermore, Tobler and Wineburg, 54 Louwerse et al.,55 Louwerse and Zwaan,56 and Davies57 aim to establish the geographical location of unknown geographical places, while we are approximating the relative locations of conjectures in palaeographic confusion space. We exemplify this space using two cases: one use case examines the food of John the Baptist, and another looks at alternatives for the toponym Judea. As such our study is more related to studies that apply MDS to abstract spaces.58 In the remainder of this article, we develop a methodology to measure palaeographic confusion between textual variants and experiment with spatial analysis, thus integrating concepts from textual criticism, computer science, and spatial science.
Starting with a set of conjectural emendations for a particular text, the first step in our approach is to adapt this set for processing in our algorithm. Therefore, an array containing all individual variants/ conjectures is translated to a table. In addition, we developed an algorithm which we implemented in Python to calculate the confusion distance for each combination of words in the array.59 This algorithm results in a distance matrix.
Next, we translate the data in the distance matrix to Euclidean space using an existing Python implementation of classical MDS. MDS is a visualization technique to analyse the (dis)similarity of data. It attempts to model such data as distances among points in a geometric space. This is useful when one "wants a graphical display of the structure of the data, one that is much easier to understand than an array of numbers." Since MDS seeks to find the most optimal visualisation of multi-dimensional phenomena in lower dimensional space within a given time frame and with a minimum of distortion, the results are only an approximation of this correlation.
Our MDS analysis results in a file containing x,y coordinates for each entry in the array. Finally, we analysed the data with proximity tools and visualization techniques. This approach is summarized in Figure 4.

Results
We test our approach with two case studies. The first case study uses the example on the food of John the Baptist, while the second scrutinizes the conjectures on the toponym Judea in Acts 2:9.

Case study 1: the food of John the Baptist
In section 1.2.1 we used the conjectures which were proposed for the food of John the Baptist as an example. We will now apply our approach to this case to demonstrate the preparation of the data for calculation of a confusion matrix and its subsequent translation to Euclidean space and apply spatial analyses. As we have already mentioned, several conjectures have been suggested as a substitution for the locusts and wild honey (ακριδεσ και μελι) in the diet of John the Baptist: coconuts and wild honey (καριδεσ και μελι), cake and wild honey (εγκριδεσ και μελι), shrimps and wild honey (γαριδεσ και μελι), wild pears and wild honey (αχραδεσ και μελι), crops and wild honey (ακρεμωνεσ και μελι), and root and fruit (ριζασ και καρπον). Feeding this array of conjectures into our algorithm results in a distance matrix, shown in Table 3. 59 The software confdist is implemented as a command line application in the Python programming language and can be run on all three major operating systems. As input it takes a    We can, for instance, perceive which conjecture is closest to ακριδεσ (locusts), i.e. καριδεσ (coconuts); but it also builds a lineage of conjectures. For instance, is it necessary to presume a direct connection between a conjecture and ακριδεσ? We could argue on the basis of this figure that there could have been a sequence of scribal errors with its accompanying error propagation. Just as an experiment, we could assume γαριδεσ (shrimps) must have been the original, which was first corrupted into καριδεσ (coconuts), which was in turn corrupted into ακριδεσ (locusts). The MDS visualization supports this kind of reasoning, although it remains speculative. This experimental analysis could be taken one step further. From the x,y plot in Figure 5 we gain a general understanding of the clustering and grouping of the conjectures. However, we can simultaneously visualize the specific confusion distances for a particular conjecture, which is a single column in the distance matrix. In this way, we are able to equate the structure in the proximity for individual conjectures. We therefore applied the Natural Neighbor tool within ArcGIS 10.5, which interpolates a raster surface based on the weighted confusion distances with a particular conjecture and repeated this for each column (see Figure 6).
From the results in Figure 6 we can observe the following: -A palaeographic confusion of ριζασ και καρπον (cj12987, root and fruit) with either of the other conjectures is unlikely. This can be concluded from the results of the proximity analysis, which are definitely different than the results for the other conjectures and also from the distances with all other conjectures. A similar conclusion could be drawn for ακρεμωνεσ (cj13821, crops), but one should observe that the majority of other conjectures is less distant than in the case of ριζασ και καρπον. In other words, if we had to choose between ακρεμωνεσ or ριζασ και καρπον, we deem the first to be more likely the consequence of palaeographic confusion.
-The results of the proximity analyses for ακριδεσ (NA28, locusts), αχραδεσ (cj11183, wild pears), and καριδεσ (cj11182, coconuts) are most equivalent in their graphical visualization. From this we can conclude that in these three cases the mutual confusion distances between the different conjectures show significant correspondence. Likewise, γαριδεσ (cj*, shrimps, sea crabs) and εγκριδεσ (cj10147, cake) are correlated.
In the end, we cannot discard a conjecture based solely on this analysis, since these results need to be interpreted with caution (the results of MDS remain an approximation), and other considerations and arguments such as semantics, grammar, phonetics or even geography might add weight to the probability of a particular conjecture. For instance, although a palaeographic confusion with γαριδες might be probable, the suggestion does not fit the geographical setting of the narrative. However, this analysis is helpful to discern grouping and clustering in the data and stimulates reasoning about lineages between the conjectures. This provides another perspective to the domain of conjectural criticism.

Case study 2: Judea in the table of nations in Acts 2:9-11
A second example of an intrinsic difficulty in interpretation of a New Testament text which led to a vast amount of discussion and numerous conjectures can be found in the list of nations in Acts 2:9-11:60 " Parthians and Medes and Elamites and residents of Mesopotamia, Judea and Cappadocia, Pontus and Asia, Phrygia and Pamphylia, Egypt and the parts of Libya belonging to Cyrene, and visitors from Rome, both Jews and proselytes, Cretans and Arabians-we hear them telling in our own tongues the mighty works of God" (Acts 2:9-11, ESV).
Mapping these locations results in Figure 7: Several scholars observed three difficulties in this text which led them to question the authenticity of the nation Judea. We will only briefly summarize these issues to provide a basic understanding of the context:61 (1) the reference to Judea and hence Jews in verse 9 seems awkward since the list refers to Jews anyway;62 (2) the reference to Judea does not fit very well in the geographical arrangement63 between Mesopotamia in the east and Cappadocia in the north;64 and (3) the Greek word ιουδαιαν (Judea) should be regarded as an adjective, not as a noun and therefore does not fit the grammatical function in the sentence. 65 To solve these difficulties, several critics have proposed to exchange Judea for an alternative location. To date, at least eighteen66 alternative geographic locations have been suggested: Cilicia, Armenia, Ida (a mountain range on Crete), Iounaia, Ionia, Yaudi,67 Iberia, Bithynia, Adiabene, Aramea, Idumea, Lydia, Gorduaia, Lycia, Galatia, Gallia, India, and Syria.68 These locations are mapped in Figure 8. Since "ancient and modern times no one conjecture has proved generally acceptable,"69 and therefore we will use this case to test our methodology. First, we calculated the palaeographic confusion distance and created a distance matrix for the array of conjectures.70 These results are reflected in Table 4. Next, using classical Multi-Dimensional Scaling, we created Figure 9 from the distance matrix. This representation gives an approximation of the palaeographic distances among the conjectures and the reading found in NA28. Finally, instead of applying the same visualization techniques we used for representing the palaeographic confusion distances for John the Baptist's food (see Figure 5), we took advantage of the geographical character of these conjectures to experiment with multi-criteria evaluation (MCE).
In this experiment, we used the geographical locations and added the palaeographic confusion distance with ιουδαιαν (Judea) as an attribute. Next, we used the Natural Neighbor tool in ArcGIS 10.5 to create a palaeographic confusion raster -an interpolated continuous surface based on the weighted confusion distances of each toponym with Judea. Finally, we created a visualisation (see Figure 10) in which we displayed the geographical data on top of the palaeographic confusion raster and also added the original geographical arrangement which is found in Acts 2:9-11 (see Figure 7). This representation can be used to simultaneously evaluate the probability of the conjectures against the criteria of (1) palaeographic confusion and (2) geographical arrangement. Although our method does not provide conclusive results, as a preliminary result γορδυαιαν (Gorduaia) or ιουναιαν (Iounaia) provide the best fit to both geographical and palaeographic criteria. To settle the issue -and it is doubtful if this even can be done -would require weighing more criteria. For our purpose, we demonstrated, however, the suitability of spatial analysis and multi-criteria evaluation as an approach to evaluate the probability of conjectures in more detail.

Discussion
As we can see from the results of both case studies, the method proposed in this article provides a new approach to weighing the probability of palaeographic confusion for conjectural emendations. Furthermore, when spatial analyses are applied to these results, patterns and correlations can be made visible that otherwise remain hidden in the data. We have observed this specifically in the results of the first case study on the food of John the Baptist.
It should be noted, however, that although MDS has a certain potential to spatialize relationships of non-spatial phenomena for subsequent visualization and analysis, no objectively repeatable results will be generated. This is mainly due to the fact that MDS gives an approximation of the higher dimensional "distances" of phenomena in a lower dimensional space.

Confusion distances algorithm
Our expansion of the Levenshtein distance with three operations and the implementation in Python where specific distances can be calculated for specific letter combinations has proven to be a valuable tool in providing insight into the relations between different conjectures. Furthermore, the algorithm can be applied in other domains. In this article we have developed an application for Greek texts, but such palaeographic confusion distances can be determined as well for other ancient or modern scripts, for example, Latin or Hebrew. Moreover, the algorithm is generic in another way: it could be used equally well to calculate the probability of typing errors or phonetic confusion. The only requirement for such an application is to have an expert from the discipline design the specific confusion table.
Our implementation, however, also has limitations in the way it simulates palaeographic confusion. Palaeographic errors that could occur while copying texts are not fully covered by the six operations operators we implemented, and the algorithm could be refined by taking haplography,71 dittography,72 compendia,73 and abbreviations (e.g. nomina sacra74) into account as well.75 Besides this finetuning of the algorithm, the confusion distance table (see Table 5) could be improved by calculating frequency statistics on the occurrence of character combinations in textual variants.

Spatial analysis
Despite its exploratory nature, the application of spatial analysis and visualisation techniques offer fundamental insights into the (im)probability of textual variants based on palaeographic confusion. Based on our analyses, we can trace palaeographic relationships between conjectures and textual variants. From our experiments, spatial visualisation and analysis have proven to be helpful literally to "look" at the reciprocal proximity of the several proposals.
However, we have only scratched the surface of spatial analyses for this application since our activities were solely restricted to the visualization of proximity relationships between textual variants based on palaeographic confusion distances. As we have argued above, several criteria to distinguish unlikely from likely readings should be taken into account. In future work, we will use the potential of GIS for more sophisticated multi-criteria evaluation (e.g. semantics, grammar, palaeography, phonetics, and even geography) to identify more suitable textual variants. GIS has proven itself to be useful for this kind of analysis in other fields such as land use suitability assessment. Application of this type of analyses, however, requires standardization and quantification of qualitative data. While not impossible, careful consideration is needed to translate the data to appropriate scales of measurement. 71 Haplography is the omission of a letter or word due to a similar letter or word in the immediate context. 72 Dittography is a duplication of a letter or word. 73 Compendia or ligatures are monograms created from a combination of two (or more) alphabetic characters. 74 Nomina sacra are a collection of words written in special abbreviated forms in Christian sources, i.e. ΘΣ̅ = θεός, ΧΣ̅ = χριστός, and ΚΣ̅ = κύριος. 75 This list is far from comprehensive and also neglects other factors which influenced the copying process. For an introduction on scribal habits, see Royse, "Scribal Tendencies in the Transmission of the Text of the New Testament."

Conclusions
The aim of this article was to calculate confusion distances to enable spatial analysis of New Testament textual emendations. Although our research was limited to palaeographic confusion and only visualised proximity relationships of conjectural emendations, we have demonstrated the applicability of distance metrics to conjectural criticism and the subsequent potential of spatial analysis and visualisation. Therefore, our method provides an additional toolset to analyse conjectural emendations and, supposedly, extant textual variants. It also reveals insights which otherwise remain hidden in the data. As such, it can provide additional arguments and will not replace classical text critical reasoning. In the end it is up to the scholar to weigh the evidence and to decide to what extent to give the method any credence.
An obvious extension of this work is to expand the algorithm to support other types of scribal errors. Additionally, we propose a refinement of the proposed palaeographic confusion table based on frequency statistics of textual variants, and the provision of additional confusion tables (e.g. based on phonetics). Furthermore, insights about the semantic proximity and grammatical relatedness of textual variants and conjectures could also be translated to quantifiable measures.
These kinds of refinements and expansions will enable textual critics to engage more fully with research on multi-criteria evaluation using GIS. Not only is a fuller assessment of MCE needed, but also a more thorough consideration for translating qualitative criteria to quantitative measurement scales. This involves a close collaboration between the disciplines of spatial analysis and textual criticism.