EAGLE Continued: IDEA. The International Digital Epigraphy Association

: Few disciplines can boast of having digitized almost the entirety of the documents they are interested in, and to have so many scholars active in digitization projects, as in Greek and Latin epigraphy (Orlandi et al., 2014; Orlandi et al., 2017). This paper will present some of the methodological issues faced by the Europeana network for Ancient Greek and Latin Epigraphy, before and after the end of the project when its activities were moved to the International Digital Epigraphy Association. It will give some examples to demonstrate how the above-mentioned achievement is far from being enough to support real user cases. Particularly, problems of mapping will be presented with an evaluation of the current quality of the data, and some hints to the continuing work of the IDEA association for the EAGLE portal and associated resources.


The EAGLE Aggregator
Within the EAGLE project (Europeana Network for Ancient Greek and Latin Epigraphy) a model was established, based on the principles of the TEI/EpiDoc standard (Amato et al., 2013;Manghi et al., 2015) that was able to guarantee easy mapping to the CIDOC-CRM1 and to EDM (Europeana Data Model) for harvesting purposes.2 As a result, this work has made possible not only the development of the EAGLE portal, with its search functionalities across data from several different sources, but has also allowed the Both the portal and the services created by EAGLE are still working after the official end of the European project in March 2016. However, they need to be maintained, constantly updated and possibly improved, as the EAGLE experience has also highlighted the limits and the problems of digital resources.

IDEA
After the end of the EAGLE project, IDEA, the International Digital Epigraphy Association, was founded, with the aim of maintaining the EAGLE resources and continuing on the path of cooperation and integration of resources. It also aimed to cross the boundaries of single projects and move towards the creation of a Epigraphy.info resource (Feraudi-Gruénais & Grieshaber, 2016), based on the model already used by papyrologists.11 IDEA has, as its primary aim, to continue the networking efforts of the EAGLE project and to maintain its outputs, with a very practical approach: keep the EAGLE portal infrastructure running, together with its functionalities, supporting members who want to contribute, advising new projects on what is and isn't available, keeping an eye on the developments in the field and sharing this knowledge to increase the possibility of more effective and organized work on digital epigraphy.
IDEA currently supports its members and prospective members in a range of activities, from data curation and consultancy on how to set up new digital epigraphic projects, to the upload of new data from existing content providers who continue to update their resources locally.
Actual activity for the current year included, for example: -aggregation of data from existing partners still actively updating their resources (occasional and not systematic or planned, due to lack of resources), -continued collaboration with network members and with active projects (e.g. Pondera project,12 IGCyr and GVCyr,13 Iscrizioni Latine Arcaiche14), -server migration and maintenance, -updates to the EAGLE vocabularies.15

Methodological Issues Faced During EAGLE
Mapping and harmonizing the data (as far as possible) from the databases to a minimum set of standardized information, was the primary aim of one of the working groups in EAGLE (Liuzzo, 2017). EpiDoc was the obvious choice, but because of the scope and obligations of the project there was no conversion of the existing databases to an XML workflow. Rather, an additional workflow was generated to export this format for the purposes of aggregation and to allow a common portal to search across databases.
This process meant that each participating database or project had to contribute an export of its data, produced with its resources or those common to the project, validating to the EAGLE schema. This was a stricter version of the EpiDoc schema from which it was generated, with a minimal set of information required by the common definition.
The efforts here went into making this limited information, packed into a strict schema originally intended for database and aggregation purposes, as rich as possible. This would allow further reuse, demonstrating its usefulness as a large corpus of disambiguated information.
We focused on the alignment and harmonization of the vocabularies used for the descriptions of inscriptions and on the up-conversion of the string text into XML (Liuzzo, Fasolini, & Rocco, 2014).
The first task began with the acquisition of all lists used by the partners. We attributed IDs to each concept and then aligned the terms used, marking the language in which they appeared. We immediately faced decisions, such as that of the "main language". The vocabularies still claim to be in English, although it was expressly declared that the choice of the language for the main label for each concept would not be in one of the many languages represented in the network, but rather prioritize those terms that also had an associated definition. Thus, in the tabular view of the full vocabulary one can see terms in Latin, English and German as the main label ( Figure 17.1).
The user would have noticed almost nothing on the portal if we had decided to show the content of the element <sic> instead of <corr>. He/she still does not see the error for the first problem, v̅ (iri), which remains untouched, as in the source, by the XSLT rendering the text. This is clarified in the portal to guide the users, but it remains a problem to be resolved by improving the algorithms for the up-conversion, or fixing by hand where needed.
However, too many hands would be needed for more than 500,000 inscriptions. Therefore, the first solution and especially the second need to be implemented collaboratively (Feraudi-Gruénais & Grieshaber, 2016).
It was not only data from various types of databases that had to be exported and mapped. EpiDoc data needed to be converted to the EAGLE EpiDoc. Needless to say, it took orders of magnitude less time and effort to do this, and in these cases no text up-conversion was needed and the correctness of the mark-up was guaranteed by the content provider.
Still, there are other problems related to the time available for the transformation, which lead to inconsistency in the data display. One could test this, which is fortunately "only" a visualization problem for correct underlying data, searching for one of the Roman Inscriptions of Britain, which often have three parallel editions in the EAGLE data.
Given the volume of data and issues that we encountered during the project, several minor issues still remain to be resolved. This will happen in the very near future, once a particular member of the association, with knowledge and access to the data, is available to solve them. If the application and the data were collectively maintained, this would not need to wait so long.
Specificities in encoding are fine, and EpiDoc does a great job of allowing enough rigour and enough flexibility, thus serving perfectly its aim and its diverse users. Still we must be aware it is not enough alone and does not do magic just because is EpiDoc. We need to do more and better EpiDoc, and to keep training people and making it a research quality point for new students and scholars.

Methodological Issues Faced After EAGLE
We tried never to say that the mappings and conversions were perfect.24 They are not, and still they serve a great function and lead the way for much more. Let us list some methodologically critical points in the harmonization process once more: -some partners did not provide EpiDoc at all and preferred to deliver the data with other formats (their data has never been up-converted), -some partners sent their data too late and the up-conversion could not be made precise enough, -some partners do not strictly follow data entry guidelines and the up-conversion process fails more often than it should, relying on consistency,25 -the export and transformation workflow will always need checking and updating, thus it is not sustainable as a workflow,26 -some datasets have TM IDs and some not, -some datasets add to some elements references to Trismegistos GEO IDs, some others do not, -the editing of the vocabularies follows a GitHub based workflow which is efficient but not particularly user friendly.27 Let us take as an example a task that sounds easy.28 Let us try to extract EAGLE data about the provinces relevant for a specific project like LatinNow.29 Besides temporary issues of the portal, where sometimes the actual XML cannot be downloaded and the function to export results only saves the one in the current view, the results obtained with any general search would have been partial. This is the case with any database, as the user is constrained by the provided functionality. But in this case I could easily, thanks to the IDEA association, access all the EAGLE data to provide a better answer and deliver the required data.30 The data from the content providers uses different definitions of provinces, depending, for example, on the time scope of the original database or on internal definitions.
We must first isolate the data belonging to the selected provinces. In the EAGLE EpiDoc model this is information stored in a TEI element <placeName type="provinceItalicRegion"> which, in the expectations of users, should be indexed by the aggregator to provide a filter "by province" and provide the functionality to search with this criterion, even if different denominations have been used, i.e. avoiding the bare string matching. The assumption is that the field is aligned to a Trismegistos GEO ID and that this is used as a key to group different denominations (Evangelisti, Liuzzo, & Verreth, 2014;Verreth, 2017). For example, by the documentation I would expect an inscription from Lusitania to have the following tag: <placeName ref="http://www.trismegistos.org/place/5531" type="provinceItalicRegion" Lusitania </placeName> However, working directly with the current raw data in EAGLE, the values of this element are quite different. Out of 412,757 document entities in the dataset with this <placeName> tag (i.e. almost 100k do not have any), only 77,303 have a @ref pointing to the Trismegistos GEO ID.31 There is also some expected 'dirtiness', like some data with <placeName type="provincItalicRegion"> instead of the correct value of the attribute @type. For this reason, no filter by province is offered. The results would be more imprecise than searching for Lusitania, in the place of provenance. Actually, querying the data directly, there are 277 different values for this element and for example, "Narbonensis" appears in Gallia Narbonensis, Narbonensis, Narbonensis?, Narbonensis II, Narbonensis I. This is an acceptable workaround for the website, as it is intuitive without forcing high expectations. The user knows that the portal is an aggregator of heterogeneous data and will most often use this parameter, together with others, to run not one search but several. Since the volume of aligned entities 30 The observations made here are based on data downloaded 2017/10/31. Many thanks go especially to Claudio Atzori, Andrea Mannocci and Franco Zoppi at CNR-ISTI in Pisa who have answered my requests faster than one could ever expect. 31 Many more have this for the precise find-spot instead, which allows us to offer in the website the ancient find-spot filter with a decent degree of reliability. Of these 77,303 almost all are records from the Epigraphic Database Heidelberg.
is not sensible compared to the corpus, one must take into account the diversity of values and group-values that probably belong to the same province of interest by hand.
Once we have all document entities referring to one of the values for the province (thus reasonably all the entities referring to the desired province), these need to be grouped by TM ID to have unique texts and their multiple editions. This is possible only to the extent to which there is such information in the data, and that it is updated and correct; this is not easy. In the dataset used for this paper, 391,227 documental entities of a total of 502,961 have at least one. The EAGLE aggregator can attribute many more on the basis of the updated Trismegistos IDs, even without injecting this information in the source data. Some content providers actually could not, during the life of the project, enter these IDs that became available later.
Trismegistos has accomplished the incredible task of disambiguating all existing digitized texts during the lifetime of the EAGLE project. However, the process of updating this information in the databases had to follow a procedure where the valuable correspondence tables were sent over from one partner to another. Within the scope of its action, IDEA has developed a small tool, based at the University of Hamburg, which serves this data via a data API. This tool can respond dynamically to the request for parallel texts connected to a Trismegistos ID, either starting the query from a local id, or from a Trismegistos ID and returning several common formats for developers to easily reuse the information in their applications.32

General Issues in Digital Epigraphy
There are currently several general issues in the field of epigraphic databases. I will list some and omit more general issues, e.g. the use of closed or private and inaccessible databases to provide results in publications and presentations, thus cutting out the verifiability of the results presented. This is a poor practice that we have observed in plenary presentations at international conferences. The following are just five selected points: 1. researchers who are not IT specialists, such as historians and philologists, are forced to traipse across an assortment of databases when seeking information about inscriptions, EAGLE being one of them in some cases, especially thinking of the lack of Greek texts; 2. the wealth of information inscribed within texts, the connections between text, support and context have been discussed extensively but are still largely underexploited in print, where little can be done about it, but also in digital resources where these connections when explicit could be easily and fruitfully used;33 3. resources in attested languages of which the researcher is not aware become blind spots, which is in contradiction with the multilingual nature of societies of the past; 4. crucially, these resources fail to be adequately referenced and used across publications in the epigraphic realm; 5. only authors and editorial teams can directly contribute, all others have to take more or less complicated workarounds.
Some other, more specific issues could be listed for projects like EAGLE, where the aim of aggregating data for Europeana has forced some definitions at different levels (Liuzzo, 2015). However issues are not what should stop us, but rather should help us to progress. The interaction of different resources, a virtue of any discipline that no project should propose to obliterate, is a huge challenge, and problems in the processes such as those encountered are not surprising.
The first issue could be easily overcome through an aggregator such as EAGLE, if aggregation did not imply regular updates. These are not always possible, especially if a contributing project is discontinued or does not have the human resources to implement it.
The second issue has been only partially faced by EAGLE. The EAGLE vocabularies and the partial EpiDoc encoding of the text go in this direction, allowing the visualization of related results based on one of the aligned features, but the automated mark-up needs to be edited and can only be considered a facilitator for the beginning of a real digital edition, rather than the final product of a process. Existing projects requesting EAGLE data for other purposes are the ideal user of this data and have been numerous. The EPNet project34 is using the very promising federated databases approach (Calvanese et al., 2016) and the CRMtex group is also making a very positive effort for the creation of a CIDOC-CRM model for epigraphy (Felicetti & Murano, 2016;Ruiz, Vassallo, & Liuzzo, 2014).
The third issue is more interesting because it is an issue of the discipline, not only of digital resources, which can be really supported by new digital resources thus serving not just the immediate needs of current research, but opening up an entirely new set of questions and possibilities for it. Beside the examples of the CIIP (Cotton et al., 2010)  has never existed, not to speak of any comparative effort, and could in fact not exist until this days when it can be leveraged by proper tools.35 Digital tools based on properly curated and linked data can help the researcher on these points.
The fourth problem is again one outside the strict realm of digital epigraphy, but affects all digital resources. Why should a contribution to an online resource which everyone uses and reads not be evaluated and accounted for in the evaluation of the scientific activity of a researcher as a paper, when this is also properly peer reviewed? There is here a hole in the more general system, but also digital resources have not done their part to make it possible and easy, although it could have been easier than thought. Now there is really no more excuse for researchers not to properly cite digital resources, as there is no excuse for digital resources not being easily and readily citable. Nevertheless, it is very rare to find the precise citation of digital resources in papers, as it is difficult to find the proper method to cite a digital resource. To make the citation of epigraphic digital resources possible, will be one of the central scopes of the already mentioned editor & navigator Epigraphy. info (Feraudi-Gruénais & Grieshaber, 2016).36 The point of the evaluation of such research products needs to be discussed in the proper venues and certainly requires far-sighted advocates.
The last issue highlighted here, is dependent on the previous one and requires the biggest leap of faith: opening one's own editorial work to the contributions of others. Assuming we start thinking of digital editions as critical editions, we edit them as such and we offer them to the public as such, then we need a further step to make them editable by others.
The work carried out to facilitate digital publication has also received, in the last year, a major input with the release of EFES (EpiDoc Front-End Services),37 EVT 2 beta 1 (Edition Visualization Technology)38 and TEI-Publisher for exist-db.39 This last, which I have personally tested, allows direct publication of TEI files in a way that has never been so easy (Turska, Cummings, & Rahtz, 2016;Wicentowski & Meier, 2015). It is usable out-of-the-box for TEI Simple, but it is also very easy to use with the EpiDoc ODD.

Conclusions
Whilst it has never been possible to directly enrich a specific dataset with data from other datasets, no comparative approach has ever been served by a digital resource for inscriptions either. These would greatly enhance the range of possible research questions that could be addressed. Research has always remained within linguistic, chronological and spatial boundaries that EAGLE, for the first time, attempted to overcome, hosting inscriptions in all languages. In addition, it is to be noted that epigraphic research lacks entirely, not just digitally, a viable way to view the current status of digitization. Instead, some online resources are happy with giving the false impression that "everything" is already there, thus building a chain of misunderstandings, leading to the misuse of online resources. Few disciplines can be as proud of having so many texts online as classical epigraphy. For even fewer, it would make more sense to have a common overview of who is doing what and where and to ensure that the increasingly limited resources are not wasted in the repetition of tasks, whilst other research areas remain forever untouched.