Digital Classical Philology
Ancient Greek and Latin in the Digital Revolution
Ed. by Berti, Monica
- eBook (PDF)
- Publication Date:
- August 2019
- Copyright year:
Optical Character Recognition for Classical Philology
This paper explains the technology behind recent improvements in optical character recognition and how it can be attuned to produce highly accurate texts of scholarly value, especially when dealing with difficult scripts like ancient Greek. Drawing upon several practical experiments using the Ciaconna OCR system (itself based on OCRopus), it shows: the impact of Unicode normalized forms on recognition accuracy; the importance of removing ambiguously encoded characters from training material; the advantage of using separate classifiers for different scripts; the helpful effects of image augmentation; and the effects of binarization levels. It also describes how Ciaconna embeds information about spell-check and dehyphenation within its output.
Bruce Robertson (2019). Optical Character Recognition for Classical Philology. In Monica Berti (Editor), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution (pp. 117–136). Berlin, Boston: De Gruyter. https://doi.org/10.1515/9783110599572-008
Book DOI: https://doi.org/10.1515/9783110599572
Online ISBN: 9783110599572
© 2019 Walter de Gruyter GmbH, Berlin/Munich/Boston. BY-NC-ND 4.0 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.