Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Digital Classical Philology

Ancient Greek and Latin in the Digital Revolution

Ed. by Berti, Monica

Series:Age of Access? Grundfragen der Informationsgesellschaft 10

Open Access
eBook (PDF)
Publication Date:
August 2019
Copyright year:
2019
ISBN
978-3-11-059957-2
See all formats and pricing

Optical Character Recognition for Classical Philology

Robertson, Bruce

Abstract

This paper explains the technology behind recent improvements in optical character recognition and how it can be attuned to produce highly accurate texts of scholarly value, especially when dealing with difficult scripts like ancient Greek. Drawing upon several practical experiments using the Ciaconna OCR system (itself based on OCRopus), it shows: the impact of Unicode normalized forms on recognition accuracy; the importance of removing ambiguously encoded characters from training material; the advantage of using separate classifiers for different scripts; the helpful effects of image augmentation; and the effects of binarization levels. It also describes how Ciaconna embeds information about spell-check and dehyphenation within its output.

Citation Information

Bruce Robertson (2019). Optical Character Recognition for Classical Philology. In Monica Berti (Editor), Digital Classical Philology: Ancient Greek and Latin in the Digital Revolution (pp. 117–136). Berlin, Boston: De Gruyter. https://doi.org/10.1515/9783110599572-008

Book DOI: https://doi.org/10.1515/9783110599572

Online ISBN: 9783110599572

© 2019 Walter de Gruyter GmbH, Berlin/Munich/Boston. BY-NC-ND 4.0 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Get Permission

Comments (0)

Please log in or register to comment.
Log in