Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access October 14, 2016

Part of Speech Tagging for Ancient Greek

Giuseppe G. A. Celano, Gregory Crane and Saeed Majidi
From the journal Open Linguistics

Abstract

In this article we report the results for five POS taggers, i.e., the Mate tagger, the Hunpos tagger, RFTagger, theOpenNLP tagger, andNLTKUnigramtagger, tested on the data of the Ancient Greek Dependency Treebank. This is done in order to find the most efficient POS tagger to use for pre-annotation of new treebank data. A corrected 1-run 10-fold cross validation t test shows that the Mate tagger outperforms all the other taggers, with an accuracy score of 88%.

References

David Bamman. Ancient Greek and Latin dependency treebanks. In Antal van den Bosch Caroline Sporleder and Kalliopi Zervanou, editors, Language Technology for Cultural Heritage, Theory and Applications of Natural Language Processing, pages 79–98. Springer, 2011.10.1007/978-3-642-20227-8_5Search in Google Scholar

Bernd Bohnet and Joakim Nivre. A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pages 1455–1465. Association for Computational Linguistics, 2012.Search in Google Scholar

Remco R. Bouckaert and Eibe Frank. Evaluating the replicability of significance tests for comparing learning algorithms. In Proceedings of the 8th Pacifica-Asian conference on knowledge discovery and data mining, pages 3–12, 2004.10.1007/978-3-540-24775-3_3Search in Google Scholar

Thorsten Brants. TNT: A statistical part-of-speech tagger. In Proceedings of the Sixth Conference on Applied Natural Language Processing, pages 224–231. Association for Computational Linguistics, 2000.Search in Google Scholar

Gregory Crane. Generating and parsing classical greek. Literary and Linguistic Computing, 6(4):243–245, 1991.10.1093/llc/6.4.243Search in Google Scholar

Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, (6):65–70, 1979.Search in Google Scholar

Zhiheng Huang, Wei Xu, Kai Yu. Bidirectional LSTM-CRF models for sequence tagging. pages 48–52, 2015. URL http://arxiv.org/ abs/1508.01991.Search in Google Scholar

Michael Piotrowski. Natural language processing for historical languages. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2012.10.2200/S00436ED1V01Y201207HLT017Search in Google Scholar

Guzmán Santafé, Iñaki Inza, and José Antonio Lozano. Dealing with the evaluation of supervised classification algorithms. Artificial Intelligence Review, 44(4):467–508, 2015.10.1007/s10462-015-9433-ySearch in Google Scholar

Helmut Schmid and Florian Laws. Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging. In Proceedings of the 22Nd International Conference on Computational Linguistics - Volume 1, COLING 2008, pages 777–784. Association for Computational Linguistics, 2008. 10.3115/1599081.1599179Search in Google Scholar

Received: 2016-4-8
Accepted: 2016-7-20
Published Online: 2016-10-14

© 2016 G. G. A. Celano et al.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Scroll Up Arrow