Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access November 2, 2016

Approaching Questions of Text Reuse in Ancient Greek Using Computational Syntactic Stylometry

  • Vanessa B. Gorman and Robert J. Gorman
From the journal Open Linguistics


We are investigating methods by which data from dependency syntax treebanks of ancient Greek can be applied to questions of authorship in ancient Greek historiography. From the Ancient Greek Dependency Treebank were constructed syntax words (sWords) by tracing the shortest path from each leaf node to the root for each sentence tree. This paper presents the results of a preliminary test of the usefulness of the sWord as a stylometric discriminator. The sWord data was subjected to clustering analysis. The resultant groupings were in accord with traditional classifications. The use of sWords also allows a more fine-grained heuristic exploration of difficult questions of text reuse. A comparison of relative frequencies of sWords in the directly transmitted Polybius book 1 and the excerpted books 9–10 indicate that the measurements of the two texts are generally very close, but when frequencies do vary, the differences are surprisingly large. These differences reveal that a certain syntactic simplification is a salient characteristic of Polybius’ excerptor, who leaves conspicuous syntactic indicators of his modifications.


Baayen, Harald, Hans van Halteren & Fiona Tweedie. 1996. Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Literary and Linguistic Computing 11(3). 121–131. 10.1093/llc/11.3.121Search in Google Scholar

Baayen, Harald, Hans van Halteren, Anneke Neijt & Fiona Tweedie. 2002. An experiment in authorship attribution. Journées internationales d’Analyse statistique des Données Textuelles 6. (accessed 1 August 2016). Search in Google Scholar

Binongo, José G. 2003. Who wrote the 15th book of Oz? An application of multivariate analysis to authorship attribution. Chance 16(2). 9–17. 10.1080/09332480.2003.10554843Search in Google Scholar

Burrows, J. F. 1987. Word-patterns and story-shapes: the statistical analysis of narrative style. Literary and Linguistic Computing 2(2). 61–70. 10.1093/llc/2.2.61Search in Google Scholar

Chung, Cindy K., James W. Pennebaker. 2007. The psychological function of function words. In Klaus Fiedler (ed.), Social communication, 343–359. New York: Psychology Press Search in Google Scholar

Drewnowski, Adam, Alice F. Healy. 1977. Detection error on the and and: evidence for reading units larger than the word. Memory and Cognition 5(6). 636–647. 10.3758/BF03197410Search in Google Scholar

Eder, Maciej. 2015. Does size matter? Authorship attribution, short samples, big problem. Digital Scholarship in the Humanities 30. 67–182 10.1093/llc/fqt066Search in Google Scholar

Gamon, Michael. 2004. Linguistic correlates of style: authorship classification with deep linguistic analysis features. In Lothar Lemnitzer, Detmar Meurers & Erhard Hinrichs (eds.), Proceedings of the 20th international conference on computational linguistics, 611–617. Stroudsburg, PA: Association for computational linguistics. (accessed 1 August 2016), Search in Google Scholar

Gorman, Robert J., Vanessa B. Gorman. 2014. Corrupting luxury in ancient Greek literature. Ann Arbor, MI: Michigan University Press. 10.3998/mpub.1615338Search in Google Scholar

Grieve, Jack. 2007. Quantitative authorship attribution: an evaluation of techniques. Literary and Linguistic Computing 22(3). 251–270. 10.1093/llc/fqm020Search in Google Scholar

Griffith, Mark. 1977. The Authenticity of ‘Prometheus Bound’. Cambridge: Cambridge University Press. Search in Google Scholar

Hollingsworth, Charles. 2012a. Syntactic stylometry: using sentence structure for authorship attribution. MA thesis. Athens, Georgia: University of Georgia. (accessed 16 June 2016). Search in Google Scholar

Hollingsworth, Charles. 2012b. Using dependency-based annotations for authorship identification. In Petr Sojka, Aleš Horák, Ivan Kopeček & Karel Pala (eds), Text, speech and dialogue, 314–319. Berlin: Springer. (accessed 28 February 2016). 10.1007/978-3-642-32790-2_38Search in Google Scholar

Hoover, David L. 2003. Another perspective on vocabulary richness. Computers and the Humanities 37(2). 151–178. 10.1023/A:1022673822140Search in Google Scholar

Houvardas, John, Efstathios Stamatatos. 2006. N-gram feature selection for authorship identification. In Jérôme Euzenat and John Domingue (eds), Artificial intelligence: methodologies, systems, and applications (AIMSA 2006), 77–86. Berlin: Springer. 10.1007/11861461_10Search in Google Scholar

Juola, Patrick. 2006. Authorship attribution. Foundations and Trends in Information Retrieval 1(3). 233–334. 10.1561/1500000005Search in Google Scholar

Kestemont, Mike. 2014. Function words in authorship attribution: From black magic to theory? In Proceedings of the 3rd workshop on computational linguistics for literature (CLfL), 59–66. Stroudsburg, PA: Association for computational linguistics. 10.3115/v1/W14-0908Search in Google Scholar

Kestemont, Mike, Justin Stover, Moshe Koppel, Folgert Karsdorp & Walter Daelemans. 2016. Authenticating the writings of Julius Caesar. Expert Systems with Applications 63. 86–96. 10.1016/j.eswa.2016.06.029Search in Google Scholar

Kjell, B. 1994. Discrimination of authorship using visualization. Information Processing and Management 30(1). 141–15. 10.1016/0306-4573(94)90029-9Search in Google Scholar

Koppel, Moshe, Jonathon Schler & Kfir Zigdon. 2005. Determining an author’s native language by mining a text for errors. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 624–628. New York: Association for computing machinery. (accessed 1 August 2016). 10.1145/1081870.1081947Search in Google Scholar

Koppel, Moshe, Jonathon Schler & Shlomo Argamon. 2009. Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology 60(1). 9–26. 10.1002/asi.20961Search in Google Scholar

Lang, Duncan Temple, and the CRAN Team. 2015. XML: Tools for parsing and generating XML within R and S-Plus. R package version 3.98–1.3. (accessed 28 February 2016). Search in Google Scholar

Luyckx, Kim, Walter Daelemans. 2005. Shallow text analysis and machine learning for authorship attribution. In T. van der Wouden (ed.), Proceedings of the fifteenth meeting of computational linguistics in the Netherlands, 149–160. (accessed 1 August 2016). Search in Google Scholar

Mendenhall, T. C. 1887. The characteristic curves of composition. Science 11. 237–249. 10.1126/science.ns-9.214S.237Search in Google Scholar

Meyer, David, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel & Friedrich Leisch. 2015. Miscellaneous functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.6–7. (accessed 28 February 2016). Search in Google Scholar

Moore, John M. 2011. The manuscript tradition of Polybius. Cambridge: Cambridge University Press. Search in Google Scholar

Mosteller, Frederick, David L. Wallace. 1963. Inference in an authorship problem. Journal of the American Statistical Association 58(302). 275–309. 10.1080/01621459.1963.10500849Search in Google Scholar

Murtagh, Fionn, and Pierre Legrandre. 2014. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? Journal of Classification 31. 274–295. 10.1007/s00357-014-9161-zSearch in Google Scholar

Oakes, Michael, P. 2009. Corpus Linguistics and stylometry. In Anke Lüdeling and Merja Kytö (eds.), Corpus linguistics: an international handbook, Vol. 2, 1170–1190. Berlin: De Gruyter. Search in Google Scholar

Oakes, Michael, P. 2014. Natural language processing: literary detective work on the computer. Amsterdam/Philadelphia: John Benjamins Publishing Company. 10.1075/nlp.12Search in Google Scholar

Pearse, Roger. 2013. The manuscripts of Polybius. (accessed 28 February 2016). Search in Google Scholar

R Core Team. 2015. R project for statistical computing. Vienna, Austria. (accessed 28 February 2016). Search in Google Scholar

Schindler, Robert M. 1978. The effect of prose context on visual search for letters. Memory and Cognition 6(2). 124–130. 10.3758/BF03197437Search in Google Scholar

Sidorov, Grigori, Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh & Liliana Chanona-Hernández. 2012. Syntactic dependency-based N-grams as classification features. In Ildar Batyrshin and Miguel González Mendoza (eds.), Advances in computational intelligence, 1–11. Berlin: Springer. (accessed 28 February 2016). Search in Google Scholar

Stamatos, Efstathios. 2009. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3). 538–556. 10.1002/asi.21001Search in Google Scholar

Uzuner, Özlem, Boris Katz. 2005. A comparative study of language models for book and author recognition. In Robert Dale, Kam-Fai Wong, Jian Su, and Oi Yee Kwong (eds.), International Joint Conference on Natural Language Processing 2005, 969–980. Berlin: Springer. 10.1007/11562214_84Search in Google Scholar

van Halteren, Hans, R. Harald Baayen, Fiona Tweedie, Marco Haverkort & Anneke Neijt. 2005. New machine learning methods demonstrate the existence of a human stylome. Journal of Quantitative Linguistics 12.1. 65–77. 10.1080/09296170500055350Search in Google Scholar

Received: 2016-2-29
Accepted: 2016-10-14
Published Online: 2016-11-2

© 2016 Vanessa B. Gorman, Robert J. Gorman

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Downloaded on 30.3.2023 from
Scroll to top button