Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton August 17, 2019

Dependency parsing of Polish

Alina Wróblewska and Piotr Rybak

Abstract

The predicate-argument structure transparently encoded in dependency-based syntactic representations supports machine translation, question answering, information extraction, etc. The quality of dependency parsing is therefore a crucial issue in natural language processing. In the current paper we discuss the fundamental ideas of the dependency theory and provide an overview of selected dependency-based resources for Polish. Furthermore, we present some state-of-the-art dependency parsing systems whose models can be estimated on correctly annotated data. In the experimental part, we provide an in-depth evaluation of these systems on Polish data. Our results show that graph-based parsers, even those without any neural component, are better suited for Polish than transition-based parsing systems.


Alina Wróblewska Instytut Podstaw Informatyki Polskiej Akademii Nauk ul. Jana Kazimierza 5 01-248 Warszawa Poland

8

8 Acknowledgements

We would like to thank the anonymous reviewers for their valuable comments. The research presented in this paper was founded by SONATA 8 grant no 2014/15/D/HSS/03486 from the National Science Centre Poland and by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure. The computing was performed at Poznań Supercomputing and Networking Center.

References

Ballesteros, M. and J. Nivre. 2012. “MaltOptimizer: An optimization tool for Malt-Parser”. Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics Avignon, France: Association for Computational Linguistics. 58–6. <http://www.aclweb.org/anthology/k12-2012>Search in Google Scholar

Bohnet, B. 2010. “Very high accuracy and fast dependency parsing is not a contradiction”. Proceedings of the 23rd International Conference on Computational Linguistics COLING 2010. 89–97.Search in Google Scholar

Buchholz, S. and E. Marsi. 2006. “CoNLL-X shared task on Multilingual Dependency Parsing”. Proceedings of the Tenth Conference on Computational Natural Language Learning New York City. 149–164.10.3115/1596276.1596305Search in Google Scholar

Carreras, X. 2007. “Experiments with a higher-order projective dependency parser”. In Proceedings of the CONLL Shared Task Session of EMNLP-CONLL 2007. 957–61.Search in Google Scholar

Chu, Y. J. and T. H. Liu. 1965. “On the shortest arborescence of a directed graph”. Science Sinica 14. 1396–1400.Search in Google Scholar

Derwojedowa, M. 2011. Składnia liczebników we współczesnym języku polskim. Zarys opisu zależnościowego Warszawa: Wydawnictwo Wydziału Polonistyki UW.Search in Google Scholar

Diestel, R. 2000. Graph theoryGraduate Texts in Mathematics 173.) New York: Springer-Verlag.Search in Google Scholar

Dozat, T. and C. D. Manning. 2018. “Simpler but more accurate semantic dependency parsing”. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers. Melbourne: Association for Computational Linguistics. 484–490. <http://aclweb.org/anthology/P18-2077>10.18653/v1/P18-2077Search in Google Scholar

Dozat, T., P. Qi and C. D. Manning. 2017. “Stanford’s graph-based neural dependency parser at the CoNLL 2017 Shared Task”. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Association for Computational Linguistics. 20–30. <http://www.aclweb.org/anthology/K/K17/K17-3002dpdf>10.18653/v1/K17-3002Search in Google Scholar

Edmonds, J. 1967. “Optimum branchings”. Journal of Research of the National Bureau of Standards 71B(4). 233–240.10.6028/jres.071B.032Search in Google Scholar

Eisner, J. M. 1996. “Three new probabilistic models for dependency parsing: An exploration”. Proceedings of the 16th International Conference on Computational Linguistics COLING 1996. 340–345.10.3115/992628.992688Search in Google Scholar

Fan, R., K.-W. Chang, C.-J. Hsieh, X. -Rui Wang and C.-J. Lin. 2008. “LIBLINEAR: A library for large linear classification”. Journal of Machine Learning Research 9. 1871–1874.Search in Google Scholar

Kaplan, R. M., J. T. Maxwell III, T. H. King and R. Crouch. 2004. “Integrating finite-state technology with deep LFG grammars”. Proceedings of the Workshop on Combining Shallow and Deep Processing for NLP 11–20.Search in Google Scholar

Kiperwasser, E. and Y. Goldberg. 2016. “Simple and accurate dependency parsing using bidirectional LSTM feature representations”. Transactions of the Association for Computational Linguistics 4. 313–327. <http://aclweb.org/anthology/Q16-1023>10.1162/tacl_a_00101Search in Google Scholar

Klemensiewicz, Z. 1968. Zarys składni polskiej Warszawa: PWN.Search in Google Scholar

Kobyliński, Ł., M. Wasiluk and G. Wojdyga. 2018. “Improving part-of-speech tagging by meta-learning”. Proceedings of the 21st International Conference on Text, Speech and Dialogue (TSD 2018). Brno: Springer, Cham. 144–152. <https://doi.org/https://doi.org/10.1007/978-3-030-00794-S_15>Search in Google Scholar

Koehn, P. 2005. “Europarl: A parallel corpus for statistical machine translation”. Proceedings of the 10th Machine Translation Summit Conference Phuket. 79–86.Search in Google Scholar

Kübler, S., R. T. McDonald and J. Nivre. 2009. Dependency parsing. Synthesis lectures on human language technologies Morgan & Claypool Publishers.10.2200/S00169ED1V01Y200901HLT002Search in Google Scholar

Marcińczuk, M. 2017. “Lemmatization of multi-word common noun phrases and named entities in Polish”. Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2017). Varna. 483–491. <https://doi.org/10.26615/978-954-452-049-6_064>Search in Google Scholar

McDonald, R., K. Crammer and F. Pereira. 2005. “Online large-margin training of dependency parsers”. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics ACL 2005. 91–98.10.3115/1219840.1219852Search in Google Scholar

McDonald, R. and F. Pereira. 2006. “Online learning of approximate dependency parsing algorithms”. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics EACL 2006. 81–88.Search in Google Scholar

Mel’čuk, I. A. 1988. Dependency syntax: theory and practice Albany: SUNY Press.Search in Google Scholar

Mikolov, T., K. Chen, G. Corrado and J. Dean. 2013. “Efficient estimation of word representations in vector space”. CoRR abs/1301.3781. <http://arxiv.org/abs/1301.3781>Search in Google Scholar

Newman, M. E. J. 2010. Networks: An introduction New York: Oxford University Press.10.1093/acprof:oso/9780199206650.001.0001Search in Google Scholar

Nivre, J. 2008. “Algorithms for deterministic incremental dependency parsing”. Computational Linguistics 34(4). 513–553.10.1162/coli.07-056-R1-07-027Search in Google Scholar

Nivre, J. 2009. “Non-projective dependency parsing in expected linear time”. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP Singapore. 351–59.10.3115/1687878.1687929Search in Google Scholar

Nivre, J., J. Hall, S. Kübler, R. McDonald, J. Nilsson, S. Riedel and D. Yuret. 2007. “The CoNLL 2007 Shared Task on Dependency Parsing”. Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007 Prague. 915–932.Search in Google Scholar

Nivre, J., J. Hall and J. Nilsson. 2006. “MaltParser: A data-driven parser-generator for dependency parsing”. Proceedings of the Fifth International Conference on Language Resources and Evaluation LREC’06. 2216–2219.Search in Google Scholar

Nivre, J., M.-C. de Marneffe, F. Ginter, Y. Goldberg, J. Hajič, C. D. Manning, R. T. McDonald, et al. 2016. “Universal dependencies v1: A multilingual treebank collection”. Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016. 1659–1666. <http://www.lrec-conf.org/proceedings/lrec2016/pdf/348_Paper.pdf>Search in Google Scholar

Nivre, J. and J. Nilsson. 2005. “Pseudo-projective dependency parsing”. Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics ACL ’05. Ann Arbor, MI: Association for Computational Linguistics. 99–106. <http://www.aclweb.org/anthology/P05-1013>10.3115/1219840.1219853Search in Google Scholar

Obrębski, T. 2002. Automatyczna analiza składniowa języka polskiego z wykorzystaniem gramatyki zależnościowej. (PhD dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw.)Search in Google Scholar

Patejuk, A. and A. Przepiórkowski. 2014. “Synergistic development of grammatical resources: A valence dictionary, an LFG grammar and an LFG structure bank for Polish”. Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT 13). Tübingen: Department of Linguistics (SfS), University of Tübingen. 113–126.Search in Google Scholar

Pęzik, P., M. Ogrodniczuk and A. Przepiórkowski. 2011. “Parallel and spoken corpora in an open repository of Polish language resources”. Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics Poznań. 511–515.Search in Google Scholar

Polguére, A. and I. A. Mel’čuk, eds. 2009. Dependency in linguistic descriptionStudies in Language Companion Series (SLCS) 111.) Amsterdam: Benjamins.10.1075/slcs.111Search in Google Scholar

Przepiórkowski, A., M. Bańko, R. L. Górski and B. Lewandowska-Tomaszczyk (eds.). 2012. Narodowy Korpus Języka Polskiego [The National Corpus of Polish]. Warsaw: Wydawnictwo Naukowe PWN.Search in Google Scholar

Przepiórkowski, A. and A. Wróblewska. 2015. “Supporting LFG parsing with dependency parsing”. Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT 14). Warsaw: Institute of Computer Science, Polish Academy of Sciences. 168–178.Search in Google Scholar

Rybak, P. and A. Wróblewska. 2018. “Semi-supervised neural system for tagging, parsing and lematization”. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Brussels, Belgium: Association for Computational Linguistics. 45–54. <https://doi.org/10.18653/v1/K18-2004>Search in Google Scholar

Seddah, D., S. Kübler and R. Tsarfaty. 2014. “Introducing the SPMRL 2014 Shared Task on parsing morphologically-rich languages”. Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages. Dublin City University. 103–109. <http://www.aclweb.org/anthology/W14-6111>Search in Google Scholar

Seddah, D., R. Tsarfaty, S. Kübler, M. Candito, J.D. Choi, R. Farkas, J. Foster, et al. 2013. “Overview of the SPMRL 2013 Shared Task: A cross-framework evaluation of parsing morphologically rich languages”. Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages Association for Computational Linguistics. 146–182. <http://www.aclweb.org/anthology/W13-4917>Search in Google Scholar

Sgall, P., E. Hajičová and J. Panevová. 1986. The meaning of the sentence in its semantic and pragmatic aspects Dordrecht: Reidel.Search in Google Scholar

Steinberger, R., A. Eisele, S. Klocek, S. Pilos and P. Schlüter. 2012. “DGT-TM: A freely available translation memory in 22 languages”. Proceedings of the 8th International Conference on Language Resources and Evaluation Istanbul. 454–459.Search in Google Scholar

Straka, M. and J. Straková. 2017. “Tokenizing, POS tagging, lemmatizing and parsing UD 2.0 with UDPipe”. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Association for Computational Linguistics. 88–99. <http://www.aclweb.org/anthology/K/K17/K17-3009dpdf>10.18653/v1/K17-3009Search in Google Scholar

Świdziński, M. 1989. “A dependency syntax of Polish”. In: Maxwell, D. and K. Schubert (eds.), Metataxis in practice. Dependency syntax for multilingual machine translation Dordrecht: Foris. 69–88.10.1515/9783110874174.69Search in Google Scholar

Tiedemann, J. 2012. “Parallel data, tools and interfaces in OPUS”. Proceedings of the 8th International Conference on Language Resources and Evaluation Istanbul. 2214–2218.Search in Google Scholar

Woliński, M. 2015. “Deploying the new valency dictionary Walenty in a DCG parser of Polish”. Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT 14). Warsaw: Institute of Computer Science, Polish Academy of Sciences. 221–29. <http://tlt14dipipan.waw.pl/proceedings/>Search in Google Scholar

Woliński, M. 2019. Automatyczna analiza składnikowa języka polskiego Warsaw: Wydawnictwa Uniwersytetu Warszawskiego.10.31338/uw.9788323536147Search in Google Scholar

Woliński, M., K. Głowińska and M. Świdziński. 2011. “A preliminary version of Składnica Treebank of Polish”. Proceedings of the 5th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics 299–303.Search in Google Scholar

Wróblewska, A. 2012. “Polish dependency bank”. Linguistic Issues in Language Technology 7(1). 1–15.10.33011/lilt.v7i.1261Search in Google Scholar

Wróblewska, A. 2014. Polish dependency parser trained on an automatically induced dependency bank. (PhD dissertation, Institute of Computer Science, Polish Academy of Sciences, Warsaw.)Search in Google Scholar

Wróblewska, A. 2018. “Extended and enhanced Polish dependency bank in universal dependencies format”. Proceedings of the Second Workshop on Universal Dependencies (UDW 2018). Brussels: Association for Computational Linguistics. 173–182. <https://aclanthology.coli.uni-saarland.de/papers/W18-6020/w18-6020>10.18653/v1/W18-6020Search in Google Scholar

Wróblewska, A. 2018. “Results of the PolEval 2018 Competition: Dependency parsing shared task”. Proceedings of the PolEval 2018 Workshop. Institute of Computer Science, Polish Academy of Sciences. 11–24.Search in Google Scholar

Wróblewska, A. and K. Krasnowska-Kieraś. 2017. “Polish evaluation dataset for compositional distributional semantics models”. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics. 784–792.10.18653/v1/P17-1073Search in Google Scholar

Wróblewska, A., K. Krasnowska-Kieraś and P. Rybak. 2017. “Towards the evaluation of feature embedding models of the fusional languages”. Proceedings of the 8th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Poznań: Fundacja Uniwersytetu im. Adama Mickiewicza w Poznaniu. 420–424. <http://ltc.amu.edu.pl/book/papers/SEMS-3dpdf>Search in Google Scholar

Zeman, D., O. Dušek, D. Mareček, M. Popel, L. Ramasamy, J. Štěpánek, Z. Žabokrtský and J. Hajič. 2014. “HamleDT: Harmonized multi-language dependency treebank”. Language Resources and Evaluation 48(4). 601–637.10.1007/s10579-014-9275-2Search in Google Scholar

Zeman, D., J. Hajič, M. Popel, M. Potthast, M. Straka, F. Ginter, J. Nivre and S. Petrov. 2018. “CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies”. Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Brussels: Association for Computational Linguistics. 1–21. <http://www.aclweb.org/anthology/K18-2001>Search in Google Scholar

Zeman, D., M. Popel, M. Straka, J. Hajič, J. Nivre, F. Ginter, J. Luotolahti, et al. 2017. “CoNLL 2017 Shared Task: Multilingual parsing from raw text to universal dependencies”. Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies Vancouver, Canada: Association for Computational Linguistics. 1–19. <https://doi.org/10.18653/1U/K17-3001>10.18653/v1/K17-3Search in Google Scholar

Published Online: 2019-08-17
Published in Print: 2019-06-26

© 2019 Faculty of English, Adam Mickiewicz University, Poznań, Poland

Scroll Up Arrow