Linguistic typology in natural language processing

Emily M. Bender 1
  • 1 Department of Linguistics, University of Washington, Guggenheim Hall, 4th Floor, Box 352425, Seattle, WA 98195, U.S.A.
Emily M. Bender

Abstract

This paper explores the ways in which the field of natural language processing (NLP) can and does benefit from work in linguistic typology. I describe the recent increase in interest in multilingual natural language processing and give a high-level overview of the field. I then turn to a discussion of how linguistic knowledge in general is incorporated in NLP technology before describing how typological results in particular are used. I consider both rule-based and machine learning approaches to NLP and review literature on predicting typological features as well as that which leverages such features.

  • Ackema, Peter, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.). 2006. Arguments and agreement. Oxford: Oxford University Press.

  • Ammar, Waleed, George Mulcaire, Miguel Ballesteros, Chris Dyer & Noah A. Smith. 2016. Many languages, one parser. Transactions of the Association for Computational Linguistics 4. 431–444. https://www.transacl.org/ojs/index.php/tacl/article/view/892

  • Baldwin, Timothy & Valia Kordoni (eds.). 2011. The interaction between linguistics and computational linguistics: Virtuous, vicious or vacuous? Special issue of Linguistic Issues in Language Technology 6. http://journals.linguisticsociety.org/elanguage/lilt/issue/view/330.html

  • Bandyopadhyay, Sivaji, Pushpak Bhattacharya, Vasudeva Varma, Sudeshna Sarkar, A. Kumaran & Raghavendra Udupa (eds.). 2009. Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (CLIAWS3), June 4, 2009, Boulder, Colorado. Madison, WI: Omnipress. http://www.aclweb.org/anthology/W09-16

  • Bender, Emily M. 2008. Grammar engineering for linguistic hypothesis testing. Texas Linguistics Society 10. 16–36.

  • Bender, Emily M. 2009. Linguistically naïve != language independent: Why NLP needs linguistic typology. In Proceedings of the EACL 2009 workshop on the interaction between linguistics and computational linguistics: Virtuous, vicious or vacuous?, 26–32. Vrilissia, Greece: Tehnografia Digital Press. http://www.aclweb.org/anthology/W09-0106

  • Bender, Emily M. 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology 6(3). 1–26. http://journals.linguisticsociety.org/elanguage/lilt/article/view/2624.html

  • Bender, Emily M. 2014. Language CoLLAGE: Grammatical description with the LinGO Grammar Matrix. International Conference on Language Resources and Evaluation 9. 2447–2451. http://www.lrec-conf.org/proceedings/lrec2014/pdf/639_Paper.pdf

  • Bender, Emily M., Joshua Crowgey, Michael Wayne Goodman & Fei Xia. 2014. Learning grammar specifications from IGT: A case study of Chintang. In Good et al. (eds.) 2014, 43–53. http://www.aclweb.org/anthology/W14-2206

  • Bender, Emily M., Scott Drellishak, Antske Fokkens, Laurie Poulson & Safiyyah Saleem. 2010. Grammar customization. Research on Language and Computation 23–72.

  • Bender, Emily M., Dan Flickinger & Stephan Oepen. 2002. The grammar matrix: An open-source starter-kit for the rapid development of crosslinguistically consistent broad-coverage precision grammars. International Conference on Computational Linguistics 19 (Workshop on Grammar Engineering and Evaluation). 8–14. http://www.aclweb.org/anthology/W02-1502

  • Bender, Emily M., Michael Wayne Goodman, Joshua Crowgey & Fei Xia. 2013. Towards creating precision grammars from interlinear glossed text: Inferring large-scale typological properties. Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities 7. 74–83. http://www.aclweb.org/anthology/W13-2710

  • Böhmová, Alena, Jan Hajič, Eva Hajičová & Barbora Hladká. 2003. The Prague Dependency Treebank. In Anne Abeillé (ed.), Treebanks: Building and using parsed corpora, 103–127. Dordrecht: Kluwer.

  • Brown, Peter F., John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer & Paul S. Roossin. 1990. A statistical approach to machine translation. Computational Linguistics 16. 79–85.

  • Buchholz, Sabine & Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. Conference on Computational Natural Language Learning 10. 149–164. http://www.aclweb.org/anthology/W06-2920

  • Büring, Daniel. 2010. Towards a typology of focus realization. In Malte Zimmermann & Caroline Féry (eds.), Information structure, 177–205. Oxford: Oxford University Press.

  • Bybee, Joan L., Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press.

  • Calzolari, Nicoletta, Riccardo Del Gratta, Gil Francopoulo, Joseph Mariani, Francesco Rubino, Irene Russo & Claudia Soria. 2012. The LRE map: Harmonising community descriptions of resources. International Conference on Language Resources and Evaluation 8. 1084–1089. http://www.lrec-conf.org/proceedings/lrec2012/pdf/769_Paper.pdf

  • Comrie, Bernard. 1976. Aspect: An introduction to the study of verbal aspect and related problems. Cambridge: Cambridge University Press.

  • Comrie, Bernard. 1985. Tense. Cambridge: Cambridge University Press.

  • Comrie, Bernard. 1989. Language universals and linguistic typology. 2nd edn. Chicago: University of Chicago Press.

  • Copestake, Ann, Dan Flickinger, Carl Pollard & Ivan A. Sag. 2005. Minimal recursion semantics: An introduction. Research on Language and Computation 3. 281–332.

  • Corbett, Greville G. 1991. Gender. Cambridge: Cambridge University Press.

  • Corbett, Greville G. 2000. Number. Cambridge: Cambridge University Press.

  • Corbett, Greville G. 2006. Agreement. Cambridge: Cambridge University Press.

  • Crowgey, Joshua. 2012. The syntactic exponence of sentential negation: A model for the LinGO Grammar Matrix. Seattle: University of Washington MA thesis. http://hdl.handle.net/1773/22454

  • Cysouw, Michael. 2003. The paradigmatic structure of person marking. Oxford: Oxford University Press.

  • Dahl, Östen. 1979. Typology of sentence negation. Linguistics 17. 79–106.

  • Dahl, Östen. 1985. Tense and aspect systems. Oxford: Blackwell.

  • Daumé, Hal, III. 2009. Non-parametric Bayesian areal linguistics. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2009(1). 593–601. http://www.aclweb.org/anthology/N09-1067

  • Daumé, Hal, III & Lyle Campbell. 2007. A Bayesian model for discovering typological implications. Association of Computational Linguistics 45(1). 65–72. http://www.aclweb.org/anthology/P07-1009

  • Dixon, R. M. W. 1994. Ergativity. Cambridge: Cambridge University Press.

  • Dixon, R. M. W. 2004. Adjective classes in typological perspective. In R. M. W. Dixon & Alexandra Y. Aikhenvald (eds.), Adjective classes: A cross-linguistic typology, 1–49. Oxford: Oxford University Press.

  • Drellishak, Scott. 2004. A survey of coordination strategies in the world’s languages. Seattle: University of Washington MA thesis.

  • Drellishak, Scott. 2009. Widespread but not universal: Improving the typological coverage of the Grammar Matrix. Seattle: University of Washington doctoral dissertation.

  • Drellishak, Scott & Emily M. Bender. 2005. A coordination module for a crosslinguistic grammar resource. International Conference on Head-Driven Phrase Structure Grammar 12. 108–128. http://web.stanford.edu/group/cslipublications/cslipublications/HPSG/2005/drellishak-bender.pdf

  • Dryer, Matthew S. 2005. Negative morphemes. In Haspelmath et al. (eds.) 2005, 454–457.

  • Dryer, Matthew S. 2008. Expression of pronominal subjects. In Martin Haspelmath, Matthew S. Dryer, David Gil & Bernard Comrie (eds.), The world atlas of language structures online, Chapter 101. München: Max Planck Digital Library. http://wals.info/feature/101

  • Dryer, Matthew S. 2013a. Order of adjective and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 87. http://wals.info/feature/87

  • Dryer, Matthew S. 2013b. Order of adposition and noun phrase. In Dryer & Haspelmath (eds.) 2013, Chapter 85. http://wals.info/chapter/85

  • Dryer, Matthew S. 2013c. Order of demonstrative and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 88. http://wals.info/chapter/88

  • Dryer, Matthew S. 2013d. Order of genitive and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 86. http://wals.info/chapter/86

  • Dryer, Matthew S. 2013e. Order of numeral and noun. In Dryer & Haspelmath (eds.) 2013, Chapter 89. http://wals.info/chapter/89

  • Dryer, Matthew S. 2013f. Order of subject, object and verb. In Dryer & Haspelmath (eds.) 2103, Chapter 81. http://wals.info/chapter/81

  • Dryer, Matthew S. & Martin Haspelmath (eds.). 2013. The world atlas of language structures online. Leipzig: Max Planck Institut für evolutionäre Anthropologie. http://wals.info/

  • Evans, Nicholas & Stephen C. Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral & Brain Sciences 32. 429–448.

  • Féry, Caroline & Manfred Krifka. 2009. Information structure: Notional distinctions, ways of expression. In Piet van Sterkenburg (ed.), Unity and diversity of languages, 123–135. Amsterdam: Benjamins.

  • Georgi, Ryan, Fei Xia & William D. Lewis. 2010. Comparing language similarity across genetic and typologically-based groupings. International Conference on Computational Linguistics 23. 385–393. http://www.aclweb.org/anthology/C10-1044

  • Georgi, Ryan, Fei Xia & William D. Lewis. 2012. Improving dependency parsing with interlinear glossed text and syntactic projection. International Conference on Computational Linguistics 24(Posters), 371–380. http://www.aclweb.org/anthology/C12-2037

  • Giannakopoulos, George & Georgios Petasis (eds.). 2013. Proceedings of the workshop “Multilingual multi-document summarization” (MultiLing 2013), August 9, 2013, Sofia, Bulgaria. Madison, WI: Omnipress. http://www.aclweb.org/anthology/W13-31

  • Givón, T. 1994. The pragmatics of de-transitive voice: Functional and typological aspects of inversion. In T. Givón (ed.), Voice and inversion, 3–44. Amsterdam: Benjamins.

  • Good, Jeff, Julia Hirschberg & Owen Rambow (eds.). 2014. Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL 2014), June 26, 2014, Baltimore, Maryland, USA. http://www.aclweb.org/anthology/W14-22

  • Hajič, Jan, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian Padó, Jann Štěpánek, Pavel Straňák, Mihai Surdeanu, Nianwen Xue & Yi Zhang. 2009. The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. Conference on Computational Natural Language Learning 13(2: Shared Task). 1–18. http://www.aclweb.org/anthology/W09-1201

  • Haspelmath, Martin, Matthew Dryer, David Gil & Bernard Comrie (eds.). 2005. The world atlas of language structures. Oxford: Oxford University Press.

  • Hwa, Rebecca, Philip Resnik, Amy Weinberg, Clara Cabezas & Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Natural Language Engineering 11. 311–325.

  • Jagarlamudi, Jagadeesh, Sujith Ravi, Xiaojun Wan & Hal Daumé III (eds.). 2012. Proceedings of the First Workshop on Multilingual Modeling, July 13, 2012, Jeju, Republic of Korea. http://www.aclweb.org/anthology/W12-39

  • Kurimo, Mikko, Sami Virpioja, Ville Turunen & Krista Lagus. 2010. Morpho Challenge competition 2005–2010: Evaluations and results. ACL Special Interest Group on Computational Morphology and Phonology 11. 87–95. http://www.aclweb.org/anthology/W10-2211

  • Lewis, William D. 2006. ODIN: A model for adapting and enriching legacy infrastructure. IEEE International Conference on E-Science 2. 137.

  • Lewis, William D. & Fei Xia. 2008. Automatically identifying computationally relevant typological features. International Joint Conference on Natural Language Processing 3(2). 685–690. http://www.aclweb.org/anthology/I08-2093

  • Lewis, William D. & Fei Xia. 2010. Developing ODIN: A multilingual repository of annotated language data for hundreds of the world’s languages. Journal of Literary and Linguistic Computing 25. 303–319.

  • Lu, Xia. 2013. Exploring word order universals: A probabilistic graphical model approach. Association for Computational Linguistics 51(3: Student research workshop). 150–157. http://www.aclweb.org/anthology/P13-3022

  • Manning, Christopher D. & Hinrich Schütze. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.

  • Marcus, Mitchell P., Beatrice Santorini & Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19. 313–330.

  • McDonald, Ryan, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló & Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. Association for Computational Linguistics 51(2: Short papers). 92–97. http://www.aclweb.org/anthology/P13-2017

  • Naseem, Tahira, Regina Barzilay & Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. Association for Computational Linguistics 50(1: Long papers). 629–637. http://www.aclweb.org/anthology/P12-1066

  • Nivre, Joakim, Johan Hall, Sandra Kübler, Ryan McDonald, Jens Nilsson, Sebastian Riedel & Deniz Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. Joint Conference on Empirical Methods in Natural Language Processing & Computational Natural Language Learning 2007. 915–932. http://www.aclweb.org/anthology/D/D07/D07-1096

  • Nivre, Joakim, Johan Hall, Jens Nilsson, Atanas Chanev, Gülşen Eryigit, Sandra Kübler, Svetoslav Marinov & Erwin Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13. 95–135.

  • Östling, Robert. 2015. Word order typology through multilingual word alignment. Association for Computational Linguistics 53(2: Short papers). 205–211. http://www.aclweb.org/anthology/P15-2034

  • Payne, John R. 1985. Complex phrases and complex sentences. In Timothy Shopen (ed.), Language typology and syntactic description, Vol. 2: Complex constructions, 3–41. Cambridge: Cambridge University Press.

  • Petrov, Slav, Dipanjan Das & Ryan McDonald. 2012. A universal part-of-speech tagset. International Conference on Language Resources and Evaluation 8. 2089–2096. http://www.lrec-conf.org/proceedings/lrec2012/pdf/274_Paper.pdf

  • Pollard, Carl & Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. Chicago: University of Chicago Press.

  • Poulson, Laurie. 2011. Meta-modeling of tense and aspect in a crosslinguistic grammar engineering platform. University of Washington Working Papers in Linguistics 28. http://http://depts.washington.edu/uwwpl/vol28/poulson_2011.pdf

  • Rama, Taraka & Prasanth Kolachina. 2012. How good are typological distances for determining genealogical relationships among languages? International Conference on Computational Linguistics 24(Posters). 975–984. http://www.aclweb.org/anthology/C12-2095

  • Saleem, Safiyyah. 2010. Argument optionality: A new library for the grammar matrix customization system. Seattle: University of Washington MA thesis.

  • Saleem, Safiyyah & Emily M. Bender. 2010. Argument optionality in the LinGO Grammar Matrix. International Conference on Computational Linguistics 23(Posters). 1068–1076. http://www.aclweb.org/anthology/C10-2123

  • Schultz, Tanja & Katrin Kirchhoff (eds.). 2006. Multilingual speech processing. Burlington, MA: Academic Press.

  • Siewierska, Anna. 2004. Person. Cambridge: Cambridge University Press.

  • Søgaard, Anders. 2011. Data point selection for cross-language adaptation of dependency parsers. Association for Computational Linguistics: Human Language Technologies 49(2). 682–686. http://www.aclweb.org/anthology/P11-2120

  • Song, Sanghoun. 2014. A grammar library for information structure. Seattle: University of Washington doctoral dissertation. http://hdl.handle.net/1773/25372

  • Stassen, Leon. 2000. AND-languages and WITH-languages. Linguistic Typology 4. 1–54.

  • Stassen, Leon. 2003. Intransitive predication. Oxford: Oxford University Press.

  • Stassen, Leon. 2013. Predicative adjectives. In Dryer & Haspelmath (eds.) 2013, Chapter 118. http://wals.info/feature/118

  • Täckström, Oscar, Ryan McDonald & Joakim Nivre. 2013. Target language adaptation of discriminative transfer parsers. North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2013(1). 1061–1071. http://www.aclweb.org/anthology/N13-1126

  • Teh, Yee W., Hal Daumé III & Daniel M. Roy. 2007. Bayesian agglomerative clustering with coalescents. In John C. Platt, Daphne Koller, Yoram Singer & Sam T. Roweis (eds.), Advances in neural information processing systems 20. 1463–1480. Cambridge, MA: MIT Press.

  • Trimble, Thomas James. 2014. Adjectives in the LinGO Grammar Matrix. Seattle: University of Washington MS thesis. http://hdl.handle.net/1773/27512

  • Xia, Fei, William D. Lewis, Michael Wayne Goodman, Glenn Slayden, Ryan Georgi, Joshua Crowgey & Emily M. Bender. 2016. Enriching a massively multilingual database of interlinear glossed text. Language Resources and Evaluation 50. 321–349.

  • Yarowsky, David, Grace Ngai & Richard Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the First International Conference on Human Language Technology Research, 1–8. http://www.aclweb.org/anthology/H01-1035

  • Zeman, Daniel & Philip Resnik. 2008. Cross-language parser adaptation between related languages. International Joint Conference on Natural Language Processing 3(Workshop on NLP for Less Privileged Languages). 35–42. http://www.aclweb.org/anthology/I08-3008

  • Zhang, Yuan & Regina Barzilay. 2015. Hierarchical low-rank tensors for multilingual transfer parsing. Conference on Empirical Methods in Natural Language Processing 2015. 1857–1867. http://aclweb.org/anthology/D15-1213

Purchase article
Get instant unlimited access to the article.
$42.00
Log in
Already have access? Please log in.


Journal + Issues

Linguistic Typology publishes research on linguistic diversity and unity. It welcomes articles that report empirical findings about crosslinguistic variation, advance our understanding of the patterns of diversity, or refine typological methodology.

Search