Abstract
In their recent paper, Lau, Clark, and Lappin explore the idea that the probability of the occurrence of word strings can form the basis of an adequate theory of grammar (Lau, Jey H., Alexander Clark & 15 Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A prob- abilistic view of linguistic knowledge. Cognitive Science 41(5):1201–1241). To make their case, they present the results of correlating the output of several probabilistic models trained solely on naturally occurring sentences with the gradient acceptability judgments that humans report for ungrammatical sentences derived from roundtrip machine translation errors. In this paper, we first explore the logic of the Lau et al. argument, both in terms of the choice of evaluation metric (gradient acceptability), and in the choice of test data set (machine translation errors on random sentences from a corpus). We then present our own series of studies intended to allow for a better comparison between LCL’s models and existing grammatical theories. We evaluate two of LCL’s probabilistic models (trigrams and recurrent neural network) against three data sets (taken from journal articles, a textbook, and Chomsky’s famous colorless-green-ideas sentence), using three evaluation metrics (LCL’s gradience metric, a categorical version of the metric, and the experimental-logic metric used in the syntax literature). Our results suggest there are very real, measurable cost-benefit tradeoffs inherent in LCL’s models across the three evaluation metrics. The gain in explanation of gradience (between 13% and 31% of gradience) is offset by losses in the other two metrics: a 43%-49% loss in coverage based on a categorical metric of explaining acceptability, and a loss of 12%-35% in explaining experimentally-defined phenomena. This suggests that anyone wishing to pursue LCL’s models as competitors with existing syntactic theories must either be satisfied with this tradeoff, or modify the models to capture the phenomena that are not currently captured.
Acknowledgements
We would like to thank two anonymous reviewers for immensely helpful for comments on an earlier draft. We would also like to thank audiences at GLOW 39, NELS 46, the University of Massachusetts, the University of Southern California, the City University of New York, and the Massachusetts Institute of Technology for stimulating discussions at various stages of this project. Finally, we’d like to thank Jey Han Lau, Alexander Clark, and Shalom Lappin for opening this conversation, and for making the code for their models publicly accessible. All errors remain our own. This material is based upon work supported by the National Science Foundation under Grant No. BCS-1347115 to JS.
References
Adger, David. 2003. Core syntax. Oxford: Oxford University Press.Search in Google Scholar
Bock, Kathryn & Carol A Miller. 1991. Broken agreement. Cognitive Psychology 23:45–93.10.1016/0010-0285(91)90003-7Search in Google Scholar
Bresnan, Joan. 2007. Is syntactic knowledge probabilistic? Experiments with the English dative alternation. In Sam Featherston & Wolfgang Sternefeld (eds.), Roots: Linguistics in search of its evidential base. Studies in Generative Grammar, 77–96. Berlin and New York: Mouton de Gruyter.Search in Google Scholar
Chomsky, Noam. 1955/1975. The logical structure of linguistic theory. New York: Springer.Search in Google Scholar
Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory 2(3):113–124.10.1109/TIT.1956.1056813Search in Google Scholar
Chomsky, Noam. 1986. Knowledge of language: Its nature, origins, and use. New York: Praeger.Search in Google Scholar
Collins, Chris & Edward Stabler. 2016. A formalization of minimalist syntax. Syntax 19:43–78.10.1111/synt.12117Search in Google Scholar
Elo, Arpad. 1978. The rating of chessplayers, past and present. New York: Arco Press.Search in Google Scholar
Featherston, Sam. 2005. The decathlon model of empirical syntax. In M. Reis & S. Kepser (eds.), Linguistic evidence: Empirical, theoretical, and computational perspectives, 187–208. Berlin: Mouton de Gruyter.10.1515/9783110197549.187Search in Google Scholar
Fodor, Jerry A & Zenon Pylyshyn. 1988. Connectionism and cognitive architecture: A critical analysis. Cognition 28:3–71.10.1016/0010-0277(88)90031-5Search in Google Scholar
Hunter, Tim & Chris Dyer. 2013. Distributions on Minimalist grammar derivations. Proceedings of the 13th Meeting on the Mathematics of Language.Search in Google Scholar
Keller, Frank. 2000. Gradience in grammar: Experimental and computational aspects of degrees of grammaticality. Edinburgh: University of Edinburgh dissertation.Search in Google Scholar
Lau, Jey H., Alexander Clark & Shalom Lappin. 2014. Measuring gradience in speakers’ grammaticality judgements. Proceedings of the 36th Annual Conference of the Cognitive Science Society, Quebec City, July.Search in Google Scholar
Lau, Jey H., Alexander Clark & Shalom Lappin. 2015. Unsupervised prediction of acceptability judgements. Proceedings of the 53rd Annual Conference of the Association of Computational Linguistics, Beijing, July.10.3115/v1/P15-1156Search in Google Scholar
Lau, Jey H., Alexander Clark & Shalom Lappin. 2017. Grammaticality, acceptability, and probability: A probabilistic view of linguistic knowledge. Cognitive Science 41(5):1201–1241.10.1111/cogs.12414Search in Google Scholar
Mikolov, Tomas. 2012. Statistical Language Models Based on Neural Networks. Brno: Brno University of Technology dissertation.Search in Google Scholar
Noam, Chomsky & George A Miller. 1963. Introduction to the formal analysis of natural languages. In R. D. Luce, R. R. Bush & E. Galanter (eds.), Handbook of mathematical psychology, vol. 2, 269–321. Amsterdam: Wiley.Search in Google Scholar
Pauls, Adam & Dan Klein. 2012. Large-scale syntactic language modeling with treelets. In Proceedings of the 50th annual meeting of the association for computational linguistics: Long papers-volume 1, 959–968. Stroudsburg PA, USA: Association for Computational Linguistics.Search in Google Scholar
Pereira, Fernando. 2000. Formal grammar and information theory: together again? Philosophical Transactions of the Royal Society 358(1769):1239–1253. doi:10.1098/rsta.2000.0583.10.1098/rsta.2000.0583Search in Google Scholar
Prince, Alan & Paul Smolensky. 1991. Connectionism and harmony theory in linguistics. Report CU-CS-600-92. Computer Science Department, University of Colorado at Boulder.Search in Google Scholar
Prince, Alan & Paul Smolensky. 1993. Optimality theory: Constraint interaction in generative grammar. RuCCS Technical Report 2, Rutgers University. Piscateway, NJ: Rutgers University Center for Cognitive Science.10.1002/9780470756171.ch1Search in Google Scholar
Smolensky, Paul. 1988. The constituent structure of mental states: A reply to Fodor and Pylyshyn. The Southern Journal of Philosophy 26:137–161.10.1007/978-94-011-3524-5_13Search in Google Scholar
Smolensky, Paul & Geraldine Legendre. 2006. The harmonic mind. Cambridge, MA: MIT Press.Search in Google Scholar
Sorace, Antonella & Frank Keller. 2005. Gradience in linguistic data. Lingua 115:1497–1524.10.1016/j.lingua.2004.07.002Search in Google Scholar
Sprouse, Jon & Diogo Almeida. 2012. Assessing the reliability of textbook data in syntax: Adger’s core syntax. Journal of Linguistics 48:609–652.10.1017/S0022226712000011Search in Google Scholar
Sprouse, Jon, Carson T Schütze & Diogo Almeida. 2013. A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001-2010. Lingua 134:219–248.10.1016/j.lingua.2013.07.002Search in Google Scholar
Townsend, David J. & Thomas G Bever. 2001. Sentence comprehension: The integration of habits and rules. Cambridge, MA: MIT Press.10.7551/mitpress/6184.001.0001Search in Google Scholar
Xiang, Ming, Brian Dillon & Colin Phillips. 2009. Illusory licensing effects across dependency types: ERP evidence. Brain and Language 108:40–55.10.1016/j.bandl.2008.10.002Search in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston