Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Corpus Linguistics and Linguistic Theory

Founded by Gries, Stefan Th. / Stefanowitsch, Anatol

Ed. by Wulff, Stefanie

2 Issues per year

IMPACT FACTOR 2017: 1.200
5-year IMPACT FACTOR: 1.386

CiteScore 2017: 0.80

SCImago Journal Rank (SJR) 2017: 0.288
Source Normalized Impact per Paper (SNIP) 2017: 0.930

See all formats and pricing
More options …

Using token-based semantic vector spaces for corpus-linguistic analyses: From practical applications to tests of theoretical claims

Martin Hilpert / David Correia Saavedra
Published Online: 2017-09-26 | DOI: https://doi.org/10.1515/cllt-2017-0009


This paper presents token-based semantic vector spaces as a tool that can be applied in corpus-linguistic analyses such as word sense comparisons, comparisons of synonymous lexical items, and matching of concordance lines with a given text. We demonstrate how token-based semantic vector spaces are created, and we illustrate the kinds of result that can be obtained with this approach. Our main argument is that token-based semantic vector spaces are not only useful for practical corpus-linguistic applications but also for the investigation of theory-driven questions. We illustrate this point with a discussion of the asymmetric priming hypothesis (Jäger and Rosenbach 2008). The asymmetric priming hypothesis, which states that grammaticalizing constructions will be primed by their lexical sources but not vice versa, makes a number of empirically testable predictions. We operationalize and test these predictions, concluding that token-based semantic vector spaces yield conclusions that are relevant for linguistic theory-building.

Keywords: semantic vector spaces; token-based; word sense disambiguation; asymmetric priming


  • Bybee, Joan L., Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect and modality in the languages of the world. Chicago: University of Chicago Press.Google Scholar

  • Davies, Mark. 2004. BYU-BNC. Based on the British National Corpus from Oxford University Press. Available online at http://corpus.byu.edu/bnc/.

  • Firth, John R. 1957. Papers in Linguistics 1934–1951. London: Oxford University PressGoogle Scholar

  • Glynn, Dylan & Justyna Robinson. 2014. Corpus methods in cognitive semantics. Studies in synonymy and polysemy. Amsterdam: John Benjamins.Google Scholar

  • Goldberg, Adele. E. 2006. Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.Google Scholar

  • Heylen, Kris, Dirk Speelman & Dirk Geeraerts. 2012. Looking at word meaning. An interactive visualization of semantic vector spaces for dutch synsets. In Proceedings of the EACL-2012 joint workshop of LINGVIS & UNCLH: Visualization of Language Patters and Uncovering Language History from Multilingual Resources, 16–24.Google Scholar

  • Heylen, Kris, Thomas Wielfaert, Dirk Speelman & Dirk Geeraerts. 2015. Monitoring polysemy. Word space models as a tool for large-scale lexical semantic analysis. Lingua 157. 153–172.Web of ScienceCrossrefGoogle Scholar

  • Hilpert, Martin. 2008. Germanic future constructions. A usage-based approach to language change. Amsterdam: John Benjamins.Google Scholar

  • Hilpert, Martin & David Correia Saavedra. 2016. The unidirectionality of semantic changes in grammaticalization: An experimental approach to the asymmetric priming hypothesis. English Language and Linguistics. https://doi.org/10.1017/S1360674316000496 (accessed 12 September 2017).

  • Hopper, Paul J. & Elizabeth C. Traugott. 2003. Grammaticalization, 2nd edn. Cambridge: Cambridge University Press.Google Scholar

  • Izenman, Alan J. 2008. Modern multivariate statistical techniques. Regression, classification, and manifold learning. New York: Springer.Google Scholar

  • Jäger, Gerhard & Anette Rosenbach. 2008. Priming and unidirectional language change. Theoretical Linguistics 34(2). 85–113.Web of ScienceGoogle Scholar

  • Jenset, Gard B. 2013. Mapping meaning with distributional methods. A diachronic corpus-based study of existential there. Journal of Historical Linguistics 3(2). 272–306.CrossrefGoogle Scholar

  • Kiela, Douwe & Stephen Clark. 2014. A systematic study of semantic vector space model parameters. Proceedings of EACL 2014, Second Workshop on Continuous Vector Space Models and their Compositionality (CVSC), Gothenburg, Sweden, 21–30.Google Scholar

  • Lebani, Gianluca & Alessandro Lenci. 2016. “Beware the Jabberwock, dear reader!” Testing the distributional reality of construction semantics. Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex-V), 8–18.Google Scholar

  • Leech, Geoffrey. 1992. 100 million words of English: the British National Corpus. Language Research 28(1). 1–13.Google Scholar

  • Levshina, Natalia. 2015. How to do Linguistics with R: Data exploration and statistical analysis. Amsterdam: John Benjamins.Google Scholar

  • Norde, Muriel. 2009. Degrammaticalization. Oxford: Oxford University Press.Google Scholar

  • Perek, Florent. 2016. Using distributional semantics to study syntactic productivity in diachrony: A case study. Linguistics 54(1). 149–188.Web of ScienceGoogle Scholar

  • Ruette, Tom, Dirk Speelman & Dirk Geeraerts. 2013. Lexical variation in aggregate perspective. In Augusto Soares Da Silva (ed.), Pluricentricity: Linguistic variation and sociocognitive dimensions, 95–116. Berlin: De Gruyter.Google Scholar

  • Sagi, Eyal, Stefan Kaufmann, and Brady Clark. 2011. Tracing semantic change with latent semantic analysis. In Justyna Robynson and Kathryn Allan (eds.), Current methods in historical semantics, 161–183. Berlin: De Gruyter.Google Scholar

  • Schütze, Hinrich. 1998. Automatic word sense discrimination. Computational Linguistics 24(1). 97–124.Google Scholar

  • Traugott, Elizabeth Closs & Graeme Trousdale (eds.) 2010. Gradience, gradualness and grammaticalization. Amsterdam: John Benjamins.Google Scholar

  • Traugott, Elizabeth Closs & Graeme Trousdale. 2013. Constructionalization and constructional changes. Oxford: Oxford University Press.Google Scholar

  • Turney, Peter D. & Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37. 141–188.Google Scholar

  • Wheeler, Eric S. 2005. Multidimensional scaling for linguistics. In Reinhard Koehler, Gabriel Altmann & Raimond G. Piotrowski (eds.), Quantitative linguistics. An international handbook, 548–553. Berlin: De Gruyter.Google Scholar

About the article

Published Online: 2017-09-26

This work was supported by Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (Grant/Award Number: ‘100015_149176/1’).

Citation Information: Corpus Linguistics and Linguistic Theory, ISSN (Online) 1613-7035, ISSN (Print) 1613-7027, DOI: https://doi.org/10.1515/cllt-2017-0009.

Export Citation

© 2017 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in