Skip to content
BY-NC-ND 4.0 license Open Access Published by De Gruyter Mouton 2022

Scoring with Token-based Models

Stefano De Pascale and Weiwei Zhang

Abstract

This paper provides a replication of sociolectometric analyses found in Geeraerts, Grondelaers, and Speelman (1999) with the help of distributional semantic modelling. We selected 14 concepts from the lexical field of football in Dutch and Chinese respectively. Instead of manually disambiguating the corpus occurrences, we explored a semi-automatic procedure based on token-based vector space models and cluster analysis. The experiments show that our workflow is efficient for detecting regional lexical variation in large-scale corpora. More specifically, the results revealed that removing semantic clusters whose most central members are tokens referring to other senses rather than the intended concept’s sense, does have an impact on the sociolectometric distances. Furthermore, discarding entire clusters has consequences for the total concept frequency.

© 2021 Walter de Gruyter GmbH, Berlin/Munich/Boston
Scroll Up Arrow