Generating semantic maps through multidimensional scaling: linguistic applications and theory

Abstract This paper reports on the state-of-the-art in application of multidimensional scaling (MDS) techniques to create semantic maps in linguistic research. MDS refers to a statistical technique that represents objects (lexical items, linguistic contexts, languages, etc.) as points in a space so that close similarity between the objects corresponds to close distances between the corresponding points in the representation. We focus on the use of MDS in combination with parallel corpus data as used in research on cross-linguistic variation. We first introduce the mathematical foundations of MDS and then give an exhaustive overview of past research that employs MDS techniques in combination with parallel corpus data. We propose a set of terminology to succinctly describe the key parameters of a particular MDS application. We then show that this computational methodology is theory-neutral, i.e. it can be employed to answer research questions in a variety of linguistic theoretical frameworks. Finally, we show how this leads to two lines of future developments for MDS research in linguistics.


Other MDS algorithms
As mentioned in the main paper, classic scaling is only one of various MDS algorithms available. Here we briefly discuss two other algorithms that have been used in the linguistic MDS literature.
A method developed in de Leeuw and Heiser (1977) and subsequent work is based on an iterative method for minimizing a function, known as majorization. The corresponding MDS algorithm that minimizes stress by majorization is known as SMA-COF (Scaling by MAjorizing a COmplicated Function) (Borg and Groenen 2005, §8;de Leeuw and Mair 2009). In practice, this implementation of MDS is more commonly used than classic scaling, as it is a more flexible and readily available method, for example implemented in R (see Borg, Groenen, and Mair 2018, sections 9 and 10 for a discussion of the benefits and potential issues of SMACOF).
The algorithm referred to as Optimal Classification (OC) by Croft and Timm (2013), belongs to a model that is known variously as an unfolding model, an ideal point model, or a preference model (see Borg and Groenen 2005, §14;Ding 2013, p. 247;Groenen and Borg 2014, p. 100;Borg, Groenen, and Mair 2018, §8 for more details).
Unfolding is often classified as a technique closely related to MDS, but distinct from it with respect to the sort of input data. Unfolding is intended to map preference data, for example voting behavior of a number of individuals (Poole 2005). Both the individuals and their choice preferences are represented in the output visualization. The linguistic equivalent of this, as explicated by Croft and Poole (2008, §3), is to interpret the possibility to express function by form , as a (binary) preference for by an individual . Unfolding was mostly used in the earlier linguistic MDS literature, which was interested in the automatic generation of maps that captured the same insights as classical semantic maps, as discussed in section 3.1 of the main paper.

Further reading
For reasons of space the mathematical introduction in the main paper had to be very concise. For readers who would like to read a more thorough introduction, or are interested in further details, here are some suggestions for further reading: -A simple online interactive demonstration version of Figure 2 in the main paper can be found at https://demonstrations.wolfram.com/EigenvectorsByHand/ -There are numerous linear algebra textbooks, which introduce both matrices and their operations, and eigenvalues and eigenvectors. An example is: Gilbert Strang (2016). Introduction to Linear Algebra, 5th edition. Wellesley-Cambridge Press. ISBN 978-0-9802327-7-6.
See chapter 6 on eigenvalues and eigenvectors (see also http://math.mit.edu/~gs/ linearalgebra/). -For readers who are primarily interested in statistical applications of eigenvector analysis, and methods such as PCA, various online resources, and suitable textbooks are available. See Jolliffe (2005) for a short encyclopedia entry. -Overview works on MDS (mathematically advanced) that are cited here and in the main paper are Ding (2013Ding ( , 2018, Borg and Groenen (2005), Groenen and Borg (2014), and Borg, Groenen, and Mair (2018).

Dimension interpretation: (Multiple) Correspondence Analysis
In section 3.1 of the main paper, we discuss ways to interpret the dimensions of an MDS map. (Multiple) Correspondence Analysis (CA) is a method related to MDS, and facilitates dimension interpretation through the addition of supplementary points on the map. Whereas MDS takes a dissimilarity matrix as its input, CA uses all two-way cross-tabulations between categorical variables, and is therefore more suited for a dataset that has been extensively annotated. In cognitive and typological linguistics, CA has been used a.o. to create semantic maps of cutting and breaking events in Majid et al. (2007), the relation between to bother, to annoy, and to hassle in Glynn (2010), causatives in Levshina et al. (2013) and Levshina (2016), and adjectival profiling of shame in Krawczak (2018). Importantly, CA can be supplemented with passive variables (i.e., points that are added only after the construction of the map) to see how the resulting dimensions are associated with individual constructions. Such points then assist in the interpretation of the crucial semantic dimensions in the dataset. For example, Levshina (2016) finds lexical causatives (e.g. to kill) across languages to end up in similar parts of the map. These passive variables are associated with the functions of direct and intentional causation. On the other hand, analytic causatives (e.g. to cause to die) are more widely distributed and thus more semantically diverse, although for most languages these forms relate to indirect causation and autonomous causees (see her Figures 1 and 2).