Introduction to special issue on areal typology of lexico-semantics

ED/INDEXED IN Baidu Scholar · BLL Bibliographie Linguistischer Literatur · Cabells Journalytics · CNKI Scholar (China National Knowledge Infrastructure) · CNPIEC: cnpLINKer · Dimensions · EBSCO (relevant databases) · EBSCO Discovery Service · ERIH PLUS (European Reference Index for the Humanities and Social Sciences) · Genamics JournalSeek · Google Scholar · IBR (International Bibliography of Reviews of Scholarly Literature in the Humanities and Social Sciences) · IBZ (International Bibliography of Periodical Literature in the Humanities and Social Sciences) · J-Gate · Journal Citation Reports/Social Sciences Edition · JournalGuide · JournalTOCs · KESLI-NDSL (Korean National Discovery for Science Leaders) · Linguistic Bibliography · Linguistics Abstracts Online · Microsoft Academic · MLA International Bibliography · MyScienceWork · Naver Academic · Naviga (Softweco) · Norwegian Register for Scientifi c Journals, Series and Publishers · OLC Linguistik · Primo Central (ExLibris) · ProQuest (relevant databases) · Publons · QOAM (Quality Open Access Market) · ReadCube · SCImago (SJR) · SCOPUS · Semantic Scholar · Sherpa/RoMEO · Summon (ProQuest) · TDNet · Ulrich‘s Periodicals Directory/ulrichsweb · WanFang Data · Web of Science: Arts & Humanities Citation Index; Current


Introduction 1
In recent years, the lexicon has become increasingly popular as a subject for cross-linguistic study. Although there is some debate around the exact meaning, 'lexical typology'as it has come to be knownis, at its broadest, the systematic study of cross-linguistic variation in words and vocabularies (cf. Koptjevskaja-Tamm 2008;Koptjevskaja-Tamm et al. 2016). This special issue will treat lexicosemantic phenomena showing parallels across languages and address how these similarities may be described and accounted forby universal tendencies, genetic relations among the languages, their contacts and/or their common extralinguistic surrounding.
Morphosyntactic and phonological features are regularly used by linguists to establish the existence of linguistic areas and construct areally based typologies. By contrast, lexico-semantic phenomena have, with a few exceptions (e.g., Brown 2011;Enfield 2003;Matisoff 2004;Smith-Stark 1994;Sobolev 2001), received remarkably little attention from areal linguistics and areal typology, and little is known about the geographical variation they display. Matisoff (2004), Vanhove (2008), Zalizniak et al. (2012) and Urban (2012) give numerous examples of cross-linguistically recurrent patterns of polysemy; whilst some are found the world over, others are clearly areally restricted and witnesses of language contact.
The study of lexical phenomena is of course well-established in research on language contact. Loanwords have been studied from a systematic cross-linguistic perspective with particular respect to the varying borrowability of words belonging to different parts of speech and/or coming from different semantic domains (cf. Haspelmath and Tadmor 2009;Wohlgemuth 2009). Areal lexico-semantics (Ameka and Wilkins 1996;Koptjevskaja-Tamm and Liljegren 2017), however, is concerned not with the way words move from language to language, but with the diffusion of semantic features across language boundaries in a geographical area. For instance, in adding to the evidence for an Ethio-Eritrean linguistic area, Hayward (1991Hayward ( , 2000; also Treis 2010) points out many shared lexicalization patterns in the three Ethiopian languages Amharic (Semitic), Oromo (Cushitic) and Gamo (Omotic). These fall into four categories: (i) shared semantic specializations, e.g. 'die without ritual slaughter (of cattle)'; (ii) shared polysemy, e.g. 'draw water' = 'copy'; (iii) shared derivational pathways, e.g. 'need' = causative of 'want'; (iv) shared ideophones and idioms, e.g., 'I caught a cold' expressed via 'a cold caught me'.
Lexico-semantics is a potentially vast field, spanning the convergence of the meanings of individual lexemes, through the structuring of entire semantic domains, to the organization of complete lexicons. As outlined in Koptjevskaja-Tamm and Liljegren (2017), at least the following groups of lexico-semantic phenomena may serve as indicators of areality: lexico-semantic parallelsshared colexification patterns and/or shared lexico-constructional patterns/calques across many West African languages, such as the colexification of 'fruit' and 'child' or 'fruit' being expressed as 'child of tree', both cases involving a semantic association between 'child' and 'fruit'; shared formulaic expressions, such as the farewell expressions au revoir (French), auf Wiedersehen (German), på återseende (Swedish), do svidanija (Russian) and näkemiin (Finnish), which follow the same model across a number of European languages; area-specific lexicalizations and a shared or similar-looking internal organization of certain semantic domains, such as a highly specialized vocabulary describing dairy practices and dairy products across the languages of the Greater Hindu Kush (Koptjevskaja-Tamm and Liljegren 2017: 218).
The fact that lexico-semantic patterns are often closely tied to cultural patterns suggests that they can be useful as a measure of innovation and diffusion in cultural practice. For example, in a recent colexification study, Schapper (2019) identifies a recurrent lexical semantic association between transitive verbs of smelling and kissing across Southeast Asia and argues that it reflects a cultural practice widespread across the area in which conventionalized gestures expressing greeting and/or affection (i.e., kissing) involve an olfactory gesture (i.e., smelling, sniffing). Conventionalized formulaic expressions used for particular pragmatic reasons, such as greetings, curses, proverbs, etc. seem particularly prone to accruing culturally significant information. Ameka (2006aAmeka ( , 2006b quotes several interactional routines including proverbs shared among the West African languages, primarily those in the Volta Basin, e.g., good-night wishes ('sleep/lie well') and thanking or leave-taking expressions. The formula 'When I die, don't cry (i.e., don't mourn for me)' as an expression of extreme gratitude builds on the West African cultural requirement to publicly express sorrow during a funeral by displays of crying and wailing, but "when someone does something very good for you, by absolving them from doing the things that one is expected to do when other people die, one is saying that the favour that has been received is like the ultimate thing or even more than it" (Ameka 2006a: 253). The leave-taking expressions ('I am asking for a way/road') also reflect the common West African cultural model that a visitor cannot just leave without first asking the host for permission to do so (Ameka and Breedveld 2004: 171-172).
The possibility of using lexico-semantic patterns to track deep time connections between groups and establish new areas has also been put forward. Urban (2009) shows that the lexical association of words for 'sun' and 'moon' is a Pacific Rim feature, furthering the observations of Bickel and Nichols (2006) in the AutoTyp project. While attempts using morphosyntax and phonology to unite the Australian and Melanesian linguistic areas into a single "Sahul" macro-area have generally found little success (Nichols 1997;Reesink et al. 2009), preliminary work indicates that lexico-semantic patterns such as fire/firewood colexification may be more promising for establishing Sahul as a linguistic area (Schapper et al. 2016). Similarly, an area uniting Mainland and Island Southeast Asia is supported by colexification of 'smell' and 'kiss' (Schapper 2019) and by lexical expressions where 'sun' is expressed as 'eye of day' (Urban 2010).
In short, lexico-semantics studied from an areal and diversity perspective has significant potential to offer new insights to linguistic typology as well as historical and contact linguistics, but it is still awaiting systematic research.

Workshop
The papers in this special issue of Linguistic Typology represent a selection of those that grew out of a workshop on areal lexical typology held at the 12th meeting of the Association of Linguistic Typology (ALT 12) in Canberra in 2017. This workshop solicited contributions related to geographically and genetically diverse languages with the aim of exploring topics furthering lexical typology such as: 1. How do we recognize contact-induced similarities in lexico-semantic structures, rather than genetic, universal or accidental ones? How do we distinguish between significant and trivial similarities? 2. What are the best ways for discovering lexico-semantic similarities (e.g., by "noticing" them, by systematic comparison of lexical domains across languages, or by automated extraction from digital databases)? What are the advantages and disadvantages of these different data collection processes? 3. At what level can semantic features be said to be shared across languagesfor instance, at the level of individual words, particular features or bundles of features of words? Or can we observe lexico-semantic similarities across languages in the organisation of entire semantic fields? 4. What is the specific contact situation behind the lexico-semantic phenomena under discussion? Do the lexico-semantic phenomena have interrelations with extralinguistic factors, such as environmental or culture features? 5. What do comparative approaches relying on large databases offer and what are the advantages of micro-scale studies? How reliable are the data coming from large databases?
The goal of the workshop was to address these topics in order to contribute, both empirically and theoretically, to the development of lexical typology from an areal perspective. We were particularly interested in contributions with the scope of an area or a larger number of languages, where the major concern would be separating contact-induced convergence from inheritance and/or more universal tendencies. In the final papers developed out of the workshop, not all the above issues were addressed equally. A major emphasis of many papers was methodological issues to do with comparability in the lexicon and different, particularly computational, techniques for detecting patterns in lexical data sets. These concerns are continued in the papers published here. Prominent in most contributions to this volume is the concept of "colexification", an approach to crosslinguistic atoms of senses associated with lexemes developed by François (2008): "A given language is said to COLEXIFY two functionally distinct senses if, and only if, it can associate them with the same lexical form". François distinguishes between "strict" colexification (e.g., TOE and FINGER in Latvian pirksts) defined on the basis of identity of forms in synchrony, and "loose" colexification, which covers relatedness of forms encoding two concepts from a diachronic point of view as well as cases of partial identity of forms, for instance, in derivation or compounding (e.g. German Haupt means 'head' as well as 'main', but it has the latter function only in compounds such as Hauptbahnhof 'main station'). Of the contributions to this volume, Georgakopoulos et al., Gast and Koptjevskaja-Tamm, and Souag limit themselves to strict colexification. The contributions of Schapper, Segerer and Vanhove, Urban, and Liljegren cast a wider net and also include concepts expressed by derivationally related forms, i.e., "loose colexification" and beyond.

This volume
Map 1: Geographical areas treated in this special issue.

Areal typology of lexico-semantics
There are still relatively few colexification studies and this volume significantly increases the number, but some contributions are cautious about the approach. 2 For example, Georgakopoulos et al. point out that the colexification approach is limited by the fact that colexification networks are restricted to pairwise associations. Schapper reviews some of the criticisms that have been levelled at the colexification approach and suggests that potential pitfalls of the approach have been enhanced by some recent simplistic applications of the methodology. She contends that automated approaches to colexification, by only looking at lexemes in isolation, miss comparable semantic associations that are expressed by phrasal and clausal constructions. Souag commends the colexification approach for enabling researchers to establish etic grids for cross-linguistic semantic comparison, but emphasizes that the etic grid itself should consist of sufficiently welldefined contexts. In his study he therefore defines colexifications not just by means of glosses but with short phrases narrowing down the intended context of usage of those glosses. This highlights that colexification studies have tended not to be about senses stricto sensu but about contextual readings of lexical items.
Mithun's contribution takes a different approach, dealing with areal lexicosemantic patterns in the organization of a core part of the verbal lexicons in languages indigenous to the North American West. These patterns concern sets of common verbs such as those denoting basic position, movement, and handling, dying and killing, and more: these verbs distinguish such features as the number, animacy, shape, and/or consistency of their most immediately involved participant as part of their meanings. However, as Mithun clarifies, the verbs do not specify features of participants directly, but rather denote what are viewed as distinct kinds of events and states. Mithun's contribution is unique in the special issue for highlighting cross-linguistic similarities in the organisation of a semantic field.
Several contributions to the special issue make use of large digital databases of lexemes to investigate the areality of lexical typological features, most notable CLICS (Database of cross-linguistic colexifications), 3 versions 2 (List et al. 2018) and 3 (Rzymski et al. 2020), and RefLex (Segerer andFlavier 2011-2019), 4 an online cross-linguistic lexical database of African languages. The contribution by Segerer and Vanhove demonstrates the usefulness of such databases in bringing to light areal patterns of colexifications and shared lexico-constructional patterns that were previously unknown or underestimated in their range and preponderance.
Gast and Koptjevskaja-Tamm, however, warn that the data used in some of the databases may not be fit for this purpose. Schapper's contribution also cautions the reader about automatic approaches to lexical typology that make use of wordlist based databases, pointing out that their focus on simplex lexemes can lead to shared patterns across lexicons being missed.
The use of large databases goes hand in hand with the application of quantitative methodologies. Two papers, Gast and Koptjevskaja-Tamm, and Georgakopoulos et al., test different quantitative methods in order to recognize areal patterns in the lexicon and arrive at various generalizations pertaining to them. Georgakopoulos et al. make use of weighted semantic maps, formal concept lattices, correlation analysis, and dimensionality reduction, to identify colexification patterns in the perception-cognition domains. They evaluate the extent to which these are specific to particular areas and thereby singling out a number of general cross-linguistic regularities in the structuring of lexicons from area-specific properties. Gast and Koptjevskaja-Tamm (a sequel to Gast and Koptjevskaja-Tamm 2018) set up two objectives. One is to propose a method measuring degrees of persistence and diffusibility, and to determine such degrees in colexification patterns, in comparison to the phonological make-up of nuclear vocabulary. A second objective is to determine to what extent colexification patterns vary in their degrees of persistence and diffusibility. With these aims in mind, they use the phonological dissimilarity measures devised by Jäger (2018) and the lexical-semantic data from the CLICS3-database to zoom in on Europe, one of the regions with the highest density of data points in CLICS3. They test several hypotheses on degrees of persistence and diffusibility of colexification patterns, in comparison to nuclear vocabulary matter, and relative to sections of the lexicon.
In their contribution Georgakopoulos et al. bemoan the small samples (10-50 languages) that have been used in most previous lexical typological studies. They argue that because features of the lexicon are not easily or straightforwardly identifiable, large datasets are needed in order to formulate crosslinguistic generalizations. But as several contributions point out, there are other considerations than mere size of a sample that are important. Observing differences in meaning calquing between standard varieties and dialects, Urban contends that lexical typological work exploring areal patterns needs to make use of dense samples. Souag advocates comparing language islands within an area to their relatives outside, arguing that it should make it possible to investigate the processes by which languages become part of areas.
Whilst the contributions mostly make use of sizeable samples of languages, they are often limited in the number of concepts or semantic domains they look at. Segerer and Vanhove limit themselves to the colour domain, Liljegren to kinship, and Georgakopoulos et al. to perception verbs. Schapper's contribution looks only at meaning extensions of 'bone' lexemes, while Urban looks at the relationships between lexemes for 'lungs', 'liver' and 'heart'. Gast and Koptjevskaja-Tamm need not limit themselves in this way thanks to the large amount of digital lexical data available for European languages. It will be key for future studies to go beyond a single feature or semantic domain when making claims of areality based on lexical patterns.
The volume also adds to an increasing body of research which argues that lexico-semantics is unduly neglected in contact linguistics (Lefebvre 2008;Renner 2018). Several contributions make observations about the borrowability of lexical patterns in contact situations. Urban observes similar borrowing patterns in the Balkans, the Caucasus and the Andes, and contends that patterns of lexico-semantic organization that recur cross-linguistically are particularly prone to spread through language contact. Based on his case study of Korandje, Souag makes the argument that borrowing of lexemes may be slower than semantic calquing in basic vocabulary. Gast and Koptjevskaja-Tamm likewise show that colexifications are less persistent than the phonological matter of nuclear vocabulary. In addition, they demonstrate that degrees of persistence of a colexification pattern vary across sections of the lexicon: colexifications involving nuclear vocabularythe 40 most stable concepts of the Swadesh listare most genealogically stable, and colexifications involving non-nuclear vocabulary are least genealogically stable.
Finally, for Souag, Liljegren, and Urban the investigations of the lexicosemantic patterns reported in the volume are intimately linked to the greater endeavour of understanding historical dynamics and unravelling the relations between linguistic genealogy, linguistic contact, and the broader historical and cultural setting in the areas under investigation. To mention one example, Liljegren argues that the different areal lexicalization patterns for the kinterms in the Hindu Kush betray intra-regional differences in marriage patterns and local alliance building, which, in turn, reflect the very gradualand still incompleteprocess of Islamization in the area.
While we are still far from having a comprehensive picture of crosslinguistic diversity in lexico-semantics, the studies in this volume can be said to represent a material advance in charting lexico-semantic patterns and their progress over geographical areas. Perhaps more significant are the clear directions for methodological improvements in lexico-semantic typological research offered by these studies, without which a proper appreciation of the vast and textured nature of the world's lexica will remain allusive.
Research funding: AS's research is supported by European Research Council "OUTOFPAPUA" project (grant agreement no. 848532), the Netherlands Organisation for Scientific Research VENI project "The evolution of the lexicon. Explorations in lexical stability, semantic shift and borrowing in a Papuan language family", the Volkswagen Stiftung DoBeS project "Aru languages documentation", the INALCO-funded projet scientifique blanc "Merging meanings in Melanesia", and the Australian Research Council project (ARC, DP180100893) "Waves of words". MKT's research is supported by grant 2018-01184 from the Swedish Research Council.