BY 4.0 license Open Access Published online by De Gruyter Mouton October 13, 2021

Cross-linguistic constraints and lineage-specific developments in the semantics of cutting and breaking in Japonic and Germanic

John L. A. Huisman, Roeland van Hout and Asifa Majid
From the journal Linguistic Typology

Abstract

Semantic variation in the cutting and breaking domain has been shown to be constrained across languages in a previous typological study, but it was unclear whether Japanese was an outlier in this domain. Here we revisit cutting and breaking in the Japonic language area by collecting new naming data for 40 videoclips depicting cutting and breaking events in Standard Japanese, the highly divergent Tohoku dialects, as well as four related Ryukyuan languages (Amami, Okinawa, Miyako and Yaeyama). We find that the Japonic languages recapitulate the same semantic dimensions attested in the previous typological study, confirming that semantic variation in the domain of cutting and breaking is indeed cross-linguistically constrained. We then compare our new Japonic data to previously collected Germanic data and find that, in general, related languages resemble each other more than unrelated languages, and that the Japonic languages resemble each other more than the Germanic languages do. Nevertheless, English resembles all of the Japonic languages more than it resembles Swedish. Together, these findings show that the rate and extent of semantic change can differ between language families, indicating the existence of lineage-specific developments on top of universal cross-linguistic constraints.

1 Introduction

Every part of language is subject to variation and change, and meaning is no exception. There are numerous cross-linguistic studies exploring various semantic domains that demonstrate languages differ substantially in the number, boundaries, and foci of meaning categories they distinguish. While the domain of colour (Berlin and Kay 1969; Kay et al. 2009) is perhaps the most well-studied, there are many other examples, such as spatial relations (Levinson and Wilkins 2006), body parts (van Staden and Majid 2006), temperature (Koptjevskaja-Tamm 2015), and cutting and breaking events (Majid et al. 2008)—for reviews see Evans (2010) and Koptjevskaja-Tamm et al. (2016).

The studies referred to above have been successful at elucidating the structure of various semantic domains, but one issue that plagues all such studies is how to understand cases that appear to form an exception to broader cross-linguistic patterns. In a comparative study of separation events, speakers of 28 typologically different languages were shown a set of videoclips depicting various cutting, breaking and tearing actions, and were asked to freely describe them in their native language (Majid et al. 2008). The overall categorisation of separation events was found to be largely constrained cross-linguistically,[1] with a small number of semantic dimensions capturing the structure of the domain: one dimension that represents high versus low predictability of the point of separation, a second dimension distinguishing ‘tearing’ events, and another dimension that distinguishes ‘snapping’ from ‘smashing’ events[2] (Majid et al. 2008). However, Japanese did not fit well within this common cross-linguistic semantic structure: while the overall mean correlation between languages was r = 0.53, Japanese showed the lowest correlation at r = 0.04. This seemed to be the case because many verbs were unique to a single event in the cutting and breaking stimulus set (Majid et al. 2008: 245).

However, the data in the original Majid et al. (2008) cross-linguistic study was sparse—i.e., only one speaker contributed data for Japanese—and the authors suggested further work was required to determine whether Japanese does indeed “categorise strikingly differently from the other languages” or whether this was a sample artefact (Majid et al. 2008: 245). Therefore, the first aim of the current study is to re-examine the semantic structure of the cutting and breaking domain in Japanese with more data. Since the original study, Fujii et al. (2013) proposed several additional distinctions that Japanese makes within the cutting and breaking domain, such as ‘loss of functionality’, but it is unclear how these distinctions interact with the three main dimensions identified by Majid et al. (2008). We ask whether Japanese is truly unique in its semantic organisation of separation events or whether it respects the dimensional structure found in other languages.

Another question that remains open in cross-linguistic studies of semantic categories is how to sample languages adequately to make generalisations. Many studies take a sample that is as diverse as possible, arguing that lineage-specific similarities can lead to biases in estimating cross-linguistic patterns (Dryer 1989; Perkins 1989; Rijkhoff and Bakker 1998). However, comparison of the cutting and breaking vocabulary in four Germanic languages shows that even closely related languages can differ substantially in how they categorise these events (Majid et al. 2007). Similar differences between related languages have also been demonstrated for other domains, including locomotion (Malt et al. 2014; Slobin et al. 2014), containers (Majid et al. 2015; Malt et al. 1999) and spatial relations (Gentner and Bowerman 2009; Majid et al. 2015). Since typological studies tend to favour diverse language samples, the scope of semantic variation across related languages is not well understood.

Therefore, the second aim of the current study is to establish the amount of semantic variation across the Japonic language family by comparing newly collected Standard Japanese[3] data with data from the highly divergent Tohoku dialects, and four related Ryukyuan languages—the Methods section (Section 2) introduces these languages in more detail. We ask whether all languages within the Japonic language family categorise separation events in a similar way. This is a particularly interesting question given the previously reported uniqueness of Japanese. If there are more languages that do not fit the common cross-linguistic structure, a good place to start would be relatives of Japanese. We therefore examine the semantic similarity of cutting and breaking events in varieties of the Japonic language family. We ask whether the potential Japanese uniqueness in this domain is a feature of the entire language family.

Since comparative studies rarely examine languages from the same family, it is unclear how the semantic variation in, for example, the Germanic languages (Majid et al. 2007) fits within the broader cross-linguistic context. Therefore, the third aim of the current study is to compare the amount of semantic variation across the two language families by making a direct comparison between the newly collected Japonic data and the previously reported Germanic data (Majid et al. 2007). Given that the two language families are approximately of the same age (see below), we ask whether the amount of variation within the two language families is comparable.

In the next section we discuss the methodology for collecting the data, starting with an introduction of the Japonic languages, followed by descriptions of the speakers, materials and procedure used in the study. The results are then presented in three parts, first focusing on the Japonic data to answer (1) whether Japanese is unique in its semantic organisation of the cutting and breaking domain, and (2) whether languages or language varieties related to Japanese organise this domain in the same way. The third part of the results section compares the variability in the Japonic and Germanic data to answer (3) how semantically similar the languages within Japonic and Germanic are in the cutting and breaking domain.

2 Methods

2.1 Languages

The Japonic language family is spoken across the Japanese archipelago and consists of two main branches: Japanese spoken across the main islands, and Ryukyuan spoken across the smaller islands in the south (see Figure 1). While the exact number of “languages” is under debate—UNESCO lists seven (Moseley 2010), whereas Ethnologue mentions eleven (Eberhard et al. 2015)—there is more consensus about which dialect areas are considered to be unintelligible for Standard Japanese speakers. Within the Japanese branch, the Hachijo dialects are considered the most divergent (Hattori 1976; Pellard 2011). In addition, the varieties in the northern periphery (Tohoku) and the southern periphery (Kyushu) are highly divergent subgroups (Shibatani 1990; see also Huisman et al. 2019). For example, the Tsugaru dialect from the Tohoku region is unintelligible for speakers of Standard Japanese (Takubo 2018).

Figure 1: Map of Japonic language areas included in this study (left pane, Japanese; right pane, Ryukyuan) with fieldwork locations marked.

Figure 1:

Map of Japonic language areas included in this study (left pane, Japanese; right pane, Ryukyuan) with fieldwork locations marked.

For the Ryukyuan branch, there is a general division into at least five “languages” that roughly correspond to the geographical island clusters: Amami and Okinawa in the north, and Miyako, Yaeyama and Yonaguni in the south (Pellard 2015; Shibatani 1990). Mutual intelligibility is generally considered impossible between these five subgroups (Pellard 2011), although intelligibility between varieties within a subgroup can also be limited—e.g., within Yaeyama (Aso 2015). All Ryukuan languages are listed as either definitely or severely endangered (Moseley 2010). Fluent speakers are generally in their 60s or 70s at least, depending on the specific variety, and intergenerational transmission of the languages has been disrupted (Anderson 2015; Heinrich 2009). The Japanese and Ryukyuan branches are estimated to have diverged from each other over 2000 years ago (Lee and Hasegawa 2011), which is comparable to—in fact, slightly further back than—what is generally accepted for the Germanic languages (e.g., König and van der Auwera 1994: 2), providing ample time for linguistic change to occur.

2.2 Speakers

Data was collected from 64 speakers in six areas (two Japanese and four Ryukyuan, see Table 1) during four fieldtrips conducted between 2017 and 2019. For all areas, data was collected from multiple localities, i.e. in multiple dialects—see also Figure 1. With the exception of Tokyo Japanese—which serves as the de facto national standard—there is no standardised variety of Tohoku Japanese or any of the Ryukyuan languages (Heinrich et al. 2015). As such, we will refer to Tokyo Japanese as “Standard Japanese”, and use the term “language area” for the other five for the remainder of this paper, e.g., the Amami language area. Given the endangered status of Ryukyuan, the data was collected from elderly native speakers, some of whom had little experience in performing abstract, reflective language tasks. As a result, some interview sessions were conducted with multiple speakers simultaneously. However, to minimise any effects arising from this, all analyses were conducted on sessions rather than speakers—see also the paragraph on Coding (Section 2.4).

Table 1:

Speaker and session information per Japonic language area.

Japanese
 Tokyo [jpn]12 speakers (11 female) in 10 sessions
 Tohoku [jpn]10 speakers (4 female) in 8 sessions
Ryukyuan
 Amami [ryn, ams, yox]12 speakers (4 female) in 10 sessions
 Okinawa [ryu]9 speakers (5 female) in 5 sessions
 Miyako [mvi]15 speakers (8 female) in 12 sessions
 Yaeyama [rys]6 speakers (3 female) in 6 sessions

  1. ISO-639-3 codes in square brackets; for complete information per session see the data repository (Section 2.5).

2.3 Materials and procedure

The Japanese and Ryukyuan data were collected using a set of 40 videoclips depicting different cutting, breaking and tearing events. A standardised set of non-linguistic stimuli provides a frame of reference against which similarities and differences across languages and their varieties can be compared (Majid 2011). The set of videoclips consisted of the 28 test items from the Kids’ Cut and Break set (Bowerman and Majid 2003), supplemented with four videoclips from the original Cut and Break Clips (Bohnemeyer et al. 2001), four re-recorded videoclips based on this original set, as well as four new videoclips. The set was designed to better represent the distinctions made in Japanese, based on the findings presented in Majid et al. (2008), Fujii et al. (2013), and several Japanese descriptive studies (e.g., Kaetsu 1979; Kunihiro 1970). Spontaneous events were excluded due to naming difficulties in both the Majid et al. (2008) study and in a pilot test of the current stimulus set.

The Kids’ Cut and Break stimulus set captures the main distinctions in the original Cut and Break stimulus set, but the clips are recorded to be clearer and more engaging for viewers. In addition, the original Cut and Break stimuli only included two videoclips depicting tearing actions. Given that ‘tearing’ was found to be a main dimension in the cross-linguistic study (Majid et al. 2008), and that Japanese has more than two commonly used verbs to describe tearing events (e.g., saku, yabuku, chigiru), the Kids’ Cut and Break stimuli were considered more appropriate for this study. Another advantage is that the Kids’ Cut and Break stimulus set includes a wider range of instruments, which make it possible to further investigate the existence and semantic range of instrument-specific verbs (e.g., ffasï ‘cut with scissors’ in Nishihara and Miyako; Nakama 2000).

Additional videos were included to tap specific contrasts deemed to be of relevance. To provide a contrast to the accidental breaking of a glass by bumping it off a table (videoclip 35), we recorded a new videoclip in which the glass was deliberately thrown (videoclip 7). Next, videoclip 16 shows the cutting of grass that, together with videoclip 22 in which a person’s hair is cut, may elicit the more specific verb karu, which involves objects that can grow back. Third, videoclip 21, cutting into a tree trunk, was added to provide a similar event to videoclip 26, cutting into a watermelon, which in the cross-linguistic study elicited the unique response kireme=o ireru [incision=ACC insert] (Majid et al. 2008). Finally, we recorded a new tearing event (videoclip 36) in which the result was a large number of fragments (cf. Hojo 1993), and where the resulting objects lose their functionality (cf. Fujii et al. 2013).

Table 2 provides a description of the 40 videoclips, as well as their sources. The stimuli from the original Cut and Break Clips and the Kids’ Cut and Break are available on the L&C Field Manuals and Stimulus Materials website.[4]

Table 2:

Descriptions and sources of the videoclips used in the naming task, as well as the two orders in which they were presented. Asterisks indicate scenes that were included in the Germanic data.

OrderActionSource
AB
140Cut paper with scissorsKids’ C&B
239Break a twig with hands*Kids’ C&B
338Break a mirror with a hammer*New, rerecord
437Cut bread with a knifeKids’ C&B
536Tear paper using a knifeKids’ C&B
635Tear cloth with hands*Kids’ C&B
734Break a glass by throwing itNew
833Cut a watermelon with a machete*Original C&B
932Cut fingernails with a clipperKids’ C&B
1031Break a chocolate bar with handsKids’ C&B
1130Cut a piece of pie with a shardKids’ C&B
1229Hack off branch with an axe*New, rerecord
1328Cut an egg with a slicerKids’ C&B
1427Break a pot with a hammer*Kids’ C&B
1526Cut off a branch with a knife*Kids’ C&B
1625Cut grass with a sickleNew
1724Cut a branch with a saw*Original C&B
1823Cut a nail with pliersKids’ C&B
1922Cut cardboard with a knifeKids’ C&B
2021Tear bread with handsKids’ C&B
2120Cut into a tree with a knifeNew
2219Cut hair with scissors*Kids’ C&B
2318Tear a bag with hands*Kids’ C&B
2417Cut a banana with a knife*Kids’ C&B
2516Break off a branch with handsNew, rerecord
2615Cut into a watermelon with a knife*Original C&B
2714Tear a bread roll with handsKids’ C&B
2813Tear a banana peel with pliersKids’ C&B
2912Tear out a page with handsKids’ C&B
3011Cut cloth with scissors*Kids’ C&B
3110Cut wood with an axe*New, rerecord
329Cut breads using scissorsKids’ C&B
338Cut scallops with a knifeKids’ C&B
347Cut off a branch with an axeKids’ C&B
356Break a glass by accidentKids’ C&B
365Tear up paper using handsNew
374Break a twig partially*Original C&B
383Cut a rope with a chisel*Kids’ C&B
392Cut a banana with scissorsKids’ C&B
401Cut a rope with a knife*Kids’ C&B

The videoclips were presented in two pseudo-random orders (A and B), one being the reverse of the other. Data collection was conducted in the speakers’ native languages. Speakers saw the videoclips one by one on a tablet or laptop and were asked to describe the event depicted in the videoclip in their own language variety by responding to the query “What happened in the video?”. When participants did not directly address the target event, the follow-up question “What did the person in the video do?” was asked to try and elicit a response related to the target event. Responses of any length were accepted, and speakers were free to give multiple descriptions of each event. All sessions were audio (and sometimes video) recorded for transcription at a later stage. The data was collected under the Ethics Assessment Committee of the Centre for Language Studies at Radboud University.

2.4 Coding

For each videoclip, we extracted the main descriptor(s) that encoded the target event depicted. Across the Japonic language family, this is typically done through a verbal construction (Example 1a). We coded verbs in their citation form (non-negative non-past; Example 1b). For V-V compound verbs,[5] we coded the first and the second verb individually (Example 1c). For light verb constructions with the verb suru ‘to do’, we coded the element combining with suru, as that part of the expression carries the semantic content. Verbal nouns were coded as such (Example 1d), adverbial nouns were coded without their particle (Example 1e), and adverbialised adjectives were coded in their citation form (non-negative non past; Example 1f). For syntagms, we coded all elements without any particles (Example 1g).

Standard Japanese

(1)
a.
kagami=owaru.waru
mirror=ACCbreak:NPST.
‘(They) break the mirror.’
b.
kagami=owatta.waru
mirror=ACCbreak:PST.
‘(They) broke the mirror.’
c.
kagami=otataki-waru.tataku, waru
mirror=ACCstrike:INF-break:NPST.
‘(They) break the mirror (by hitting it).’
d.
tamago=osuraisusuru.suraisu
egg=ACCslicedo:NPST
‘(They) slice an egg.’
e.
tamago=obarabara=nisuru.barabara
egg=ACCpieces-ADVdo:NPST
‘(They) cut an egg into pieces.
f.
tamago=okomaka-kusuru.komakai
egg=ACCfine-ADVdo:NPST
‘(They) finely cut an egg.’
g.
kirikomi=oireru.kirikomi, ireru
incision=ACCinsert:NPST.
‘(They) cut into.’

As speakers were allowed to produce multiple responses, we coded all responses. In sessions where multiple speakers were present, we coded all unique responses produced for a videoclip—so, if two speakers produced different responses, we coded both.

Finally, we coded all responses for cognacy. For example, warɯ (Standard Japanese), warɯ̈ (Tohoku; Hachinohe variety), warjuɴ (Amami, Naze variety), waiɴ (Okinawa, Motobu variety), baɭ (Miyako, Tarama variety), and bari (Yaeyama, Shiraho variety) all share a common etymological origin.

2.5 Data availability

The coded data as described above is available through an OSF repository.[6] In addition, the repository contains the R syntax used for the analyses described in the sections below, as well as further information about the languages, speakers and materials described in this study.

3 Results

3.1 Overview of the naming data

First, to get a general overview of the variation in the verbs used to describe cutting and breaking events across the Japonic language family, we created frequency tables of unique responses for each language area—see Table 3. These frequencies were calculated at the cognate level. As Table 3 makes clear, the Japonic languages share many cognates in this domain—in fact, the three most frequent cognates were the same across languages. Next, we briefly describe the main types of responses, namely single verb responses and multiple verb responses.

Table 3:

List of responses for linguistic area produced more than once in the naming task.

JapaneseTohokuAmamiOkinawaMiyakoYaeyama
kirɯ173kïrɯ̈155kirjuɴ208tʃiːɴ89ksɨ209ʃʃi114
warɯ59warɯ̈32warjuɴ59waiɴ32baɭ80bari41
orɯ36orɯ̈26wurjuɴ31wuiɴ14buɭ39buri19
tʃiɡirɯ27saɡɯ̈25jaburjuɴ24jain6saksɨ37jari19
sakɯ21tsïŋïrɯ̈13wəhərjuɴ15pikuɴ6suɭ25fusahi15
otosɯ14karɯ̈10sakjuɴ14hittʃiːɴ4bakiɭ15muʃi12
jabɯkɯ13waɡerɯ̈10tʃiɡirjuɴ6satʃuɴ4jabuɭ13saki7
wakerɯ12kasɯ̈9utusuɴ5wakiːɴ4tuɭ7utuʃi5
jabɯrɯ8jabɯ̈gɯ̈8muʃirjuɴ4iriːɴ3musɨ6kaki3
karɯ7kaɡɯ̈8[hadʒa]2nudʒuɴ2sɨːtsɨ6kari3
kirikomi-irerɯ7kowasɯ̈7[muttʃui]2ffasɨ5turi3
kizɯ-tsɯkerɯ7sasɯ̈7[ssudʒa]2kizam4kitʃiri2
hikkakɯ4kïdzɯ̈-dzɯ̈ɡerɯ̈6haɡasuɴ2ŋɡɨ4
haɡasɯ3kezɯ̈rɯ̈3maɡurjuɴ2kaɭ3
suraisu-sɯrɯ3kïtaɡɯ̈rɯ̈3ndaɨ3
torɯ3odosɯ̈3utusɨ3
hanbun-sɯrɯ2haŋasɯ̈2aki2
kizamɯ2haraɯ̈2naɡiɭ2
tsɯkerɯ2kïnaɡɯ̈rɯ̈2
kïrïkomï-ïrerɯ̈2
meɡɯ̈rɯ̈2
torɯ̈2

3.1.1 Single verb responses

The most common type of response was the use of a single content verb to describe the cutting and breaking event. Examples (2) through (7) illustrate cases from each language area.

(2)
Standard Japanese
pan=okitta
bread=ACCcut:PST
‘(They) cut the bread.’
(3)
Aomori variety, Tohoku
kaŋami=bakanadzïdzï=dewatta
mirror=ACChammer=INSTbreak:PST
‘(They) broke the mirror with a hammer.’
(4)
Akaogi variety, Amami
fukuro=ba=ɕɕisaɕɕa
bag=ACChand=INSTtear:PST
‘(They) tore the bag with their hands.’
(5)
Nanjo variety, Okinawa
ki:=nujura=ti:=sa:niwu:taɴ
tree=GENbranch=ACChands=INSTbreak:PST
‘(They) broke the branch with their hands.’
(6)
Nagahama variety, Miyako
kabᶻɨ̥=zu=dupasaɴ=ɕi:ffaɕi-uɭ
paper=ACC=FOCscissors=INSTsnip-PROG
‘(They) are cutting the paper with scissors.’
(7)
Taketomi variety, Yaeyama
bu:nu=sa:nitamunu=dubari
axe=INSTfirewood=FOCbreak:NPST
‘(They) chop firewood with an axe.’

In addition to single content verbs, speakers produced two other types of single-verb responses: (a) the light verb suru ‘to do’ in combination with adverbs (including mimetics) or nouns (generally Sino-Japanese or borrowings), and (b) idiomatic expressions. This latter group was generally only produced by speakers of mainland varieties, although several Amami speakers also did so. Examples (8) through (11) illustrate light verb constructions. As described in Section 2.4, we coded the element combining with suru. Finally, Examples (12) through (14) illustrate idiomatic expressions, for which we coded all content elements (see again Section 2.4).

(8)
Standard Japanese
a.
mijika-kusuru
short-ADVdo:NPST
‘(They) cut shorter.’
b.
tamago=osuraisusuru
egg=ACCslicedo:NPST
‘(They) slice an egg.’
(9)
Hachinohe variety, Tohoku
tamaŋo=obarabara=niɕita
egg=ACCMIM=ADVdo:PST
‘(They) finely cut the egg.’
(10)
Koniya variety, Amami
wagiri=niɕi:
slice=NIdo:NSPT
‘(They) slice.’
(11)
Bora variety, Miyako
gumamdaᶻɨ
fine:ADVdo:NPST
‘(They) mince.’
(12)
Standard Japanese
suika=nikirikomi=oireru
watermelon=DATcut=ACCinsert
‘(They) cut into the watermelon.’
(13)
Hachinohe variety, Tohoku
kiⁿdzɯ̈=tsɯ̈ɡeda
wound=ACCapply:PST
‘(They) scratched.’
(14)
Tatsugo variety, Amami
kizu=bairisho:ta
wound=ACCinsert:PST
‘(They) made a cut.’

3.1.2 Responses with multiple verbs

For the six language areas combined, 8.9% of the 2,141 collected responses contained multiple verbs. The highest percentage was found in Amami (23.4%), but closer inspection revealed that almost 90% of responses with multiple verbs came from two sessions with speakers on Yoron Island. For the remaining eight Amami sessions, multiple verbs accounted for only 6.4% of 326 responses, comparable to the other areas (see Table 4).

Table 4:

Number of responses that contained multiple verbs across six Japonic linguistic areas.

All areas combined8.8% of 2093 responses
 Standard Japanese8.4% of 419 responses
 Tohoku1.2% of 346 responses
 Amami22.9% of 411 responses
 Okinawa8.0% of 174 responses
 Miyako5.3% of 488 responses
 Yaeyama4.3% of 255 responses

The most common type of response with multiple verbs was the verb-verb (V-V) compound. The first verb in such compounds semantically modifies the second, often specifying a means of action or manner of motion/change (Kageyama 2016)—see examples (15) through (20). As described in Section 2.4, the two verbs in V-V compounds were coded individually.

(15)
Standard Japanese
eda=okiri-otoɕi-ta
branch=ACCcut-remove-PST
‘(They) cut off the branch.’
(16)
Hachinohe variety, Tohoku
paɴ=bakiri-wage-da
bread=ACCcut-divide-PST
‘(They) cut the bread into pieces.’
(17)
Sani variety, Amami
paɴ=muɕi-kiri
bread=ACCpluck-cut:NPST
‘(They) tear off a piece of the bread.’
(18)
Shuri variety, Okinawa
tɕinu=hitɕi-jai-taɴ
cloth=ACCpull-tear-PST
‘(They) pulled the cloth apart’
(19)
Shimoji variety, Miyako
sɨ:ka=dutataki-ki̥ɕu
watermelon=FOCstrike-cut:NPST
‘(They) hacked the watermelon.’
(20)
Shiraho variety, Yaeyama
banana=fusai-ɕɕi
banana=ACCsnip-cut:NPST
‘(They) cut the banana with scissors.’

Whereas V-V compounds were generally used to describe different aspects of a single event, speakers also described consecutive events depicted in the videoclips using multiple verbs—with the first verb appearing in a gerundive form. Two stimuli in particular elicited this type of response: videoclip 7, in which a drinking glass was broken by throwing it, which was described using the serial verb construction ‘throw-break’ (e.g., JP nagete-watta; MI tivvi-bariui; YA naŋga-bari); and videoclip 35, in which a drinking glass broke after it was accidently pushed off a table, which was described using the serial verb construction ‘fall-break’ (e.g., JP otoɕite-wareta; AM utu:tɕi-ware:ta; OK; ututɕi-wataɴ). As with V-V compounds, we coded both elements of this type of response.

3.2 Semantic dimensions of cutting and breaking in Standard Japanese

Following Majid et al. (2008), we coded, per interview session, for each pair of videoclips whether the speaker(s) described those two videoclips with the same verb (coded as 1) or different verbs (coded as 0). This created a videoclip-by-videoclip similarity matrix. Majid et al. created binary matrices in their study, but since we had data from multiple speakers/sessions, we were able to take into consideration speaker variation using weighted matrices. To do this, individual matrices were summed to create a Standard Japanese contingency table with frequency counts representing how often speakers used the same verbs for videoclips. We then performed correspondence analysis in R (R Core Team 2018; CA function in the FactoMineR package—Lê et al. 2008), using this frequency table for Standard Japanese as input. Correspondence analysis calculates distances between rows and columns of a contingency table based on chi-squared distances. As such, these distances correspond to the strength of association between rows and columns, which can then be visualised in a (series of) plot(s)—(see e.g., Baayen 2008). The more similar columns or rows are to each other, the closer together they will be in the plot(s). In this case, we focus on the similarity of videoclips, i.e., the similarity of separation events.

The correspondence analysis showed that the first four dimensions accounted for approximately 80% of the variance in the data. We compared how the videoclips were plotted along these dimensions to the findings discussed in Majid et al. (2008) and found corresponding patterns—albeit with a different order for the specific dimensions, and with more distinctions for ‘tearing’ actions (see Figure 2). The differences between the two studies can most likely be accounted for by stimulus sampling (the original study only included two tearing videoclips), although they could also reflect the differential salience of ‘tearing’ within Standard Japanese relative to other languages.

Figure 2: Dimensions 2 and 3 of the correspondence analysis for Japanese.

Figure 2:

Dimensions 2 and 3 of the correspondence analysis for Japanese.

The first dimension, accounting for 28.9% of the variance, distinguished a single videoclip (carving/making a cut into a tree trunk) from all others. Its modal response, the syntagm kizu=o tsukeru [wound=ACC attach] ‘to scratch’ was not used to describe any other videoclip. The videoclip distinguished by this dimension was not included in the original Cut and Break videoclips (Bohnemeyer et al. 2001). Notably, Majid et al. (2008) also reported a single videoclip (poke a hole in cloth with a twig) that was distinguished by a fourth dimension of their analysis.

The second dimension accounted for 18.4% of the variance. On one end of the dimension, we see a cluster of events generally described with the generic ‘cutting’ verb kiru, e.g., videoclip 1 in which a piece of paper is cut using scissors, videoclip 4 in which a loaf of bread is sliced with a knife, and videoclip 17 in which a branch is sawn in two—Standard Japanese does not distinguish between different instruments. On the other end of the dimension, we find three ‘snapping’ events which were almost exclusively described with the verb oru ‘snap’, e.g., videoclip 25 in which a branch is snapped off from a tree. Also towards this end of the dimension were the videoclips depicting ‘smashing’ events, such as videoclip 3 in which a mirror is smashed with a hammer, and videoclip 14 in which a flower pot is smashed with a hammer. In the middle of the dimension, we find ‘tearing’ events, such as videoclip 6 in which a piece of paper is torn along a knife, and videoclip 29 in which a page is torn from a notebook. The way the events are organised along this dimension based on the Japanese data shows striking similarities to the first dimension in Majid et al. (2008), who interpreted it as the ‘predictability of the point of separation’, where one end of the dimension represents events where the point of separation is predictable from the place an instrument intersects an object, while the other end represents events where this is not the case as a result of a more ballistic force or the potential of multiple fractures.

The third dimension, accounting for 17.4% of the variance, contrasted ‘tearing’ events from the three ‘snapping’ events. The distinctness of ‘tearing’ events corresponds to the second dimension found by Majid et al. (2008). In Standard Japanese, videoclips depicting tearing events elicited three modal responses: saku, chigiru, and yabuku. There does not appear to be a transparent organisation of the tearing videoclips along this dimension, but we will come back to this shortly.

The fourth dimension, accounting for 15.1% of the variance, contrasted ‘smashing’ events from other events with low predictability of the point of separation. A distinction between ‘smashing’ and ‘snapping’ corresponds to the third dimension in Majid et al. (2008). Smashing events were generally described with the verb waru in Standard Japanese.

In the original Majid et al. (2008) study, Japanese was found to have a relatively large number of verbs used uniquely for single videoclips. However, the data from this study with a larger sample of speakers suggests that high specificity in this domain is not a general characteristic of Japanese. Instead, there is a small class of terms that partition the semantic space of cutting and breaking events along dimensions similar to what has been previously reported. First, events that, following Majid et al. (2008), have high predictability of the point of separation (e.g., cutting with scissors or a knife) are contrasted with events such as snapping a branch, smashing a pot with a hammer, or tearing a piece of cloth—where there is arguably less predictability. Secondly, snapping events appear to be categorically distinct from smashing in Japanese. Finally, tearing events appear distinct from cutting, smashing, and snapping events. The dimensions uncovered by the correspondence analysis resemble those put forward by Majid et al. (2008), and as such, our findings are consistent with the claim that the Japanese semantic system follows the cross-linguistic constraints on the categorisation of cutting and breaking events. We provide additional evidence for this conclusion below.

3.3 The cutting and breaking domain across the Japonic languages

Next, we set out to compare Standard Japanese to its relatives, using three different measures to investigate the categories of cutting and breaking across the Japonic languages. For a general overview of the structure of the domain, we used correspondence analysis. Then, we used Mantel correlations to examine how similar the language areas were overall in their categorisation of cutting and breaking. Finally, we also examined naming consensus between speakers through a measure based on Simpson’s Diversity Index.

To explore the semantic space of cutting and breaking events in the Japonic language family, we created videoclip-by-videoclip matrices for each session across the languages, using the procedure described above, and constructed aggregate frequency tables. We then used all six tables (two Japanese; four Ryukyuan) as input for a second correspondence analysis, to uncover how well the language family as a whole reflects the semantic dimensions uncovered in Standard Japanese.

The correspondence analysis showed that the first four dimensions of the solution accounted for approximately 91% of the variance in the data (see Figure 3). Again, three of the dimensions resemble those described by Majid et al. (2008), and which were also revealed for Standard Japanese in Section 3.2.

Figure 3: Dimensions 1 and 2 of the correspondence analysis for the Japonic language family.

Figure 3:

Dimensions 1 and 2 of the correspondence analysis for the Japonic language family.

Along the first dimension, accounting for 30.6% of the variance, events such as cutting a piece of paper with scissors (videoclip 1) appeared on one side, while on the other side we find events such as snapping off a branch from a tree (videoclip 25). In between, there are events such as tearing a bread roll (videoclip 27) and smashing a flower pot with a hammer (videoclip 14). As in Section 3.2, the organisation of the events along this dimension strongly resembles the dimension Majid et al. (2008) argued to represent the predictability of the point of separation.

The second dimension, accounting for 23.8% of the variance, contrasts ‘smashing’ and ‘snapping’ events, further highlighting the distinctness of the ‘snapping’ category across the Japonic language family—the three snapping events were also highly distinct from all others on the first dimension.

The third dimension, accounting for 21.3% of the variance, distinguished a single event (videoclip 21: carving/making a cut into a tree trunk) from all others. This videoclip was described using many different verbs—including the generic cutting verb—which might be a further indication that its distinctness in Standard Japanese might be language-specific where it was described using a single unique verb.

The fourth dimension, accounting for 15.7% of the variance, picked out the tearing events from all others, which appeared along a continuum. One interpretation of this continuum could be that it appears to represent the ‘cleanness’ of the tear as a result of the thickness and density of the object (cf. Fujii et al. 2013). On one side, there are events involving thicker and less dense objects, such as bread and banana peels; on the other side, there are thinner objects such as cloth and plastic bags. Alternatively, the continuum could be interpreted as distinguishing destructive tearing from functional tearing (cf. Fujii et al. 2013). On one side we find the tearing of a plastic bag or a piece of cloth (which renders them useless), whereas on the other side we see the tearing of bread which can then be distributed in bite-size pieces. We come back to the distinctions between different tearing actions in the discussion.

To summarise, although there were some differences in the order of the dimensions extracted from the correspondence analysis, the overall semantic organisation of the cutting and breaking domain of the Japonic language family is similar to that found by Majid et al. (2008) in a diverse cross-linguistic sample. This is contrary to what was suggested by Majid et al. in their paper where Japanese appeared to be an outlier. This study with a larger sample of speakers and language varieties shows to the contrary that the Japonic languages adhere to the same semantic principles as other languages of the world.

The correspondence analysis by itself could mean that all languages are linguistically categorising events in the same way, as we suggest; or alternatively it could indicate that Standard Japanese is dominating the solution and masking differences in the other languages. To further investigate how similar individual languages were to each other and to distinguish these possibilities, we calculated pairwise Mantel correlations (in R; mantel function in the ecodist package—Goslee and Urban 2007) between the six videoclip-by-videoclip matrices, using 10,000 permutations and 1,000 bootstrap iterations on 95% confidence intervals. If languages are, indeed, similar in how they partition the cutting and breaking domain, then they should correlate positively with one another. If, however, the correspondence analysis simply reflects one dominant categorisation pattern correlations between languages should be close to zero or negative.

The Mantel correlation tests showed that, overall, there was substantive similarity in the grouping of cutting and breaking events across the Japonic language family. Average Mantel correlation between the six language areas was Mr = 0.83 (SDr = 0.07), with correlations between language pairs ranging between r(780) = 0.69 and r(780) = 0.93 (all p’s < 0.001)—see Table 5. These values indicate that the semantic system of cutting and breaking across the Japonic languages is highly similar. There is no clear division between the Japanese and Ryukyuan varieties, as might have been expected from the overall dissimilarity of the languages (see, e.g., Huisman et al. 2019). With an average Mantel correlation of Mr = 0.75, Okinawa Ryukyuan was the least similar to the other language areas. This could also be the result of data sparsity (fewer sessions and more missing values) for Okinawa Ryukyuan. Importantly however, even this language shows a significant, high positive correlation with the other languages.

Table 5:

Mantel correlations between the six Japonic language areas. All languages show high positive correlations indicating the languages are very similar in how they partition the cutting and breaking domain.

TohokuAmamiOkinawaMiyakoYaeyama
Japanese0.8810.8730.7420.9080.825
Tohoku0.8620.6920.8670.838
Amami0.8090.9260.862
Okinawa0.8030.720
Miyako0.915

Separately, we calculated Mantel correlations between individual sessions, and used these to calculate mean correlations between each pair of languages (e.g., mean Mantel correlation between all Japanese sessions and all Tohoku sessions) to approximate the extent to which individual variation mirrors overall variation between languages—shown in Table 6. The Mantel correlation between the two language-by-language matrices (i.e. between Tables 5 and 6) was r(15) = 0.85, p = 0.027, indicating a high correlation between individual session variation and language level variation. As Table 6 shows, however, correlations between individual sessions were lower than the overall correlations based on the summed language data. On average, correlations between sessions of different languages, Mr = 0.46 (SDr = 0.14), were only slightly lower than correlations between sessions of the same language, Mr = 0.47 (SDr = 0.14), further highlighting the overall high semantic similarity between the Japonic languages.

Table 6:

Mean Mantel correlations between individual sessions across the six Japonic language areas. Correlations are lower than those based on summing overall data from one language, but overall patterns of variation remain highly similar.

TohokuAmamiOkinawaMiyakoYaeyama
Japanese0.4590.4770.4220.5020.464
Tohoku0.4450.3660.4840.416
Amami0.3900.4670.457
Okinawa0.3840.361
Miyako0.468

As a final comparison between languages, we examined whether there were differences in the codability of cutting and breaking events. Majid et al. (2007) found that among Germanic languages, English speakers showed lower naming consensus than Swedish speakers, for example. This difference was found to be the result of the structure of the cutting and breaking lexicon. The English cutting and breaking lexicon was found to be more hierarchical, with two superordinate verbs (cut and break) which subsumed several subordinate verbs (e.g., slice, chop; snap, smash), so the same videoclip can be described with different verbs (at varying levels of specificity). In contrast, Swedish lacked this type of hierarchy resulting in more constrained verb choice. To uncover whether such differences exist in the Japonic languages as well, we followed Majid et al. (2007) and calculated naming consensus across interview sessions using Simpson’s Diversity Index (SDI; Simpson 1949, see also Majid et al. 2018). We calculated ∑n(n − 1)/N(N − 1) per videoclip, in which lowercase n stands for the frequency of each unique verb, and uppercase N stands for the total number of responses for that videoclip. This produces a number between 0 (no consensus, where every speaker uses a different verb to describe a videoclip) and 1 (complete consensus, where every speaker uses the same verb to describe a videoclip).

Figure 4 shows the distribution of naming consistency scores for all the videoclips in each language. Average naming consistency across speakers ranged between MSDI = 0.42 (SDSDI = 0.20) for Amami and MSDI = 0.63 (SDSDI = 0.36) for Okinawa. A one-way analysis of variance comparing naming consistency for the 40 cutting and breaking videoclips across the six Japonic language areas revealed a significant difference, F(5, 234) = 2.78, p = 0.018, and a post-hoc Tukey range test showed that only naming consistency in Amami and Okinawa differed significantly at p < 0.05. The heavy use of multi-verb responses in Amami (see Table 4) is most likely the cause of this difference.[7] Overall, however, the lack of difference between all other language pairs lends further weight to the conclusion that the Japonic languages code and categorise this domain in comparable ways.

Figure 4: Plot of naming consistency scores (calculated using Simpson’s Diversity Index) for 40 videoclips for each Japonic language area, with mean values per language represented by the black circle, error bars represent two times the standard error, grey dots represent individual scores for each videoclip.

Figure 4:

Plot of naming consistency scores (calculated using Simpson’s Diversity Index) for 40 videoclips for each Japonic language area, with mean values per language represented by the black circle, error bars represent two times the standard error, grey dots represent individual scores for each videoclip.

In sum, the correspondence analysis revealed that the dimensions in Standard Japanese are reflected in the entire language family. Moreover, the results align with what has previously been reported for a diverse cross-linguistic sample, with some minor differences in the order in which dimensions were extracted. The high Mantel correlations also indicate a large degree of homogeneity across the language family. Naming consensus was similar across the six language areas. In short, all three measures point to the same conclusion: the Japonic languages partition the cutting and breaking domain in similar ways.

3.4 Comparing the cutting and breaking domain in Japonic and Germanic

Finally, we aimed to put the results of the Japonic language family into cross-linguistic perspective. For this, the newly collected Japonic data described above was compared to existing data from four Germanic languages—English, Dutch, German and Swedish—see Table 7 for a speaker overview, and Majid et al. (2007) for a full description. We recoded Germanic complex predicates to include all elements—as for Japonic; see Section 2.4.

Table 7:

Speaker information the Germanic language.

West Germanic
 English [eng]5 speakers in 5 sessions
 Dutch [nld]7 speakers in 7 sessions
 German [deu]5 speakers in 5 sessions
North Germanic
 Swedish [swe]5 speakers in 5 sessions

  1. ISO-639-3 codes in square brackets.

The Germanic language family was originally spoken in north-western Europe but now has reach across the globe. The language family consists of three branches: North Germanic mainly spoken across the Nordic countries, West Germanic mainly spoken across the north-western part of the European mainland and the British Isles, and the now extinct East Germanic.

The Japanese and Ryukyuan languages are estimated to have split from a common ancestor over 2,000 years ago (Lee and Hasegawa 2011), while the Germanic languages are thought to have split from a common ancestor a few centuries later than that (König and van der Auwera 1994), meaning the two language families are of comparable age. Since diverging and developing along their own paths, how much similarity is there still between related languages within a language group? Given the cross-linguistic constraints on the categorisation of cutting and breaking events, we can expect meaning similarity to remain high, but is the extent of divergence the same across language families?

While the original Germanic data was collected with a different set of videoclips—the original Cut and Break Clips (Bohnemeyer et al. 2001)—there is an overlapping subset of 17 videoclips (see Table 2) covering all major dimensions discussed in Majid et al. (2008) and including scenes from all major clusters discussed in Majid et al. (2007). To ensure the comparison based on this reduced set reflects the results for the complete set of videoclips, we first compared the language-by-language similarities for these two sets of videoclips. If the correlations are high between the similarity matrices comprised of the overlapping videoclips and the full videoclips, then we can confidently go on to compare the Germanic and Japonic languages.

For the Japonic languages, we had already created videoclip-by-videoclip matrices and calculated Mantel correlations between them (see Section 3.3). We followed the same procedure for the subset of 17 videoclips that we have comparable naming data for Germanic and Japonic languages. We coded, per session, whether two videoclips were described with the same verb, and then summed all individual session matrices from each language. During this process, we found that the Okinawan data had gaps (5 out of the 17 videoclips were missing responses), so the Okinawan data was excluded from further comparison with Germanic. We followed the same procedure for the Germanic languages, creating one language-by-language similarity matrix based on the videoclip subset, as well as a similarity matrix based on the full set of 43 original core cutting and breaking videoclips (see Majid et al. 2008). In the end, we had two language-by-language similarity matrices per language family, i.e., two sets of pairwise similarities—one based on the 17 shared videoclips and one based on the respective full sets of videoclips (40 for Japonic, 43 for Germanic).

To test whether these pairwise similarities of the subset of shared videoclips reflects similarities of the full set of videoclips, we calculated for each language family Mantel correlations (R; ecodist package—Goslee and Urban 2007), using 10,000 permutations and 1,000 bootstrap iterations on 95% confidence intervals. For the Japonic languages, the Mantel correlation between full and subset of videoclip similarity matrices was r(15) = 0.86, p = 0.018; for the Germanic languages, the correlation was r(6) = 0.91, p = 0.040. The high positive correlations indicate that the smaller shared set of videoclips can be considered an adequate sample of the overall cutting and breaking domain, and thus suitable for comparing the two language families.

3.4.1 Semantic similarity through cross-linguistic constraints

We assessed the possibility of cross-linguistic constraints on semantic similarity by comparing across the two languages families. To do so, we used a similar approach as before and calculated the Mantel correlation between all language pairs for both language families for the 17 shared videoclips (in R; ecodist package—Goslee and Urban 2007), using 10,000 permutations and 1,000 bootstrap iterations on 95% confidence intervals.

To have greater confidence in our results given the smaller set of videoclips considered in these analyses, we used random resampling of videoclips (using an increasingly smaller number of stimulus items to N = 10). In addition, to investigate the extent to which variation between languages reflects individual variation, we also repeated the comparisons using the individual session matrices rather than summed language matrices.

Given the cross-linguistic constraints on semantic variation in the cutting and breaking domain, we expected positive correlations between Japonic and Germanic languages. Indeed, Mantel correlations between the Japonic and Germanic languages were all positive and statistically significant, ranging between r(136) = 0.27 and r(136) = 0.55, all p’s < 0.01. A one-sample t-test showed that the average correlation between Japonic and Germanic languages (Mr = 0.43, SDr = 0.08) was significantly larger than zero, t(35) = 16.31, p < 0.001, Cohen’s d = 5.13. When randomly resampling the data for 1,000 iterations at Nstimuli = 10, all resamples were significant at p < 0.05. Correlations between individual sessions (Mr = 0.21, SDr = 0.13) were also significantly higher than zero, t(121) = 17.53, p < 0.001, Cohen’s d = 1.59.

In addition to measuring semantic similarity, we also visualised the categorisation of the videoclips to further elucidate the patterns of variation. Figure 5 depicts the 17 videoclips with each rectangle representing a category based on the modal (i.e. most frequent) response. Intersecting rectangles with dashed lines represent cases in which there were two high frequency responses.[8] If there is no rectangle, there was no clear preference for any verb—either speakers gave varying responses or no response at all. The figure thus serves as a visualisation of the number and distribution of semantic categories in the cutting and breaking domain.

Figure 5: Semantic categories in the cutting and breaking domain across the 17 shared videoclips in Japonic and Germanic based on the modal response (i.e., most frequent). Intersecting categories (dashed lines) represent videoclips with two high frequency verbs. Missing rectangles indicate there was no dominant response.

Figure 5:

Semantic categories in the cutting and breaking domain across the 17 shared videoclips in Japonic and Germanic based on the modal response (i.e., most frequent). Intersecting categories (dashed lines) represent videoclips with two high frequency verbs. Missing rectangles indicate there was no dominant response.

Overall, the figure highlights the broader similarity in the semantic structure of the cutting and breaking domain across the two language families, which reflects the dimensions along which separation events are categorised (Majid et al. 2008). Tearing events are a distinct subgroup listed on the left. The remaining videoclips are roughly ordered along the dimensions of predictability of the locus or separation with high predictability towards the left and low predictability towards the right. Within the low predictability scenes, snapping events are a distinct subgroup on the right.

3.4.2 The role of lineage-specific developments

As well as illustrating similarities between languages, Figure 5 also illustrates how languages differ. First, the number of semantic categories (represented by rectangles) differ. For example, the Germanic languages distinguish some cutting events based on how an instrument is used. Second, the boundaries of categories vary. This is particularly notable around videoclips labelled “hack” and “chop”. Finally, the figure illustrates differences between language families. The number and distribution of categories have more variability across the Germanic languages than the close alignment of the Japonic languages. It thus seems that lineage-specific developments play a role even when universal cross-linguistic principles apply to a domain.

To investigate these lineage-specific patterns further, we calculated Mantel correlations between all Japonic and Germanic language pairs (see in Figure 6). First, we compared within-Japonic similarity versus within-Germanic similarity and found Mantel correlations ranged between r(136) = 0.81 and r(136) = 0.94 for the Japonic languages, and between r(136) = 0.49 and r(136) = 0.82 for the Germanic languages—all p’s < 0.01. The Japonic languages (Mr = 0.88, SDr = 0.05) were, on average, more similar to each other than Germanic languages were to other Germanic languages (Mr = 0.65, SDr = 0.14), t(5.67) = 4.11, p = 0.007, Cohen’s d = 2.65. Randomly resampling the data for 1,000 iterations at Nstimuli = 10 provided around 98% significant results at p < 0.05. At the individual session level, Japonic sessions (Mr = 0.51, SDr = 0.19) were also more similar to each other than Germanic sessions were to each other (Mr = 0.40 SDr = 0.17), t(185.6) = 6.28, p < 0.001, Cohen’s d = 0.60.

Figure 6: Plot of Mantel correlations for each language group, with mean values per group represented by the black circle, error bars represent two times the standard error. Correlations between language pairs are shown in the left pane and correlations between individual sessions in the right pane—grey dots represent each Mantel correlation value.

Figure 6:

Plot of Mantel correlations for each language group, with mean values per group represented by the black circle, error bars represent two times the standard error. Correlations between language pairs are shown in the left pane and correlations between individual sessions in the right pane—grey dots represent each Mantel correlation value.

Next, we compared within-family similarity as opposed to between-family similarity. If there are strong lineage-specific patterns, then languages should correlate more highly with their own language family than with another language family. On the other hand, as there are cross-linguistic constraints on the semantic variation in this domain, it could be the case that the amount of variation found within language families does not differ from that found between unrelated languages. Semantic similarity within language families (Mr = 0.79, SDr = 0.15) was found to be significantly higher than between language families (Mr = 0.43, SDr = 0.08), t(22.6) = 8.95, p < 0.001, Cohen’s d = 3.18. Separate tests showed that this was the case for variation within the Japonic language family versus between language families, t(22.7) = 19.44, p < 0.001, Cohen’s d = 6.23, and for within Germanic versus between language families, t(6.16) = 3.74, p = 0.009, Cohen’s d = 2.27. Randomly resampling the data for 1,000 iterations at Nstimuli = 10 provided around 94% significant results at p < 0.05. At the individual session level, correlations between sessions from the same language family (Mr = 0.49, SDr = 0.19) were also more similar to each other than sessions from different language families (Mr = 0.21, SDr = 0.13), t(222.7) = 20.09, p < 0.001, Cohen’s d = 1.53.

Taken together, the findings suggest that even though there are cross-linguistic constraints in semantic similarity across unrelated languages, there can still be differences between language families as a result of lineage-specific developments with some families showing more variability or a higher degree of similarity than others.

4 Discussion

We set out to investigate the semantic structure of the cutting and breaking domain in Japanese because previous research was unclear as to whether Japanese was an outlier in how it categorised cutting and breaking events (Majid et al. 2008). We found that the three most important dimensions reported in the cross-linguistic comparison (interpreted previously as ‘predictability of locus of separation’, ‘tearing’, and ‘snapping’) apply to Japanese as well. In addition to examining Standard Japanese, we also collected data in one highly divergent dialect area (Tohoku) and four Ryukyuan languages to investigate whether these languages categorise cutting and breaking events in similar ways; and found they do. There were minor differences in the order and weighting of the dimensions, but stimulus sampling undoubtedly played a role in this. Given the high correlations we found between languages, this reinforces a shared semantic space of cutting and breaking events in the Japonic language family. Overall, our findings suggest that Japanese and the Ryukyuan languages are not outliers from a cross-linguistic perspective, and further confirm that the semantic variability in the cutting and breaking domain is constrained.

That is not to say that there is no semantic variation across the Japonic languages for this domain. The tearing dimension seems to be particularly differentiated in Japanese and Ryukyuan, which was revealed because we sampled more diverse types of tearing events. Data from individual languages suggests that they differ in the number of distinctions made. While some languages appear to distinguish tearing events based on the object (thick vs. thin), others appear to distinguish based on the type of separation (clean vs. messy), or the functionality of the object after tearing (destructive vs. non-destructive tearing). This final distinction has also been highlighted by Fujii et al. (2013), in the context of ‘breaking’ events. Our data suggests the distinction may be of importance for ‘tearing’ too. Future research could systematically explore further points of comparison across additional languages as well—see e.g., rip versus tear (Fujii et al. 2013).

More broadly, as in Majid et al. (2008), our quantitative analyses of cross-linguistic data reveal a continuous space of cutting and breaking events—even though the individual languages classify such events (more or less) discretely through their respective set of cutting and breaking verbs. Majid et al. (2008) interpreted this continuous dimension as representing “predictability of the point of the separation” and it has been suggested this may be a conceptual universal (e.g., Slobin et al. 2014). These claims require independent evidence. It is unclear whether predictability, as such, is a singular semantic feature that languages distinguish or whether it emerges from a combination from factors, such as the use of sharp versus blunt instruments, the type and direction of force applied, properties of the object, etc. (see also the discussion in Majid et al. 2008: 242). Similarly, if “predictability of the point of separation” is a veritable conceptual universal, then it should be evident in a non-linguistic task across diverse groups. These are points for future investigation, hand-in-hand with in-depth studies of individual verbs across languages.

We compared our Japonic data with data from a previous study on the Germanic languages to investigate the amount of variation within and between language families. We found that correlations between Japonic and Germanic languages were positive, confirming cross-linguistic constraints on semantic variability in the cutting and breaking domain. If Japanese was indeed an outlier, we would have expected zero or even negative correlations between the Japonic and Germanic languages, but this was not the case. We also found that similarities within language families were larger than between language families, hinting at lineage-specific developments in addition to these broader cross-linguistic constraints. Finally, the Japonic languages were more similar to each other than the Germanic languages, showing that the rate and extent of lineage-specific developments can differ between language families. Some semantic domains in some languages—in this case, cutting and breaking across the Japonic languages—appear to be very stable.

While this study reveals that the Japonic languages have very similar semantic categories, it is not known whether semantic stability is a general feature of these languages or whether this is domain-specific. Close examination of other semantic domains is required to adjudicate. For body parts, for example, Standard Japanese and other mainland dialects do not distinguish between ‘foot’ and ‘leg’, while some varieties of Ryukyuan do (Hirayama 1992; Huisman et al. 2021). Similarly, the exact extensional ranges of the terms for ‘arm’ and ‘hand’ appear to differ between Japanese and Ryukyuan (Hirayama 1992; Huisman et al. 2021), and even within Japanese (Majid and van Staden 2015: 577). Systematic comparative work on semantic differences in other domains is needed to reveal the extent of such variation across the Japonic language family and how this compares to the variation we find in other language families.

The apparent stability of this domain across the Japonic languages brings us back to the question raised in the Introduction (Section 1) of how to sample languages for cross-linguistic comparisons of semantics. Previous research showed that closely related Germanic languages differ considerably in their semantic categories of cutting and breaking (Majid et al. 2007). In contrast, the results presented in this study show that there is remarkably little variation in the Japonic language family. Interestingly, even though semantic variation in the cutting and breaking domain occurs within cross-linguistic constraints, we found a general pattern that languages within a family are more similar to each other than they are to members of a different family. Having said that, we did find that English was more similar to all the Japonic varieties than it was to Swedish, showing that even lineage-specific patterns can be irregular, adding further complication for typological sampling. Additional comparisons within and between language families can give us further insight into which patterns of semantic variation are common across the world’s languages and at what scale they occur. A study comparing lexical form as opposed to syntactic features across Austronesian showed higher stability in the lexicon (Greenhill et al. 2017), indicating that different parts of language change at different rates, but it remains unclear how patterns of semantic variation compare.

There are also language-specific features to consider. It is likely that the semantic stability of the cutting and breaking verbs in Japanese and Ryukyuan is a result of the broader issue of how “semantic choices made in one subsystem affect those in others” (Evans 2010: 508). In addition to the use of V-V compounds pertinent to this domain, Hamano (1998) points out that semantic underspecification of verbs in Japanese can be compensated for through the use of mimetics (ideophones), which have high referential specificity (Akita 2012). For example, combining mimetics with the Standard Japanese verb oru ‘to snap’ allows for further specification of the characteristic of the snapping event, e.g., pokiri to oru ‘to snap’ versus pokiQ to oru ‘to snap suddenly’, or the object being snapped, e.g., pokiQ to oru ‘to snap smaller objects’ versus bokiQ to oru ‘to snap bigger objects’ (Yamaguchi 2003). Similar observations have been made in a study of human locomotion, where the number of verbs used by speakers of Japanese was lower than Dutch and English, because mimetics were used to further differentiate specific ways of moving (Malt et al. 2014). This provides new opportunities to further investigate semantic differences in mimetics used to describe separation events.

5 Conclusions

To conclude, the overall findings suggest neither Japanese nor any of the related Ryukyuan languages is an outlier in its verbal expression of the semantic domain of cutting and breaking, confirming that semantic variability in this domain is cross-linguistically constrained. In addition, a comparison between the Japonic and Germanic language families reveals that despite cross-linguistic constraints, lineage-specific semantic developments cause related languages to resemble each other more than unrelated languages. In addition, the rate and extent of such lineage-specific developments differ between language families. So, while there are cross-linguistic constraints in semantic systems, there is still much to learn about the forces leading to semantic diversity between language communities.


Corresponding author: John L. A. Huisman (ʒɔn hœy̯smɑn), Centre for Language Studies, Radboud University & International Max Planck Research School, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands, E-mail:

References

Akita, Kimi. 2012. Toward a frame-semantic definition of sound-symbolic words: A collocational analysis of Japanese mimetics. Cognitive Linguistics 23(1). 67–90. https://doi.org/10.1515/cog-2012-0003. Search in Google Scholar

Anderson, Mark. 2015. Substrate-influenced Japanese and code-switching. In Patrick Heinrich, Shinsho Miyara & Michinori Shimoji (eds.), Handbook of the Ryukyuan languages. History, structure, and use, 481–510. Berlin, Boston, Munich: De Gruyter Mouton. Search in Google Scholar

Aso, Reiko. 2015. Hateruma Yaeyama grammar. In Patrick Heinrich, Shinsho Miyara & Michinori Shimoji (eds.), Handbook of the Ryukyuan languages. History, structure, and use, 423–448. Berlin, Boston, Munich: De Gruyter Mouton. Search in Google Scholar

Baayen, Rolf H. 2008. Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press. Search in Google Scholar

Berlin, Brent & Paul Kay. 1969. Basic color terms: Their university and evolution. Berkeley: California University Press. Search in Google Scholar

Bohnemeyer, Jürgen, Melissa Bowerman & Penelope Brown. 2001. Cut and break clips. In Stephen C. Levinson & Nick J. Enfield (eds.), Manual for the field season 2001, 90–96. Nijmegen, The Netherlands: Max Planck Institute for Psycholinguistics. Search in Google Scholar

Bowerman, Melissa & Asifa Majid. 2003. Kids’ cut & break. In Field research manual 2003, part I: Multimodal interaction, space, event representation, 70–71. Nijmegen, The Netherlands: Max Planck Institute for Psycholinguistics. Search in Google Scholar

Dryer, Matthew S. 1989. Large linguistic areas and language sampling. Studies in Language 13(2). 257–292. https://doi.org/10.1075/sl.13.2.03dry. Search in Google Scholar

Eberhard, David M., Gary F. Simons & Charles D. Fennig. 2015. Ethnologue: Languages of the world. Dallas, Texas: SIL International. Search in Google Scholar

Evans, Nicholas. 2010. Semantic typology. In J. J. Song (ed.), The Oxford handbook of linguistic typology, 504–533. Oxford: Oxford University Press. Search in Google Scholar

Fujii, Seiko, Paula Radetzky & Eve Sweetser. 2013. A multi-frame analysis of separation verbs. In Mike Borkent, Barbara Dancygier & Jennifer Hinnell (eds.), Language and the creative mind, 137–154. Stanford: CSLI Publications. Search in Google Scholar

Gentner, Dedre & Melissa Bowerman. 2009. Why some spatial semantic categories are harder to learn than others: The typological prevalence hypothesis. In Jiansheng Guo, Elena Lieven, Nancy Budwig, Susan Ervin-Tripp, Keiko Nakamura & Şeyda Õzçalişkan (eds.), Crosslinguistic approaches to the psychology of language: Research in the tradition of Dan Isaac Slobin, 465–480. New York: Psychology Press. Search in Google Scholar

Goslee, Sarah C. & Dean L. Urban. 2007. The ecodist package for dissimilarity-based analysis of ecological data. Journal of Statistical Software 22(7). https://doi.org/10.18637/jss.v022.i07. Search in Google Scholar

Greenhill, Simon J., Chieh-Hsi Wu, Hua Xia, Michael Dunn, Stephen C. Levinson & Russell D. Gray. 2017. Evolutionary dynamics of language systems. Proceedings of the National Academy of Sciences 114(42). E8822–E8829. https://doi.org/10.1073/pnas.1700388114. Search in Google Scholar

Hamano, Shoko. 1998. The sound-symbolic system of Japanese. Stanford: CSLI Publications. Search in Google Scholar

Hattori, Shiro. 1976. Ryukyu hõgen to hondo hõgen [The Ryukyu dialects and the mainland dialects]. In Okinawagaku No Reimei: Ifa Fuyu Seitan Hyakunen Kinenshi [The Dawning of Okinawan Studies: A Volume Commemorating the Hundredth Anniversary of the Birth of Ifa Fuyu]. Tokyo: Okinawa Bunka Kyõkai. Search in Google Scholar

Heinrich, Patrick. 2009. The Ryukyuan languages in the 21st century global society. Human migration and the 21st century global society, 16–27. Okinawa: University of the Ryukyus. Search in Google Scholar

Heinrich, Patrick, Shinsho Miyara & Michinori Shimoji. 2015. Introduction: Ryukyuan languages and Ryukyuan linguistics. In Patrick Heinrich, Shinsho Miyara & Michinori Shimoji (eds.), Handbook of the Ryukyuan languages: History, structure, and use, 1–10. Berlin, Boston, Munich: De Gruyter Mouton. Search in Google Scholar

Hirayama, Teruo. 1992. Gendai Nihongo Hōgen Daijiten [Dictionary of Contemporary Japanese Dialects]. Tokyo, Japan: Meiji-Shoin. Search in Google Scholar

Hojo, Hiroshi. 1993. A new nonmetric multidimensional scaling method for sorting data. Japanese Psychological Research 35(3). 129–139. https://doi.org/10.4992/psycholres1954.35.129. Search in Google Scholar

Huisman, John L. A., Asifa Majid & Roeland van Hout. 2019. The geographical configuration of a language area influences linguistic diversity. PloS One 14(6). e0217363. https://doi.org/10.1371/journal.pone.0217363. Search in Google Scholar

Huisman, John L. A., Roeland van Hout & Asifa Majid. 2021. Patterns of semantic variation differ across body parts: Evidence from the Japonic languages. Cognitive Linguistics. https://doi.org/10.1515/cog-2020-0079. Search in Google Scholar

Kaetsu, M. 1979. Yaburu, yabuku, saku, kiru and waru. Nihongo Kenkyu 2. 57–61. Search in Google Scholar

Kageyama, Taro. 2016. Verb-compounding and verb-incorporation. In Taro Kageyama & Hideki Kishimoto (eds.), Handbook of Japanese lexicon and word formation, 273–310. Boston, Berlin: De Gruyter Mouton. Search in Google Scholar

Kay, Paul, Brent Berlin, Luisa Maffi, William R. Merrifield & Richard Cook. 2009. The world color survey. Stanford: CSLI Publications. Search in Google Scholar

Kita, Sotaro. 2006. A grammar of space in Japanese. In Stephen C. Levinson & David P. Wilkins (eds.), Grammars of space: Explorations in cognitive diversity, 437–474. Cambridge: Cambridge University Press. Search in Google Scholar

König, Ekkehard & Johan van der Auwera. 1994. The Germanic languages. London: Routledge. Search in Google Scholar

Koptjevskaja-Tamm, Maria (ed.). 2015. The linguistics of temperature. Amsterdam: John Benjamins. Search in Google Scholar

Koptjevskaja-Tamm, Maria, Ekaterina Rakhilina & Martine Vanhove. 2016. The semantics of lexical typology. In Nick Riemer (ed.), The Routledge handbook of semantics, 434–454. London, New York: Routledge. Search in Google Scholar

Kunihiro, Tetsuya. 1970. Imi no Shosou [Aspects of Meaning]. Tokyo, Japan: Sanseido. Search in Google Scholar

Lê, Sébastien, Julie Josse & François Husson. 2008. FactoMineR: An R package for multivariate analysis. Journal of Statistical Software 25(1). 1–18. https://doi.org/10.18637/jss.v025.i01. Search in Google Scholar

Lee, Sean & Toshikazu Hasegawa. 2011. Bayesian phylogenetic analysis supports an agricultural origin of Japonic languages. Proceedings of the Royal Society B: Biological Sciences 278(1725). 3662–3669. https://doi.org/10.1098/rspb.2011.0518. Search in Google Scholar

Levinson, Stephen C. & David P. Wilkins. 2006. Grammars of space: Explorations in cognitive diversity. Cambridge: Cambridge University Press. Search in Google Scholar

Majid, Asifa. 2011. A guide to stimulus-based elicitation for semantic categories. In Nicholas Thieberger (ed.), The Oxford handbook of linguistic fieldwork, 54–71. Oxford: Oxford University Press. Search in Google Scholar

Majid, Asifa, James S. Boster & Melissa Bowerman. 2008. The cross-linguistic categorization of everyday events: A study of cutting and breaking. Cognition 109(2). 235–250. https://doi.org/10.1016/j.cognition.2008.08.009. Search in Google Scholar

Majid, Asifa, Marianne Gullberg, Miriam van Staden & Melissa Bowerman. 2007. How similar are semantic categories in closely related languages? A comparison of cutting and breaking in four Germanic languages. Cognitive Linguistics 18(2). 179–194. https://doi.org/10.1515/cog.2007.007. Search in Google Scholar

Majid, Asifa, Fiona Jordan & Michael Dunn. 2015. Semantic systems in closely related languages. Language Sciences 49. 1–18. https://doi.org/10.1016/j.langsci.2014.11.002. Search in Google Scholar

Majid, Asifa, Seán G. Roberts, Ludy Cilissen, Karen Emmorey, Brenda Nicodemus, Lucinda O’Grady, Bencie Woll, Barbara LeLan, Hilário de Sousa, Brian L. Cansler, Shakila Shayan, Connie de Vos, Gunter Senft, Nick J. Enfield, Rogayah A. Razak, Sebastian Fedden, Sylvia Tufvesson, Mark Dingemanse, Ozge Ozturk, Penelope Brown, Clair Hill, Olivier Le Guen, Vincent Hirtzel, Rik van Gijn, Mark A. Sicoli & Stephen C. Levinson. 2018. Differential coding of perception in the world’s languages. Proceedings of the National Academy of Sciences 115(45). 11369–11376. https://doi.org/10.1073/pnas.1720419115. Search in Google Scholar

Majid, Asifa & Miriam van Staden. 2015. Can nomenclature for the body be explained by embodiment theories? Topics in Cognitive Science 7(4). 570–594. https://doi.org/10.1111/tops.12159. Search in Google Scholar

Malt, Barbara C., Eef Ameel, Mutsumi Imai, Silvia P. Gennari, Noburo Saji & Asifa Majid. 2014. Human locomotion in languages: Constraints on moving and meaning. Journal of Memory and Language 74. 107–123. https://doi.org/10.1016/j.jml.2013.08.003. Search in Google Scholar

Malt, Barbara C., Steven A. Sloman, Silvia P. Gennari, Meiyi Shi & Yuan Wang. 1999. Knowing versus naming: Similarity and the linguistic categorization of artifacts. Journal of Memory and Language 40(2). 230–262. https://doi.org/10.1006/jmla.1998.2593. Search in Google Scholar

Moseley, Christopher. 2010. Atlas of the world’s languages in danger. Paris, France: UNESCO Publishing. Search in Google Scholar

Nakama, Mitsunari. 2000. Miyako Nishihara Hōgen no Goi (8) [A Study of Vocabulary in \ Nishihara Dialect on Miyako Island (8)]. Ryūkyū no Hōgen 24. 113–130. Search in Google Scholar

Pellard, Thomas. 2011. The historical position of the Ryukyuan languages. In Historical linguistics in the Asia-Pacific region and the position of Japanese, 55–64. Osaka, Japan: National Museum of Ethnology. Search in Google Scholar

Pellard, Thomas. 2015. The linguistic archeology of the Ryukyu Islands. In Patrick Heinrich, Shinsho Miyara & Michinori Shimoji (eds.), Handbook of the Ryukyuan languages: History, structure, and use, 13–38. Berlin, Boston, Munich: De Gruyter Mouton. Search in Google Scholar

Perkins, Revere D. 1989. Statistical techniques for determining language sample size. Studies in Language 13(2). 293–315. https://doi.org/10.1075/sl.13.2.04per. Search in Google Scholar

R Core Team. 2018. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. Search in Google Scholar

Rijkhoff, Jan & Dik Bakker. 1998. Language sampling. Linguistic Typology 2(3). 263–314. https://doi.org/10.1515/lity.1998.2.3.263. Search in Google Scholar

Shibatani, Masayoshi. 1990. The languages of Japan. Cambridge: Cambridge University Press. Search in Google Scholar

Shindo, Mika. 2015. Subdomains of temperature concepts in Japanese. In Maria Koptjevskaja-Tamm (ed.), The linguistics of temperature, 639–665. Amsterdam: John Benjamins. Search in Google Scholar

Simpson, Edward H. 1949. Measurement of diversity. Nature 163(4148). 688. https://doi.org/10.1038/163688a0. Search in Google Scholar

Slobin, Dan I., Iraide Ibarretxe-Antuñano, Anetta Kopecka & Asifa Majid. 2014. Manners of human gait: A crosslinguistic event-naming study. Cognitive Linguistics 25(4). 701–741. https://doi.org/10.1515/cog-2014-0061. Search in Google Scholar

van Staden, Miriam & Asifa Majid. 2006. Body colouring task. Language Sciences 28(2–3). 158–161. https://doi.org/10.1016/j.langsci.2005.11.004. Search in Google Scholar

Takubo, Yukinori. 2018. Mutual intelligibility as a measure of linguistic distance and intergenerational transmission. Approaches to endangered languages in Japan and Northeast Asia: Description, documentation and revitalization. Presented at the Approaches to Endangered Languages in Japan and Northeast Asia: Description, Documentation and Revitalization conference. Tokyo, Japan: National Institute for Japanese Language and Linguistics. Search in Google Scholar

Yamaguchi, Nakami. 2003. Kurasi-no kotoba: giongitaigo jiten [Words for living: A dictionary of mimetics]. Tokyo: Koodansha. Search in Google Scholar

Received: 2019-11-13
Accepted: 2021-07-19
Published Online: 2021-10-13

© 2021 John L. A. Huisman et al., published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.