Towards a typology of middle voice systems

Guglielmo Inglese ORCID logo
From the journal Linguistic Typology


The middle voice is a notoriously controversial typological notion. Building on previous work (e.g. Kemmer, Suzanne. 1993. The middle voice. Amsterdam & Philadelphia: John Benjamins), in this paper I propose a new working definition of middle markers as inherently polyfunctional constructions which are partly associated with valency change in opposition to bivalent (or more) verbs and partly lexically obligatory with monovalent verbs. Based on this definition, the paper undertakes a systematic survey of 149 middle voice constructions in a sample of 129 middle-marking languages. Evidence from the sample shows that middle voice systems display a much richer variation in forms and functions than is reported in the literature. This richer empirical evidence challenges some of the mainstream views on middle marking, especially its purported connection with reflexivity and grooming-type events, and calls for an overall rethinking of the typology of the middle voice.

1 Introduction

Verbal voice is a core aspect of clausal morphosyntax, and for good reasons, as it deals with the complex interaction between lexical semantics, valency, and transitivity and their realization in verbal systems. Verbal voice has been variously defined in typology (e.g. Klaiman 1991; Kulikov 2010; Bahrt 2021). In this paper, I follow the definition of grammatical voice proposed by Zúñiga and Kittilä (2019: 4) as “a grammatical category whose values correspond to particular diatheses marked on the form of predicates”, with diathesis being “any specific mapping of semantic roles onto grammatical roles” (ibid.; on grammatical roles see Bickel 2010; Witzlack-Makarevich 2019).[1]

The existence of the middle as a distinct type of verbal voice has long been advocated. Ancient Indo-European languages such as Ancient Greek offer the textbook example of middle voice system (henceforth, MVS) (Zúñiga and Kittilä 2019: 169–171). Middle marking in Ancient Greek occurs both with verbs that alternate between middle and non-middle (or active) marking to indicate valency change, as in (1a), and with verbs that exclusively occur as middles, as in (1b). I refer to these as oppositional and non-oppositional middles, respectively (see Section 3.1).[2]

Ancient Greek (anci1242, Indo-European, Greek; Delbrück 1897: 417–425; Romagno 2010: 431–432)[3]
daí-ō ‘burn.act (tr.)’ versus daí- omai ‘burn.mid (intr.)’ anticausative
kópt-ō ‘hit.act’ versus kópt- omai ‘hit oneself.mid reflexive
dikáz-ō ‘judge.act’ versus dikáz- omai ‘be judged.mid passive
mai ‘sit’, kínu mai ‘move (intr.)’, skúz omai ‘be angry at’, mákh omai ‘fight’

Typological work has shown that systems akin to the ancient Indo-European middle are attested in several unrelated languages and their similarities are such that they can hardly be dismissed as coincidental (Geniušienė 1987; Klaiman 1991: Chap. 2; Kemmer 1993). Nevertheless, current typological accounts of the middle voice leave a number of issues unsettled. The middle consequently remains a highly elusive domain (Zúñiga and Kittilä 2019: 168–177), and some authors even deny its usefulness as a typological notion (Dixon and Aikhenvald 2000: 4).

The greatest obstacle in the typological study of the middle is that there is no agreement on what cross-linguistically counts as a middle marker (henceforth, MM). Scholars often operate with different definitions, thereby hampering reliable cross-linguistic comparisons. Moreover, existing comprehensive works on the middle (Geniušienė 1987; Kemmer 1993) are nowadays in part outdated in at least two respects. First, they draw on limited cross-linguistic samples. Second, the typological study of valency operations, including those commonly connected to the middle such as passive and reflexive, has witnessed significant advances. These facts call for a general reassessment of the middle voice in a typological perspective.

Tacking stock of these premises, this paper lays the foundations for a new typology of the middle voice. The two main goals are (i) to propose a more rigorous and cross-linguistically valid definition of MM and (ii) to offer a systematic description of the possible patterns of formal and functional variation of MMs across languages. To do so, I analyze MMs in a sample of 129 middle-marking languages. This more solid empirical basis, coupled with the more up-to-date typological knowledge of voice operations, will allow me to improve the existing typology of MVSs in several crucial respects. However, I will also point out a number of methodological shortcomings in the current typologies of the middle which, also due to limits of space, must remain unresolved for now. The primary aim of this paper is descriptive and synchronic. Diachronic considerations are not addressed here, even though, as I will argue, these are essential for a deeper understanding of MVSs.

The paper is structured as follows. In Section 2, I offer a critical review of previous research on the middle voice, with a particular focus on Kemmer (1993). In Section 3, I propose my own cross-linguistic definition of MM (Section 3.1), and I discuss how this definition is useful to keep MMs distinct from other potentially similar phenomena (Section 3.2). Section 4 presents the result of the analysis of MMs in the language sample. I first discuss the different morphosyntactic constructions that may instantiate MVSs (Section 4.1). In Section 4.2, I investigate the patterns of polyfunctionality of middle constructions, focusing first on oppositional (Section 4.2.1) and then on non-oppositional (Section 4.2.2) middles. Section 4.3 further elaborates on the relationship between oppositional and non-oppositional middles. Section 5 presents the conclusions of this work.

2 The middle voice: current research and open problems

One of the major difficulties in talking about the middle voice is that this term has been used in reference to a wide range of phenomena, to the effect that “middles […] represent a major terminological problem area” (Zúñiga and Kittilä 2019: 151).[4]

The history of the research on the middle has already been the topic of several publications and I will not address it further here (Manney 2000; Rousseau 2014; Gallardo and Nakamura 2014; Calude 2017; Zúñiga and Kittilä 2019: 171–175 inter alia). It suffices to mention that discussions on the middle voice have long been based on the Indo-European middle voice, such as that of Ancient Greek in (1). Influential definitions of the middle voice as the construction indicating that “the ‘action’ or ‘state’ affects the subject of the verb or his interests” (Lyons 1968: 373) are essentially rooted in the tradition of Indo-European linguistics.

2.1 The middle voice in typology: Kemmer (1993)

Early typological work on reflexives (Faltz 1977; Geniušienė 1987), reciprocals (Lichtenberk 1985), and verbal voice (Klaiman 1991), has brought to light the cross-linguistic complexity of the domain of voice and valency-related phenomena. Nevertheless, “none of these […] deal with the middle in sufficient detail to establish how the various phenomena that comprise it relate to each other” (Kemmer 1993: 3).

Kemmer (1993) offers the first systematic study of the middle voice from a cross-linguistic perspective, and to date still the most exhaustive one. Based on a sample of 32 languages, Kemmer observes that MMs are recurrently associated with a specific set of situation types, i.e. “semantic/pragmatic contexts” (ibid.: 7). Middle situation types can be exemplified by Reflexive -si in Italian, as in (2):[5]

Italian (ital1282, Indo-European, Italic)
rader si ‘shave’ grooming
alzar si ‘stand up’ change in body posture
girarsi si ‘turn (intr.)’ non-translational motion
spostar si ‘move (intr.)’ translational motion
arrabbiar si ‘get angry’ emotion
immaginar si ‘envisage’ cognition
scioglier si ‘melt’ spontaneous event
combatter si ‘fight’ reciprocal
colpir si ‘hit oneself’ reflexive
si va ‘one goes’ impersonal
si vende ‘is sold’ passive
si taglia (facilmente) ‘is (easily) cut’ facilitative

As can be seen from (2), in Kemmer’s approach MMs are described as being partly associated with valency changing operations such as passive and reflexive, and partly with specific lexical domains, that is, with semantic classes of verbs such as grooming, motion, cognition etc.

Generalizing over the distribution of MMs, Kemmer proposes that situation types that are typically middle marked share the common semantic property of low degree of elaboration of events, which most saliently concerns verbs of grooming, change in body posture and non-translational motion (Kemmer 1993: 53–56). Kemmer borrows from cognitive linguistics the representation of events in terms of schemas (Langacker 1987; Talmy 1976): prototypical transitive events are conceived as involving a transfer of energy between two fully distinguished participants, the Initiator and the Endpoint, whereas intransitive events feature one participant only. In Kemmer’s view, middle situation types differ from both prototypical transitive and intransitive events because they feature two participants, the Initiator and the Endpoint, which are however not fully physically and conceptually distinguishable from one another (see further Næss 2007: 22–24, 27–29).

Kemmer’s groundbreaking work set the agenda for the study of MVSs in the decades to follow. However, her account is not free of shortcomings. To begin with, Kemmer’s empirical basis is rather narrow, and in her sample entire language families in which MVSs exist are under- or not represented, e.g. Afroasiatic languages (Palmer 1995), while other are overrepresented (e.g. Atlantic-Congo and Indo-European languages).

More importantly, I see two methodological issues. In the first place, Kemmer does not provide a rigorous enough definition of MM. According to Kemmer (1993: 15), middle marking languages are those that feature MMs, which in turn are defined as “language-specific morphosyntactic marker[s] that appear in the expression of some cluster of distinct situation types that are hypothesized to be semantically related to one another and fall within the semantic category of middle voice.” A semantic approach to the definition of MMs is not problematic per se. Nevertheless, unfortunately no explicit and operationalizable characterization of such a semantic category is offered by Kemmer or later studies. In other words, the identification of MMs presupposes the identification of a “semantic category” of middle voice, but it is not clear how this semantics should be independently established in the first place. In practice, this means that we lack a set of explicit criteria based on which one can consistently identify MMs across languages.

The second methodological issue concerns the individuation of situation types. The basic insight is that situation types are distinguished if they can be co-expressed in some languages but receive distinct marking in others (Kemmer 1993: 4–6). German and English offer a case in point. While German sich is compatible with both sieht ‘see’ in (3a) and rasiert ‘shave’ in (3b) English himself is obligatory in (4a) but optional in (4b). This can be taken as evidence that reflexives based on typically bivalent verbs behave differently from those based on verbs that indicate typically self-directed situations, or, in other words, that direct reflexives and grooming verbs constitute two different situation types (Kemmer 1993: 53; see also Haiman 1983; König and Siemund 2000; Haspelmath 2008).

German (stan1295, Indo-European, Germanic)
a. Hans sieht sich im Spiegel
b. Hans rasiert sich
English (stan1293, Indo-European, Germanic)
a. Hans sees himself in the mirror
b. Hans shaves (himself)

This methodology is in principle valid, but its application to the entire middle domain is controversial at best. As Kemmer remarks “empirical evidence should be available to justify each separate domain” but unfortunately in many cases “the domains referred to are distinguished only on semantic grounds” (Kemmer 1993: 41). This clearly leads to a severe inconsistency as to which and how many situation types are individuated. I return to this issue in Sections 4.2.2 and 4.3.2.

3 The middle voice as a typological concept

In order to overcome some of the issues outlined in Section 2.1, in Section 3.1 I propose my own definition of MM, and I discuss its advantages in more detail in Section 3.2.

3.1 Defining middle markers

As a working definition, I propose that a MM be defined as a construction (that is, as a form – meaning pairing in the technical the sense of Construction Grammar) with the following characteristics:

  1. (i)

    it occurs with bivalent (or more) verbs to encode one or more of the following valency changing operations: passive, anticausative, reflexive, reciprocal, antipassive;

  2. (ii)

    the same construction is also obligatory with some (at least monovalent) verbs that cannot occur without MM;

  3. (iii)

    the semantics of (at least some of) the verbs in (i) does not match that of those in (ii) or vice versa.

Any verb that carries a MM is a middle verb and languages featuring MMs are middle-marking languages. Middle verbs conforming to (i) and (ii) are defined as oppositional and non-oppositional, respectively (these correspond to Klaiman 1991: 106 alternating and nonalternating).[6] I return to point (iii) in the definition in Section 3.2.

From this definition, it follows that MMs are inherently polyfunctional constructions. The combination of oppositional and non-oppositional verbs constitutes a MVS.[7] The term system highlights the fact that, rather than a single voice operation in the sense of Zúñiga and Kittilä (2019: 4), the middle voice should be conceived as a cluster of functions (thus explicitly Kulikov 2013: 265–266; Zúñiga and Kittilä 2019: 176).

This general definition is independent from language-specific criteria and is therefore suitable for cross-linguistic comparison (Haspelmath 2010). Compare the Ancient Greek Middle voice in (1) with the Parecís suffix -oa in (5):

Parecís (pare1272, Arawakan, Central Maipuran; Brandão 2014: 247–259)
holoka ‘boil (tr.)’ versus holok oa ‘boil (intr.)’
fehanatya ‘bless’ versus fehanaty oa ‘bless oneself’
tyaloka ‘bite’ versus tyalok oa ‘get bitten’
ez oa ‘fall’, haik oa ‘come back’, hawinits oa ‘breath’, hik oa ‘show up’

Both the Greek Middle inflection and Parecís -oa qualify as MMs: they occur with oppositional verbs with anticausative, reflexive and passive function, as in (1a) and (5a), and they also occur with non-oppositional verbs, as in (1b) and (5b).

3.2 What is a MM (and what is not)?

The definition proposed in Section 3.1 essentially expands upon existing definitions of the middle voice (e.g. Geniušienė 1987; Klaiman 1991; Kemmer 1993; Kulikov 2013), but crucially differs from these in that it offers more explicit criteria to draw a neat line between MMs proper and constructions that potentially resemble MMs but that do not satisfy all criteria. The middle voice can thus be rescued from its often-criticized ambiguity and re-established as a meaningful typological notion.

In particular, this definition avoids the need to postulate a specific ‘middle semantics’ or ‘middle function’, which is difficult to establish on independent grounds (Section 2.1; also Holvoet 2020: xv–xvi). MMs as defined in Section 3.1 can instead best be seen as a hybrid comparative concept (Croft 2016). On the one hand, the identification of oppositional middles relies on a functional component, that is, their association with valency change, which can be operationalized by referring to already existing comparative concepts (Section 4.2.1). On the other hand, the identification of non-oppositional middles is based on a straightforward distributional criterion, i.e. lack of an unmarked counterpart.

Valency change has already been proposed as a defining feature of MVSs by e.g. Geniušienė (1987) and Kulikov (2013). To this, however, non-oppositional middles must also be added (Kaufmann 2007: 1688; Kemmer 1993: 22; Klaiman 1991: 44). Taking only valency related functions as relevant in defining MMs (e.g. Givón 2001: 116; Bahrt 2021: 74–82) neglects the crucial fact that there exists a divide between those valency reducing markers that exclusively function as intransitivizers and those that show a systematic relationship with a more or less substantial class of verbs that take the same marking with no obvious valency-related function. If one wishes to find a typological usefulness in the notion of middle voice, I propose that it is precisely this interplay between the grammatical and the lexical component characterizing MMs that make them interesting to study in their own right, as distinct from (syncretic) intransitivizers. Future studies may further elucidate the relationship between MMs proper and syncretic intransitivizers, which is something I leave out of consideration here for reasons of space.

This is why in this paper I keep MMs distinct from polyfunctional intransitivizing markers (also fn. 9). A case in point is the Xavánte prefix si- in (6), which occurs in reflexive, reciprocal, and anticausative function. However, since there are no non-oppositional verbs that only occur with si-, I do not consider it as a MM proper.

Xavánte (xava1240, Nuclear-Macro-Je, Je; Machado Estevam 2011: 260, 262, 265)
Ãne, ma ajbö si -sõʔreptu
thus pft man refl-sav
‘The man saved himself.’
Ta -si- sãmri aba ni.
3.h.abs-refl-see coll indf
‘They saw each other’
Wahu te si -utõrĩ
year ptc hto refl-exhaust
‘The year is over’

Conversely, I likewise do not regard as MMs constructions that display a lexically restricted distribution and do not encode valency change. An example are constructions that have been described as semantic alignment or split intransitivity systems (Creissels 2007; Donohue 2008). A case in point is the use of personal pronouns in Yuki (Mithun 2008). Yuki features three sets of independent personal pronouns. Intransitive verbs show a split based on whether their S argument is marked with Set I or Set II pronouns, as in (7).

Yuki (yuki1244, Yuki-Wappo, Northern Yukian; Mithun 2008: 300–301)
a. ʔąp lis k’ąn laʔaktekb ‘I talked fast.’ set I
b. ʔi: hilyuʔ ‘I’m sick’ set II
ʔi: p’a:nessuyitwiˇck ‘I fell down’
ʔi: k’awtek ‘I yawned’

The distribution of Set I and Set II pronouns is sensitive to the semantics of S: the former essentially express Agents while the latter Patients (Mithun 2008: 301). Many of the verbs that take Set II pronouns semantically overlap with typically middle situations, including experiencer verbs, uncontrolled events, and bodily processes, as in (7b). However, alternation between Set I and Set II pronouns is not used to express valency change, which in Yuki is encoded by derivational suffixes (Balodis 2016: 279). This means that Set II pronouns do not count as MMs. Similar considerations hold for Intransitive Copy Pronouns in Chadic languages. These have been interpreted as a type of MM by Leger and Zoch (2011), but since they do not appear to relate to valency change, I do not count them as genuine MMs.

Finally, part (iii) of the definition in Section 3.1 captures the fact that oppositional and non-oppositional verbs are two (at least in part) autonomous components of MVSs, and is meant to exclude isolated lexicalizations of valency changing functions as potential non-oppositional middles. While (iii) may sound like a much too arbitrary criterion, in practice one can easily decide whether individual MMs conform to (iii) or not. Consider again the Parecís example in (5b), where at least hawinits oa ‘breathe’ is not obviously synchronically related to any of the valency functions of the suffix -oa in (5a), which is consequently classified as a MM.

A construction that fails to meet (iii) is the prefix bi- in Mekeo. The prefix encodes reciprocity with non-reciprocal verbs (Jones 1998: 374–380), as in (8), and also obligatorily occurs with a few inherently reciprocal verbs, e.g. bi -baini ‘fight’. Because the semantics of verbs formed with bi- in (8) entirely matches that of bi-only verbs, i.e. both express reciprocal situations, the prefix does not qualify as a MM as per point (iii) in the definition.[8]

Mekeo (meke1243, Austronesian, Oceanic; Jones 1998: 374–380)
ke- pi -isa
They look at one another.’

4 Towards a new typology of MVSs

For this paper, I have undertaken a survey of MMs in a variety sample of over 400 languages (Mattiola 2020). In this sample, I have found 105 languages with MMs which comply with the definition in Section 3.1. This number underestimates the actual distribution of MVSs, in the sense there are several languages that potentially feature MVSs but for which the available data is insufficient to make a final decision.[9] Further research on individual languages will likely result in the identification of more MMs. The variety sample has been expanded with a smaller convenience sample of 24 languages featuring MMs. The complete list of languages is given in Appendix I in Supplementary material.

In these 129 languages, I have identified 149 MMs. The reason for the mismatch is that languages may feature more than one MM. For example, Ho features two MMs, the Reflexive suffix -(e)n and the Middle suffix - (Pucilowski 2013). The suffix -(e)n expresses reflexive, as in (9a), reciprocal, and autocausative situations. By contrast, - chiefly encodes passives and anticausatives, as in (9b). Both suffixes also occur with non-oppositional verbs.[10]

Ho (hooo1248, Austroasiatic, Munda; Pucilowski 2013: 109, 112, 114)
iniʔ arsi-re=ʔ nel- en -tan-a
3sg.anim mirror-loc=3sg see-mm-ipfv-fin
‘S/he is looking at her/himself in the mirror.’
gotom ser- -tan-a
ghee melt-mm-ipfv-fin
‘The ghee is melting.’

The distribution of middle marking languages in the variety sample does not seem to show any geographical bias, as shown in Table 1 (Pearson’s Chi-squared test, p-value = 0.06; macroareas from Mattiola 2020).

Table 1:

Geographical distribution of MVSs.

Macroarea Non-middle marking languages Middle marking languages
Africa 58 20
Australia & New Guinea 48 16
Eurasia 26 17
North America 39 21
South America 52 17
Southeast Asia & Oceania 74 14

The 129 middle-marking languages come from 56 different language families plus 13 isolates. This means that my sample is more genetically varied than Kemmer’s (1993). Interestingly, MMs are often pervasive within individual families/groups (though this is by no means necessarily the case). Examples are e.g. Afro-Asiatic, Algonquian, Athabaskan, Bantu, Indo-European, Iroquoian, Oceanic, Salish, Sino-Tibetan, and Turkic languages. While this situation may occasionally result from language contact among genetically related languages (see Comrie 2006: 316 on Reflexive middles in Indo-European languages of Europe), in most cases comparative evidence points towards the common retention of a shared inherited trait (see e.g. Gandon 2018 on Turkic). This suggests that MMs are often historically old constructions (Kemmer 1993: 179). In what follows, I describe the formal and functional variation of MMs in the sample.

4.1 The morphosyntactic typology of middle marking

Different types of constructions may instantiate MMs as defined in Section 3.1. This formal variation essentially complies with existing morphosyntactic classifications of valency changing constructions (e.g. Haspelmath 1990; Zúñiga and Kittilä 2019), but has not previously been systematically documented for MMs. The classification of MMs is summarized in Table 2.

Table 2:

Morphosyntactic classification of MMs.

Synthetic MMs Analytic MMs
Specialized affixes 122 Inflected 7
 Suffix 80 Uninflected 1
 Prefix 38 Total 8
 Circumfix 2
 Infix 2
Cumulative affixes 15
Other strategies 4
Total 141

The major distinction is that between analytic versus synthetic constructions (as defined by Bickel and Nichols 2013a).[11] Analytic MMs, which stand out for their narrow distribution, can be distinguished into inflected and uninflected ones. Inflected analytic MMs are typically pronouns. This type can be exemplified by Reflexive clitic pronouns in Romance language such as Italian in (2) (Serianni 1989: 254–255). The only uninflected analytic MM is the Chitimacha (Isolate) preverb ʔapš, which mostly occurs in reflexive and reciprocal function, as in (10) (Hieber 2018: 24–25).

Chitimacha (chit1248, Isolate; Hieber 2018: 25)
hus nehe ʔapš kʼet-iʔi
3sg self mm kill(sg)-nf;sg
‘He killed himself.’

Most MMs are synthetic, that is, they constitute bound verbal morphology. This ties in with the earlier observation that MMs are often historically old. Synthetic MMs can further be distinguished into specialized versus cumulative MMs.

Specialized MMs are those that express middle-related functions only. Specialized MMs are predominantly dedicated affixes.[12] Arguments can be made for a more derivational-like status of these affixes in individual languages (e.g. Bybee 1985: 81–82; Haspelmath and Müller-Bardey 2004: 1139): their occurrence is often optional, they occur closer to the verbal root than TAM/polarity/person markers, and their use appears to be less systematic than inflectional morphology (but see also Say 2005 and Spencer 2014: 90–108 for discussion). Moreover, MMs can also add non-valency related meanings and may even be involved in other word-formation processes (Section 4.3.1). This is in line with the well-known fact that among grammatical categories of verbs, valence and voice are more likely to be expressed derivationally (see Bybee 1985: 29–32).

Based on their position, specialized MMs appear as suffixes, prefixes, infixes, and circumfixes, as exemplified in (11):

Suffix: Oromo (nucl1736, Afro-Asiatic, Cushitic; Fufa Teso 2009: 81)
gub- at -e
burn- mid
‘(The house) burns’
Prefix: Bwatoo (bwat1240, Austronesian, Oceanic; Bril 2005: 48)
le ve -hnyam
3pl mid-love
‘They love each other.’
Circumfix: Cavineña (cavi1250, Isolate; Guillaume 2008: 269)
Señora ka -peta- ti -wa espejo=ju.
lady mirror=loc
‘The lady looked at herself in the mirror.’
Infix: Tzeltal (cavi1250, Mayan, Core Mayan; Polian 2013: 55)
sut ‘turn (tr.)’ → su- j- t’ ‘turn (intr.).mid’

Cumulative MMs also co-express other grammatical categories such as TAM and agreement. The lower frequency of this class is unsurprising, and reflects a tendency for voice, TAM, and agreement to be expressed separately (Auderset 2015; Bickel and Nichols 2013b).

An example of cumulative MM is the inflectional Middle Voice of Ancient Greek in (1). This type is pervasive in ancient Indo-European languages. Consider the Hittite Middle inflection, illustrated in Table 3 (Hoffner and Melchert 2008: 180–184). In Hittite, verbal voice is an inflectional category, in the sense that each verb must be marked for voice, either Active or Middle. Together with voice, endings cumulatively express tense, mood and person/number agreement.[13]

Table 3:

Active and Middle inflection in Hittite.

Present indicative singular
Active Middle
ēp-/ap- ‘take’ iya- ‘march, go’
1sg ēp-mi iya-ḫḫa(ri)
2sg ēp-ši iya-ttati
3sg ēp-zi iya-tta(ri)

Another example of inflectional-like cumulative MM comes from Toba. Verbs in Toba may take two series of indexing prefixes, Series I or Series II, as shown in Table 4. The two series also express voice, with Series II indexes showing the range of functions of MMs.

Table 4:

Singular person prefixes in Toba (toba1269, Guaykuruan, Southern Gyaykuruan; adapted from Zurlo 2016: 288).

Series I Series II
1sg s- ñ-
2sg ʔaw- ʔan-
3sg i- n-

4.2 Functions of MMs

In this section, I discuss the semantics of MMs in the sample, starting with oppositional functions and then moving on to non-oppositional verbs. This is a significant difference with respect to approaches stemming from Kemmer (1993), where MMs are primarily described in terms of the situation types that they cover, without keeping the two groups of oppositional versus non-oppositional systematically distinct. As I argue, while oppositional middles can be successfully compared across-languages, the classification and comparison of non-oppositional middles still remains more elusive.

4.2.1 Oppositional middles

As per the definition in Section 3.1, with oppositional middle verbs, middle marking triggers a different reading in terms of valency change (or, occasionally, another semantic property) than that of the non-middle verb.[14]

Oppositional functions can be further distinguished into valency-related and non-valency-related functions. These classes differ in two respects. Valency-related oppositional functions all involve the encoding of valency change. They are also the best represented group, in the sense that, by definition, each MM expresses at least one such function. Non-valency-related functions are of a different nature: from a semantic perspective, they are a more diverse group, and their distribution is significantly narrower than valency-related functions. Valency-related oppositional middles

Valency-related oppositional MMs occur with an otherwise non-middle bivalent (or more) verb to indicate one or more of the following valency changing operations: passive, anticausative, reflexive, reciprocal, and antipassive (see already Geniušienė 1987 and Kemmer 1993).[15] There is a vast typological literature on these operations, and it has been repeatedly pointed out that each actually consists of a cluster of subtypes (Dixon and Aikhenvald 2000; Kulikov 2010; Zúñiga and Kittilä 2019; Bahrt 2021; see also Holvoet 2020 on Baltic oppositional middles). In this paper, I neglect such variation and operate with a coarse-grained classification, following the definitions proposed by Zúñiga and Kittilä (2019). The (clusters of) valency changing functions that I consider here are the following.

The passive cluster includes both canonical promotional passives as well as agentless passives (Zúñiga and Kittilä 2019: 84). The anticausative cluster groups together decausatives of the type ‘melt (intr.)’ and autocausatives of the type ‘turn (intr.)’ (Geniušienė 1987: 98–109). With reflexive, I refer to the direct reflexive type establishing coreference between A and P (Zúñiga and Kittilä 2019: 154), irrespective of the lexical semantics of the verb, thus subsuming both reflexive proper and grooming events, as in (3) and (4). Finally, reciprocal and antipassive group together the different semantic types of reciprocal situations (Majid et al. 2011) and the various possible configurations of antipassive constructions (Janic and Witzlack-Makarevich 2021; Vigus 2018), respectively.

Besides these functions, which I regard by definition as characteristic of MMs, I also consider three other functions that have often been discussed in connection with middle marking (Kemmer 1993). First, under the self-benefactive label, I group together indirect reflexives and autobenefactives. Both constructions essentially establish conference between an A and a non-P participant, which based on the verb’s valency may be either a Recipient/Addressee or a Beneficiary (Kulikov 2010: 391). Second, I keep distinct two types of constructions often associated with passives: the impersonal and the facilitative. impersonal is a type of passive that lacks promotion of P, has generic (human) agents, and may also apply to intransitive verbs (Blevins 2003; Neshcheret and Witzlack-Makarevich 2016; Zúñiga and Kittilä 2019: 85). facilitatives are a semantic subtype of agentless passives characterized by habitual/potential semantics, corresponding to the type the book sells well (Zúñiga and Kittilä 2019: 100–101).

Oppositional functions can be exemplified by the Laz MM -i- in (12) (for reasons of space, I only give examples of middle forms).

Laz (kart1248, Karvelian, Georgian-Zan; Lacroix 2012: 170–185)
Tabi baba-muši d- i -yl-u
of_course father-poss3sg pv-mm-kill-aor.i3sg
‘Of course, his father was killed.’
Nek’na ge- i- nk’ol-e-n
door pv-mm-close-ths-i3sg
‘The door closes.’
b- i -xazi-am
‘I prepare myself.’
Bee-pe-k muntxa el -i -purčin-am-an
child-pl-erg something pv-mm-whisper-ths-i3pl
‘The children whisper something to each other.’
Hentebe i -gur-am-t’es Amerik’a-s mm-learn-ths-impft.i3pl America-dat
‘They studied in America.’
Hemu-k oxoi i -k’od-um-s
demp-erg house mm-build-ths-i3sg
‘He builds a house for himself.’
Hac’ineri mč’ima do ixi-s mezare-ša mend- i -l-in-e-n-i?
contemporary rain and wind-dat tomb-all pv-mm-go-caus-ths-i3sg-int
‘Do people go to the tomb when it is raining and windy, as it is now?’
Ha porča va dol- i -kun-e-n
demd shirt neg pv-mm-put.on-ths-i3sg
‘This shirt is not wearable.’

Some interesting generalizations can be drawn regarding the occurrence of valency-related functions in the sample. First, not all functions are equally frequent. Table 5 shows the type frequency of each function in the sample (counted per number of MMs that instantiate a given function).

Table 5:

Oppositional valency-related functions of MMs.

Function MM
Anticausative 111
Reflexive 103
Passive 86
Reciprocal 68
Self-benefactive 40
Antipassive 38
Facilitative 17
Impersonal 12

The most striking result is that, while reflexives are indeed a very frequent oppositional function of MMs, some of the functions that Kemmer (1993) describes as marginal are instead conspicuously expressed by MMs in the world’s languages. This is especially the case for the anticausative (thus already Haspelmath 1995: 372), but also for the passive and antipassive. The impersonal and the facilitative constructions are the least frequently attested functions of MMs. However, as these are often described sub-types of passive constructions, data on these may simply be absent in grammars.[16]

Second, turning to the number of valency-related functions expressed by individual MMs, one frequently finds clusters of 2–4 functions, with polyfunctionality patterns involving fewer or more functions being rarer in the sample, as shown in Table 6 (for details on the possible combinations of functions see Appendix I in Supplementary material). This finding corroborates the idea that valency reducing markers are indeed usually polyfunctional, as opposed to valency increasing ones (Bahrt 2021: 161; Nichols et al. 2004: 175).

Table 6:

Number of valency-related oppositional functions of MM.

Valency-related functions Languages
1 9
2 44
3 38
4 35
5 16
6 4
7 1
8 2

Another important finding is that there exist a number of constraints on the polyfunctionality of MMs, in the sense that out of all the logically possible combinations of functions, only some are actually realized in the sample.[17] Such patterns can be described by means of a semantic map (cf. Georgakopoulos and Polis 2018), as shown in Figure 1.[18]

Figure 1: 
A semantic map of valency-related oppositional middle functions.
The graph has been computer-generated using a modified version of the algorithm in Regier et al. (2013). I thank Antonio Gelameris for assistance.

Figure 1:

A semantic map of valency-related oppositional middle functions.

The graph has been computer-generated using a modified version of the algorithm in Regier et al. (2013). I thank Antonio Gelameris for assistance.

Figure 1 represents the conceptual space of valency-related oppositional functions of MMs. Each node in the network corresponds to one valency-related function and a link is established between two nodes if they are co-expressed by at least one MM in the sample. The numbers on the edges represent the frequency of co-expression of pairs of functions (van der Auwera 2013: 156–157). As can be seen, some co-expression patterns are more frequent than others (on the absolute frequency of the nodes see Table 5), with the higher frequency detectable in the reflexive-reciprocal-anticausative-passive cluster.

The conceptual space in Figure 1 shows that some earlier claims on the polyfunctionality of MMs need to be revised. First, the structure of the conceptual space further supports the finding that the anticausative function plays a much more central role than described by Kemmer (1993), as this is the only the node to be directly connected with every other node in the network. Second, the conceptual space also contradicts earlier claims that reflexives must be synchronically connected to passives via anticausatives (e.g. Haspelmath 2003). The passive-reflexive polyfunctionality is for example widespread in Salish languages, as in Halkomelem (halk1245), where the suffix -m has both reflexive and passive function, while the anticausative alternation is encoded by transitivizing suffixes (Gerdts and Hukari 2006). Moreover, while in Haspelmath (2003) the facilitative/potential is taken as an intermediate step between the anticausative and the passive, this is not necessarily the case, as a direct link between anticausative and passive also exists.[19] The position of the antipassive is also of interest. Studies on antipassive constructions have often discussed their synchronic and diachronic connection with reciprocals and reflexives (Holvoet 2020: 65–67; Sansò 2017). Data from my sample also suggest a direct link between anticausative and antipassive: a case in point is the Cavineña suffix -tana (Guillaume 2008: 256–267).[20]

In connection with oppositional functions, Kemmer (1993) famously proposed that languages can be typologized into those that have one single marker for reflexive (and reciprocal) and other middle situations (one-form languages) and those that have two distinct markers (two-form languages). In two-form languages, it is always the case that the morphologically more complex marker is used for reflexives built on typically bivalent verbs, while the less complex marker applies to grooming and other middle situations (see Haiman 1983; König and Siemund 2000; Haspelmath 2008 for discussion). For example, the Laz MM -i- is mostly used with grooming actions, as in (12c), while the reflexive of highly transitive verbs is formed by means of the noun ti ‘head’, as in (13) (Lacroix 2012: 176–177).

Laz (Lacroix 2012: 177)
K’oči-k ti -muši il-om-s
man-erg head-poss3s kill-ths-i3s
‘The man kills himself.’

Kemmer’s two-way typology captures an important pattern, but it only offers a partial view. Such an investigation should be extended to all valency-related oppositional functions, so as to obtain a more general picture of how the competition with alternative valency changing constructions affects the configuration of language-specific MVSs. Such a large-scale investigation falls outside the scope of this paper. Non-valency-related oppositional functions

Oppositional middles may also express a wide range of meanings not immediately connected with valency change (similar effects have also been noted for valency-increasing morphology, see Aikhenvald 2011a). I limit myself to two main groups that stand out among non-valency-related functions: aspectual and low transitivity functions. The border between the two groups is admittedly blurred by some functions which share features of both (e.g. intensive): the distinction is kept here mostly for the sake of exposition.

In the first place, middle marking shows affinities with aspect (broadly understood as per Croft 2012). This is quantitatively the most conspicuous sub-class of non-valency-related functions in the sample. Still, it is not always clear whether individual MMs do in fact behave as aspectual markers proper or whether aspectual nuances are associated with valency-related operations. For example, in Iraqw the MM -t is also the marker of imperfectivity, because its occurrence alone triggers an imperfective reading (Mous and Qorro 2000: 167–169), as in (14).

Iraqw (iraq1241, Afro-Asiatic, Cushitic; Mous and Qorro 2000: 168)
faar ‘count’ → fadu t ‘be counting’

By contrast, in imperfective contexts the Kryz MM -aR- also conveys a habitual/deontic nuance, as in (15a). However, as argued by Authier (2012: 144–145), habituality is only contextually associated to passive -aR- in the imperfective, and the same reading is unavailable in perfective contexts, as in (15b).

Kryz (kryt1240, Nakh-Daghestanian, Lezgic; Authier 2012: 144)
ğul ambar.c-a va-ns an -e
corn barn-in pv-weigh.detr-prs
‘Corn is usually weighed in the barn.’
ğul ambar.c-a va-ns an -a xhi-yic
corn barn-in pv-weigh.detr-a be-aor.n
‘The corn has been weighed in the barn.’

Comparison between Iraqw -t- and Kryz -aR- suggests that each case should be judged on its own. Keeping this in mind, MMs show different associations with the aspectual domain. In some languages, MM show functions connected with imperfectivity/atelicity. For example, the middle prefix ve- in Bwatoo also occurs with unbounded events, as in (16):

Bwatoo (Austronesian, Oceanic; Bril 2005: 53)
tobewaa ‘run’ → ve -tobwaa ‘be running’

Imperfectivity is also among the functions of the MM -n in Otomi (otom1300; Palancar 2004, 2006), while in Athabaskan languages, the MM -d also occurs in iterative contexts (Rice 2000). Similarly to Kryz, middle marking occurs in habitual contexts in Kharia (khar1287; Peterson 2010: 166–167) and Siwai (siwa1245; Onishi 2000: 125).

MMs may also be associated with stativity/resultativity. Stativizing MMs have been reported for Tibeto-Burman languages such as Drung (LaPolla and Jiangling 2005) and Daai Chin (daai1236, So-Hartmann 2009: 56). Consider example (17) from Drung, where the MM -ɕɯ̌ in passive function also adds a stative component ‘be visible’ (instead of ‘be seen’):

Drung (drun1238, Sino-Tibetan; Nungish; LaPolla and Jiangling 2005: 2)
ɕàm àŋ-lě a-ɟàŋ- ɕɯ̌
sword 3sg-dat intr-see-mid
‘The sword is visible (to him).’

Even though the stative semantics often appears to be a by-product of the passive usage of MMs (Haspelmath 1990; Zúñiga and Kittilä 2019: 98–100), this is not necessarily the case. For example, the Drung verb guɑ̄ can either mean ‘put on’ or ‘wear’ but its middle counterpart guɑ̄-ɕɯ̌ only allows a stative meaning ‘wear’ (LaPolla and Jiangling 2005: 8). An association with stativity has also been noted for the Neuter verbs in -a in Masai (masa1300; Tucker and Mpaayei 1955: 134–139), for the Hamer-Banna MM -ɗ-/-aɗ- (hame1242; Petrollino 2016: 142–148), and for the Balinese MM ma- (bali1278; Udayana 2013: 96–98).

In other languages, middle marking is instead connected with telicity, specifically with the encoding of ingressive and/or change-of-state events. In Bella Coola, stative verbs may be turned into change-of-state verbs by adding the MM -m, as in (18a), and the same holds true for the cognate Halkomelem suffix -m (Gerdst and Jukari 2006). The so-called Akkadian N-stem performs a similar function, as shown in (18b) (Kouwenberg 2010: 297–298). An affinity with telicity has also been extensively discussed for the Reflexive middle in Romance languages, as is the case of Italian si (Ježek 2003: 161–162).

a. Bella Coola (Salishan; Beck 2000: 230)
tuin ‘be visible’ → tuin- m ‘come into sight’
b. Akkadian (Afro-Asiatic, Semitic; Kouwenberg 2010: 297)
bašûm ‘be present’ → na -bšûm ‘emerge’

In a few languages, MMs extend to the expression of tense-related values. In Ho, the suffix -oʔwa, resulting from the combination of the MM -oʔ- plus the finite suffix -a, is grammaticalizing into a future tense marker, as in (19) (Pucilowski 2013: 133–135). In Kharia, the Middle Inflection is also associated with remote past/future events (Peterson 2011: 166–169).

Ho (Pucilowski 2013: 133)
joka=ñ jom-pe:ʔ-le:-n- -wa
little=1sg eat-strong-ant.fut-itr-mid-fin
‘(After I eat them,) I will become a little stronger.’

The second group of non-valency-related functions can broadly be characterized as lowering the verbs’ semantic transitivity as defined by Hopper and Thompson (1980). First, MMs can express diminished agentivity/volitionality of the Agent (this use relates to involuntary agent constructions in fn. 16, but its effect on valency is less obvious; see Fauconnier 2011). In Kharia, middle marking may indicate events that take place unexpectedly or accidentally, as in (20):

Kharia (Austroasiatic, Munda; Peterson 2011: 170)
huɽmuɽay act ‘bump into someone on purpose (intr.)’ → mid ‘unexpectedly bump into someone (intr.)’

Non-volitionality is also expressed by the Caddo suffix -ʔu (cadd1256; Melnar 2004: 184–185) and by the MM -d- in ‘errative’ function in several Athabaskan languages (see Rice 2000: 189–190), as in (24) below. In other languages, involuntary semantics only coexists alongside reflexivity, as in the case of Kiowa -kɔ̂ ‘cut oneself, get cut’ (kiow1266; Watkins 1984: 140–141).

Low transitivity also concerns events portrayed as not being thoroughly/successfully completed, as is the case of Kharia in (21a). Similarly, the Bwatoo prefix ve- may indicate attempted actions, as in (21b).

a. Kharia (Peterson 2011: 170)
lebui act ‘love (strongly)’ → mid ‘love (somewhat)’
b. Bwatoo (Brill 2005: 53)
tataee ‘surpass’ → ve -tataee ‘try to surpass’

Another function, which is partly connected with aspect and partly with agentivity, is the ‘intensive’ use of MMs. MMs with intensive function depict events involving higher participants’ effort, involvement, and/or affectedness than normal (Mattiola 2019: 36). Intensive middles are found in Otomi (Palancar 2004: 66–67), as in (22). Intensive MMs have also been described in New Caledonian languages (Bril 2005: 33), Akkadian (Kouwenberg 2010: 276), Semelai (seme1247; Kruspe 2004: 121), and Yauyos Quechua (yauy1235; Shimelman 2017: 219).

Otomi (Otomanguean, Western Otomanguean; Palancar 2004: 66)
tsithe ‘drink’ → n- tsithe ‘drink to get drunk’

Other marginally attested functions include sociative, assistive, and comitative functions in e.g. New Caledonian languages (Bril 2005: 33), Worrorra (worr1237; Clendon 2014: 411–415), and Tuvinian (tuvi1240; Kuular 2007: 1202–1212), distributive function in Mussau-Emira (muss1248; Brownie and Brownie 2007: 107), and reversative function in Fwe (fwee1238; Gunnik 2018: 234–239).

Valency-related and non-valency-related oppositional functions show a neat distribution in the sample: all middle-marking languages feature valency-related functions (as follows from the definition), whereas only 54 MMs also display non-valency-related functions. In general, non-valency-related functions represent a heterogeneous group, and many occur in a handful of languages only. Nevertheless, their distribution is not entirely random, and individual non-valency-related functions tend to co-occur with valency-related ones. First, as already remarked, MMs with stative/resultative meaning typically also have passive function, attesting to the well-known connection between these two domains (Zúñiga and Kittilä 2019: 98–100). Second, MMs associated with generic/habitual aspect typically function as antipassives as well (see Sansò 2017). In addition, MMs that indicate reduced control/involuntary actions are also used in anticausative function (see Fauconnier 2011), while those MMs that express comitative/sociative/assistive meanings have reciprocity among their valency-related functions (see Lichtenberk 2000). Further research is needed to fully explore the synchronic and historical connections between valency-related and non-valency-related functions of MMs.

Turning to the polyfunctionality of individual MMs, the general trend is for MMs to have several valency-related functions (see Table 6) vis-à-vis fewer or no non-valency-related functions, both in terms of number of functions and in terms of verb types that instantiate them.[21] However, there are languages in which non-valency-related functions appear to be at least as widespread as valency-related ones, if not more. This is the case for several Athabaskan languages, which feature a middle prefix -d- (Rice 2000: 178–199). This prefix has a variety of functions (the precise inventory of which varies from language to language). Valency-related functions include passive, anticausative, reflexive, and reciprocal. In addition, -d- is associated with a wide range of non-valency-related oppositional functions, which include (the list is non-exhaustive) the iterative, the ‘errative’, and the perambulative functions, as shown in (23) to (25). The peculiarity of Athabaskan is that in individual languages non-valency-related functions of -d- taken together often outnumber valency-related ones.

Slave (slav1253, Eyak-Athabaskan, Athabaskan; Rice 2000: 189)
’a-ra-ne- d -le
‘You do it again’
Ahtena (ahte1237, Eyak-Athabaskan, Athabaskan; Rice 2000: 189)
kah- n -es-daa
‘S/he accidentally spoke it’
Koyukon (koyu1237, Eyak-Athabaskan, Athabaskan; Rice 2000: 190)
kk’o-ts’ee- d e-daaA
‘We are travelling around’

4.2.2 Non-oppositional middles

As discussed in Section 3.1, MVSs by definition include a group of (often monovalent) verbs that exclusively take middle marking, i.e. non-oppositional middles. Existing studies essentially follow Kemmer (1993) and describe non-oppositional middles in terms of the situation types that they express.

Unfortunately, the cross-linguistic study of non-oppositional middles faces some major challenges. As Haspelmath puts it, when it comes to non-oppositional verbs “their occurrence is strongly lexically determined [and] cross-linguistic comparison is not easy” (Haspelmath 2003: 224). This is also an empirical difficulty: non-oppositional verbs are usually only briefly mentioned, if at all, in grammars, and this information is usually not systematically stored in dictionaries either. This seriously hampers their correct individuation (Kemmer 1993: 22). The goal of this section is not to provide a systematic new classification of non-oppositional middles, but rather to explicitly problematize these issues in more detail.

To better illustrate these challenges, I apply the traditional classification into situation types to a subset of MMs in my language sample and then discuss its shortcomings. This smaller sample consists of 29 languages for which data on non-oppositional middles is available (see Appendix II in Supplementary material; even though this sample is smaller than Kemmer’s, it is more genetically varied). Even within this smaller sample, the available data is unbalanced, ranging from the nine non-oppositional middles documented in Warungu (waru1263; Tsunoda 2006) to the 210 of Somali (soma1255; Saeed 1995).[22]

I have classified non-oppositional middles in these languages following Kemmer’s checklist for middle situation types (1993: 267–270) with a few additions. Situation types together with examples of the typical verb meanings that belong to each class are given in Appendix II in Supplementary material. The results of the classification are given in Table 7, which gives the frequency of the situation types across languages, and Table 8, which reports how many verb types belong to each situation type in the sample.[23] This is a novelty as compared to Kemmer’s work, where data on the frequency of individual situation types is absent.

Table 7:

Non-oppositional verbs: situation type frequency.

Situation type Languages
Spontaneous events 27
Translational motion 27
Emotion 23
Body processes 22
Non-translational motion 19
Natural reciprocal 19
Controlled actions 18
Change in body posture 17
Grooming 16
Indirect middle 16
Speech 16
Deponent 14
Body actions 12
Cognition 11
Emotive speech 11
Perception 10
Position 10
State [-animate] 10
State [+animate] 9
Emission 8
Chaining 7
Weather 6
Body state 5
Self-directed actions 5
Modality 2

Table 8:

Non-oppositional verbs: verb frequency.

Situation type Token frequency
Spontaneous events 212
Natural reciprocal 166
Translational motion 150
Emotion 124
Body processes 100
Grooming 71
Indirect middle 59
State [-animate] 52
Change in body posture 51
Controlled actions 50
Cognition 46
Body actions 38
Non-translational motion 37
Emission 32
Deponent 29
Speech 28
Body state 21
State [+animate] 20
Emotive speech 20
Perception 18
Weather 18
Position 16
Chaining 14
Self-directed actions 11
Modality 3

Table 7 shows that not all situation types are equally represented in the sample, and this data is roughly matched by the type frequency of verbs belonging to each situation type in Table 8. A striking result is that spontaneous events, which are a marginal class in Kemmer’s typology, and verbs of translational motion, rank highest both in terms of number of languages and of verbs. By contrast, verbs of grooming and non-translational motion, which are described in Kemmer (1993: 53–56) as constituting the semantic core of the middle domain, are in general less predominant. Another interesting finding is that in several languages, MMs also occur with deponents, that is, highly transitive bivalent verbs like ‘break’ (Grestenberger 2016). This is surprising considering Kemmer’s characterization of middle verbs as semantically distinct from prototypical two-place events. Note that if a language has deponents, it also has monovalent non-oppositional middles.

This being said, the situation-type approach poses at least one major methodological issue. Specifically, is not clear how the inventory of situation types should be established nor how many and how fine-grained the classes should be. This issue pertains to both the level of language-specific description and cross-linguistic comparison.

As anticipated in Section 2.1, the general idea is that the individuation of situation types follows a distributional criterion: two situation types count as distinct if there is at least one language that express them differently. In practice, however, in Kemmer (1993) and especially in most of the subsequent scholarship, the grouping of verbs is largely done impressionistically based on semantic similarity. A rigorous application of the distributional methodology shows that current classifications, like the one in Table 7, are highly problematic. For example, there is no language in the sample in which non-oppositional middles include spontaneous events but not translational motion, or vice versa. Therefore, it is not clear whether establishing a distinction between spontaneous and translational motion events is of any use for cross-linguistic comparison.

Linked to this issue is also the question of the granularity of situation types. Again, preliminary results from the sample illustrate this point nicely. A closer observation of semantically similar classes of verbs reveals the existence of a few implicational scales, as in (26):

Implicational scales in semantic classes of non-oppositional verbs
a. translational motion (‘go’) > non-translational motion (‘turn’), change in body posture (‘kneel’), position (‘stand’)
b. emotion (‘love’) > cognition (‘know’)
c. spontaneous events ‘melt’ > state [+/- animate] (‘live’, ‘be flat’), weather (‘thunder’), body process (‘vomit’), emission (‘glitter’)
d. body process (‘vomit’) > body state (‘be hungry’)
e. grooming (‘dress oneself’) > self-directed (‘hit oneself’)
f. natural reciprocal (‘fight’) > chaining (‘follow each other’)

The scales in (26) can be read as follows. If a language has a MM associated with a class of non-oppositional middles on the right, the same MM is also found with the class(es) on the left. As an example, this means that languages may only feature non-oppositional translational motion middles (e.g. Tuvinian), but there will be no language that has change-in-body-posture middles but not translational motion middles. The existence of such scales cast doubts on the need for finer grained classifications. Given that e.g. body posture middles always co-exist alongside translational motion middles, what is the advantage of setting up two distinct classes for the purpose of cross-linguistic comparison? In addition, even if one were to successfully individuate discrete situation types, the problem remains as to how to consistently assign individual verbs to specific classes (Saeed 1995; also Inglese 2020: 16–17).

Overall, the survey of non-oppositional middles in the sample shows how even the traditional classification into situation types, if conducted over an empirically richer basis, may lead to clearer results as compared to earlier works. This calls for a more systematic large-scale testing. However, I have also pointed out the need for better categorization of non-oppositional middles, which is beyond the scope of the paper. Given the combination of the empirical and the methodological issues, the exhaustive discussion of non-oppositional verbs must be postponed to future study.

4.3 The relationship between oppositional and non-oppositional middles

In Section 4.2, I have examined the meaning of MMs by keeping oppositional and non-oppositional verbs distinct. In this section, I elaborate on the relationship between the two classes and their contribution to the overall shape of MVSs.

4.3.1 The distribution of oppositional and non-oppositional middles

A great deal of variation can be observed in the actual make up of MVSs with respect to the ratio of oppositional and non-oppositional middles. Unfortunately, the necessary evidence on type and/or token frequency of middle verbs belonging to the two classes is not often reported in existing descriptions of MMs. Therefore, until detailed cross-linguistic corpus data is available, any classification of MVSs based on the ratio of oppositional/non-oppositional verbs must remain tentative.

Nevertheless, even the limited available evidence suggests that the ratio of oppositional and non-oppositional middles forms a continuum along which MVSs can be variously placed, as shown in Figure 2.

Figure 2: 
Ratio of oppositional versus non-oppositional verbs.

Figure 2:

Ratio of oppositional versus non-oppositional verbs.

At one end of the continuum, one finds languages that feature a MM that is fully productive for oppositional functions but occurs with only a few non-oppositional verbs. Wambaya belongs to this type. Wambaya inflected verbs consist of independent forms carrying the verb’s lexical meaning and TAM auxiliaries which also host subject and object pronouns. Reflexive and reciprocal verbs are built by replacing the object pronoun of transitive verbs with a bound form -ngg- in object position, as in (27) (Nordlinger 1998: 142–143, 193–194).

Wambaya (nucl1328, Mirndi, Ngurlun; Nordlinger 1998: 142)
Janji gini- ngg -a Wagardbi
dog.I(nom) 3sg.m.a-mm-nf wash
‘The dog is washing himself.’

Verbs that take -ngg- are typically oppositional, except for three non-oppositional verbs: gurda ‘be sick’, jagina ‘lie’, barnamuluma ‘flash lightning’ (Nordlinger 1998: 185).

The opposite pole of the continuum features languages in which non-oppositional middles display a higher type frequency than oppositional ones. This is the case of Thulung (thul1246), where one finds 68 non-oppositional middles carrying the MM -si- as opposed to only 15 clearly oppositional ones (Lahaussois 2016). Wambaya and Thulung thus represent two ideal endpoints of the continuum in Figure 2. Languages that cluster towards the Wambaya pole are for example Warungu, where the ratio of oppositional/non-oppositional verbs of the MM -gali is 24/1 (Tsunoda 2006). MMs of the Thulung-type are for example Iraqw -t-, which is found with 12 oppositional verbs versus 90 non-oppositional ones (Mous and Qorro 2000), and the Old Hittite Middle Inflection with a 7/28 ratio (Inglese 2020: 220). One also finds several intermediate types with a more balanced ratio. MMs of this type are Mizo (lush1249) in-, with a 58/44 oppositional/non-oppositional ratio (Chhangte 1993), and Seereer (sere1260) -u-/-oox-, with a 32/29 ratio (Faye and Mous 2006).

The existence of languages in which non-oppositional middles are clearly predominant counters Klaiman’s (1991: 105) claim that in middle marking languages oppositional middles outnumber non-oppositional ones. Moreover, it also challenges the idea that non-oppositional middles must be thought of as idiosyncratic lexicalizations of oppositional ones (thus e.g. Haspelmath and Müller-Bardey 2004: 1139). As a matter of fact, MMs may also be productively used as verbalizers to form denominal/deadjectival non-oppositional verbs, as in (28) (this is typical of valency-changing morphology in general, Aikhenvald 2011b: 244–245):

a. South East Huastec (huas1242, Mayan, Huastecan Mayan; Kondik 2011: 129)
akal ‘night’ → akl- an ‘get dark’
b. Hamer-Banna (Afro-Asiatic, Omotic; Petrollino 2016: 150)
líkka ‘small’ → likk- im - ‘become small’
c. Seereer (Niger-Congo, Atlantic-Congo; Faye and Mous 2006: 99)
o tag ‘courtesan’ → dag- oox ‘behave as a courtesan’

Finally, there are languages in which middle-like marking is unproductive in all functions. Consider the Trumai prefix wa-, which occurs with a very limited number of verbs overall (Guirardello 1999: 359–365). With one verb, it functions as an oppositional autocausative, as in (29a), while with other verbs it optionally occurs with unpredictable semantic effects, as in (29b). Moreover, it also occurs with non-oppositional verbs, in (29c).

Trumai (trum1247, Isolate; Guirardello 1999: 359–365)
a. kot̹’kan ‘bring together’ versus wa -kot̹’kan ‘come together’
b. chï ‘go’ versus wa -chï ‘go away’
c. wa -pata ‘arrive’

This narrow ranges of usages can be best explained historically if one considers that Trumai wa- most likely represents a case of a dying MM (this is why I have not included it in my sample). As Guirardello (1999: 356) puts it “the distribution of the prefix wa- observed in the modern Trumai data is probably just a remnant of a middle voice marker.”

The case of Trumai is particularly instructive in that it shows how historical considerations may contribute to explaining why MVSs are drawn more towards either the oppositional or the non-oppositional pole. Specifically, the different observed synchronic patterns may in fact reflect different types or different stages in the development of MVSs. More diachronic research is needed to explore this possibility.

4.3.2 MVSs: is a unified account possible?

An adequate account of MMs within and across languages needs to take into consideration both oppositional and non-oppositional middles. This is not an easy task, because synchronically the motivations for the occurrence of MMs with the two classes are of a different nature.

Oppositional middles are essentially compositional. With these verbs, MMs carry a meaning of their own, which can be transparently combined with verbs’ meaning with different effects on the verb’s valency and/or semantics (Section 4.2.1). By contrast, non-oppositional middles “undoubtedly represent[s] a lexical phenomenon” (Say 2005: 255), because, in the absence of non-middle counterpart, they “cannot be segmented into two meaning components, and the root and the valency-changing morphology are lexicalized together” (Haspelmath and Müller-Bardey 2004: 1139). This means one can only generalize over the lexical semantics of the non-oppositional verbs, and the MM can hardly be attributed any specific meaning per se.

To put it simply, oppositional verbs belong to the domain of grammar, while non-oppositional verbs are located towards the lexicon.[24] The main problem is that neither of the two components of MVSs can be entirely explained by resorting to the semantics of the other (see Section 3.2). It is true that there is a robust semantic match between some oppositional functions and non-oppositional verbs. For example, non-oppositional self-directed and grooming situations, spontaneous events and natural reciprocals are semantically close to oppositional reflexives, anticausatives, and reciprocals, respectively. In these cases, one could simply postulate a single semantics, e.g. reciprocal, which is lexically stored in non-oppositional verbs but is contextually triggered with oppositional verbs (thus e.g. Kemmer 1993: 102). However, this type of reasoning cannot be applied to the entire middle domain. Indeed, both groups also feature classes for which no such match can be readily identified. For example, oppositional passives are not obviously linked to any non-oppositional group, and conversely non-oppositional verbs of body processes and experiencer situations in general are not linked to any specific valency operation. The issue is how to reconcile the two components in a holistic account of MVSs.

Studies on individual languages often explain MVSs by resorting to an overarching prototypical middle semantics to which all middle verbs are more or less directly linked. The middle prototype has variously been cast in terms of higher subject involvement and/or subject affectedness (see e.g. Manney 2000 on Modern Greek; Maldonado 2000 on Spanish; Allan 2003 on Ancient Greek; Calude 2017 on Romanian) or noncanonical control (see Kaufmann 2007). Unfortunately, notions that may explain the distribution of MMs in one language may not as easily apply to others. For example, while subject involvement may be a good proxy for the middle prototype in Ancient Greek (Allan 2003), this does not hold for (Old) Hittite, where most middle verbs denote uncontrolled events (Inglese 2020).

Kemmer (1993) also essentially adopts a prototype model, whereby MMs are prototypically connected with low degree of elaboration of events (see Section 2.1). A prototype account of MVSs is not in principle inconceivable, but I believe that the notion of elaboration of event as the semantic core of middle marking should be reconsidered.

To begin with, this notion falls short in accurately predicting the cross-linguistic occurrence of MMs. In fact, a privileged connection between MMs and reflexive/grooming and non-translational motion situations is not borne out by the data discussed in Section 4.2. On the contrary, it appears that cross-linguistically MMs are most conspicuously associated with anticausative/spontaneous events and with verbs of translational motion: these are monovalent verbs with only one participant, to which the notion of lower degree of elaboration arguably does not apply. The existence of non-valency-related oppositional functions also poses a challenge to Kemmer’s middle prototype: while some low transitivity functions may be reconciled with lower elaboration of events, the same does not hold for e.g. telicizing aspectual functions.

In addition, even if Kemmer’s prototype were descriptively accurate, the problem remains as to whether it can be taken as an adequate explanation for middle marking in general (see discussion in Næss 2007: 16–17; Inglese 2020: 192–196).

A promising alternative solution is offered by semantic maps. Indeed, the semantic map model is particularly well-suited to represent the polyfunctionality of MMs, as it avoids the need to set up a contrast between oppositional and non-oppositional middles and an overarching motivation for both, since “any type of meaning can be integrated in semantic maps” (Georgakopoulos and Polis 2018: 19). However, existing maps of the middle voice are not up to the task. Specifically, Kemmer’s (1993) semantic map of the middle, shown in Figure 3, despite still being adopted in the description of language-specific MVSs (Peterson 2011: 186 on Kharia; Dom et al. 2016 on Bantu languages), is not built according to the standard practice in the field and should therefore be employed with care.

Figure 3: 
The semantic map of the middle voice (adapted from Kemmer 1993: 202).

Figure 3:

The semantic map of the middle voice (adapted from Kemmer 1993: 202).

For reasons of space, an in-depth critique of Kemmer’s map cannot be pursued here. The main problem concerns how nodes are identified and linked. In current typological practice, nodes are placed in semantic maps if they constitute analytical primitives (Cysouw 2007, 2010), that is, “if [a node] cannot be subdivided into two (or more) meanings that are expressed by separate linguistic items in a given language” (Georgakopoulos and Polis 2018: 4). However, as discussed in Section 4.2.2, situation types and links among these in Kemmer’s map are mostly identified based on hypothesized semantic connections, and do not necessarily reflect patterns of cross-linguistic coding. As a result, the map does not necessarily comply with the semantic map connectivity hypothesis (cf. Croft 2003: 133).

The main problem in successfully building a conceptual space of the middle domain is that, while oppositional functions can be easily identified and arranged in a network (Section, it is not clear how non-oppositional verbs should be integrated as nodes in the map (Section 4.2.2).

5 Conclusions

In this paper, I have set the stage for a new typology of the middle voice. First, I have argued that a more rigorous and explicit definition of middle marker (MM) is the essential perquisite for cross-linguistic investigations of this domain. I have proposed to define MMs as constructions occurring with oppositional and non-oppositional verbs. Once properly defined, the middle voice can still be employed as a useful notion in typological studies.

Data from a 129-language sample shows that MVSs display a much more varied picture than previously thought. In particular, I have discussed four main parameters along which the variation of MMs can be classified: (i) morphosyntactic type; (ii) number of oppositional functions; (iii) semantics of non-oppositional verbs; (iv) ratio of oppositional/non-oppositional middles. Some correlations between these parameters can be detected (e.g. MMs instantiated by pronouns tend to have reflexivity among their oppositional functions and feature more oppositional than non-oppositional verbs), but in general these parameters do not cluster in such a way that the variation of MMs can easily be reduced to few general and discrete types. Instead, these parameters build a complex multidimensional space in which individual MMs can be placed based on their often unique combinations of values.

Concerning the range of functions covered by MMs, I have pointed out that a better understanding can be achieved if one keeps oppositional and non-oppositional verbs distinct. Oppositional middles can successfully be typologized by resorting to existing comparative concepts of voice phenomena (Zúñiga and Kittilä 2019). I have provided empirical support to the widespread belief that middle marking expresses various valency-related functions, but I have also shown that oppositional MMs may display non-valency-related functions as well. By contrast, I have shown that the typologization of non-oppositional middles remains highly problematic due to methodological and empirical issues. Specifically, I have discussed how the traditional situation-type approach (cf. Kemmer 1993) falls short in individuating semantic classes that are meaningful for cross-linguistic comparison.

An important finding of this paper is that MMs are most conspicuously associated with anticausative/spontaneous events and with verbs of translational motion, and less so with grooming and non-translational motion situations, which in Kemmer’s (1993) view represent the semantic middle prototype. This evidence challenges Kemmer’s (1993) explanation that MMs cross-linguistically express a lower degree of elaboration of events. A similar critique was already formulated by Haspelmath (1995: 373), who observed that “low elaboration of events seems to be the best approximation if one wants a common meaning for all middle situations, but couldn’t it be that there is no real common meaning that all situation types share? The existing obvious similarities could be attributed to the fact that they arose by grammaticalization from the same marker” (also Holvoet 2020: 223–224).

Indeed, studies in source-oriented typology have repeatedly suggested that cross-linguistic regularities may also be explained by the historical processes that lead to the emergence of individual patterns rather than by the synchronic properties of the patterns in themselves (Cristofaro 2019; see Haspelmath 2019 for discussion). In Section 4.3.1, I remarked how historical considerations in some cases best explain the shape of individual MVSs. Following this line of reasoning, one wonders whether low elaboration of events is actually the best explanation of the polyfunctionality of MMs, or whether the similarities (and divergences) in the configuration of MVSs are ultimately mostly due to diachronic factors. I leave this question open for future study.

Corresponding author: Guglielmo Inglese

Funding source: FWO – Research Foundation Flanders 10.13039/501100003130

Award Identifier / Grant number: 12T5320N


I would like to thank Jean-Christophe Verstraete, Simone Mattiola, and Sebastian Dom for comments on earlier drafts of this paper. I also thank all colleagues who have kindly discussed with me specific points or have shared with me their data, as well as three anonymous reviewers whose comments have contributed to greatly improve this paper in both content and form. All remaining shortcomings are, needless to say, my own.

  1. Research funding: This paper is the result of research work carried out within the project “Towards a diachronic typology of middle voice systems” funded by the FWO – Research Foundation Flanders (grant no. 12T5320N).


