Skip to content
BY-NC-ND 4.0 license Open Access Published by De Gruyter Mouton October 24, 2019

Insights on the Greenberg-Sanches-Slobin generalization: Quantitative typological data on classifiers and plural markers

Marc Tang and One-Soon Her
From the journal Folia Linguistica


This paper offers quantitative typological data to investigate a revised version of the Greenberg-Sanches-Slobin generalization (GSSG), which states that (a) a language is unlikely to have both sortal classifiers and morphosyntactic plural markers, and (b) if a language does have both, then their use is in complementary distribution. Morphosyntactic plurals engage in grammatical agreement outside the noun phrase, while morphosemantic plurals that relate to collective and associative marking do not. A database of 400 phylogenetically and geographically weighted languages was created to test this generalization. The statistical test of conditional inference trees was applied to investigate the effect of areal, phylogenetic, and linguistic factors on the distribution of classifiers and morphosyntactic plural markers. The results show that the presence of classifiers is affected by areal factors as most classifier languages are concentrated in Asia. Yet, the low ratio of languages with both features simultaneously is still statistically significant. Part (a) of the GSSG can thus be seen as a statistical universal. We then look into the few languages that do have both features and tentatively conclude that part (b) also seems to hold but further investigation into some of these languages is needed.

1 Introduction

The dichotomy between classifier languages and plural-marking languages is commonly characterized as follows: in a plural-marking language such as English, count nouns are quantified by numerals directly (1a) or by the use of a measure term (1b), while mass nouns generally occur with a measure term (e.g., five bottles of water). In English, the quantification process thus involves plural marking that is reflected on the noun and the verb, cf. a boy is sitting inside and two boys are sitting outside. In contrast, in classifier languages, a classifier (word or morpheme) is generally required when a numeral is employed in the quantification of a noun, i.e., “classifiers occur within ‘pseudopartitive’ constructions, which consist of a specifier (numeral, quantifier or determiner), classifier and noun” (Kilarski 2013: 33–34).

Classifiers are thus generally referred to as ‘numeral classifiers’ due to their co-occurrence with numerals. As demonstrated with Mandarin examples in (1c)–(1d), numeral classifiers come in two varieties: sortal classifiers apply to count nouns (1c) and mensural classifiers may apply to count nouns (1d) and/or mass nouns, e.g., wu ping shui (five mensbottle water) ‘five bottles of water’. However, plural in Mandarin is not marked morphosyntactically as it is in English, cf., yi zhi mao (one clfanimal cat) ‘one cat’ vs. san zhi mao (three clfanimal cat) ‘three cats’. Further details and differentiation between different types of plural are discussed in Section 2.


Sortal and mensural classifiers in Mandarin compared with English

a.five booksb.five boxes of books
‘five books’‘five boxes of books’

It is important to differentiate syntactically between ‘measure terms’ observed in plural-marking languages like English (1b) and ‘mensural classifiers’ found in classifier languages like Mandarin (1d). The two are sometimes confused due to their similar semantic functions. Syntactically, the English measure terms are nouns in that they take plural marking and require the preposition ‘of’, e.g., three bottles of wine, as nouns behave in the same way in contexts serving other functions, e.g., three bottles of good quality and three bottles of wine. Mensural classifiers quantify the noun directly without the intervention of a preposition, e.g., san ping jiu (three mensbottle wine) ‘three bottles of wine’. Also, in genuine classifier languages that do have morphosyntactic plural markers, mensural classifiers do not take such plural marking, e.g., Hungarian (Csirmaz and Dékány 2010: 13) and Armenian (Borer 2005: 94–95). [1] Mensural classifiers and sortal classifiers are therefore two subcategories of numeral classifiers, a distinct syntactic category from nouns (for a detailed discussion cf. Kilarski 2013: 35]).

The structure of the classifier phrase is generally considered as right-branching within the formalist literature (e.g., Li 2014), as in (2a), but the traditional view, e.g., Greenberg (1990), is a left-branching structure, as in (2b). This is also argued for by some formalists, e.g., Her (2012b, 2017). Note that the paper focuses on sortal classifiers, not mensural classifiers.


The right-branching approach to C/M phrases


The left-branching approach to C/M phrases

The use of sortal classifiers for count nouns in some languages and the use of plural markers in others have led to two divergent theoretical views: the typological approach versus the universalist approach (Ghomeshi and Massam 2012: 2). Under the typological approach, which comes in different versions, classifier languages, unlike plural-marking languages, either do not make the mass-count distinction or only make this distinction semantically, but not syntactically, and therefore do not allow nouns to be quantified by numerals directly without classifiers (e.g., Allan 1977; Bale and Coon 2014; Chierchia 1998; Hansen 1983; Krifka 1995; Link 1998; Zhang 2012). Under this view, nouns in classifier languages are in some sense all mass nouns.

Under the universalist approach, however, sortal classifiers and plural markers are unified under one grammatical category (e.g., Borer 2005; Borer and Ouwayda 2010; Cowper and Hall 2012; Doetjes 2012; Her 2012a; Mathieu 2012; Nomoto 2013; T’sou 1976). This view is based on the initial observation by Greenberg (1990) and Sanches and Slobin (1973) that languages with obligatory sortal classifiers do not have compulsory plural markers, and vice versa. This observation is also known as the ‘Greenberg-Sanches-Slobin Generalization’ (GSSG). The universalist view thus suggests that, in spite of the apparent differences in their functions, the two are different manifestations of the same grammatical feature. Under this view, the mass-count distinction is recognized in both types of languages, where the use of a sortal classifier is analogous to that of a plural marker (Aikhenvald 2000; Borer and Ouwayda 2010; Cheng and Sybesma 1998; Jenks 2017; Yi 2011). Some researchers argue that the distinction is universally lexical (e.g., Her 2012a), while others claim that it can only be made syntactically (e.g., Borer 2005).

Map 1 is based on data from the World Atlas of Language Structures (WALS) and shows the general distribution of languages with sortal classifiers and those with nominal plural markers. The former category is concentrated in East and Southeast Asia, extending westwards to Iran and Turkey and eastwards through the Indonesian Archipelago into the Pacific, Micronesia, New Caledonia and western Polynesia. [2] Only sporadic cases are found in other regions of the world: West Africa, the Pacific Northwest, Mesoamerica, and the Amazon basin (Gil 2013). Languages with plural markers are located in western and northern Eurasia and in most parts of Africa. They extend out to Southeast and East Asia and Australia, but are almost non-existent in New Guinea (Haspelmath 2013). Little overlap is found between the classifier areas and the plural areas, suggesting a potential complementarity of the two features.

Map 1: Languages with sortal classifiers and nominal plural markers, adapted from Gil (2013) and Haspelmath (2013).This map includes all types of nominal plurals included in Haspelmath (2013), i.e. both obligatory and optional nominal plural markers on human nouns or all types of nouns. This study does not apply the same definition of plural marker as Haspelmath (2013), but this map is shown to provide a general overview from previous studies.

Map 1:

Languages with sortal classifiers and nominal plural markers, adapted from Gil (2013) and Haspelmath (2013).

This map includes all types of nominal plurals included in Haspelmath (2013), i.e. both obligatory and optional nominal plural markers on human nouns or all types of nouns. This study does not apply the same definition of plural marker as Haspelmath (2013), but this map is shown to provide a general overview from previous studies.

No large-scale study has analysed qualitatively and quantitatively the veracity of this assumed complementary distribution. This paper aims at filling this gap via the following steps. Qualitatively, we further specify the theoretical definition for plural markers and classifiers. We define plural markers as a morphosyntactic feature (Doetjes 2012; Kibort and Corbett 2008). Specifically, the plural markers in complementary distribution with sortal classifiers should only involve morphosyntactic plural markers that engage in number agreement with another element, e.g., the verb, as a formal requirement. For example, in the two French clauses il est ici (3sg.masc here) ‘he is here’ and ils sont ici (3pl.masc here) ‘they are here’, the two different forms of the verb are due to the grammatical number agreement with their respective subjects. Previous studies have commonly included both morphosyntactic and morphosemantic plural markers in their discussion. However, we follow the line of thinking that the sole presence of morphosemantic plural markers that only involve nominal plural marking (as shown in data from WALS in Map 1) is not sufficient for a language to be accepted as plural marking for the purpose of the GSSG. Further explanation will be provided in Section 2.

Quantitatively, we provide typological data obtained from a phylogenetically and geographically weighted dataset of 400 languages worldwide, and investigate the actual distribution of sortal classifiers and morphosyntactic plural markers in languages of the world via the statistical test of conditional inference tree (Breiman 2001). As a disclaimer, the main goal of this study is the validation of the GSSG, not necessarily the universalist view that attempts to account for it. The observations in our data thus only allow us to reject the null hypothesis of no association between sortal classifiers and morphosyntactic plural marking. Moreover, this association is considered as a statistical universal rather than an absolute universal; exceptions are therefore expected.

This paper contains five sections. In Section 2, we provide a brief review of the GSSG and distinguish between morphosyntactic and morphosemantic plural markers. Section 3 presents how our data was gathered and introduces the concept of conditional inference trees. In Section 4, we visualize the results of our statistical analysis. The main findings are discussed in Section 5, and Section 6 concludes the paper.

2 Two versions of the GSSG and two kinds of plural marking

The possible complementary distribution of sortal classifiers and plural markers is initially put forth in Greenberg (1990), where he cites an unpublished paper by Mary Sanches, later published as Sanches and Slobin (1973). The original text is repeated below; note that in the following quote, ‘numeral classifiers’ refers to ‘sortal classifiers’ specifically.

If a language includes in its basic mode of forming quantitative expressions numeral classifiers, then it will also have facultative expression of the plural. In other words, it will not have obligatory marking of the plural on nouns and as a result the classified noun itself is normally […] not marked for number. (Greenberg 1990: 177)

The original version of the GSSG does not claim explicitly that sortal classifiers and plural markers are mutually exclusive on a noun and are collectively exhaustive (Greenberg 1990; Sanches and Slobin 1973). First, complementary distribution is defined as a condition where two elements never appear in the same context (Fromkin et al. 2011). That the two elements do not co-occur in the same context does not entail that one of the two will appear in this context. A classifier language generally lacks number marking, but it is certainly not the case that languages without obligatory number marking on the noun all have classifiers (Doetjes 2012: 2566). Second, the GSSG states that if a language requires classifiers with numerals, then morphological number is not expressed, and if a language has obligatory number marking, it does not have classifiers (Scontras 2013: 560). Classifier languages thus generally do not have compulsory expression of nominal plurality, but they may have facultative expression (Greenberg 1974: 25). Therefore, a strong version and a weak version of the GSSG can be derived as follows.

The study of nominal classifier systems suggests an important hypothesis that the use of nominal classifiers and the use of the plural morpheme are in complementary distribution in natural language. More concretely, it suggests that either a) a natural language has either nominal classifiers or plural morphemes, or b) if a natural language has both kinds of morphemes, then their use is in complementary distribution. (T’sou 1976: 1216)

The strong version (a) is easily rejected since languages with neither classifiers nor plural markers are commonly attested, e.g., Karitiana (Muller et al. 2006) and Dëne (Wilhelm 2008). The weak version (b) predicts that in languages with both systems, a classifier does not co-occur with a plural marker in the same construction. This version is argued for by syntacticians such as Borer (2005), Her (2012a), and Jenks (2017), who claim that classifiers and plural markers belong to the same syntactic category. Both versions should be subject to further investigation, since even if languages with both sortal classifiers and obligatory nominal plural markers are attested, it seems that none of them constitutes a clear case with obligatory number marking and also a classifier on the noun (Doetjes 2012), e.g., Kana (Bisang 2012; Ikoro 1994, 1996), Northern Kam (Gerner 2006: 243–244), Hungarian (Csirmaz and Dékány 2010: 13), Persian (Ghomeshi 2003: 55–56), Nivkh, Ejagham, Southern Dravidian languages (Aikhenvald 2000), Mandarin, among others (Aikhenvald 2000: 100–101; Allan 1977: 294). As stated by Zhang (2013: 153), the fact that even a prototypical classifier language like Mandarin [3] represents an apparent counterexample to the GSSG is very intriguing, i.e., the morpheme -men in Mandarin is often interpreted as a plural marker that may occasionally occur with classifiers, e.g., san wei laoshi-men (three clfperson teacher-pl) ‘the three teachers’. If even such a prototypical classifier language may be an exception, the weak version of the GSSG should be rejected as well. Further qualitative and quantitative analyses are therefore required.

One of the main issues involves the theoretical definition of classifiers and plural markers that should be included in the GSSG. Sortal classifiers are relatively well-documented and theoretically differentiated between similar systems such as noun classifiers, noun classes, among others (Aikhenvald 2000; Dixon 1986; Grinevald 2015). Sortal classifiers are additional grammatical elements (usually bound or free morphemes) occurring in a construction of nouns and numerals, serving the function to divide the inventory of count nouns into semantic classes (Aikhenvald 2000; Gil 2013; Grinevald 2015). For example, in Assamese (Indo-Aryan), the classifier is a bound morpheme attached to the numeral, e. g., du-khͻn train (two-clfflat train) ‘two trains’, while in Mandarin the classifier is a clitic that requires a c-commanding host and in Cantonese the classifier is a free morpheme (Her et al. 2015).

With regard to plural markers, previous studies on the GSSG mostly considered nominal plural marking (Kim and Melchin 2018). Thus, the occurrence of overt plural marking on the noun would allow a language to be labeled as plural marking (Haspelmath 2013). In this study, we suggest that Kibort and Corbett’s (2008) differentiation of morphosyntactic and morphosemantic plurals should be taken into account, and that only morphosyntactic plural markers should be included in the GSSG. Plural markers as a morphosyntactic feature participate in grammatical agreement. As an example, the French grammatical plural marker -s is marked on nouns, while articles, adjectives, and verbs also agree with the noun in number, e.g., (3a) for singular and (3b) for plural.


Morphosyntactic plural marking in French (Indo-European)


‘The new student (female) is smart.’


‘The new students (female) are smart.’

If the plural is only marked on the noun but is not involved in number agreement with another element as a formal requirement, it is regarded as a morphosemantic feature in the language. Plural markers that only exist as “number morphology on a demonstrative or number marking by means of an independent morpheme” (Doetjes 2012: 2566) thus should be excluded from the GSSG. In other words, the sheer presence or absence of a plural marker itself is not sufficient to determine the formal status of grammatical number, which is manifested via number agreement as a formal feature and thus not simply due to semantic compatibility. For example, the different forms is and are of the verb be in he is a student and they are students are due to a formal requirement of number agreement between the verb and the subject, as there is nothing inherently singular or plural in the semantics of the verb be; however, the anomaly of *he is students and *they are a student is simply due to the semantic incompatibility between the two noun phrases involved. The former scenario is number agreement as a formal feature; the latter case is not.

Number agreement as a formal feature does not necessarily involve an NP-external element as in subject-verb agreement. In Swedish, for example, adjectives show number agreement with the noun in the predicative context, e.g., en bok är populär ‘a book is popular’ and två böcker är populära ‘two books are popular’, as well as in the attributive context, e.g., en populär bok ‘a popular book’ and två populära böcker ‘two popular books’. Such number agreement exists as a formal feature, in spite of the few exceptional or irregular adjectives that do not vary in form in terms of number, e.g., böckerna är gratis ‘the books are free’ with no plural marking on gratis ‘free’. Take also sheep in English for example. If the presence of a plural marker were the sole criterion to define plural, we would count sheep in three sheep as singular, since the noun sheep is not overtly marked for number in the noun phrase. However, number agreement reveals its number as a formal feature, as in the sheep is sleeping and three sheep are sleeping. Thus, three sheep and three goats differ only in the sense that the plural marker in the former is a zero allomorph.

Previous studies on the GSSG have generally assumed that markers that are semantically equivalently plural are also categorically identical. Such a non-discriminatory approach misses the important insight that morphosyntactic plurals constitute a formal feature in grammar but morphosemantic plurals do not. Note that, even though morphosyntactic plurals in general also serve the semantic function of marking plurality, morphosemantic plurals do not participate in number agreement as a formal feature. According to Dryer (2013), there are only ten percent of languages with no plural marking, i.e., 9.2%, or 98/1066, and a similar percentage is found in Haspelmath (2013), i.e., 9.6%, or 28/291. In this paper, the distinction of morphosyntactic and morphosemantic plurals differentiates between plural markers as functional heads and those as modifiers (cf. inflectional vs non-inflectional), whose complementarity is syntactically conditioned (Wiltschko 2008: 666). Sortal classifiers, on the other hand, are not expected to show such concord (Aikhenvald 2000; Grinevald 2000; Senft 2000) as sortal classifiers function both syntactically to mark countability and also semantically to classify count nouns.

This approach allows us to filter out morphosemantic plurals such as collective markers and associative markers that do not denote plurality in the narrow sense (Rijkhoff 2000: 240; Vogel and Comrie 2000). As an example for collectivity, the marker -men in Mandarin highlights the homogeneous group feature of the referents and does not refer to the additive plurality as common plural markers do (Lo 2015), e.g., laoshi-men (teacher-pl) ‘the (group of) teachers’. A similar situation is observed for associative plurals carrying the meaning ‘X and other people associated with X’, e.g., in Japanese Tanaka-tachi ‘Tanaka and his associates’ (Daniel and Moravczik 2013). In neither case do the plural markers exhibit grammatical agreement, e.g., in (4) with the collective marker -men in Mandarin.


Absence of morphosyntactic plural marking in Mandarin (Sino-Tibetan)


‘This girl speaks Mandarin.’


‘These girls speak Mandarin.’

Mandarin is thus not an exception to the GSSG as it only possesses morphosemantic plural markers. Similar cases are observed in other languages within our dataset, such as Remo (Austro-Asiatic), where a plural-marked noun is still morphosyntactically singular in terms of verb agreement (Anderson and Harrison 2008: 570).

To summarize, the revised GSSG states that (a) a language is unlikely to have both sortal classifiers and morphosyntactic plural markers, and (b) if a language does have both, then their use is in complementary distribution. The more restricted GSSG should be more plausible and will be tested empirically. We shall first test the strong version (a) and see how it pans out, and we then look into the exceptions to (a) and see if the weak version (b) holds up.

3 Methodology

In this section, we first explain how data was gathered and then present the general methodology and lists the geographical, phylogenetic and linguistic features included in the dataset, followed by an overview of the model (conditional inference tree) used in our statistical analysis.

3.1 Data

We apply a geographically and phylogenetically weighted language sample based on the methodology of the WALS. It is defined as weighted in the sense that both linguistic and geographical diversity are taken into consideration. For instance, as the Austronesian family accounts for almost 17.14% (1262/7363) of the languages in the world (Lewis et al. 2009), a similar ratio is applied in the dataset (19.00%, 76/400). Likewise with regard to spatial distribution, since the Pacific region contains 18.74% (1380/7363) languages of the world, an equal scale is illustrated in the database (18.50%, 74/400). Such methodology is equally enforced within the sub-groups of each language family and smaller regions in terms of geography. We are aware that a set of 400 languages is not an absolute (but still acceptable) representative of all 7000 languages worldwide, and that such data may also suffer from a lack of independence since most languages are clustered into big language families. However, we consider it adequate for the current aim, which is to provide preliminary quantitative data with regard to the distribution of sortal classifiers and plural markers. Moreover, conditional inference trees are expected to be able to detect informative patterns through permutations without the conventional limitation of distributional assumptions within the data (further details in Section 3.2). The languages included in our study are displayed in Map 2.

Map 2: The 400 languages selected for the dataset.

Map 2:

The 400 languages selected for the dataset.

Our data sources for each language consist of existing language grammars after which we cross-check the language examples with the definition we apply. Linguists working on the languages in question are solicited in case of difficulty. Databases such as the WALS are only used as optional references, as our interpretation may diverge from the annotations of other researchers. For instance, even though the definition of sortal classifiers is already established in the relevant literature (Aikhenvald 2000; Dixon 1986; Grinevald 2015), they still “go by an exasperating variety of names” (Blust 2009: 292) within language descriptions (e.g., classifiers, words of measure, quantifiers, unit words, numeratives, projectives). For example, Tsimshian (Penutian) is attested to have classifiers in Gil (2013), but the literature actually points out that the language has different sets of numbers for counting different things instead of classifiers (Dunn 1979: 38–39). As another example, several South-American languages such as Tuyuca (Tucanoan) and Miraña (Huitotoan) are originally annotated as classifier languages in the WALS (Gil 2013). However, evidence from language grammars and the statement of previous studies show that these languages have a concordial system that is more likely to be considered as grammatical gender rather than sortal classifiers within our definition (Derbyshire and Payne 1990: 256). As shown in (5) with Miraña, the general class marker (GCM) is present on the noun, numeral, and verb. Considering these morphemes as sortal classifiers is therefore inadequate in our study since they have “a uniform pattern that shares many characteristics with canonical agreement (Corbett 2003a, 2003b, 2003c)” and therefore have “little in common with numeral classifiers” (Seifart 2005: 13).

As a result, languages in a similar status as Miraña are labelled as non-classifier languages in our database.


Concordial markers in Miraña


‘one man’


‘he fell, the man’

(Seifart 2005: 158)

With regard to morphosyntactic plural markers, we follow previous studies and further distinguish between ‘number agreement’ and ‘cross-referencing’ (Klamer 1998: 60–61; Nichols 1986). Both number agreement and cross-referencing indicate grammatical properties of the NP(s) on an element related to the NP(s), e.g., the verb. However, number agreement implies that the verb plus its agreement marker(s) cannot occur without the agreeing NP(s), and the NP(s) cannot occur without being in agreement with the verb. In cross-referencing languages, the NP(s) are optional as apposition to the pronominal markers on the verb, i.e., the verb plus the pronominal marker can occur without the dependent NP(s). In other words, pronominal markers on verbs are “markers of cross-reference and not of agreement” (Klamer 1998: 60–61). To summarize, number agreement refers to number marking, whereas cross-referencing indicates person marking (Hicks 2016). [4] For instance, French (Indo-European) is categorized as having number agreement since the verbs mark singular and plural without distinction of person, cf. (3a) and (3b). However, languages such as Kambera (Austronesian) mark person on the verb instead of number, and are thus defined as cross-referencing languages. By way of illustration, in (6), the NPs representing the subject (‘the big man’) and the object (‘me’) are optional as the verb is already marked by pronominal clitics.


Cross-referencing in Kambera


‘The big man hit me’

(Klamer 1998: 63)

Cross-referencing is not an isolated phenomenon as it is commonly attested in languages of North and South America and sub-Saharan Africa, and is also found in languages of Europe, Australia, and New Guinea (Van Valin 1987: 392–393). It should be differentiated from number agreement for the following reasons. [5] First, the syntactic behaviour of pronominal clitics differs from morphosyntactic plural markers obligatorily marked on the verb, e.g., in Atayal (Austronesian),

Preverbal weak pronouns, whether nominative or genitive, are in actual fact clitics. This is apparent from the fact that they move leftwards in a structure where the only possible lodging point for free pronouns would be to the right of the sentence core (Holmer 1993: 92).

Such a phenomenon is equally attested in Australian languages such as Yukulta, where the pronouns “occur in a clitic complex together with a transitivity marker and a tense-aspect marker and the clitic complex is suffixed to the first constituent of the sentence” (Keen 1983: 216). [6] Second, even if person marking is realized via bound pronouns instead of pronominal clitics, it does not necessarily imply number marking, e.g., in Tukang Besi (Austronesian), Chimariko (Hokan), and Khumi (Tibeto-Burman) the third person pronominal affixes show no distinction in number (Donohue 1999: 114; Jany 2009: 100; Peterson 2002: 101). Finally, even though person marking may distinguish between the singular/plural of first, second, and third person, this is not considered as number agreement since the interpretation of plural is a side effect of person-marking and is semantic rather than syntactic. However, it is also possible to have a language that marks person and number separately. For instance in Kulina (Arauan), person marking only distinguishes between singular and non-singular in the first person, whereas pronouns for the second and third person do not mark number. The number of the subject is usually indicated by the non-singular affix -mana. As demonstrated in (7), the prefix i- only indicates third person, the declarative suffix -i only marks the gender of the direct object (manioc has the masculine gender in Kulina), while the suffix -mana relates to number. [7] In such a situation, Kulina is considered to have morphosyntactic plural marking in our database of 400 languages.


Cross-referencing and morphosyntactic plural marking in Kulina


‘All the women are cooking manioc’

(Dienst 2014: 102)

In sum, the definitions of classifiers and plural markers may vary across studies. We have clarified in this sub-section how we defined the two categories in the current analysis: we only included sortal classifiers and morphosyntactic plural marking through thorough analysis of different grammars and language examples. As a result, each of the 400 languages of our database is annotated with the features displayed in Table 1. Basic information such as language name and location are extracted from the WALS, while linguistic information related to sortal classifiers and morphosyntactic plural markers are gathered from our own search of reference grammars and scientific articles. [8]

Table 1:

Features encoded for the 400 languages in our data.

language_nameName of the languageWALS
longitude,latitudePoint-coded location of the languageWALS
GenusGenus classification of the languageWALS

We also included information related to the geographical and phylogenetic distribution of the 400 languages so that the statistical analysis may take into account the influence of areal and genealogical distribution of sortal classifiers and morphosyntactic plural markers. By way of illustration, most statistical tests presuppose that each data point is independent of all others; however, the areal pattern of sortal classifier languages in Map 1 shows that the assumption of independence may not hold. Sortal classifier languages such as Mandarin form part of a single classifier area, while morphosyntactic plural-marking languages like English are represented in an area of their own. We may observe two kinds of areal pattern, i.e., classifiers in Asia and plural marking in the other parts of the world. The apparent complementary distribution of sortal classifiers and morphosyntactic plural markers could thus result from this geographical pattern rather than from linguistic factors. In order to eliminate this areal influence, we include the main geographical area of the world (Africa, Americas, Asia, Europe, Pacific) in our statistical analysis. [9] Furthermore, we equally annotate languages according to their genealogical classification as the same issue of independence may occur in terms of genealogy. Each language is categorized in a genus, which applies a definition analogous to the taxonomic level of genus of biology and refers to a group of languages whose relatedness is well-established among linguists and has a level of classification comparable across the world (Dryer 1989: 267). For instance, subfamilies of Indo-European such as Germanic, Slavic, Celtic, among others are examples of genera.

We are aware that the binary distinction employed in this paper may be controversial. The languages in our dataset are labelled as classifier languages and non-classifier languages. However, such a binary distinction is not obvious for every language as the obligatoriness and productivity of classifiers may vary, and likewise for the inventory size of classifiers. For instance, a language with only one classifier and another language with more than 100 classifiers are annotated with the same value, and their difference of inventory size is lost during the process. The same limitation applies to our definition of morphosyntactic plural markers as different types of number agreement (e. g., noun phrase internal and external) could be applied and influence the distribution of languages in the dataset. It would probably be more precise to adopt recent approaches toward nominal classification and weigh languages according to their level of canonicity with regard to sortal classifier systems and/or morphosyntactic plural marking (Corbett 2003a; Corbett and Fedden 2016; Fedden and Corbett 2017; Grinevald 2015). Nevertheless, for the purposes of the current analysis, the binary classification is sufficient as another project applying the canonical approach is ongoing in parallel to provide additional information related to the research question of this study.

3.2 Conditional inference trees

Conditional inference trees are a method for regression and classification that relies on binary recursive partitioning (Breiman et al. 1984). This method has broad applications in data mining and machine learning (Chen and Ishwaran 2012: 324) and has been recently introduced to the field of linguistics (Levshina 2015; Tagliamonte and Baayen 2012). With this method, binary splits recursively partition the tree into homogeneous or near-homogeneous terminal nodes. The split is considered optimal if the homogeneity of data is improved after the transfer from the parent node to the two daughter nodes. To assure a low variance of the output, the model not only uses a bootstrap sample of the original data but also selects a random subset of variables for each splitting node instead of using all variables.

The overall processing of data is listed as follows. First, the algorithm scans through the variables and chooses the strongest association with the response. Then, the dataset is divided into two subsets based on the chosen variable. These two steps are repeated for every subset until no variables may split the data with statistical significance. By way of illustration, if we had to predict the rain based on humidity and wind, we may gather information on these three variables for a period of 34 days, and generate the conditional inference tree shown in Figure 1. The colour of the buckets at the bottom of the graph shows the percentage of response variable, i.e., did it rain (black) or not (gray). Based on toy data, the model shows that the probabilities of experiencing rain is statistically highly significant (p<0.001) when the humidity level is above 60 (Node 7). Moreover, the effect of wind is relevant only for the subset of observations that have a humidity level under 60, i.e., if the humidity level is under 60 and it is not windy (Node 3), the probabilities of not experiencing rain is statistically significant (p<0.05). The same logic applies for Node 5 and 6. Both humidity and wind are shown on the graph as they are statistically significant. If one of the two factors did not have a statistically significant effect on the response variable, it would not be included in the final output of the conditional inference tree.

As a reminder, this is a toy sample with limited number of observations to demonstrate the output of a conditional inference tree. The observations have been manually designed to show a clear-cut situation. Real data usually do not show such intuitive distribution as we will display in the following section.

One of the main advantages of conditional inference trees is the use of permutation to retrieve p-values. In other words, the labels on the data points are reshuffled randomly, and the statistical test is computed for each rearrangement. The result is considered statistically significant if the proportion of the permutations providing a test statistic greater than or equal to the one observed in the original data is smaller than the significance level. Such setting allows conditional inference trees to handle data with a small number of observations and large number of (possibly correlated) predictors, which conventional statistical tests may have difficulties to assess (Tagliamonte and Baayen 2012). Moreover, using recursive portioning “does not require distributional assumptions to be met. It is also considered to be robust in the presence of outliers” (Levshina 2015: 292). For further details of the algorithm, please refer to (Breiman 2001).

4 Results

In the current section, we provide an overview of the database, which is followed by the conditional inference tree analysis. First, we visualize the geographical distribution of the 400 languages according to their value of sortal classifier and morphosyntactic plural marking. As shown in Map 3, most classifier languages are located in Asia, with sporadic cases attested in the Americas and Africa. This observation concords with previous studies on Map 1. With regard to plural markers, only languages with morphosyntactic plural marking are considered as plural marking; while languages with cross-referencing or lack of any type of plural marking are not labelled as plural marking. Our findings also match what is found in the literature, as approximately 20% of the languages of the world do not even mark person on the verb (Siewierska 2013). Finally, 1.00% (4/400) of the languages in our dataset are annotated as having both sortal classifiers and morphosyntactic plural markers. These languages will be further investigated in Section 5.

Map 3: Spatial distribution of classifier and plural-marking languages in the 400 languages.

Map 3:

Spatial distribution of classifier and plural-marking languages in the 400 languages.

The spatial overview shows a complementary-like distribution between sortal classifiers and morphosyntactic plural-marking languages. Yet, the coverage of plural-marking languages differs from what we observed in Map 1 when considering nominal plural marking. The detailed numbers of our data are summarized in Table 2. Languages with only classifiers (29.25%, 117/400) or plural markers (32.50%, 130/400) have a similar ratio as languages without either of the two (37.25%, 149/400). Further studies are thus required to investigate if these 149 languages with neither classifiers nor morphosyntactic plural markers use other means to encode the functions of sortal classifiers and plural markers. We do not investigate this question in the current study since the GSSG does not claim that sortal classifiers and plural markers are collectively exhaustive, i.e., the GSSG claims that sortal classifiers and morphysyntactic plural markers do not co-occur due to their shared functions. However, this does not imply that every language of the world should have either one of the two systems, as the functions fulfilled by sortal classifiers and morphosyntactic plural markers may not be universally required across languages.

Table 2:

Distribution of classifiers and morphosyntactic plural markers in the 400 languages.

ClassifiersNo classifiersTotal
Plural markers4 (1.00%)130 (32.50%)134
No plural markers117 (29.25%)149 (37.25%)266

With regard to the statistical analysis of such distribution, we use the party and rms packages (Harrell 2015; Hothorn et al. 2006) from R (R Core Team 2017) to generate conditional inference trees with the MonteCarlo simulation of linguistic, geographical, and phylogenetic factors. Then we extract the main information of statistical significance and effect size. To be more precise, the conditional inference tree tells us which factors are relevant to the distribution of classifiers and plural markers. Afterward, we measure the explanatory power of the conditional inference tree based on these factors via the C-statistic (Austin and Steyerberg 2012). Figure 2 shows the conditional inference tree that takes classifiers as response variable and plural markers as the sole explanatory variable. The results show that the negative correlation between plural markers and classifiers is statistically highly significant (p<0.001). Node 2 relates to sum of the first row in Table 2, and demonstrates that if a language has plural markers, it is not likely to have classifiers, e.g., only 2.99% (4/134) of languages with plural markers have classifiers. Node 3 refers to the sum of the second row in Table 2, and shows that classifiers are found much more frequently in languages without plural markers (43.98%, 117/266). The colours of the buckets indicate the ratio of values within the response variable, i.e., black represents the amount of classifier languages and gray non-classifier languages. We do not display the opposite of sortal classifiers as explanatory variable and plural markers as response variable since the output is equivalent.

Figure 1: Toy sample of conditional inference tree to predict the binary response of rain.

Figure 1:

Toy sample of conditional inference tree to predict the binary response of rain.

Figure 2: Conditional inference tree with sortal classifiers as response variable and plural markers as explanatory variable.

Figure 2:

Conditional inference tree with sortal classifiers as response variable and plural markers as explanatory variable.

To measure the explanatory power of the conditional inference tree in Figure 2, we generate the C-statistic via the R package rms. The C-statistic is a standard measure of goodness-of-fit for binary outcomes in a logistic regression model. It is obtained from the estimated area under the Receiver Operating Characteristics curve that plots the true positive against the false positive predicted by the model in order to show how the model would perform under different cost requirements. It corresponds to “the Wilcoxon rank sum statistic for measuring the rank correlation between observed and predicted outcomes divided by the product of the number of subjects with the outcome or condition and the number of subjects without the outcome or condition” (Austin and Steyerberg 2012: 8; Bamber 1975; Hanley and McNeil 1982) and also relates to Somer’s Dxy rank correlation between the predicted probability of the occurrence of the outcome and the observed outcome (Harrell 2001). As a rule of thumb: if the C-statistic is below 0.5, it says that the model is performing poorly and cannot provide discrimination within the data. If it equals to 0.5, it means that the model can only reach a by-chance performance. If the C-statistic is greater than 0.7, it infers an acceptable discrimination, whereas above 0,8 indicates excellent discrimination. A C-statistic greater than 0.9 implies outstanding discrimination (Hosmer and Lemeshow 2000).

The C-statistic of the conditional inference tree in Figure 2 equals 0.72, which shows acceptable performance. It is, however, not sufficient to reach the threshold of good discrimination even though the correlation between sortal classifiers and plural markers is statistically highly significant. Moreover, we did not include geographical and phylogenetic factors in the analysis. Hence, we generate a second conditional inference tree that includes the information on continent and genus encoded in our data as additional explanatory variables. The result is shown in Figure 3. The conditional inference tree indicates through the variable continent that the effect of unbalanced areal distribution is statistically highly significant (p<0.001). If a language is located in Asia, the complementary distribution of sortal classifiers and morphosyntactic plural markers may be due to the fact that most languages in Asia have sortal classifiers and lack morphosyntactic plural markers. However, the correlation between sortal classifiers and morphosyntactic plural markers is still statistically highly significant (p<0.001) in other areas of the world (Africa, Americas, Europe, Pacific). Information on genera are not shown on the conditional inference tree since the model estimates that genera do not play a statistically significant role in the decision process. In other words, while the geographical location of a language has a statistically significant influence on the presence/absence of sortal classifiers, phylogenetic factors do not.

Figure 3: Conditional inference tree with sortal classifiers as response variable and morphosyntactic plural markers, continent, genus as explanatory variables.

Figure 3:

Conditional inference tree with sortal classifiers as response variable and morphosyntactic plural markers, continent, genus as explanatory variables.

The detailed distribution of languages according to their linguistic features and geographical location is shown in Table 3. The percentages indicate the ratio of languages within every continent, e.g., only three sortal classifier languages of our dataset are located in Africa, 5.17% thus equals to three divided by the sum of all languages from Africa in our dataset (58). Most classifier languages are found in Asia (70.09%, 82/117), which explains the strong areal effect found in Figure 3. Morphosyntactic plural markers are concentrated in Africa and Europe while the Pacific region and the Americas primarily consist of languages with neither sortal classifiers nor morphosyntactic plural markers.

Table 3:

Distribution by continent of classifiers and plural markers in the 400 languages.

ContinentSortal classifierMorphosyntactic pluralBothNoneTotal
Africa3 (5.17%)44 (75.86%)0 (0.00%)11 (18.97%)58
Americas14 (11.86%)41 (34.75%)0 (0.00%)63 (53.39%)118
Asia82 (64.57%)21 (16.54%)3 (2.36%)21 (16.54%)127
Europe1 (4.76%)19 (90.48%)1 (4.76%)0 (0.00%)21
Pacific17 (22.37%)5 (6.58%)0 (0.00%)54 (71.05%)76
Total117 (29.25%)130 (32.50%)4 (1.00%)149 (37.25%)400

The C-statistic of the conditional inference tree in Figure 3 equals 0.83 and indicates excellent discrimination. In other words, areal distribution combined with the presence/absence of morphosyntactic plural markers can accurately predict the presence/absence of sortal classifiers in a language. Genealogical factors are also included in the analysis but do not show a statistical significant effect on the data.

5 Discussion and conclusions

In this section, we scrutinize language examples extracted from our data. Since we already provided examples of canonical sortal classifiers and morphosyntactic plural markers in the previous sections, we focus here on language examples from non-canonical languages that display both structures in their system. In the total of 400 languages, only four languages (1.00%) have both sortal classifiers and morphosyntactic plural markers. These languages demonstrate that the complementary distribution between sortal classifiers and morphosyntactic plural markers is not merely due to morphological parameters of languages, e.g., if sortal classifiers occurred only in languages without grammatical agreement, the complementary distribution between sortal classifiers and morphosyntactic plural markers could simply be due to differences of language structures.

The majority of the languages are found in Asia since this is where sortal classifier languages are concentrated. No cases are attested in Africa, the Americas, and the Pacific. Some observations in these areas are peculiar and worth describing for further investigation in terms of typology, but they are not exceptions (one language having both sortal classifiers and morphosyntactic plural markers as a general feature) according to our definition of morphosyntactic plural. For instance in Africa, Toussian (Niger-Congo) originally had noun classes, but the noun-class system has “broken down and most classes are represented only by noun suffixes, not by concord” (Miehe and Winkelmann 2007: 17). We therefore consider that Toussian only has sortal classifiers and does not have morphosyntactic plural markers that are marked on the verb (Kießling 2013: 51–52). With regard to the Pacific and the Americas, they mostly consist of languages using cross-referencing (as shown in Table 3), which decreases the chances of finding languages with both sortal classifiers and morphosyntactic plural marking.

In the following paragraphs, we go through the four languages attested with both classifiers and plural markers listed in Table 4. As a disclaimer, we do not intend to try to explain how these languages do not represent real exceptions to the GSSG (for some of them at least); our main purpose is to provide examples of sortal classifiers and plural markers in these languages. The complementarity between sortal classifiers and plural markers is considered as a generalization, exceptions are thus justified and beneficial for further studies.

Table 4:

Languages with both classifiers and plural markers in the 400 languages.


In Hungarian, count nouns are marked for plural, and a predicative adjective or verb agrees with a subject phrase overtly marked for plural. However, a noun is not marked for plural in the context of a plural numeral (Wunderlich 2001: 6–7). A plural noun phrase with a numeral and (optionally) a sortal classifier thus does not engage in plural agreement with the predicate. In (8a), the general classifier darab is optional. Due to the use of the numeral hat (‘six’), the noun könyv (‘book’) is marked with singular, and the verb is also conjugated in the singular form van ( In (8b), the construction of numeral plus sortal classifier is absent and the noun könyvek (‘book’.pl) is marked for plural and agrees with the verb vannak (‘be’ Morphosyntactic plural markers therefore do not co-occur with classifiers in a Hungarian nominal phrase, even though the plural markers are a morphosyntactic feature in the language (for detailed examples, please refer to Csirmaz and Dekany [2010, 2014]).


Complementary distribution of sortal classifiers and morphosyntactic plural marking in Hungarian


‘There are six books in the closet.’


‘There are old books in the closet.’

(Rounds 2009: 84, 253)

As a reminder, the classifiers discussed here only relate to sortal classifiers and do not involve mensural classifiers (e.g., san xiang shu ‘three boxes of books’ in Mandarin) or measure terms (e.g., three bottles of wine in English). Examples such as Ez-ek a csokor rózsá-k (dem-pl the bouquet rose-pl) ‘these bouquets of roses’ are thus not related to our discussion.

Armenian displays the same phenomenon as in Hungarian. Optional classifiers and morphosyntactic plural markers are both found in the language (9a-b); however, they do not co-occur on a noun (9c). With regard to the verb,

the number of the verb agrees with the morphologically expressed number of the subject. This means that if the subject noun appears in the singular, the verb also has to appear in the singular; independent of whether it has a singular or plural meaning (Dum-Tragut 2009: 315).

Sortal classifiers thus cannot co-occur with morphosyntactic plural markers on the verb as nouns preceeded by sortal classifiers are morphologically marked with singular instead of plural (9a).


Sortal classifiers in Armenian


‘two buildings’




In Persian, even though “classifiers are in complementary distribution with plural marking” (Hamedani 2011: 147), further research is required. We list as follows a sample of classifiers and plural marking in Persian, along with the main reasons for our suggestion for additional investigation. Sortal classifiers are not used in formal written context, but frequently found in informal registers of the language. They are also obligatory when the enumerated item is not mentioned. For instance, the general classifier -taa (clfgeneral) is the most commonly used classifier in colloquial speech and cannot be omitted in the following context: cand ta darid? ‘how many do you have?’ – se ta (three clfgeneral) ‘three’. Further examples of classifiers in Persian are shown in (10).


Sortal classifiers in Persian


‘two boys’


‘twenty books’


‘forty sheep’

(Roberts 2003: 15)

With regard to plural marking, Persian nouns and verbs are not obligatorily marked for plural. Various factors are mentioned in the literature; first, the animacy hierarchy is relevant, [10] as inanimates are generally marked as singular. As an example, both (11a) and (11b) are grammatically correct, but they differ semantically. In (11a), the shops are not an agent and thus no number agreement is observed. In (11b), the shops are personified and represent the shopkeepers, so plural is marked on the verb.


Singular and plural marked on the verb in Persian


‘the shops were closed’


‘the shops were closed’

(Hashabeiky 2007: 78)

Second, plural marking on the verb involves suffixes and pronouns that may mark different number while referring to the same noun depending on the syntactic construction (Sedighi 2007). For instance in (12), animate subjects represented by the third person plural pronoun anha ‘they’ are indicated as plural via the pronominal clitic esan, but the verb additionally bears a singular marker.


Plural animate nouns marked as singular on verbs in Persian


‘they got cold’

(Hamedani 2011: 60)

Still, co-occurrences of sortal classifiers and plural markers on the verb are found in the literature. As shown in (13), the general classifier taa is optional (13a) but it may co-occur with a verb marked as plural (13b). In such a situation, a nominal plural marker -ha may also be used. While the nominal plural marker -ha is considered a morphosemantic plural, the plural marker -and on the verb represents a potential exception to the complementary distribution of sortal classifiers and morphosyntactic plural markers.


Sortal classifiers and plural markers in Persian


‘twenty soldiers were on the street’


‘twenty soldiers were on the street’

(Hamedani 2011: 153)

We consider that Persian requires further investigation as the morphosyntactic plural markers on the verb may be interpreted as cross-referencing rather than plural marking (Hamedani 2011: 60; Roberts 2003), especially since number agreement is not consistently observed on other elements of the language, e.g., adjectives do not “vary for number, or show agreement with any other properties of the head” (Perry 2007: 983). Furthermore, the optionality and obligatoriness of sortal classifiers between informal and formal registers calls for additional data to verify how their distribution varies across different registers.

Further investigation is also suggested for Turkish because of the optionality of plural marking on the verb. In the language, plural is not marked on the noun when there is a cardinal numeral or quantifier as modifier (Göksel and Kerslake 2011: 40; Kornfilt 1997). It is therefore not possible for a construction of a numeral plus a classifier to co-occur with a plural marker on the noun (Her and Chen 2013). [11] With regard to plural marking on the verb, “animate plural subjects may take either a plural or singular verb. Inanimate plural subjects, in contrast, are restricted to singular verbs” (Bamyacı et al. 2014: 255). As shown in (14), plural marking on the verb is optional with animate subjects (14a) and ungrammatical with inanimate subjects (14b). Further data is therefore required to scrutinize the co-occurrence of classifier constructions as subjects and their correlation with plural marking on the verb.


Plural marking on the verb in Turkish


‘Locksmiths opened the doors.’


‘Keys opened the doors.’

(Sezer 1978: 26)

As a summary, we observe in the 400 languages of our dataset the tendency that sortal classifier languages do not co-occur with morphosyntactic plural marking in a language; if they do co-occur, languages such as Hungarian have shown that the two features still do not co-occur in the same clause. However, additional analysis is suggested for languages such as Persian, in which the optionality of sortal classifiers and morphosyntactic plural marking requires additional data to scrutinize their interaction in the language.

A number of languages beyond our dataset are attested in the literature to have both classifiers and plural markers. We suggest that each of these languages be scrutinized for three points: whether it has genuine sortal classifiers, and, if so, whether sortal classifiers and plural markers do co-occur on the noun; and, if so, whether the plural markers are morphosyntactic plural markers and engage in agreement outside the noun phrase. Such procedure is crucial as terminologies and definitions are rather diversified in the field and may easily lead to confusion. For instance, Mi’gmaq (Algic) is attested to have classifiers which can apparently co-occur with plural markers on the noun (Bale and Coon 2014: 700); however, further data and analysis are required due to divergence in the literature. By way of illustration, the classifier te’s in Mi’gmaq mentioned in Bale and Coon (2014) is considered in other studies as a universal quantifier, which “is roughly translated as ‘every’ and triggers singular agreement on both the noun it modifies and the verb” (Hamilton 2015: 119). In such a situation, the first source would lead to a counter-example of the GSSG whereas the second source would show the opposite. Likewise for Yagua (Peba-Yagua), which according to Nichols (1992) allows co-occurrence of classifiers and plural markers on the noun. Again, a closer look reveals that Yagua has no agreement between the noun phrase and the verb phrase and thus has no morphosyntactic plural marking (Payne 1985: 41–43).

Dékány (2011: 234) suggests a list of languages with both classifiers and nominal plural markers: Algonquian languages, several dialects of Arabic, [12] Armenian, Bulgarian, Ejagham, Halkomelem, Mayan (Akatek, Jacaltec, Yucatec, Itzaj), Nivkh, Nootka, North Arawak, Ojibwe, Ossetian, Persian, Russian, South Dravidian languages, Tajik, Tarascan, Tariana, Tlingit, Tucano, Yuki, and Vietnamese. Some of these languages (e.g., Persian, Russian, Tlingit, Tucano, Vietnamese, Bulgarian, among others) are covered in our dataset of 400 languages and are not categorized as co-occurrences of sortal classifiers and morphosyntactic plural markers due to the difference of definition. For instance, class markers in Tlingit are not considered as sortal classifiers due to their concordial nature, cf. Miraña in (5). Other languages such as Yuki, Ossetian, Nootka, Halkomelem, Ojibwe, [13] Ejagham, among others, are not included in our dataset and require further investigation. As an example in Ejagham (Niger-Congo), five lexical items are considered as classifiers. An instance is given in (15), in which the plural noun class 6 – marked on the classifier, the numeral, and the genitive marker (floating low tone) – differs from the singular noun class 19 marked on the noun.


Possible sortal classifiers in Ejagham [14]


‘two orange seeds’

(Watters 1981: 310)

However, even though the term ‘classifier’ is used in his description, Watters (1981) offers no specific argument for the five elements as sortal classifiers and in fact concedes that these elements are nominals: “Some of the classifiers are clearly nominals with a specific meaning, while others are shaped like a nominal with no specific, assignable meaning in isolation” (Watters 1981: 310). Moreover, the alleged classifiers in examples like (15) are marked for noun class, e.g., à-məgε (noun class 6, plural) and behave like nouns in the language, e.g., í-čɔkúd (noun class 19, singular). We may thus infer that “[i]t seems we have every reason […] to consider məgε a protonoun with the meaning ‘small round object’ morphologically realized as a free form” (Kihm 2005: 496), a point of view that is followed up by other syntactic studies in the literature (Her 2017: 269–271). Such an analysis once more demonstrates the divergence of definition in the literature and the importance of case by case analysis when approaching data from different languages.

The main contributions and limitations of our study are summarized as follows. First, we provided an additional theoretical definition of sortal classifiers and plural markers within the GSSG. Only morphosyntactic plural-markers that involve grammatical agreement outside of the noun phrase were included in the analysis. This criterion was then further differentiated between cross-referencing and number agreement. The revised GSSG thus states that (a) a language is unlikely to have both sortal classifiers and morphosyntactic plural markers, and (b) if a language does have both, then their use is in complementary distribution.

Second, we provided a geographically and phylogenetically weighted set of 400 languages to investigate empirically the distribution of sortal classifiers and morphosyntactic plural marking in languages of the world. Our data was coded in a binary manner, e.g., presence/absence of sortal classifiers in a language. It would have been more appropriate to weigh the investigated features according to their level of canonicity in language. For instance, the productivity of the classifier system could have been further defined based on its inventory size and obligatoriness. We also agree with an anonymous reviewer that “another option would be to design a multivariate typology (not related to canonicity) that would be able to capture all the relevant language-internal variation”.

Third, the statistical test of conditional inference tree was used to assess the interaction between sortal classifiers, morphosyntactic plural markers, geographical factors, and phylogenetic affiliation. Results from the statistical analysis have shown a strong areal effect as classifiers are concentrated in Asia. However, the tendency of complementary distribution between sortal classifiers and morphosyntactic plural markers remained statistically highly significant and validated the revised GSSG as a statistical universal, whereas phylogenetic factors did not play a significant role in the dataset. Several languages beyond our dataset where classifiers and plural markers are known to co-exist have also been tentatively investigated.

Finally, we note that the results in this study do not necessarily exclusively support the theory that sortal classifiers and morphosyntactic plural markers belong to the same syntactic category, as there may be other theories that can be interpreted in a way that also predicts their complementary distribution. One example is Chierchia’s (1998) theory that mass nouns are inherently plural and that classifier languages like Mandarin differ from languages like English in having only mass nouns. The former thus employs numeral classifiers for all its nouns, while the latter uses morphosyntactic plural markers exclusively for its count nouns. We contend that our universalist view is more favourable than Chierchia’s parametric view but will leave this debate and its implications to future studies.


We thank the two anonymous reviewers for their constructive comments. Special thanks also to Sune Gregersen and Olga Fischer for their valuable suggestions, which led to significant improvements of the paper. All remaining errors are our own. We also gratefully acknowledge the financial support by Taiwan’s Ministry of Science and Technology (MOST) via the following grants awarded to the second author: 104-2410-H-004-164-MY3 and 106-2410-H-004-106-MY3.



subject of transitive verb




general class marker


mensural classifier




Aikhenvald, Alexandra Y. 2000. Classifiers. Oxford: Oxford University Press.Search in Google Scholar

Allan, Keith. 1977. Classifiers. Language 53(2). 285–311.10.1353/lan.1977.0043Search in Google Scholar

Anderson, Gregory & David Harrison. 2008. Remo (Bonda). In Gregory Anderson (ed.), The Munda languages, 557–632. London: Routledge.Search in Google Scholar

Austin, Peter & Ewout Steyerberg. 2012. Interpreting the concordance statistic of a logistic regression model. BMC medical research methodology 12. 82.10.1186/1471-2288-12-82Search in Google Scholar

Bale, Alan & Jessica Coon. 2014. Classifiers are for numerals, not for nouns: Consequences for the mass/count distinction. Linguistic Inquiry 45(4). 695–707.10.1162/LING_a_00170Search in Google Scholar

Bale, Alan & Hrayr Khanjian. 2008. Classifiers and number marking. Semantics and Linguistic Theory (SALT) 18. 73–89.10.3765/salt.v18i0.2478Search in Google Scholar

Bamber, D. 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of mathematical psychology 12. 387–415.10.1016/0022-2496(75)90001-2Search in Google Scholar

Bamyacı, Elif, Jana Häussler & Barış Kabak. 2014. The interaction of animacy and number agreement: An experimental investigation. Lingua 148. 254–277.10.1016/j.lingua.2014.06.005Search in Google Scholar

Bisang, Walter. 2012. Numeral classifiers with plural marking. In Xu Dan (ed.), Plurality and classifiers across languages of China, 23–42. Berlin: De Gruyter Mouton.10.1515/9783110293982.23Search in Google Scholar

Blust, Robert A. 2009. The Austronesian languages. Canberra: Pacific Linguistics.Search in Google Scholar

Borer, Hagit. 2005. Structuring sense. vol. 1. Oxford: Oxford University Press.10.1093/acprof:oso/9780199263929.001.0001Search in Google Scholar

Borer, Hagit & Sarah Ouwayda. 2010. Men and their apples: Dividing plural and agreement plural. Paper presented at the GLOW Asia 8, Beijing Language and Culture University, 12–14 August.Search in Google Scholar

Breiman, Leo. 2001. Random forests. Machine Learning 45(1). 5–32.10.1023/A:1010933404324Search in Google Scholar

Breiman, Leo, Jerome Friedman, Charles J. Stone & Richard Olshen. 1984. Classification and regression trees. New York: Taylor & Francis.Search in Google Scholar

Chen, Xi & Hemant Ishwaran. 2012. Random forests for genomic data analysis. Genomics 99(6). 323–329.10.1016/j.ygeno.2012.04.003Search in Google Scholar

Cheng, Lisa L. S. & Rint Sybesma. 1998. Yi-wan tang, yi-ge tang: Classifiers and massifiers. Tsing Hua Journal of Chinese Studies 28(3). 385–412.Search in Google Scholar

Chierchia, Gennaro. 1998. Plurality of mass nouns and the notion of semantic parameter. In Susan Rothstein (ed.), Events and grammar, 53–104. Dordrecht: Kluwer.10.1007/978-94-011-3969-4_4Search in Google Scholar

Corbett, Greville G. 2003a. Agreement: Canonical instances and the extent of the phenomenon. In Geert E. Booij, Janet DeCesaris, Angela Ralli & Sergio Scalise (eds.), Topics in morphology, 109–128. Barcelona: Institut Universitari de Lingüistica Aplicada.Search in Google Scholar

Corbett, Greville G. 2003b. Agreement: Terms and boundaries. In Willam E. Griffin (ed.), The role of agreement in natural language, 109–122. Austin: Texas Linguistics Society for Texas Linguistic Forum.Search in Google Scholar

Corbett, Greville G. 2003c. Agreement: The range of the phenomenon and the principles of the Surrey Database of Agreement. Transactions of the Philological Society 101(2). 155–202.10.1111/1467-968X.00117Search in Google Scholar

Corbett, Greville G. & Sebastian Fedden. 2016. Canonical gender. Journal of linguistics 52(3). 495–531.10.1017/S0022226715000195Search in Google Scholar

Cowper, Elisabeth & Daniel C. Hall. 2012. Aspects of individuation. In Diane Massam (ed.), Count and mass across languages, 27–53. Oxford: Oxford University Press.10.1093/acprof:oso/9780199654277.003.0003Search in Google Scholar

Csirmaz, Aniko & Éva Dékány. 2010. Hungarian classifiers. Paper presented at the Conference on word classes: Nature, typology, computational representation, Roma Tre University, 24–26 March.Search in Google Scholar

Csirmaz, Aniko & Éva Dékány. 2014. Hungarian is a classifier language. In Raffaele Simone & Francesca Masini (eds.), Word classes: Nature, typology and representations, 141–160. Amsterdam: Benjamins.10.1075/cilt.332.08csiSearch in Google Scholar

Daniel, Michaek & Edith Moravcsik. 2013. The associative plural. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. (accessed 1 May 2018).Search in Google Scholar

Dékány, Éva. 2011. A profile of the Hungarian DP. Tromsø: University of Tromsø dissertation.Search in Google Scholar

Derbyshire, Desmond C. & Doris L. Payne. 1990. Noun classification systems of Amazonian languages. In Doris L. Payne (ed.), Amazonian linguistics: Studies in Lowland South American languages, 243–271. Austin: University of Texas Press.Search in Google Scholar

Dienst, Stefan. 2014. A grammar of Kulina. Berlin: De Gruyter Mouton.10.1515/9783110341911Search in Google Scholar

Dixon, Robert M. W. 1986. Noun class and noun classification. In Colette Craig (ed.), Noun classes and categorization, 105–112. Amsterdam: Benjamins.10.1075/tsl.7.09dixSearch in Google Scholar

Doetjes, Jenny. 2012. Count/mass distinctions across languages. In Claudia Maienborn, Klaus von Heusinger & Paul Portner (eds.), Semantics: An international handbook of natural language meaning, vol. 3. 2559–2580. Berlin: Mouton de Gruyter.Search in Google Scholar

Donohue, Mark. 1999. A grammar of Tukang Besi. Berlin: De Gruyter Mouton.10.1515/9783110805543Search in Google Scholar

Dryer, Matthew S. 1989. Large linguistic areas and language sampling. Studies in Language 13(2). 257–292.10.1075/sl.13.2.03drySearch in Google Scholar

Dryer, Matthew S. 2013. Coding of nominal plurality. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. (accessed 1 May 2018).Search in Google Scholar

Dum-Tragut, Jasmine. 2009. Armenian. Amsterdam: Benjamins.10.1075/loall.14Search in Google Scholar

Dunn, John A. 1979. A reference grammar for the Coast Tsimshian language. Ottawa: National Museums of Canada.10.2307/j.ctv173vpSearch in Google Scholar

Fedden, Sebastian & Greville G. Corbett. 2017. Gender and classifiers in concurrent systems: Refining the typology of nominal classification. Glossa 2(1). 1–47.10.5334/gjgl.177Search in Google Scholar

Fromkin, Victoria, Robert Rodman & Nina Hyams. 2011. An introduction to language. Boston: Wadsworth.Search in Google Scholar

Gerner, Matthias. 2006. Noun classifiers in Kam and Chinese Kam-Tai languages. Journal of Chinese Linguistics 34(2). 237–305.Search in Google Scholar

Ghomeshi, Jila. 2003. Plural marking, indefiniteness, and the noun phrase. Studia Linguistica 57(2). 47–74.10.1111/1467-9582.00099Search in Google Scholar

Ghomeshi, Jila & Diane Massam. 2012. The mass count distinction: Issues and perspectives. In Diane Massam (ed.), Count and mass across languages, 1–8. Oxford: Oxford University Press.10.1093/acprof:oso/9780199654277.003.0001Search in Google Scholar

Gil, David. 2013. Numeral classifiers. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. (accessed 1 May 2018).Search in Google Scholar

Gillon, Carrie. 2015. Innu-aimun plurality. Lingua 162. 128–148.10.1016/j.lingua.2015.05.006Search in Google Scholar

Göksel, Asli & Celia Kerslake. 2011. Turkish: An essential grammar. New York: Routledge.Search in Google Scholar

Greenberg, Joseph H. 1974. Studies in numerical systems I: Double numeral systems. Working Papers on Language Universals 14. 75–89.Search in Google Scholar

Greenberg, Joseph H. 1990. Numeral classifiers and substantival number. In Keith Denning & Suzanne Kemmer (eds.), On language: Selected writings of Joseph H. Greenberg, 166–193. Stanford: Stanford University Press.Search in Google Scholar

Grinevald, Colette. 2000. A morphosyntactic typology of classifiers. In Gunter Senft (ed.), Systems of nominal classification, 50–92. Cambridge: Cambridge University Press.Search in Google Scholar

Grinevald, Colette. 2015. Linguistics of classifiers. In James D. Wright (ed.), International encyclopedia of the social and behavioral sciences, 811–818. Oxford: Elsevier.10.1016/B978-0-08-097086-8.53003-7Search in Google Scholar

Hamedani, Ladan. 2011. The function of number in Persian. Ottawa: University of Ottawa dissertation.Search in Google Scholar

Hamilton, Michael D. 2015. The syntax of Mi’gmaq. Montreal: McGill University dissertation.Search in Google Scholar

Hanley, James & McNeil. Barbara. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143. 29–36.10.1148/radiology.143.1.7063747Search in Google Scholar

Hansen, Chad. 1983. Language and logic in ancient China. Ann Arbor: University of Michigan Press.Search in Google Scholar

Harrell, Frank. 2001. Regression modelling strategies. New York: Springer.10.1007/978-1-4757-3462-1Search in Google Scholar

Harrell, Frank. 2015. Regression modelling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Dordrecht: Springer.10.1007/978-3-319-19425-7Search in Google Scholar

Hashabeiky, Forogh. 2007. The usage of singular verbs for inanimate plural subjects in Persian. Orientalia Suecana 56. 77–101.Search in Google Scholar

Haspelmath, Martin. 2011. The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica 45(1). 31–80.10.1515/flin.2011.002Search in Google Scholar

Haspelmath, Martin. 2013. Occurrence of nominal plurality. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. (accessed 1 May 2018).Search in Google Scholar

Her, One-Soon. 2012a. Distinguishing classifiers and measure words: A mathematical perspective and implications. Lingua 122(14). 1668–1691.10.1016/j.lingua.2012.08.012Search in Google Scholar

Her, One-Soon. 2012b. Structure of classifiers and measure words: A lexical functional account. Language and Linguistics 13. 1211–1251.Search in Google Scholar

Her, One-Soon. 2017. Deriving classifier word order typology, or Greenberg’s Universal 20A and Universal 20. Linguistics 55(2). 265–303.10.1515/ling-2016-0044Search in Google Scholar

Her, One-Soon, Jing-Perng Chen & Hui-Chin Tsai. 2015. Justifying silent elements in syntax. International Journal of Chinese Linguistics 2(2). 193–226.10.1075/ijchl.2.2.02herSearch in Google Scholar

Her, One-Soon & Yun-Ru Chen. 2013. Unification of numeral classifiers and plural markers: Empirical facts and implications. Pacific Asia Conference on Language, Information, and Computation (PACLIC) 27. 37–46.Search in Google Scholar

Hicks, Christopher. 2016. A syntactic account for the parametric variation of the number feature. Cambridge Occasional Papers in Linguistics 9. 82–107.Search in Google Scholar

Holmer, Arthur. 1993. Atayal clitics and sentence structure. Working Papers in Linguistics 40. 71–94.Search in Google Scholar

Hosmer, David W. & Stanley Lemeshow. 2000. Applied logistic regression. New York: Wiley.10.1002/0471722146Search in Google Scholar

Hothorn, Torsten, Kurt Hornik & Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15(3). 651–574.10.1198/106186006X133933Search in Google Scholar

Ikoro, Suanu M. 1994. Numeral classifiers in Kana. Journal of African Languages and Linguistics 15. 7–28.10.1515/jall.1994.15.1.7Search in Google Scholar

Ikoro, Suanu M. 1996. The Kana language. Leiden: Research School CNWS.Search in Google Scholar

Jany, Carmen. 2009. Chimariko grammar. Berkeley: University of California Press.Search in Google Scholar

Jenks, Peter. 2017. Numeral classifiers compete with number marking: Evidence from Dafing. Paper presented at the Linguistic Society of America Annual Meeting 2017, Austin, 5–8 January.Search in Google Scholar

Keen, Sandra. 1983. Yukulta. In Robert M W Dixon & Barry Blake (eds.), Handbook of Australian languages, 191–306. Amsterdam: Benjamins.Search in Google Scholar

Kibort, Anna & Greville G. Corbett. 2008. Number. Grammatical Features. (accessed 1 May 2018).Search in Google Scholar

Kießling, Roland. 2013. On the origin of Niger-Congo nominal classification. In Ritsuko Kikusawa & Lawrence A. Reid (eds.), Historical linguistics 2011, 43–65. Amsterdam: Benjamins.10.1075/cilt.326.05kieSearch in Google Scholar

Kihm, Alain. 2005. Noun class, gender, and the lexicon-syntax-morphology interfaces. In Guglielmo Cinque & Richard S. Kayne (eds.), The Oxford handbook of comparative syntax, 459–512. Oxford: Oxford University Press.Search in Google Scholar

Kilarski, Marcin. 2013. Nominal classification: A history of its study from the classical period to the present. Amsterdam: Benjamins.10.1075/sihols.121Search in Google Scholar

Kim, Kyumin & Paul B. Melchin. 2018. On the complementary distribution of plurals and classifiers in East Asian classifier languages. Language and linguistics compass 12(4). 1–22.10.1111/lnc3.12271Search in Google Scholar

Klamer, Marian. 1998. A grammar of Kambera. Berlin: Mouton de Gruyter.10.1515/9783110805536Search in Google Scholar

Kornfilt, Jaklin. 1997. Turkish descriptive grammar. London: Routledge.Search in Google Scholar

Krifka, Manfred. 1995. Common nouns: A contrastive analysis of Chinese and English. In Gregory N. Carlson & Francis J. Pelletier (eds.), The generic book, 398–411. Chicago: University of Chicago Press.Search in Google Scholar

Levshina, Natalia. 2015. How to do linguistics with R. Amsterdam: Benjamins.10.1075/z.195Search in Google Scholar

Lewis, Paul, Gary F. Simons & Charles D. Fennig. 2009. Ethnologue. Dallas: SIL International.Search in Google Scholar

Li, Audrey Y. H. 2014. Structure of noun phrases: Left or right? Taiwan Journal of Linguistics 12(2). 1–32.Search in Google Scholar

Link, Godehard. 1998. Algebraic semantics in language and philosophy. Stanford: CSLI.Search in Google Scholar

Lo, Yi-Chieh. 2015. Plural marker -men and numeral classifiers: Convergence and divergence. Taipei: National Chengchi University MA thesis.Search in Google Scholar

Mathieu, Eric. 2012. On the mass-count distinction in Ojibwe. In Diane Massam (ed.), Count and mass across languages, 172–198. Oxford: Oxford University Press.10.1093/acprof:oso/9780199654277.003.0010Search in Google Scholar

Miehe, Gudrun & Kerstin Winkelmann. 2007. Noun class systems in Gur languages: Southwestern Gur languages. Cologne: Rüdiger Köppe.Search in Google Scholar

Muller, Ana, Luciana Storto & Thiago Coutinho-Silva. 2006. Number and the count mass distinction in Karitiana. Workshop on Structure and Constituency in Languages of the Americas (WSCLA) 11. 122–135.Search in Google Scholar

Nichols, Johanna. 1986. Head-marking and dependent-marking grammar. Language 62. 56–119.10.1353/lan.1986.0014Search in Google Scholar

Nichols, Johanna. 1992. Linguistic diversity in space and time. Chicago: University of Chicago Press.10.7208/chicago/9780226580593.001.0001Search in Google Scholar

Nomoto, Hiroki. 2013. Number in classifier languages. Minneapolis: University of Minnesota dissertation.Search in Google Scholar

Payne, Doris L. 1985. Aspects of the grammar of Yagua. Los Angeles: University of California dissertation.Search in Google Scholar

Perry, John R. 2007. Persian morphology. In Alan S. Kaye (ed.), Morphologies of Asia and Africa, 975–1019. Pennsylvania: Eisenbrauns.10.5325/j.ctv1bxh537.43Search in Google Scholar

Peterson, David A. 2002. On Khumi verbal pronominal morphology. Berkeley Linguistics Society (BLS) 28S. 99–110.10.3765/bls.v28i2.1037Search in Google Scholar

R Core Team. 2017. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.Search in Google Scholar

Rijkhoff, Jan. 2000. When can a language have adjectives? An implicational universal. In Petra M. Vogel & Bernard Comrie (eds.), Approaches to the typology of word classes, 217–257. Berlin: De Gruyter Mouton.10.1515/9783110806120.217Search in Google Scholar

Roberts, John.2003. Persian grammar sketch. TulQuest 1–85. (accessed 1 May 2018).Search in Google Scholar

Rounds, Carol. 2009. Hungarian: An essential grammar. 2nd edn. New York: Routledge.10.4324/9780203886199Search in Google Scholar

Sanches, Mary & Linda Slobin. 1973. Numeral classifiers and plural marking: An implicational universal. Working Papers in Language Universals 11. 1–22.Search in Google Scholar

Scontras, George. 2013. Accounting for counting: A unified semantics for measure terms and classifiers. Semantics and Linguistic Theory (SALT) 23. 549–569.10.3765/salt.v23i0.2656Search in Google Scholar

Sedighi, Anousha. 2007. Agreement restrictions in Persian. Amsterdam: Rozenberg Publishers and Purdue University Press.Search in Google Scholar

Seifart, Frank. 2005. The structure and use of shape-based noun classes in Miraña (North West Amazon). Nijmegen: Radboud University dissertation.Search in Google Scholar

Senft, Gunter. 2000. Systems of nominal classification. Cambridge: Cambridge University Press.Search in Google Scholar

Sezer, Ergin. 1978. Eylemlerin çoǧul öznelere uyumu [The agreement between verbs and plural subjects]. Genel Dilbilim Dergisi 1. 25–32.Search in Google Scholar

Siewierska, Anna. 2004. Person. Cambridge: Cambridge University Press.10.1017/CBO9780511812729Search in Google Scholar

Siewierska, Anna. 2013. Verbal person marking. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. (accessed 1 May 2018).Search in Google Scholar

T’sou, Benjamin K. 1976. The structure of nominal classifier systems. Oceanic Linguistics Special Publications 13. 1215–1247.Search in Google Scholar

Tagliamonte, Sali A. & Harald Baayen. 2012. Models, forests, and trees of York English: Was/were variation as a case study for statistical practice. Language variation and change 24. 135–178.10.1017/S0954394512000129Search in Google Scholar

Van Valin, Robert D. 1987. The role of government in the grammar of head-marking languages. International Journal of American Linguistics 53. 371–397.10.1086/466065Search in Google Scholar

Vogel, Petra M & Bernard Comrie. 2000. Approaches to the typology of word classes. Berlin: Mouton de Gruyter.10.1515/9783110806120Search in Google Scholar

Watters, John R. 1981. A phonology and morphology of Ejagham. Los Angeles: University of California dissertation.Search in Google Scholar

Wilhelm, Andrea. 2008. Bare nouns and number in Dëne Sųłiné. Natural Language Semantics 16. 39–68.10.1007/s11050-007-9024-9Search in Google Scholar

Wiltschko, Martina. 2008. The syntax of non-inflectional plural marking. Natural language & linguistic theory 26(3). 639–694.10.1007/s11049-008-9046-0Search in Google Scholar

Wunderlich, Dieter. 2001. Grammatical agreement. In Neil J. Smelser & Paul B. Baltes (eds.), International encyclopedia of social and behavioral sciences, 6330–6334. Amsterdam: Elsevier.10.1016/B0-08-043076-7/02966-1Search in Google Scholar

Yi, Byeong U. 2011. What is a numeral classifier? Philosophical Analysis 23. 195–258.Search in Google Scholar

Zhang, Niina N. 2012. Countability and numeral classifiers in Mandarin. In Diane Massam (ed.), Count and mass across languages, 220–237. Oxford: Oxford University Press.10.1093/acprof:oso/9780199654277.003.0012Search in Google Scholar

Zhang, Niina N. 2013. Classifier structures in Mandarin Chinese. Berlin: Mouton de Gruyter.10.1515/9783110304992Search in Google Scholar

Zwitserlood, Inge. 2012. Classifiers. In Roland Pfau, Markus Steinbach & Bencie Woll (eds.), Sign language: An international handbook, 158–186. Berlin: Mouton de Gruyter.10.1515/9783110261325.158Search in Google Scholar

Received: 2018-05-01
Revised: 2018-07-05
Revised: 2018-07-12
Accepted: 2018-10-31
Published Online: 2019-10-24
Published in Print: 2019-11-26

© 2019 Tang and Her, published by De Gruyter

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Scroll Up Arrow