How comparative concepts and descriptive linguistic categories are different

. This paper reasserts the fundamental conceptual distinction between language-particular categories of individual languages, defined within particular systems, and comparative concepts at the cross-linguistic level, defined in substantive terms. The paper argues that comparative concepts are also widely used in other sciences, and that they are always distinct from social categories, of which linguistic categories are special instances. Some linguists (especially in the generative tradition) assume that linguistic categories are natural kinds (like biological species, or chemical elements) and thus need not be defined, but can be recognized by their symptoms, which may be different in different languages. I also note that category-like comparative concepts are sometimes very similar to categories, and that different languages may sometimes be described in a unitary commensurable mode, thus blurring (but not questioning) the distinction. Finally, I note that crosslinguistic claims must be interpreted as being about the phenomena of languages, not about the incommensurable systems of languages.


Introduction
To make lasting progress in linguistics, we need cumulative research results and replicability of each other's claims. Cumulativity and replicability are not much emphasized by linguists, and one of the reasons why these seem difficult to achieve is that often we cannot even agree what we mean by our technical terms. Typically this is because we do not distinguish clearly enough between descriptive categories of individual languages and comparative concepts for cross-linguistic studies. We routinely use the same terms for both (e.g. ergative, or relative clause, or optative mood), but I have argued that we cannot equate the two kinds of concepts in the general case (Haspelmath 2010).
The first published critique of my 2010 proposal was van der Auwera & Sahoo (2015), but in the meantime, several further articles discussing this methodological distinction have appeared (especially the papers collected by Plank 2016 andLehmann 2016). I will use the opportunity of this paper to address a number of different points that have come up in the discussion of the issues over the last few years.
Overall, I have few disagreements with those linguists that work in a broadly Boasian and/or Greenbergian tradition. But it is clear that some of my claims SEEM controversial, so I hope that this paper will clarify a few issues. (I do have real disagreements with linguists who simply assume a close match between categories of particular languages and innate cross-linguistic categories; see §6-7 below.) In this paper, I provide further justification for the claim in (1), but in addition, I put special emphasis on the observation that the general category presumption is wrong for linguistics (see 2).
(2) (general category fallacy) We do not learn anything about particular languages merely by observing that category A in language 1 is similar to category B in language 2, or by putting both into the same general category C (cf. §6).
For example, by saying that the Spanish-specific construction [estar V-ndo] 'be V-ing' is an instance of the general category "progressive", we do not learn anything that goes beyond what we need to know for a description of this construction anyway. Thus, general categories do not by themselves advance our knowledge, although there are of course many ways in which information about some other language or knowledge of cross-linguistic patterns can help describers to identify all the properties of a languageparticular construction. 1 This is worth emphasizing, because there is a constant temptation to think that subsuming a language-particular descriptive category under a general category does add information. We experience the usefulness of the general category presumption every day: When a young woman introduces a young man as her boyfriend, I can make certain further inferences concerning their behaviour which are usually very helpful for further interaction; and when I'm told that a certain kind of infusion is real tea (made from Camellia sinensis), I have different expectations concerning its effects than if it is a herbal tea made of chamomile. It is important to understand why the general category presumption is a fallacy in comparative linguistics.
Briefly, the answer is that the cross-linguistic comparative concepts (like "progressive") are not natural kinds, or pre-established categories that exist independently of the comparison. Different languages represent historical accidents, and (unless they influenced each other via language contact or derive from a common ancestor) the categories of one language have no causal connection to the categories of another language. By contrast, the categories 'boyfriend' and Camellia sinensis do exist independently of particular circumstances, and if someone becomes a boyfriend or if a new tea plant grows, this is causally connected to the independently existing category.
I will elaborate on this point later on, but first I discuss a number of different kinds of comparative concepts ( §2). Subsequent sections will address a range of additional issues that have come up in the literature on comparative concepts and descriptive categories.

Kinds of comparative concepts
Comparative concepts can be divided into two main types: CATEGORY-LIKE comparative concepts and ETIC comparative concepts. With the latter type, there is no danger of confusing them with pre-established categories.
Category-like comparative concepts are the most difficult to deal with, but also the most familiar type of comparative concept. Some examples of category-like comparative concepts are given in Table 1, listed together with chapters from WALS that make use of them.  (2005) All these terms were originally used for the description of some particular language, and were extended to comparative use only later (they could therefore be called descriptive-derived terms). Some of them are phonetically based (e.g. lateral consonant) or semantically based (e.g. epistemic possibility). But most category-like comparative concepts which are familiar from typology are HYBRID comparative concepts (Croft 2016: 3), i.e. they include both semantic-functional aspects and formal aspects in their definition. For example, a future tense form is a verb form which includes a marker that indicates future time reference of the situation denoted by the verb. Crucially, the form must include a grammatical marker, i.e. a formally defined entitity, 2 and this marker must occur on a particular class of roots (namely verb roots). In Haspelmath (2009: §6) and Haspelmath (2010: §5), I listed and defined a dozen category-like comparative concepts, which were all are all of this hybrid type. In these earlier papers, I focused on this subtype of comparative concepts, because these are the concepts that are often confused with descriptive categories.
Another type of category-like comparative concept is known by terms that are not derived from grammars of particular languages. For the typology of argument coding, the role-types S, A, P, T and R, along with the notion of alignment, have proven very useful (Haspelmath 2011a), and for the typology of subordination, Cristofaro (2003) makes extensive use of the notions of balanced subordination and deranked subordination. These concepts have been important in typology, but they are not normally used in descriptions and are therefore not easily confused with descriptive categories. Similarly, the general concepts of locus (head-marking and dependentmarking, Nichols 1992) and branching direction (Dryer 1992) have been important in typology, but need not play any role in particular languages. The notions of adpossessive construction (Haspelmath 2017) and existential construction (Creissels 2013) have also proven very useful, though many grammatical descriptions make no use of these notions. They are still category-like, but less so than the descriptive-derived terms in Table 1. What is typical of these concepts is that they are defined more narrowly than the 2 A grammatical marker can be defined as a simple bound form (i.e. a form that cannot occur in isolation), but occurs in close association to a major-class root (or in second position of the clause), and that expresses an abstract meaning which may correspond to nothing in a translation to another language. corresponding language-particular categories. For example, an adpossessive (= adnominal possessive) construction is defined as a construction that expresses kinship relations, part-whole relations, and/or ownership relations (cf. Koptjevskaja-Tamm 2003), but in individual languages, such constructions normally express other relations as well (e.g. my chair 'the chair I am sitting on', or your school 'the school that you are attending'). 3 In addition to category-like comparative concepts, typologists also work with etic comparative concepts, which are kinds of pronunciations in phonetic typology, and meanings or functions in grammatical typology, often of a type that would not be expected to be the meaning or function of a single form. In semantic-map studies, for example (e.g. Haspelmath 2003;van der Auwera & Temürcü 2006), the nodes on the map are meanings or functions (or uses) that are employed by the typologist to express generalizations across languages, as illustrated by Figure 1. Even though semantic-map studies do not always make this fully clear, the meanings or functions (or uses) are not intended to correspond to any categories of languages. Categories of languages can be mapped onto semantic maps, but there is no claim that the categories must be polysemous and that the meanings or uses on the map are somehow significant outside of the comparison.
When the semantic-functional nodes on semantic maps are not abstract concepts as in Figure 1, but reflect concrete utterances, it is immediately clear that they are not linguistic categories, but merely components of a comparative methodology. Examples of such token-based comparative concepts are visual stimuli, as employed in much recent research on semantic typology (e.g. Majid et al. 2007 on cutting and breaking events, Evans et al. 2011 on reciprocals), as well as translation contexts, as employed by questionnaire-based studies (e.g. Dahl 1985;van der Auwera 1998) and in parallel-text typology (e.g. Wälchli & Cysouw 2012;Dahl 2014). Comparative concepts of the type considered in this paragraph are also called "etic grids" (Meira & Levinson et al. 2003: 487), using a term originating in anthropology. 4 The functions or uses of classical 3 Thus, I disagree with Lander & Arkadiev's (2016: 404) statement that "if comparative concepts are not felt to be relevant for the grammars of different languages, they are usually not viable". On the contrary, many comparative concepts (e.g. all the etic ones) are not usable for language description, and conversely, some of the well-known category-like concepts that are not viable as comparative concepts (see (8) in §8 below) work well in individual languages. 4 The terms "etic" and "emic" from American anthropology (going back to Kenneth Pike) broadly correspond to the Hjelmslevian (European structuralist) terms "substance-based" and "structure-based" (cf. Boye & Harder 2013). semantic maps of the type in Figure 1 have not been called "etic", but I would argue that their status is not any different. As Croft (2016: 3) notes, the newer token-based methods "provide a denser distribution of comparative concepts in particular regions of conceptual space", and the existing cross-linguistic studies have shown that "linguistic categorization is even more variable than we believed".
What all comparative concepts share is that they are defined in substantive terms, i.e. making reference to aspects of form or meaning that are independent of the structures of particular languages. This allows them to be applied to all languages in the same way, using the same criteria for all languages. This point will become important in §7 below.
Different kinds of comparative concepts relate to language-particular phenomena in somewhat different ways. Token-based comparative concepts must be matched by tokens of language use, and category-like comparative concepts (like those in Table 1) are generally matched by categories of language systems. Category-like comparative concepts are particularly easy to confuse with descriptive categories because we talk about "a language having X" in both cases. As a language-particular statement, for instance, we say that "German has a Future tense construction, formed with the auxiliary werden", and likewise we say from a typological perspective that "German has a periphrastic future tense construction". These two ways of expression sound almost identical, but they are actually quite different -from a comparative perspective, German could have a periphrastic future tense construction that is at the same time an epistemic mood construction, but German's Future tense construction cannot be anything else at the same time -it is just a single language-particular construction, identified by language-particular criteria.

Natural kinds, social categories and observer-made concepts
Describing a new language is somewhat like discovering a new island that has not been visited by an explorer before. The language contains a large number of previously unseen elements of language structure: More concrete ones such as sounds and words, and more abstract ones such as classes of sounds, meanings, and sound-meaning combinations at multiple levels of organization. These can be compared to landscape features of the newly discovered island, and to the plant and animal species inhabiting the island. The explorer will try to bring home pictures of the island's mountains and streams, as well as behavioral descriptions and specimens of the plants and animals, and in modern times, she will also make videos that tell others about the new discoveries. Likewise, the descriptive linguist will make sound recordings of the language, and bring home a dictionary and a grammar containing many new "linguistic species".
When multiple islands are compared by comparative geographers and biogeographers, they must find a way of relating all the unique parts and life forms of the islands to each other. Now crucially, this is done differently for plants, animals and minerals than for mountains and streams.
Plant and, animal species elements and kinds of minerals are NATURAL KINDS, i.e. they are categories which "have properties that seem to be independent of our minds" (Dahl 2016: 428). For example, the red fox (Vulpes vulpes) is a category of animals that form a group regardless of any observers. To talk about them, we need detailed descriptions and agreement on a label, but not a definition. If we know enough about red foxes, we can easily recognize them in California or China after having first described the species in Europe (or vice versa). The same is true for trees such as the sycamore (Acer pseudoplatanus), found in Spain, Belgium or Romania, and for elements and minerals such as gold or quartz. 5 (Philosophers seem to regard chemical elements as the best exemplars of natural kinds, but for present purposes, biological species can also be included.) Mountains and streams, by contrast, are not categories of nature. They are CONCEPTS CREATED BY OBSERVERS, and we must learn what they mean from other people. If they are to be applied in science, they must be defined rigorously, and delimited from similar phenomena (e.g. mountains vs. hills, streams vs. rivers). They are comparative concepts of physical geography. Such delimitations are often somewhat arbitrary, so terminological uniformity among scholars may require decisions by nomenclature bodies (a well-known example is the International Astronomical Union's 2006 decision to define the comparative concept of a planet in such a way that Pluto is no longer considered a planet).
When exploring a new island, researchers may find completely new plants and animals (endemic to the island), but they will not find completely new landscape forms to which existing terms (like "mountain" or "stream") are inapplicable. Geographers may feel unhappy with conventional terminology and may propose new ways of cutting up the continuum found in nature (just as astronomers changed their minds about planets). But such changes in observer-made concepts will not be triggered by any single discovery, the way a single new animal species requires a new name.
But what about human cultures? Suppose the explorers encounter a new human population, with different kinship patterns, poetic forms and house-building styles than they are familiar with. How will these be categorized? On the one hand, comparative culture scientists work with observer-made concepts. For example, when Botero et al. (2014: 16784) find that "beliefs in moralizing high gods are more likely in politically complex societies that recognize rights to movable property", they use the observermade concepts "moralizing high god" and "politically complex society", which have a status very much like that of "mountain" or "planet". These are thus comparative concepts, not natural kinds.
On the other hand, human cultures and societies also have specific categories that are neither natural kinds (in the sense that they recur across continents, independently of individual cultures) nor observer-made concepts, but that are recognized by every member of the society. For example, Western societies have the categories "boyfriend" (a quasi-kinship concept), "poetry slam" (a poetic form), and "office tower" (a housebuilding style). These are not universal and did not exist in Western societies as recently as 150 years ago, but nowadays they are well-recognized parts of Western culture. I call such categories SOCIAL CATEGORIES. What they share with natural kinds is that they are pre-established, and there is a causal connection between their members and the category. It is not only observers of the Hong Kong skyline that put the buildings in the 5 Another sort of natural kind is represented by diseases such as tuberculosis which can occur in different places at different times, and which can be cured in the same way, regardless of cultural conventions (cf. Haspelmath 2015 on the analogy between linguistic categories and diseases). Such diseases are generally caused by a single pathogen. (Of course, there are also disease names that comprise rather heterogeneous conditions, and these are then better seen as comparative concepts, e.g. the "common cold".) category 'office tower' -these buildings were created with precisely this category in mind. Similarly, when a man becomes a woman's boyfriend, he knows in advance what social behavior this category implies.
Moving to language, many readers will readily agree that comparative concepts used in language typology are observer-made in the same sense as "mountain" or "politically complex society". But what about the descriptive categories that authors of grammars of individual languages set up for their descriptions? Aren't they more like the unique plant and animal species that explorers used to find on newly discovered islands? And what about individual words or morphemes, such as the word bahi 'book' in Odia (an Indic language of India)? Here I will argue that language-particular categories are social categories, not natural kinds or observer-made concepts (see §6). But before we get there, I will discuss the main challenges of language description and comparison ( §4), and why there is no type-token relation between comparative concepts and descriptive categories ( §5).

The challenges of description and comparison
Linguists often talk about "theoretical approaches" and "linguistic analysis", but I do not find these notions sufficiently clear. It seems to me that all non-applied linguistics is theoretical, and that analysis is the same as description ( §4.1). Deeper questions often require comparison of languages ( §4.2).

Description
Science begins with charting the territory and cataloguing the phenomena, as a prerequisite for comparing the data to answer deeper questions. A basic difference between the two is that charting should be exhaustive, while asking and answering deeper questions is an endless enterprise.
In practice, it may be difficult to describe a language fully, but this is a task that can in principle be completed. We do have very comprehensive dictionaries of quite a few languages, and the complexity of grammars is not limitless either. Thus, one goal of linguistics is to describe all languages in such a way that every regularity is captured, or in other words, to chart the territory exhaustively This is quite different from comparison of languages, which is necessarily partial, as further discussed in §4.2.
In addition to listing the words of a language, our descriptions need to make reference to categories (with names such as syllable, construction, inflection class, noun phrase, clause) because language use is productive, and speakers can create and understand completely novel complex expressions. These categories must strike a balance between elegance and comprehensibility. The more abstract the description, the less easy it will be to understand it, because it will presuppose understanding many abstract intermediate concepts. 6 Thus, there is no such thing as the best description, 7 but description can be more or less comprehensive, and ideally, it would be exhaustive. Van der Auwera & Sahoo (2015: 2) are right when they observe that not only comparative concepts, but also descriptive categories are "made by linguists", but the difference is that linguistic categories must exist for productive language use to be possible, independently of linguists. Different speakers may use different categories, just as different linguists may prefer different categories, but categories of some kind must exist. (In contrast, comparative concepts do not exist in the absence of comparative linguists.) It is also sometimes said that descriptions should be "typologically informed" (e.g. Himmelmann 2016), but it is unclear what exactly this means, beyond the imperative to avoid idiosyncratic terminology. 8 What is clear, however, is that one cannot describe a language well by filling in a questionnaire or checklist. The grammars based on the Comrie & Smith (1977) questionnaire are often hard to understand because they do not give the authors the opportunity to introduce the basic categories that are crucial for understanding the grammatical patterns of the language. It is true that the checklist structure ensures comprehensiveness and comparability, but it does not ensure or even allow good descriptions.

Comparison
Unlike description of languages, comparison is not a goal in itself. It always serves some other goal, such as learning about human language in general, or answering question about the historical origin and development of languages.
Comparison must be based on comparable phenomena, i.e. phenomena that are identified by the same criteria in all languages (sometimes called tertia comparationis). It is not sufficient if the phenomena happen to have the same label in different languages. This is the same in other disciplines such as geography. We can compare streets, bridges and subway lines across cities on the basis of their universally applicable formal and functional properties, and probably also main streets and side streets, as well as one-way streets and city highways. But it makes no sense to compare streets called "Willy-Brandt-Straße" across German cities (unless one's focus is on the history of street naming, of course). Thus, we can compare gender systems or causatives across languages only if we have a universally applicable definition of the comparative concepts of gender and causatives.
One of the most interesting results of comparison is implicational universals of the type pioneered by Greenberg (1963). In order to formulate testable universals which can be replicated and can serve as the basis for a cumulative research agenda, it is particularly important that the comparative concepts have clear boundaries. Canonical definitions are useful in that they allow us to see how various phenomena relate to each other conceptually (cf. Brown et al. 2013), but they do not allow us to test universal (or other quantitative) claims, because they do not have clear boundaries. 9 Unlike description, comparison cannot and need not be exhaustive. There are many things that can usefully be compared across languages, but each language also has highly idiosyncratic features that cannot be readily compared. Examples from grammar are stranded prepositions in English, strong and weak adjectives in German, liaison in French, and A-not-A questions in Chinese. Linguists tend to study more general phenomena, and they rarely wonder about idiosyncrasies of lexical items and idiomatic multi-word expressions, of which every language has many thousands. All these can (and ultimately must) be described, but they can hardly be compared across languages. This is not a problem, because there may not be anything special to learn about such historically accidental phenomena anyway, beyond their exhaustive description.

Why there is no type-token relation between comparative concepts and descriptive categories
According to Lehmann (2016) and Moravcsik (2016), comparative concepts can simply be seen as types of which descriptive categories are tokens: "comparative concepts are taxonomically superordinate to descriptive categories" (Moravcsik 2016: 422).
In some simple cases, this may seem to be the case. Thus, Moravcsik would say that English personal pronouns and Hungarian personal pronouns are tokens of the general category "personal pronoun", and Lehmann says that the Ancient Greek dual is a hyponym of the general ("interlingual") category "dual" (2016: §2.3). And in these particular cases, no big problems would arise.
However, more generally, this is not the case, because descriptive categories are defined in a very different way from comparative concepts: Language-specific categories are classes of words, morphemes, or larger grammatical units that are defined distributionally, that is, by their occurrence in roles in constructions of the language. (Croft 2016: 7) 10 Comparative concepts, by contrast, are defined in a way that is independent of distributions within particular systems. This is a crucial point that is often overlooked.
For example, Moravcsik (2016: 420) says that one could ask whether the categories of the Latin case system (Nominative, Accusative, etc.) hold for Warlpiri, and that it is an empirical question whether the two are commensurable or not. And van der Auwera & Sahoo (2015: 3) say that three categories A, B, C from three different languages could simply be compared by checking whether they share the features a, b, c, d, etc. But this approach cannot work, because categories are defined within particular systems, which are different across languages. It makes no sense to ask whether Warlpiri has a Latin Accusative because the Latin Accusative is defined with respect to constructions of Latin. And when van der Auwera & Sahoo compare demonstratives of a special type in English, Dutch and Odia (such, zulk, and emiti/semiti), they do not do so with respect to the defining features of these items, but with respect to other comparative concepts which actually play no role in defining these items. 11 That comparative concepts are different kind of entities than descriptive categories is clearest in the case of etic comparative concepts, especially token-based concepts like visual stimuli and translation contexts. But category-like comparative concepts are not different in principle. The category-like comparative concept "dative" (Haspelmath 2009: §6.1) is defined in the familiar substantive way based on universally applicable semantic and formal features, 12 but the meaning of the English preposition to is defined with respect to the structural network of constructional meanings in English. Many authors attribute a general "goal" meaning to it, and claim that a sentence such as Mary gave the money to John uses the Caused-motion construction and thus has a slightly different meaning than Mary gave John the money, which uses the Ditransitive construction (e.g. Goldberg 1992). From a comparative perspective, one can thus say that English to matches the "dative" concept, but one cannot say that it is a token of a general (crosslinguistic) dative category, or that it "instantiates" the general category. 13 That the difference is important can best be seen by controversial cases, such as the notion of subject, which has been widely discussed (also in Dryer's seminal 1997 article). From a comparative perspective, it seems best to use the term "subject" as the conjunction of the S argument (the single argument of a verb like 'fall') and the A argument (the agent argument of a verb like 'kill ', cf. Dixon 1994: 124), because in this way, we can ensure the biggest overlap with the existing literature. However, in particular languages, definitions of syntactic roles are necessarily rather different. They do not make any reference to S, A and P, but rather to constructions such as casemarking, person indexing and passivization. In Latin and German, for example, one could say that a Subject is a nominal argument that is in the Nominative case and controls Verb Agreement. Subjects can have various kinds of semantic roles (going far beyond physical-action verbs like 'kill', which are the basis of the definition of A and P, as well as transitive clauses, Haspelmath 2011a), but these do not define the category. The category is defined by case and agreement.
The situation in English is different, because case is impoverished and various syntactic patterns are quite salient. For example, Subject-to-Object Raising not only allows patterns such as (3), but also patterns like (4), where the existential particle there is raised. 11 In fact, there is no need to define English such, other than by its pronunciation, as van der Auwera & Sahoo note themselves (2016: §3.7). 12 A dative marker is a marker on a nominal that codes the recipient role if this is coded differently from the theme role (Haspelmath 2009: §6.1). 13 Dahl (2016: 429) objects to my earlier arguments against a type-token relation, observing correctly that the mere fact that a category in a language has more properties than the comparative concept does not mean that there can be no type-token relationship (similarly Lehmann 2016: §2.3). In Haspelmath (2010), I did not sufficiently emphasize that categories are defined distributionally within a given language, while comparative concepts are defined not distributionally but by their substantive properties.
(3) a. The dog is in the house.
b. I believe the dog to be in the house.
(4) a. There are two unicorns in the garden. b. I believe there to be two unicorns in the garden.
This is commonly taken to be a criterion for Subjecthood in English, for good reasons. If we do not use the label Subject for the dog and there in (3)-(4), we need to find some other label, and none comes readily to mind. But this also means that agreement is no longer relevant to the definition of Subject in English, because the verb are in (4a) does not agree with there. In Icelandic, which has much richer case marking, not even case is thought to be relevant for the definition of Subject. This well-known example nicely illustrates that in different languages, different criteria are used to identify categories that are rather similar semantically (because of course the Latin, English and Icelandic "Subject" categories are semantically similar, and differ only in atypical cases). But since the categories are not defined by their meanings, their nature is different, and they are incommensurable.
In such cases of incommensurable definitions, it is nonsensical to use the term "subject" as a general term, and to ask, for example, whether the Subject is the controller of reflexivization in both Latin and Icelandic. There is no "Subject" concept that would work as a descriptive category in diverse languages.
Thus, I maintain the view that comparative concepts and descriptive categories are not the same kinds of things. But even more important is the point is that we do not learn anything about a language 1 by observing that its category A is similar to category B in language 2, or by putting both into the same general category C: The general category presumption does not work in cross-linguistic studies. This is discussed next.

Linguistic categories are not natural kinds but social categories
When I realize that the Spanish noun nariz 'nose' belongs to the Feminine gender, this gives me additional knowledge about this noun: I can predict that it will occur with the indefinite article form una (not un). And when you are told that the Russian verb kupit' 'buy' is in the Perfective aspect, you can predict that its Non-Past form will have future time reference (ja kuplju 'I will buy'). Thus, language-particular categories help predict the behavior of linguistic forms. In this regard, they are like natural kinds or (other) social categories. As we saw in §1 and §3, when told that something can be subsumed under a natural kind or a social category, we learn more: when told that a drink is made of Camellia sinensis, we can predict its health effects, and when told that a man is a woman's boyfriend, we can predict their behavior. Similarly, once we realize that an animal is a red fox (Vulpes vulpes), we can predict much about it, and if an investor is told that a developer wants to build an office tower, they have clear expectations. Both natural kinds (like tea, red fox, sycamore) and social categories (boyfriend, office tower, epic poem) are categories that exist in advance, independently of the categorization. Realizing that something is subsumed under a natural kind or social category is a finding that gives us additional information, and we can establish a causal link between the phenomena and the categories.
In this respect, natural kinds and social categories are crucially different from comparative concepts such as "mountain", "planet", or "moralizing high god". If a geographer calls a landscape form on a newly discovered island a "mountain", this does not add any information, and it does not establish a causal link. And the classification by a category-like concept such as "mountain" may be regarded as too crude by other observers, to be replaced by more fine-grained comparative concepts such as precise contour lines on topographic maps (just as rough classifications into alignment patterns based on S, A and P can be replaced by more fine-grained comparative concepts based on micro-roles, e.g. Hartmann et al. 2014). Similarly, comparative concepts in economy such as "developing country" and "industrialized country" are very crude and are usually replaced by more fine-grained measurements.
But are categories of particular languages natural kinds or social categories? This depends on whether one sees language systems as biological entities or as conventional systems.
In generative grammar, it is common practice to emphasize the biological foundations of language, and it is often assumed that highly specific aspects of language are part of its biology, including not only architectural properties of the system, but also substantive features ("substantive universals"). 14 In this approach, linguistic categories are thus regarded as natural kinds, which means that the same categories are used in different languages, just as different languages use the same architectural design for their rules. In other words, categories are thought to be cross-linguistic categories (or universally available categories, Newmeyer 2007). This means that there is no need to define linguistic categories, just as there is no need to define natural kinds such as red fox, or gold, or tuberculosis (Zwicky 1985: 284-286). Natural kinds can be recognized by various symptoms, which need not be necessary and jointly sufficient, unlike definitional criteria (cf. Haspelmath 2015).
I regard the generative vision as perfectly coherent, 15 but it has not been confirmed by research on grammatical patterns over the last century. We have not come up with a fixed list of categories (analogous to the periodic table of elements in chemstry, cf. Baker 2001) that we encounter again and again with exactly the same properties.
In practice, when we describe a new language and find a phenomenon that is similar to a previously encountered phenomenon from some other language, this is far from the end of our study: We still need to look at the whole range of its properties. For example, when we discover a construction that has some properties of a passive construction, we cannot simply say that it belongs to the natural kind "passive" and leave it at that. We need to investigate it in detail, until we have found all its properties in all contexts (see, for example, Noonan (1994) on two different passives in Irish, and Broadwell & Duncan (2002) on two passives in Kaqchikel). In the end, it does not matter what we call the newly found category -we should probably call it "Passive" for pedagogical reasons, but by attaching that label to the category, we have not learned 14 "Substantive universals ... concern the vocabulary for the description of language; formal universals involve rather the character of the rules that appear in grammars and the ways in which they can be interconnected" (Chomsky 1965: 29). 15 Dryer (2016: 314) sees it in the same way: "the position that there are crosslinguistic categories is, under such a view [i.e. of innate linguistic knowledge], at least coherent ... this is the only coherent way in which there might be cross-linguistic categories". anything that is not part of our primary description. Thus, I do not see any reason to hope that we will ever find a fixed list of possible categories, and it remains a remote possibility at best 16 Languages have a strong biological basis, but they vary widely across communities, i.e. they are systems of social conventions, like social hierarchies, religions, laws, currencies, and kinship systems. All of these consist of social categories. In general, social categories are definable only within particular systems. Thus, the religious category 'angel' can be defined only within a monotheistic religion of the Judeo-Christian-Islamic type; the kinship-like category 'boyfriend' can be defined only within a modern Western society; the currency Euro depends on its validity on the existence of European Union institutions; and so on. All social categories need to be described fully within their frame of reference, and we do not learn anything new by linking them to a comparative concept. For example, if a religious scholar encounters an angel-like being in a newly studied faith, they cannot simply assume that it has all the properties of angels in Christianity or Islam; and if a Western comparative legal scholar encounters a divorce law in a non-Western society, they cannot simply assume that it has all the properties of Western divorce laws (which are of course somewhat variable themselves).
The three kinds of scientific concepts that I have discussed here and how they relate to concepts in other disciplines are summarized in Table 1.  (Haspelmath 2007) is, e.g. why we would want to know whether Chamorro words with meanings like 'big' are "Class II words" (words with weak pronoun subjects, Topping 1973) or whether they are "adjectives" (Chung 2012). Both descriptions are possible, though the first one would seem to be more straightforward (as it makes reference to a highly salient feature, wheras the second description builds on two fairly marginal phenomena). So why would one insist that a description in terms of "adjectives" is possible and desirable (as Chung does)? The only reason, it seems, is that it would confirm the hypothesis that all languages have nouns, verbs and adjectives as innate categories, i.e. that these are natural kinds. But this hypothesis seems to be based primarily on English, and the alternative hypothesis that all languages are like Chamorro in having Class I and Class II words would also be confirmed by many (and maybe all) languages (Haspelmath 2012). 17 And if Chung's (2012) deeper study of Chamorro had indeed made a discovery of broader significance, we would expect that other properties of the relevant Chamorro words would come to light due to their identification as adjectives. But this is not the case: The properties of Chamorro adjectives are specific properties of Chamorro, not general properties of adjectives in all languages. Calling them adjectives does not teach us anything further about Chamorro (or about human language), and thinking that it does means to succumb to the general category fallacy (see (3) above).

Different criteria for different languages
Unfortunately, the general category fallacy is still widespread in linguistics. When there is a prominent grammatical term, linguists often assume that it stands for a general category that exists independently of the term and of particular languages. Since languages differ in the criteria that can be used, linguists resort to different criteria for different languages. It is often implicitly assumed that this is an acceptable strategy, and sometimes it is also stated explicitly: (6) a. adjective Dixon (2004: 9): "All languages have a distinguishable adjective class...
[which] differs from noun and verb classes in varying ways in different languages, which can make it a more difficult class to recognize." b. word Spencer (2006: 129): "There may be clear criteria for wordhood in individual languages, but we have no clear-cut set of criteria that can be applied to the totality of the world's languages…" c. monoclausal pattern Butt (2010: 57): "Whether a given structure is monoclausal or not can only be determined on the basis of language-dependent tests. That is to say, tests for monoclausality may vary across languages, depending on the internal structure and organisation of the language in question." d. NP vs. PP Baker (2015: 13) "[To distinguish NPs and PPs, we should] hope that one can find some fine-grained syntactic properties which distinguish the two kinds...: a process of clefting, perhaps, or quantifier floating -the sorts of syntactic phenomena known to apply to NPs but not to PPs in some languages" However, using different criteria (or "tests", or "properties", or "diagnostics") for different languages makes sense only if we have good reason to think that the phenomenon exists as a universal category (or natural kind) in the first place. In generative linguistics, the presupposition that part of our grammatical knowledge is innate makes it at least a coherent enterprise to look for such universal categories, but if there are no good initial reasons to think that categories like "word" or "PP" are universal (other than that they have been used in the grammatical tradition of the last few decades and centuries), it is not a promising enterprise. Croft (2009; has called this approach "methodological opportunism"; another term that I have used informally is diagnostic-fishing. It seems to me that diagnostic-fishing is one of the biggest obstacles to rigorous cross-linguistic comparison, and to the sort of replicable and cumulative science of language structures that I mentioned at the beginning of this paper. It is for this reason that I regard the distinction between language-specific descriptive categories and rigorously defined comparative concepts as fundamental for the progress of typological linguistics.

Portable terms for category-like comparative concepts
Some category-like comparative concepts seem very similar to corresponding descriptive categories. For example, the Italian Future tense and the Swahili Future tense are similar to each other (in the sense that their language-particular descriptions would involve very similar basic notions) and one could say not only that they correspond to the comparative concept "future tense" of Dahl & Velupillai (2005), but even that "the Italian Future tense is a future tense", i.e. that there is a type-token relationship here, or an instantiation relationship. And for languages which have two such categories, like English, one could say that "both the will Future and the gonna Future instantiate the future tense". Thus, for these concepts, it is possible to see the comparative concepts as categories or classes. The comparative concept "future tense" would then be the class (or category) of all tense forms in different languages that fulfill the definition.
Terms for comparative concepts of this kind are called "portable" by Beck (2016), and there are quite a few of them, e.g. those in (7).
personal pronoun, second person, demonstrative, polar question, accusative, instrumental, comitative, future tense, past tense, dual, plural, cardinal numeral, conditional clause, bilabial, velar, fricative, nasal stop I do not agree with Beck (2006: 395) that these are language-particular terms which "are comparative concepts", 18 but clearly, these terms are widely used for category-like comparative concepts which do not differ greatly in their definition from the corresponding descriptive categories. In many or most circumstances, it does not matter much for these concepts whether they are defined substantively like comparative concepts, or distributionally like language-particular categories. It seems that those linguists who deny or ignore the importance of the distinction between comparative concepts and descriptive categories mostly have this subset of comparative concepts in mind. However, even here it is often between descriptive categories and comparative concepts when one considers the phenomena in greater detail. For example, the German polite pronoun Sie 'you' is semantically a second person pronoun, but within the grammar of German, it is a Third Person form that triggers Third Person indexing on the verb (e.g. Sie komm-en [you.POLITE come-3PL] 'you are coming'). The English polite question Would you please open the door? is a Polar Question within in the grammar of English (as can be seen from its word order and intonation pattern), but functionally, as a speech act, it is not a question but a request. The Finnish Present Tense is normally used in future contexts where English requires a special future tense form (Dahl & Velupillai 2005), but it would still be strange to say that "the Finnish Present Tense instantiates the future tense". 19 How does one distinguish between portable and non-portable category labels? I do not know any simple answer to this question. Most grammatical category terms from the Greco-Latin tradition have been used for other languages, but not all of them have given rise to general concepts that can be defined in the same way (using substantive concepts) for all languages. Some concepts that do not seem to work for all languages are listed in (8).
(8) a. aorist, supine, gerund, middle voice, ablative absolute b. word, clitic, adposition, compound, incorporation, morphology c. inflection, derivation d. finite, converb The terms in (8a) belong to the more exotic aspects of the classical languages, and only middle voice has been used in a typological context, as far as I am aware (but while Kemmer (1993) cites many similarities in different languages, she does not provide a definition of middle voice with clear boundaries). The unsolved problems with word and clitic as comparative concepts are discussed in Haspelmath (2011;2015), and they carry over to other concepts defined in terms of 'word', such as adposition, compound, and morphology. Sharp boundaries between inflection and derivation are often assumed (e.g. when gender is defined in terms of a lexeme concept, which is itself defined in terms of the inflection concept), but they do not seem to be definable in a cross-linguistically applicable way (cf. Plank 1994). Finally, finiteness is not a useful concept cross- 19 Lehmann (2016: §2.1) says that grammatical category concepts can be multiple hyponyms of other grammatical category concepts, but it seems that this is possible only when these are on different levels (as with his example of adverbial clauses, which instantiate both "subordinate clause" and "adverbial modifier"). It hardly seems felicitous to say that the Finnish Present tense is both a present tense and a future tense, or that the Turkish Dative case is both a dative case and an allative case. For this reason, I have used the verbs "correspond to" and "match" for the relation between descriptive categories and comparative concepts rather than "be" or "instantiate".
linguistically, because it combines both person marking and tense marking, which need not be absent or present together (cf. Cristofaro 2007). 20 Moravcsik (2016: 421) asks whether descriptive categories are different for all languages, even closely related languages such as French and Italian. And what about dialects, or historical stages of a language? "Are relative clauses of Standard Modern English categorically different from those of the African-American Vernacular and also from those of Middle English?" (Moravcsik 2016: 421). And Dahl (2016: 430) asks a similar question: "If we accept that a category varies within one language, why can't it do so across languages?"

Commensurable description of different languages
The answer is that it depends on how we view and describe these languages, as different systems, or as variants of a single system. Especially for closely related languages, describing them as variants of a single system makes good sense for practical purposes. This is what Gil (2016) calls the "unitary commensurable mode" of description. Adopting this mode means that the same categories are used, and variation is described in an ad hoc way. Thus, for example, we could describe German and Modern English relativizers in the same way, as Relative Pronouns, regardless of their synchronic status within the system. We would then say that Modern English that is a relative pronoun (cf. van der Auwera 1985), like the German relative pronouns, and that it just happens to be case-invariant and identical to the complementizer that. 21 One could extend the unitary commensurable mode to languages even further away, and this is of course what has traditionally been done, e.g. when linguists have said that the accusative in Swahili is expressed by word order, or the vocative in English is identical to the nominative. Such descriptions are now universally thought to be cumbersome and ethnocentric, and linguists agree that they do not do justice to the languages whose structure is not Latin-like. But such judgements are always somewhat subjective, and I do not know how to achieve greater objectiveness in language description. As I noted in §4.1, description must primarily be comprehensive, and it must include categories which strike a balance between elegance and comprehensibility. Uncontroversially, using the same categories for all languages leads to hopelessly inelegant descriptions, 22 so the issue of incommensurability arises whenever different language-specific categories are set up by researchers. Since the well-known European languages English, Spanish, French, German and so on are very similar in their 20 The term converb is defined in terms of the finiteness concept in Haspelmath (1995) and thus inherits its unsolved problems (see also van der Auwera 1998 on the definition of converb). 21 Another situation where two categories may be known by the same label is when they are cognate but not particularly similar anymore. For example, the Modern German Subjunctive mood has almost no functional overlap with the English Subjunctive (as in I insist that he come), but both are known by this name because they derive from the same Proto-Germanic form. The term subjunctive is not used as a comparative concept here, but as a label for a cognate set, like "the *tūn word", a possible label for the cognate set comprising both English town and German Zaun ‚fence', which derive from Proto-Germanic *tūn. Cognate sets are united by common origin, not by any common features. 22 More precisely, this is uncontroversial outside of generative linguistics. In generative linguistics, not even the goal of comprehensive description ( §4.1) seems to be shared, let alone the goal of readily comprehensible description. structure, incommensurability does not raise its head very often, and many linguists blissfully ignore it.
But when it does arise, as with the question whether Serbo-Croatian adnominal demonstratives are adjectives or determiners (cf. Bošković 2009), one needs to be aware that terms like "adjective" and "determiner" are either defined language-internally (in which case Bošković's question is a terminological question), or as comparative concepts (in which case Serbo-Croatian adnominal demonstratives would normally be treated as determiners, not as adjectives, because the latter are defined semantically, with respect to properties such as age, dimension, value and color). Dahl (2016: 432) notes that "generalizations presuppose the possibility of making statements about individual cases". Thus, corresponding to the universal in (9a), there must be a true language-particular statement as in (9b), and similar statements for all languages that have question-word movement.

Universal claims pertain not to language structures, but to language phenomena
(9) a. Question-word movement is always to the left. (Haspelmath 2010: 671) b. In Swedish, question-word movement is to the left.
Dahl correctly observes that "if typological generalizations do not involve languagespecific categories, these statements should also be free from such categories". This may sound paradoxical, because (9b) would seem to be a statement about Swedish grammar, and the rules of Swedish grammar are supposed to be stated in terms of languageparticular descriptive categories. The paradox is resolved by noting that (9b) is a correct factual statement about the Swedish language, but is not a rule of the Swedish language. The corresponding Swedish rule says that Question Words are moved to the Prefield Position (i.e. the position preceding the Finite Verb), and this rule is of course formulated in structural terms that presuppose other descriptive categories of Swedish. 23 The relationship between the Swedish rule and the factual statement in (9b) is that the rule makes it straightforwardly clear that the factual statement is true, i.e. there is a matching or correspondence relationship (but of course not an instantiation relationship).
Very similarly, the universal in (10) entails a statement such as (10b).
(10) a. In almost all languages, the subject normally precedes the object when both are nominals. (Greenberg 1963, Universal 1) b. In Mandarin Chinese, the subject normally precedes the object. 23 A generativist might try to formulate both the universal in (9b) and the Swedish rule in terms of a crosslinguistic category (a natural kind, part of innate linguistic knowledge) such as "specifier of C position". Such a view has indeed been popular (and may still be held by many), but there are very few cross-linguistic phenomena that support it. In the great majority of cases, question words are simply fronted, without any evidence for a "C" position (cf. Dryer 2005). LaPolla (2016: §2) objects to the claim that Chinese is an SVO language (which is a more specific claim than (10b), but otherwise very similar) because he has shown in earlier work that Chinese does not have any subject or object category, and he thinks that "labeling [Chinese as an SVO language] implies that these categories either determine word order or are determined by it" (cf. LaPolla & Poa 2006). But again, this is not so. (10b) is a correct factual statement about Mandarin Chinese (assuming that "subject" means S/A, and "object" means P), and it is not a rule of Mandarin grammar. 24 LaPolla (2016: 370) may be right that "most people who see a description of Chinese as SVO will in fact assume that the label was given to the language because those categories are significant for determining word order in the language". But if they do, they have not understood the difference between describing a language and classifying a language from a comparative perspective. These two are different enterprises -not completely unrelated, because both are based on the phenomena of the language, but also not identical.
The notion of "factual statement" may be a bit surprising to some readers, because it seems not to have played an important role in typology so far. But I would argue that implicitly, it has long been there. As part of their grammar-mining activities, typologists have generally considered the entire description of a language, not merely the part where the author describes a particular category. In many cases, considering the frequency of occurrence of a particular form or function is part of this. For example, Dobrushina et al. (2005) say that they regard an inflectional form with subjunctive functions as an optative if "the expression of the wish is the main function", which is presumably decided by frequency of use. Similarly, Dryer (2005a) distinguishes between dominant order and lack of dominant order on the basis of frequency of use.
Thus, what we compare across languages is not the grammars (which are incommensurable), but the languages at the level at which we encounter them, namely in the way speakers use them. This is true not only for word order, but also for crosslinguistic variation in semantic categorization. Studies based on etic comparative concepts such as translation questionnaires, visual stimuli and parallel texts lead to groupings of comparative concepts into larger clusters, and to semantic maps as seen in Figure 1 above. These etic concepts typically reflect uses to which the categories can be put, not different meanings, and they would not play a role in their semantic description. This is again similar to what is practiced in related disciplines: When anthropologists compare kinship terms, when political scientists compare political systems, and when economists compare economic activities, they must make reference to what happens on the ground, rather than to the incommensurable categories of the diverse cultures. 25 For linguistics, the relative independence of typology from description was already noted in Haspelmath (2004). 24 Confusingly, LaPolla (2016) uses the expression "the facts of the language" in the sense in which I use "rules of the language" (this strange terminology may be motivated by his rejection of "structuralism" and the competence/performance distinction). 25 These disciplines can make mistakes as well, of course. For example, comparative economists can make the mistake of equating economic activities with legally recorded activities expressed in money values, ignoring subsistence and "shadow" economies of various sorts. Such a failure may lead to a very distorted view of economic patterns.

Conclusion
I conclude that there is a fundamental distinction between language-particular categories of languages (which descriptive linguists must describe by descriptive categories of their descriptions) and comparative concepts (which comparative linguists may use to compare languages). Language-particular categories are defined system-internally, by other language-particular categories, but comparative concepts are defined substantively, by other comparative concepts. The distinction between system-internal categories and comparative concepts is found in the same way in other disciplines dealing with social and cultural systems, and has been well-known in anthropology by the labels "emic" (for system-internal categories) and "etic" (for comparative concepts). I have also compared linguistic categories with natural kinds, as familiar from biology and chemistry, and I have argued that they are not natural kinds, because they do not recur across languages with identical properties. Thus, it is not licit to use different criteria or symptoms for the identification of the same categories across languages.
The widespread confusion between language-particular categories and category-like comparative concepts seems to derive from the fact that for a significant part of the categories ("portable categories"), a characterization in substantive terms gets us fairly far (e.g. characterizing nouns in terms of 'things, persons and places'). As a result, carrying over terms from one language to another language based on substantive similarities is often possible, sometimes without any serious difficulties. But it is universally recognized that ultimately, linguistic categories must be defined in structural terms (with respect to other constructions of the language), so the distinction does not disappear.
Finally, I noted that on the present view of comparative linguistics, what we compare is not language systems (which are incommensurable), but "the phenomena of languages".