November 16, 2018

A note on lexicalizing ‘what’ and ‘who’ in Russian and in Polish

Bartosz Wiland


The contrast between the Russian čto and the Polish co ‘what’ is syntactic and reflects the way in which an identical sequence of features in the syntactic representation becomes realized as morphology. Specifically, I argue that this scenario follows from a spell-out mechanism outlined in Starke (2018), where prefix formation, as in the Russian tri-morphemic čt-o but not in the Polish bi-morphemic c-o, takes place in order to spell out a feature which cannot be spelled out in the mainline derivation. Next, I explore a possibility that the wh-prefix in kto ‘who’, the same form in Russian and Polish, merges with a syntactically different stem than the one present in the lexical items for ‘what’, a scenario more transparently visible on the example of English wh-at and wh-o.

1 Wh exponent in čto, co ‘what’ and kto ‘who’

The forms of the interrogative pronoun‘what’ are different in Polish and in Russian: co [ţso] and čto [Sto], respectively. In turn, the forms of the interrogative person pronoun kto [kto] ‘who’ are identical, as shown in:

  • (1) (a) Co to jest? (Pol) (2) (a) Kto to jest? (Pol)

  • what it is who it is

  • (b) Čto èto? (Rus) (b) Kto èto? (Rus)

  • what it who it

  • ‘What is it?’ ‘Who is it?’

A clear distributional relation between the wh-prefixes k- and č- in the Russian person and kind queries does not carry over to the k- and c- in the Polish kto ‘who’ and co ‘what’. The problemof the unattested *cto in Polish is not resolved by phonology as there is no rule in Polish phonology that leads to t-truncation and there is no constraint that rules out a word-initial ct [ţst] cluster, either.[1] In fact, all word-initial consonantal clusters are in principle permissible in Polish, the position advanced in Scheer (2007), who submits that all non-existing examples of word-initial consonantal clusters in Polish (among other Slavic languages) are accidental rather than systematic gaps.

Specifically, Scheer argues that in languages where sonority increases in word-initial clusters any non-occurring #RT cluster is always a systematic gap.[2] For example, in a TR-only language like English, lbick is an impossible word due to the violation of the increasing sonority rule, while blick is a possible word as it observes the sonority rule. In contrast, in languages that do not observe the increasing sonority rule, any unattested word-initial consonantal cluster is a lexical accidence. Scheer shows this on the example of #RT and #TR clusters in “anything goes” Slavic languages. For example, the #rt cluster appears in the Polish rtęć ‘mercury’ but the #rp cluster is unattested.[3]

While the Slavic #RT and #TR clusters constitute a robust illustration of a general situation where in ‘anything goes’ languages any missing word-initial consonantal cluster is an accidental gap, Scheer (2007: 349) states that the same result carries over to non-TR clusters, including #TT and #RR clusters. For the case we are considering, this means that the unattested word-initial ct [ţst] in Polish resists a phonological account.

In what follows, I will argue that the contrast between the Russian čto and Polish co is syntactic and results from the way the common underlying syntactic representation of these pronouns (i.e. a hierarchical structure) is realized as morphology (i.e. a linear structure). Specifically, the Russian č- is a wh-prefix on a case inflected bi-morphemic demonstrative stem t-o, while the Polish c- realizes both the demonstrative stem and the wh-feature. I will argue that such a pattern follows from a spell-out mechanism advanced in Starke (2018), where prefix formation, as in the case of Russian č, takes place in order to spell out a feature which otherwise cannot be spelled out in the mainline derivation.

Such an explanation, however, does not yet account for the contrast between the wh-exponents that we find in č-t-o/c-o ‘what’ on the one hand and in k-t-o ‘who’ on the other, which indicates that k- spells out a different lexical entry than č- and c- do. Essentially, č-/c- lexicalize the wh-feature in the kind query ‘what’, while k- lexicalizes the wh-feature in personal pronouns (and in a variety of other stems, too, as for instance in the Polish k-iedy ‘when’, do-k-ąd ‘where to’, or in g-dzie ‘where’ and g-dy ‘when’, where the prefixal g- [g] appears to be a voiced allomorph of k- appearing before [d]).

In a system like Starke (2018), the position of lexical items in a morphological representation (including the prefix vs. suffix distinction) follows from the spell-out procedure based on the shape of lexical entries. Under such view, the č/c- vs. k- contrast should reflect a structural difference of lexical items for kind and person queries, a possibility I will explore in what follows.

2 Demonstrative stem

Descriptively speaking, Polish and Russian wh-words like k-to and č-to contain a demonstrative pronoun to, which itself comprises a nominal root t- and a neuter nominative case suffix -o, which shows syncretism with the accusative, as indicated in the declension paradigms of the singular demonstrative:[4]

  • (3)

Just like English, Polish and Russian make a two-way morphological distinction between adnominal forms for this and that: to and tamto in Polish, èto and to in Russian. These two forms are used to describe a three-way deictic contrast between the proximal (close to speaker), the medial (close to hearer), and the distal (far from speaker and hearer).[5]

Cross-linguistically, such a three-way contrast is realized by one, two, or three distinct lexical items. An example of a lexical item that is used in all three contexts is the French ce(tte):[6]

  • (4) French

  • ce journal

  • PROX/MED/DIST newspaperM

Languages like English, Polish, and Russian have two distinct lexical items that express the deictic contrast:

  • (5) Polish

  • to / tamto auto


  • (6) Russian

  • Èto / to plat’e

  • PROX MED/DIST dressN

The three-way deictic contrast is morphologically realized also by three distinct lexical items. We find it for instance in Basaá (Bantu, A43), where all nominal classes overtly mark the three-way distinction, in Spanish, or in Japanese, where the demonstrative markers are bound morphemes which merge with pronouns, determiners, and adverbs, as shown in:

  • (7) Basaá (Hyman (2003: 267))

  • (a) líní / lî / líí (class 5)


  • (b) tíní / dî / díí (class 13)


  • (8) Spanish

  • este / ese / aquel


Lander and Haegeman (2016) argue that the proximal–medial–distal contrast reflects a universal deictic syntactic structure in which the proximal is structurally contained within the medial, which is structurally contained within the distal, a containment relation that can be described as in:

  • (10) distal > medial > proximal

A tool that Lander and Haegeman adopt to turn this containment relation into a morphological representation is phrasal spell-out, an essential feature of Nanosyntax, which I will outline in certain detail in the following section. In such an approach, deictic markers are lexical items that spell-out phrasal layers of the structural representation of the sequence in (10). This can be illustrated on the example of Japanese proximal ko-, medial so-, and distal a- as in:[7]

  • (11)

Lander and Haegeman argue that the evidence for the sequence in (11) comes from morphological containment and syncretic alignment found between the markers of spatial deixis.

The argument from morphological containment is based on attested patterns of polymorphemic markers of spatial deixis, which are consistent with the hierarchy in (11). The attested patterns include the proximal marker inside the medial marker, as for instance in Palauan (Austronesian), where the proximal ngile (related to the 1st person exclusive) is contained in the structure of the medial ngile-cha (related to the 2nd person), as described in (12b).

  • (12) Palauan (Janssen (2004: 989–990) as cited in Lander and Haegeman

  • (2016: 32))

  • (a) ngile / ngilecha


  • (b) [MED [PROX ngile]-cha]

The bi-morphemic medial marker ngile-cha can be derived by remerging ProxP ngile with MedP. Such a movement allows the target node MedP to spell-out as -cha, as shown in:[8]

  • (13)

Other attested patterns include the proximal contained in the medial and the distal as in Wailevu Fijian (Austronesian) in (14) or in the marker that is syncretic for the medial/distal as in Wargamay (Australian) in (15).

  • (15) Wargamay (Dixon (1981: 44–45) as cited in Lander and Haegeman (2016: 34))

  • (a) ɲuŋga / ɲuŋgaɠi


  • (b) [DIST/MED [PROX ɲuŋga]-ɠi]

Let us note that the Med feature in Wailevu Fijian is spelled out in both the medial demonstrative ya-ri as -ri and in the distal demonstrative ya-ðei as part of -ðei, while the form *ya-ri-ðei is unattested. The non-existence of such a form reveals that the lexical entry for -ðei is defined in the lexicon as in (16c) among the list of entries containing deictic features:

  • (16) Lexical entries in Wailevu Fijian

  • (a) [ Prox ] ⇔

  • (b) [ Med ] ⇔ -ri

  • (c) [ Dist [ Med ]] ⇔ -ðei

The insertion of ðei in the syntactic node DistP over-rides the earlier spell-out of MedP as -ri, as illustrated in:

  • (17)

The distal demonstrative ya-ðei is, thus, derived by successive-cyclic movement of ProxP ya. The unattested form ya-ri-ðei would be derived by the movement of the MedP ya-ri on top of DistP if the lexical entry for -ðei included only one feature Dist rather than a complex feature structure in (16c).

A remark about the spell-out and over-riding is in order here. In Nanosyntax, each application of merge is followed by an attempt to spell-out. That is, in order to lexicalize distal or medial structures like the ones shown in (11), (13), or (17), we attempt to spell-out each feature, Prox, Med, and Dist, if present, immediately upon their mergers in the phrase marker. This results in a scenario where a lexical entry that matches a bigger tree always over-rides the lexical entires that match its subconstituents, the principle referred to in the literature on Nanosyntax as ‘cyclic over-ride’ (see Starke (2009: 4)).

With this remark, let us return to the arguments from lexical containment for the existence of the syntactic ‘DIST>MED>PROX’ sequence given in Lander and Haegeman (2016), who also list Boumaa Fijian as an example of the language where the medial is contained in the distal and the distal forms a prefix on the contained medial, as in:

  • (18) Bouma Fijian (Ross (2007: 278) as cited in Lander and Haegeman (2016: 33))

  • (a) yā / mayā


  • (b) [DIST ma-[MED yā]]

The set of attested patterns is completed by the proximal contained in the medial contained in the distal as in Ewondo (Niger-Congo, A72), as shown on the example of class 2 plural proximal marker :

  • (a) mī / mīlí / mīlíí


  • (b) [DIST [MED [PROX mī]-lí]-í]

The attested patters conform to the hierarchy in (11) in the sense that markers that are structurally smaller are morphologically contained in the markers that are structurally bigger. At the same time, Lander and Haegeman (2016) report that the patterns where the medial is contained within the proximal and the distal is contained within the medial or the proximal have so far been unattested, as predicted by the functional sequence in (11).

In turn, the argument from syncretism in favor of (11) is based on the assumption that syncretic alignment anchors structural containment since it only targets contiguous layers of structure. This assumption stems from the observation that a more complex structure and a less complex structure are not realized by an exponent A if structures that are in between them in terms of complexity are realized by an exponent B, the generalization that has become known as the *ABA since Bobaljik’s (2012) work on suppletion.[9]

The *ABA generalization follows from a major premise behind Nanosyntax, namely that spell-out is regulated by the Superset Principle defined as in:

  • (20) Superset Principle

  • A phonological exponent of a lexical item is inserted into a syntactic node if its lexical entry has a subconstituent which matches that node (Starke (2009)).

If there are multiple lexical items competing for insertion, the choice which one is inserted is controlled by the Elsewhere Principle defined as in:

  • (21) Elsewhere Principle

  • Where several items meet the conditions for insertion, the item containing fewer features unspecified in the node must be chosen.

To illustrate how the Superset Principle rules out ABA patterns from grammar, let A stand for the English demonstrative this, whose lexical entry is described as in (22a), and let B stand for that whose lexical entry is described as in (22b).[10]

  • (22) Lexical entries in English (simplified)

  • (a) [ Prox ] ⇔this

  • (b) [ Dist [ Med [ Prox ]]] ⇔that

These lexical entries capture the fact that the demonstrative this (form A) has only the proximal reading while that (form B) has two readings, medial and distal. On the strength of the Superset Principle, the layers of the syntactic representation in (23) are spelled out in the following way:

  • (23)

The nodes DistP and MedP are both spelled out as that since they constitute the superset and a subset of the lexical entry in (22b). The ProxP node, which is also a subset of the entry in (22b), is spelled out by this on the strength of the Elsewhere Principle in (21), since (22a) is a more specific match for ProxP than (22b).

Since that spells out the proximal as its own subset and as the superset of this, there is no way to arrange the exponents of the proximal, the medial, and the distal such that the ABA pattern can arise. However, since that shows the medial=distal syncretism, the ABB pattern can also reflect the ‘PROX< DIST < MED’ containment. Such an ordering is ruled out by cross-linguistically attested PROX=MED syncretism to the exclusion of DIST as in the case of the Polish to in (5), or the Bulgarian tova as in:

  • (24) Bulgarian

  • tova / onova


At the same time, Lander and Haegeman report that out of 5 logical possibilities of syncretism that can be attested between the proximal, the medial, and the distal (including a triplet as in French in (4) or non-syncretism as in Japanese in (9)) the only pattern that is unattested is DIST=PROX to the exclusion of MED, that is the ABA pattern in the ‘PROX < MED < DIST’ hierarchy in (11).

While the hierarchy in (11) appears to be on the right track, the Polish and Russian facts indicate that it should be updated with a nominal element, which projects the case layers in the extended projection of the demonstrative pronoun.[11] For the purposes of the discussion of demonstratives and wh-pronouns co and čto ‘what’, I will represent this (pro)nominal element simply as the NP of some complexity at the base of the sequence in:

  • (25)

and I will refine its structure into a more meaningful representation at a point where it becomes relevant later in the paper.

3 Spelling out the demonstrative stem

As repeated below from (5)-(6),

  • (26) to / tamto auto (Pol)


  • (27) èto / to plat’e (Rus)

  • PROX MED/DIST dressN

the Polish proximal/medial marker and both markers in Russian do not show overt morphological containment of the type we find in Palauan, Wargamay, Fijian, or Ewondo. This indicates that these lexical items spell out spans of the tree in (32) in a similar manner to what we find in English. The Polish distal marker tamto, however does contain the proximal/medial to. The morpheme that is prefixed onto the t- stem (of case inflected t-o, cf. (3)) is a distal locative reinforcer tam ‘there’.

Polish tu ‘here’ and tam ‘there’ are indeclinable locative adverbs, as in:

  • (28) Tu /tam jest zimno

  • here / there is cold

  • ‘It’s cold here/there.’

We can observe that tam ‘there’ and in certain contexts also tu ‘here’, too, can serve also as demonstrative reinforcers, which can be optionally placed after the demonstrative pronoun just like here in a (non-standard) English this here big house, as in:[12]

  • (29) (a) to ??tu / tam dziecko

  • PROX/MEDM.NOM here there childM.NOM

  • ‘this here child’

  • (b) tej tu / tam dziewczynie

  • PROX/MEDF.DAT here there girlF.DAT

  • ‘to this here human’

  • (c) tego tu / tam chłopaka

  • PROX/MEDM.GEN here there boyM.GEN

  • ‘of this here boy’

In contrast to tu ‘here’, tam ‘there’ cannot serve as a free form reinforcer placed after the distal demonstrative pronoun tamto:

  • (30) (a) tamto (*tam) dziecko

  • DISTN.NOM there childN.NOM

  • intended ‘that there child’

  • (b) tamtej (*tam) dziewczynie

  • DISTF.DAT there girlF.DAT

  • intended ‘to that there human’

  • (c) tamtego (*tam) chłopaka

  • DISTM.GEN there boyM.GEN

  • intended ‘of that there boy’

Moreover, Polish tam-to looks like the reinforcer-demonstrative pattern that we find for instance in Afrikaans, where the reinforcer forms a prefix to the demonstrative in front of the head noun, as in:

We can therefore quite safely conclude that the structure of the Polish distal demonstrative comprises the distal locative adverb tam- ‘there’, which is prefixed onto the t- stem, which spells out the medial:

  • (32)

Note that it is to but not the Polish distal tamto or the Russian proximal èto that forms the stem for the merger of the wh-prefix in č-to. Since t- spells out the medial in both Polish and Russian, it shows that it is MedP of (32) that the wh-feature merges with in Polish and Russian forms for ‘what’. (Nevertheless, what is more important to the following discussion than the fact that it is precisely the MED layer that forms a basis for the merger of the wh-feature is the observation that Polish and Russian demonstratives comprise a (pro)nominal base topped with the spatial deictic structure of some degree of complexity and case).

In order to represent how case becomes spelled out as morphology on Polish and Russian demonstratives, let us assume that the lexical entry of neuter nominative suffix that we find in t-o includes a case feature K1, in line with Caha’s (2009) case decomposition, where cases are decomposed into multiple features as in (33), where accusative corresponds to K1+K2, genitive to K1+K2+K3, etc.

  • (33)

In line with (33), the lexical entry of the neuter nominative looks like in:[13]

  • (34) Lexical entry in Russian and in Polish

  • [ K1 ] ⇔-o

However, the merger of the case feature K1 on top of the demonstrative (medial) stem t- as in:

  • (35)

does not result in an immediate spell-out since such a tree does not match any existing lexical entry in Polish or Russian.

Since the newly merged feature K1 cannot be spelled out in a tree like above, the evacuation movement of MedP takes place in an attempt to form a subtree which will match a lexical entry with K1, as in:

  • (36)

The evacuation of MedP results in the formation of a tree which matches the lexical entry in (34) and whose exponent comes out as a suffix. Note that this kind of spell-out driven movement, where the complement node of the feature requiring to be spelled out becomes evacuated, is what we observe in (13), where the movement of ProxP ngile- facilitates the spell-out of MedP and results in -cha surfacing as a suffix.

4 Polish c- vs. Russian č-

So far we have seen two ways in which a feature can get spelled out after it is merged in a syntactic tree: without spell-out driven movement or following spell-out driven movement. Let us discuss both options in some detail.

The merger of a feature in a tree will lead to an immediate spell-out if the lexicon contains an entry which matches that tree structure. To voice it differently, a syntactic representation will stay as is if it can be spelled out in that way. We have seen a merger of a feature followed by an immediate spell-out in demonstratives in Japanese in (11) and in English in (23) where the mergers of Med and Dist on top of ProxP both create lexicalizable nodes. Likewise, we have seen it in Polish in (35) where t- spells out a monotonically growing sequence of layers of the demonstrative stem up to MedP without a prior movement of any node. In this way the option ‘stay’ is the basic spell-out option in the lexicalization procedure.

Let us here observe that all we need do to explain why in contrast to Russian č-t-o ‘what’, Polish c-o does not comprise the medial t- morpheme is to recognize how this basic option ‘stay’ applies in spelling out the wh-feature. Let us, thus, backtrack in the derivation of the Polish demonstrative to a point where the case inflection is not yet projected on its top, the stage shown in (37), and the uninflected stem merges with the wh-feature.

  • (37)

At this point, the spell-out of the newly merged wh-feature is attempted and the lexicon is checked for a lexical entry containing the following structure:

  • (38) Lexical entry in Polish

  • [ Wh [ Med [ Prox NP ]]] ⇔ c-

If (38) is stored in the Polish lexicon, the tree with wh-feature becomes spelled out right away, as shown in:

  • (39)

The spell-out of the WhP as c- over-rides the earlier spell-out of the medial stem t- in the same way as in the Japanese example in (11) a- overrides so-, which overrides ko-.

If the ‘stay’ option fails, that is when a tree with a newly merged feature does not match an existing lexical entry, the tree must be reshaped and the attempt to spell it out is repeated. This is exactly what we see in (36) when K1 is merged after a successful spell-out of MedP. But what happens if both these options fail to spell-out a newly merged feature?

In such a case, Starke (2018) proposes that a third option kicks in, whereby a derivation of a parallel subtree with the feature that remains to be spelled out is created and subsequently merged with the mainline derivation. Starke (2018) argues that spawning a subderivation is a costly operation, as it essentially involves backtracking in the mainline derivation and starting deriving a parallel tree. For this reason, it is a last-resort option employed only when two earlier options do not lead to spell-out, which renders (a simplified) order of procedures that facilitate spell-out as in:[14]

  • (40) Spell-out algorithm

  • staymovesubderive

In order to spawn such a subderivation, what needs to be provided is a feature from the mainline derivation, which serves as a base for the merger of subsequent features in the functional sequence up to the one that needs to be spelled out. In the example we are looking at, let us (for now) simply assume that the feature that is provided as the base for spawning the subderivation is the lowest deictic feature Prox. As before, the next feature in line that merges with Prox is Med, which creates the base for the merger of Wh, which gives us the following tree structure:

  • (41)

Once the mainline derivation has projected up to MedP, the subderived tree in (41) is closed in, forming a left branch to the mainline derivation, as in:

  • (42)

At this point, spell-out is attempted again and the lexicon is checked for lexical entries that match the structure in (42). If the following lexical entry

  • (43) Lexical entry in Russian

  • [ Wh [ Med Prox ]] ⇔ č-

is stored in the Russian lexicon, then the derived tree in (42) is spelled out to the effect that č- comes out as the prefix and the stem t- is preserved in the surface morphological representation, as shown in:

  • (44)

In this way the difference between the wh morphology in the Polish c-o and the Russian č-t-o reduces to the fact that the spell-out of c- over-rides the previous spell-out of the t-stem in Polish while the spell-out of č- in Russian does not.

One consequence of the methodology applied in explaining the contrast between co and čto is that, as pointed out in Starke (2018), the difference between ‘pre-’ elements and suffixes can be defined representationally such that prefixes have a binary and suffixes a unary foot.[15] For instance, the č-prefix in (44) has a binary foot, while the neuter nominative suffix in (36) has a unary foot at the point of spell-out (on the proviso that spell-out driven movement does not create a trace node, the supposition that is informed by the fact that such movements do not reconstruct).

Two remarks about the lexicalization mechanism discussed so far must be made at this point.

First, a subtree such as WhP in (42) becomes a head (in the sense that it projects its own label) once it is closed in with the mainline. More perspicuously, an XP that forms a left branch is not a (non-projecting) specifier of its sister node and, instead, it provides its own label to the mainline in the same way as a simplex Wh head does in (37).[16] In this way, the attested functional sequence ‘KP > WhP > MedP > ProxP > NP’ is exactly the same in Russian and in Polish irrespective of morphological complexity of the lexical items that realize this sequence.

Second, after each successful spell-out, the application of the algorithm in (40) repeats. Once the case feature K1 is merged on top of the wh-stem, ‘stay’ is attempted. If ‘stay’ fails to result in spell-out, ‘move’ applies, which is what we observe in the derivation of case-inflected Polish c-o and Russian č-t-o, as illustrated in (45) and (46).

  • (45) Spelling out nominative K1 in Polish c-o

  • (46) Spelling out nominative K1 in Russian č-t-o

In the context of case-inflected čto, co ‘what’ and kto ‘who’, let us point out that throughout Slavic, non-nominative forms of wh-pronouns like e.g. the Polish cz-ego ‘what-GEN’, k-omu ‘who-DAT’, etc. do not have the t- stem in their morphological structures, as shown for the Polish kto in:

  • (47) NOM k-t-o

  • ACC/GEN k-ogo

  • DAT k-omu

  • LOC/INST k-im

Given the case hierarchy in (33), the t- stem in wh-pronouns disappears in cases that are higher than nominative. For this reason, what we can call here ‘the disappearing t- stem problem’ appears to be an effect of spell-out driven operation that leads to the lexicalization of cases higher than nominative rather than a phonological process of [t] deletion. In this paper, I will not discuss this issue in any greater detail and I will instead continue to describe the structure of the t- stem in the nominative forms, a necessary step before an analysis of the ‘disappearing t- stem problem’ can be worked out.

5 Nominal base in wh-pronouns

An immediate consequence of the lexicalization system in which the formation of a prefix requires spawning a subderivation is that its feature composition is sensitive to the feature composition of the main projection line. This is due to the fact that, as illustrated on the example of the Russian prefix č- above, in order to start a subderivation, a feature from the mainline needs to be provided. I will argue below that this is what is reflected in the contrast between the Russian č- in č-t-o ‘what’ and the Russian/Polish k- in k-t-o ‘who’, where č- and k-realize different syntactic trees whose feature compositions are determined by syntactically different, although syncretic, t- stems they merge with.

The fact that kind and person wh-pronouns have structurally different stems is clearly visible in languages that show an opposite pattern to Russian or Polish, the one where an invariant wh-prefix merges with non-syncretic stems, as for instance in Germanic (e.g. the English wh-at, wh-o; the German w-as, w-er; or the Norwegian hv-a, hv-em). The third logical pattern, the one where the exponents of both the wh-prefix and the stem are syncretic, is also attested. We find it for instance in Latvian, where k-as is a syncretic form for ‘what’ and ‘who’, while other wh-pronouns such as k-ur ‘where’, k-ā ‘how’, and k-āpēc ‘why’ indicate that the k- is a wh-prefix.

There exists evidence for the decomposition of the ‘NP’ base in wh-pronouns as in (48), where Nn stand for features that cumulatively form a monotonically growing sequence of nominal categories in pronouns denoting THING (‘what’), PERSON (‘who’), and PLACE (‘where’).

  • (48)

The argument for such a sequence can be inferred from Baunaz and Lander’s (2018) work on the so-called ontological categories, a closed class of crosslinguistically attested functional nouns comprising, among others, THING, PERSON, and PLACE, MANNER, AMOUNT, and TIME, which are found in certain defined morpho-syntactic environments, including interrogative pronouns.[17] On the basis of their syncretism and morphological inclusion, Baunaz and Lander arrange the list of the ontological nouns that are discussed in Cysouw’s work (2004) on the typology of wh-pronouns into a sequence which includes ‘PLACE > PERSON > THING’.

Assuming the *ABA generalization about syncretic alignment, PERSON is an intermediate category in terms of structural complexity with respect to a bigger PLACE and a smaller THING. As shown in (49), we can observe this on the example of the Latvian kas, which shows the THING=PERSON syncretism to the exclusion of PLACE and, as reported in Baunaz and Lander (2018), Awa Pit (Barbacoan), which shows the PLACE=PERSON syncretism to the exclusion of THING (assuming that mɨn= and mɨn as described in Curnow’s work (2006: 225) can be taken to be indeed syncretic).


  • English: where who what ABC

  • Latvian: kur kas kas ABB

  • Awa Pit: mɨn= mɨn shi AAB

  • unattested: ABA

As noted in Baunaz and Lander (2018), the PLACE=THING syncretism to the exclusion of PERSON is cross-linguistically unattested.

In turn, the morphological inclusion of THING in the structure of PERSON in wh-pronouns is reported to be morphologically visible for instance in Muna (Austronesian) and Amuecha (Arawkan), as in:

The containment of PERSON in the structure of PLACE is morphologically visible in Sanumá (Yanomaman) and Pipil (Uto-Aztecan):

Such morphological forms can be derived if the syntactic representations they realize include the sequence ordered as in (48).

Together with the decomposition of spatial deixis as in (10) and the projection of the WhP layer on its top, the refined structures of the Polish co and the Russian čto look as in (54a) and (54b)(modulo the case suffix).

  • (54) (a) Polish c- in co ‘what’ (final approximation)

  • (b) Russian č- in čto ‘what’ (final approximation)

With refined representations of co and čto, let us move on to the k- prefix in the Russian/Polish kto ‘who’.

6 K- in kto

Descriptively speaking, the juxtaposition of kto ‘who’ and čto/co ‘what’ indicates that the person vs. kind contrast is marked on the wh-morpheme such that k- marks the person and č- and c- mark the kind.

There is caveat to such a description, though, as it presupposes that both kto and čto/co have an identical stem. If they do, then such a scenario poses a problem for the mechanism of spell-out involving ‘subderive’ since the feature from the mainline that is provided as the base for the subderivation in (54b) is Prox, the feature that gets spelled out jointly with Wh as č-, not as k-. To restate the problem, if we have a syntactically identical stem in kto and in čto/co, we are unable to generate the contrast between the wh-exponents k- and č-/c- and we are unable capture the fact that k- is prefixal while the Polish c- is not, either.

An immediate attempt to resolve these problems is to assume that the t-stems in čto and kto are syncretic but syntactically distinct, a solution I will consider in what follows. A clue for structural distinction in the stems of kind and person wh-pronouns comes from languages like English where wh-at and wh-o have non-syncretic stems. If this observation can be extended to the wh-pronouns in Polish and Russian, then the formation of the left branch realized as k- in kto is going to be spawned by providing a different base feature from the mainline than Prox in (54b).

Such a result can be obtained if, in agreement with the functional sequence given in (48), person queries include a bigger nominal stem than kind queries, as in (55), the structure which serves as the base for the merger of deictic features and Wh.[18]

  • (55)

Let us suppose that the lexical entry for t- is defined as in (56) and it spells out the MedP node in (55) in both languages (the supposition I will return to shortly).

  • (56) Lexical entry in Russian and Polish

  • [ Med [ Prox [ N2 [ N1 ]]]] ⇔ t-

Given that, as shown in (54), the Polish c- and the Russian č- spell out the WhP nodes that do not include PERSON in their structures, the merger of Wh on top of MedP in (57) fails to become spelled out by ‘stay’ in both languages.

  • (57)

The next step of the spell-out algorithm in (40), ‘move’ (illustrated for instance in the derivation of the case suffix in (36)), does not result in a successful spell-out either, in which case the wh-marker would come out as the suffix on t-, counter fact.

In such case, the last-resort operation, ‘subderive’, is launched. Unlike in the case of the Russian čto as in (54b), there are more possible features in the mainline derivation to be provided as the base to spawn the subderivation. If the base feature that becomes provided is the PERSON-forming N2, the feature absent in kind queries but present in person queries, the resulting subtree projected up to Wh in (58) is going to have a different foot than both trees lexicalized by č- and c- in (54). (For convenience, let us right away indicate that the subtree forming the left branch in (58) gets spelled out as k-).

  • (58) Russian and Polish k- and t- in kto ‘who’ (modulo the case suffix)

In order to spell-out the left branch in (58), there needs to be a different lexical entry than for č- or c-, which get inserted in the WhP nodes in the representations in (54). As already indicated above, the lexically stored tree that can get inserted in the WhP node in (58) in the spell-out mechanism based on the Superset Principle defined as in (20) is the one in (59), which includes the PERSON-forming feature N2 in its specification.

  • (59) Lexical entry in Russian and Polish

  • [ Wh [ Med [ Prox [ N2 ]]]] ⇔k-

On a final note, let us return to the supposition that the lexical entry for the t- stem is defined as in (56). With the Superset Principle in (20), which relies on tree structures rather than on feature sets, we are not able to spell-out the MedP nodes in the representations in (54a)-(54b) as subset spell-outs of (56). This is so since the MedP nodes in these derivations and the lexical entry for t- in (56) are not structurally contained. We are, thus, left with č-t-o ‘what’ and k-t-o ‘who’ that have syncretic stems whose syntactic representations differ ‘in the middle’ (i.e. they are not structurally contained).

One possibility to resolve this problem is to attempt to replace the Superset Principle defined as in (20) with the Revised Superset Principle proposed in Vanden Wyngaerd (2018) and formulated as in:

  • (60) Revised Superset Principle (RSP)

  • A lexical entry L may spell out a syntactic node SN if and only if the features of L are a superset of the features dominated by SN.

The difference between the RSP and the traditional Superset Principle is that the first does not rely on constituency of the lexically stored tree as a prerequisite on the insertion into a syntactic node and replaces this condition with a feature set of the syntactic node. For example, the RSP allows a syntactic tree in (61b) to be realized by an exponent a of a lexical item in (61a).

  • (61)

Such a result is unobtainable under the classic Superset Principle. Adopting the RSP, we are able to lexicalize the MedP nodes in both čto/co ‘what’ in (54) and in kto ‘who’ in (57) with the lexical entry for the t- stem defined as in (56).

Let us also note that with two separate lexical entries, one for č- in (43) and the other for k- in (59), we correctly predict the left branch in (54b) to be spelled out by č- rather than by k-. Despite the fact that the WhP node in (54b) includes a subset of features specified in the lexically stored tree for k- in (59), which potentially qualifies for the subset spell-out as k- under the RSP, the WhP node in (54b) is lexicalized as č-. This is due to the Elsewhere Principle in (21), since the č- item in (43) is a better match for the left branch in (54b) than the k- item in (59).

Despite the result we can obtain with applying the RSP, there remain certain other options that rely on the classic definition of the Superset Principle to be explored in deriving the t- stem syncretism − pointers being an immediate alternative to the RSP (see Caha and Pantcheva (2012) for an illustration of the pointer technology). For this reason, it is safe to conclude that this issue is left only provisionally resolved at this point.

7 Summary

In a system of lexicalization like in Starke (2018), a feature that cannot be spelled out in the mainline derivation becomes spelled out in its prefix. I have made a case for this kind of derivational mechanism of prefix formation by arguing that it accounts for the contrast between the forms of Russian and Polish lexical items for ‘what’: čto and co. I have then explored the possibility that in such a system, the feature composition of a stem provides an insight about the feature composition of its prefix.

Bartosz Wiland Faculty of English Adam Mickiewicz University in Poznań al. Niepodległości 4 61-874 Poznań Poland


I am indebted to Tobias Scheer, two anonymous reviewers for their excellent comments, and to the audience at the syntax session at the 47th Poznań Linguistic Meeting in September 2017, where an earlier version of this work was presented. Needless to say, all errors are my own responsibility.

This work is part of the project funded by the National Science Center (grant No 2016/2/B/HS2/00619).


