Folia Linguistica

Acta Societatis Linguisticae Europaeae

Editor-in-Chief: Fischer, Olga / Norde, Muriel

Folia Linguistica
IMPACT FACTOR 2018: 0.463
5-year IMPACT FACTOR: 0.647

CiteScore 2018: 0.59

SCImago Journal Rank (SJR) 2018: 0.284
Source Normalized Impact per Paper (SNIP) 2018: 0.971

Volume 40, Issue 1


Sibilant-stop onsets in Romance: Explaining phonotactic complexity

Elissa Pustka
Published Online: 2019-07-28 | DOI: https://doi.org/10.1515/flih-2019-0004


Focusing on sibilant-stop onsets, this paper deals with syllabic complexity in Romance languages. At its core are two empirical studies that address the complex case of French: a type-level study is based on the Petit Robert, and a token-level study uses Parisian and Southern French corpus data elaborated in the framework of the PFC program (Phonologie du Français Contemporain). The paper identifies three factors behind the emergence of phonotactic complexity: (a) vowel elision, (b) borrowing, and (c) expressivity.

Keywords: consonant clusters; syllable structure; sibilants; Romance languages; French

1 Introduction

The name by which the smurfs (see en.wikipedia.org/wiki/The_Smurfs; accessed February 8, 2019) are known in French is Schtroumpf. It was coined in 1958 by the Belgian cartoon artist Peyo (pseudonym of Pierre Culliford). He chose the name for his creatures because to his French ears it sounded alien and hilarious (cf. Eco 1983, or http://www.franquin.com/amis/peyo_amis.php, accessed February 8, 2019). Indeed, no French word ends in [mpf], and apart from the Austrian German loan Strudel, none begins with [ʃtʁ] either. This is not altogether surprising: containing three consonants, [ʃtʁ] represents an onset of rare complexity, and because of the articulatory and perceptual difficulties complex onsets create, they count as dispreferred and occur only in few languages (see also Dressler et al.’s and Dziubalska-Kołaczyk’s contributions to this volume). Thus, French speakers are not exceptional in finding words like Schtroumpf strange. Yet, that is how they refer to smurfs, and this fact exemplifies a general problem: how and why do phonotactically complex (and dispreferred) sound sequences come to get established in natural languages at all? This is the question addressed in this paper.

Note first that words beginning with [ʃtʁ] are exceptional not only in French, but in Romance languages in general. In the case of the smurfs, only Romanian has retained the original French name Schtroumpf (although it is spelt Strumf). All other Romance languages have either changed its phonotactic structure, or created new names altogether. In European Portuguese (EP), for example, the initial adaptation Estrumpf was soon replaced by Smurf, taken from English and pronounced [smœɾf]. Although the onset [sm] does not occur in native Portuguese words either, it is less complex than [ʃtʁ]. In Brazilian Portuguese (BP), on the other hand, where initial Strunf was also replaced by Smurf, the word came to be pronounced [is.mœɾ.fi], so that it conformed to the less complex native phonotactics. 1 Italian and Spanish, finally, solved the problem by coining completely new names for the comic creatures: It. Puf.fo (Pl. Puf.fi) and Sp. Pi.tu.fo contain only CV(C) syllables and fit perfectly into the phonotactic systems of the languages. 2

Note also that the onset cluster [ʃtʁ] is rare only in the French lexicon, but not in French speech. There, it occurs frequently in casual pronunciations of phrases such as j(e) trouve [ʃtʁuv] ‘I find’. This shows that speakers of French are perfectly capable of dealing with this onset cluster. So why do they find the onset funny when it occurs in Schtroumpf, or do they? Maybe the frequency of phrases like j(e) trouve explains why the name Schtroumpf has been adopted in French but in no other Romance language? These questions once again highlight the central phonotactic problem that this paper addresses: how can rare, difficult and dispreferred sound sequences enter a language and thereby, possibly, change its phonotactic system?

The Romance language family provides rich evidence on this issue. Its branches exhibit much micro-typological variation, and reflect influences of different contact languages. Contacts involved various substrates, (mainly Germanic) superstrates, the classical cultural adstrates (mainly Latin), as well as modern neighbour languages, such as English and German.

This paper focusses on sibilant-stop onsets. Here, we can observe, for example, that the Latin onset /sk/ as in schola ‘school’ was retained in It. scuola 3 and Rom. şcoală, but shows up as /Vs.kV/ in Sp. escuela (although the <e-> can be dropped in the Highlands of Latin America) and in BP escola [is.ˈkɔ.lə]. In European Portuguese, however, escola is pronounced [ˈʃkɔ.lɐ], i.e. with the onset /ʃk/. French takes an intermediate position: the Latin onset has disappeared in native école ‘school’ (~ 1050), 4 but is faithfully reflected in learned words such as scolarité ‘school attendance’ (1383). In French-based creoles, finally, a process of vowel epenthesis is still (or possibly once again) productive, yielding Guadeloupe Creole èskandal versus Fr. scandale, for example (cf. Section 2.2).

The specific questions that raise themselves are the following. First, why did speakers begin to repair complex onsets such as /sk/ by prosthesis (i.e. by inserting a vowel before them), so that the complex syllable was split into two simpler ones, as in (1)?


[skVx]σ → [Vs]σ1 [kVx]σ2, as in Lat. scho.la → Sp. escuela

Second, why did French speakers cease to do so a millennium later? Third, why have European Portuguese and Highland Spanish all come to re-establish complex onsets, while Coastal Spanish and Brazilian Portuguese have not? And forth, why has prosthesis never been attested in Romanian? All these questions are variants of the “actuation problem” as formulated by Weinreich et al. (1968: 102):

Why do changes in a structural feature take place in a particular language at a given time, but not in other languages with the same feature, or in the same language at other times? (Weinreich et al. 1968: 102)

In order to deepen our understanding of the historical dynamics behind increasing and decreasing phonotactic complexity in Romance, the paper takes two different approaches. Focussing particularly on the case of French, it analyses the level of the language system from a diachronic perspective on the basis of the dictionary Le nouveau Petit Robert (2008), and the level of language use from a synchronic perspective on the basis of the corpus Phonologie du Français Contemporain (PFC).

Section 2 of this paper surveys the state of the art in Romance phonotactics; Section 3 contains empirical studies of French phonotactics; and Section 4 discusses possible factors behind the observed variation and the changes, namely (a) frequency of use in grammatical and pragmatic contexts, (b) linguistic contact and (c) expressivity. The paper shows that the most important factor behind the particular developments in French was probably Germanic influence.

2 State of the art

2.1 Syllable types in Romance

French, Spanish and Italian count as syllable-timed languages. 5 In syllable-timed languages, vowels are more frequent and consonants rarer than they are in stress-timed languages (cf. Ramus et al. 1999: 272). This is also reflected in low relative frequencies of onset clusters. For Portuguese, Spanish, French, Italian and Romanian this has been established in various quantitative studies. Table 1 represents the numbers derived from Hess (1975), which is based on a written corpus containing between 8,000 and 10,000 syllables per language.

Table 1:

Proportions of onset types in Romance (derived from Hess 1975: 258‒270).

As Table 1 shows, Hess reports a relatively high proportion of CC onsets (12%) for French, and very low proportions of CCC onsets 6 for all Romance languages. 7 Other studies report similar, albeit not identical results. For example, Wioland (1985) counts 1.21% CCC onsets in a corpus of 200,000 phones of spoken French. Some of them (e.g. reprendre [ʁpʁɑ̃dʁ] ‘resume’, or strict [stʁikt]) occur in the syllable type CCCVCC (0,01%), which is absent from Hess’ written data (cf. Wioland 1985: 1, 338, 348). With regard to the types of segments that can appear in CCC onsets, Girard and Lyche (2005: 36) point out that in French their first element is always [s], at least in ‘primary’ onsets, which do not result from post-lexical processes such as schwa elision. In this respect, French behaves like Latin (cf. Sampson 2010: 42). The second element is always a voiceless stop and the third one a liquid, as in strict, spray, sclérose, or splendeur (only [stl] is not attested). Thus, the proper name Schtroumpf [ʃtʁumpf], cited in Section 1, is highly exceptional for two reasons: first, because of its highly complex CCCVCCC structure, and second, because its onset begins with /ʃ/ rather than /s/.

Spanish allows neither CCC onsets nor sibilant-stop CC onsets (cf. Hualde 2005: 77). For Portuguese, Mateus and dʼAndrade (2000: 41) mention no primary onset cluster beginning with /s/ or /ʃ/, and explicitly identify *[sn] and *[ʃm] as “impossible” (for secondary clusters cf. Section 2.3). For Italian, Schmid (1999: 159, 161) does mention (C)CC types, which typically involve “sonority reversal”. They include sibilant-stop onsets as in scala /ˈska.la/ ‘staircase’ or sgelare /zdʒe.ˈla.re/ ‘thaw (v.)’, and sibilant-stop-liquid onsets as in scrivere /ˈskri.ve.re/ ‘write or sgridare /ˈzgri.da.re/ ‘scold’. However, Schmid (1999) gives no information on their relative frequency. In sum, CC and CCC onsets are rare in Romance languages, and altogether impossible in Spanish.

The rarity of complex onsets in Romance is not just a historical accident but a design feature: most of the types that fail to occur are prevented from doing so by productive repair processes. Such processes reflect language specific constraints and define systematic, as opposed to accidental gaps in the onset inventories of Romance languages. Patrick Honeybone’s contribution to this issue discusses the theoretical status and the possible historical developments of such gaps in great detail. Here, we look specifically at repair strategies in Romance languages, at historical changes in their productivity, and at the way in which they can account for historical stratification of the vocabularies.

2.2 Repair strategies

In the history of Romance languages two types of repair strategies are attested: vowel prosthesis (Lat. scho.la → Sp. es.cue.la, see Section 2.2.1) and consonant elision (CLat. spon.sam → OFr. _pou.se, see Section 2.2.2). 8 Their distribution in Romance languages is represented in Table 2, which contains written reflexes of the word family of Latin schola, and points to interesting differences.

Table 2:

Romance reflexes of the word family of Latin schola (cf. Reinheimer Rîpeanu 2004: 384).

As the spellings in Table 2 suggest, Romance languages exhibit two different types of behaviour. All Spanish and Portuguese words show epenthetic initial <e>, while Italian and Romanian maintain sibilant-stop onsets, and French takes an intermediate position, and has both native words with prosthesis, such as école, and learned loans with complex onsets, such as scolaire, scolarité and scolastique.

A look at further data and historical developments reveals a more complex picture, however. From the Middle Ages onwards, Spanish has shown spellings both with and without <e> (e.g. Sp. <estrés>/<stress>; see also Section 2.2.1). On the other hand, European Portuguese has – over the last centuries – lost prosthetic [e] in pronunciation, although it has retained <e-> in writing (e.g. EP escola [ʃkˈɔ.lɐ]; see Section 2.3). Italian has preserved traces of former prosthesis in some fixed constructions such as It. per iscritto ‘in written form’ (cf. Section 2.2.1). In Romanian, finally, vowel prosthesis before sibilant-stop onsets has never been documented (cf. Kiss 1971: 92). For this reason, I will concentrate on French, Spanish, Portuguese and Italian.

2.2.1 Vowel prosthesis

One way of eliminating pre-nuclear sonority peaks in sibilant-stop clusters is vowel ‘prosthesis’. In Vulgar Latin, prosthesis is very common, e.g. CLat. scho.la(m) versus VLat. is.co.la (documented in an inscription in Rome; cf. Sampson 2010: 54), which is the basis of OFr. escole (cf. Rheinfelder 1963: 183)> Mod. Fr. école, Sp. escuela and Pt. escola (cf. Table 2). 9

Historically, vowel prosthesis is attested already in the inscriptions of Pompeii (Ismurna for Greek Smyrna, today Izmir; before 79 ad) and then again in the second century ad (cf. Prinz 1938: 97, 103; Sampson 2010: 56). The inserted vowel was first [i]. It was then affected by a change [ĭ]> [e] (cf. Sampson 2010: 62) and starts to show up as <e> in the fifth century ad. The productivity of prosthesis can also be inferred from hypercorrections such as spectemus instead of exspectemus ‘let us await’ (cf. Sampson 2010: 57). Its pattern of regional diffusion is quite distinct: it probably originated in Rome and never reached Dacia, which explains its absence in Romanian (cf. Kiss 1971: 91). As to its actuation, Schuchardt (1867: 337–348) suggests language contact through Christian influence either from the East or from Northwest Africa.

In Old French, prosthetic <e> initially appears only after words ending in consonants. When the cluster is preceded by a vowel, prosthesis is blocked. Thus, for CLat. sponsa(m) OFr. has espouse but la spouse (l’espouse is later). Prosthetic <e> shows up also in Germanic loans such as éperon ‘spurs’ <Frankish sporo (cf. Rheinfelder 1963: 182; Petit Robert). Manuscripts show variation between <sC> and <esC> spellings until the twelfth century, including hypercorrections, such as spectat for exspectat (cf. Sampson 2010: 113). From the end of the Middle Ages onwards, however, words taken over from Latin never show up with a prosthetic <e>: spectacle, splendeur, station, statue, scandale (twelfth century), spasme, sphère (13th), scribe, scrupule (14th), etc. During the Renaissance, some native forms with <e-> were even re-latinized, e.g. <scorpion> for <escorpion> and <special> for <especial> (cf. Petit Robert). In the sixteenth and seventeenth century, the only new words with prosthetic <e> were Italian loans, such as escale or escorte. 10

Despite this general tendency, prosthesis remained common in the speech of non-educated speakers until the nineteenth century. It subsequently disappeared with the introduction of compulsory schooling and the spread of literacy (cf. Sampson 2010: 122–123). In French-based creoles, we again find cases of prosthesis, e.g. GuaCr. èspésyal <Fr. spécial, GuaCr. èskandal <Fr. scandale (cf. Ludwig et al. 2002: 120). Production data of L2 French in Guadeloupe shows that the process is still productive, e.g. Fr. <stupide> read as [ɛstypid] instead of [stypid] (cf. Pustka 2007: 117).

In Spanish – unlike in French – epenthesis has remained productive until today (cf. Hualde 2005: 77). It occurs not only in native words like escuela and learned Latin loans like Sp. escolar and escolástico (cf. Table 2), but also in more recent loans, such as Sp. <stress>/<estrés> [ɛsˈtɾɛs] or Sp. stop [esˈtop] from English (cf. Hualde 2005: 77). There are also hypercorrections like stablishment for establishment (cf. Pratt 1980: 123). In spite of its productivity, however, prosthesis has also led to reinterpretations of lexical targets. This is suggested by derivations like Sp. [[archi][estúpido]] ‘superstupid’ and [[re][estructurar]] ‘re-structure’. In them, the /e/s must be underlying because prosthesis would have been blocked by the vowels before the /st/ clusters (cf. Sampson 2010: 11‒12).

The difference between the Spanish and the French developments may possibly reflect language policies, especially concerning orthography. In France, orthography and pronunciation were re-latinized under the influence of Erasmus of Rotterdam’s reform:

[…] the key factor undermining prosthesis is taken to lie in the prestigious spelling-based system of pronunciation which Erasmus […] proposed in 1528 for Latin and Greek […]. It is assumed that by extension this spelling-based approach to ‘good’ pronunciation was adopted by the educated classes when articulating learned words in French, and from there it spread to the bourgeoisie before finally reaching the mass of the population, especially in Paris and other urban environments where there was a strong cultural presence. In this new system of pronunciation, prosthetic vowels had no place since they were not orthographically represented. (Sampson 2010: 123)

Thus, <espécial> became re-latinized to <spécial> (cf. supra), etc. In Spain, on the other hand, Alfonso the Wise (1221–1284) adopted a policy of emancipating the Romance language from Latin and established an orthography close to its pronunciation, e.g. <especial> (cf. Sampson 2010: 105‒106).

Portuguese behaves historically like Spanish, and has forms such as Pt. escola, escolar, escolástico (cf. Table 2). In speech, however, the prosthetic vowel is no longer pronounced today (e.g. [ˈʃkɔ.lɐ]). It may still be added in writing, as in EP estrumpf [ʃtɾumpf], but does not have to be, as in EP smurf [smœɾf] or stress [stɾɛs] (cf. Section 1). In BP, on the other hand, the initial vowel is regularly pronounced, even when it is not written, e.g. smurf [ismœɾfi] or stress [istɾɛs(i)].

Italian, finally, also knew alternations during the Middle Ages (cf. Kiss 1971: 92; Sampson 2010: 80–96). In the modern standard language, however, the forms with initial sibilant-stop onset have come to be stabilized as in It. scuola and scolastico (cf. Table 2). Exceptional traces of prosthesis can be found only in formal registers and in some fixed constructions such as It. per iscritto. In such constructions, the prosthetic vowel always occurs after a function word that ends in a consonant. In recent borrowings, on the other hand, prosthesis never occurs. Loans such It. sport [spɔɾt] or stress [stɾɛs] from English invariably retain their initial cluster (cf. Sampson 2010: 10).

2.2.2 Consonant elision

The second way of simplifying complex onsets is consonant elision. It has been very rare in Romance languages. This is true already for Latin, where only few cases are attested: thus, CLat. *spasmare ‘to have a spasm’ gave rise to VLat. *pasmare and later to Sp./Pt. pasmar and Fr. (archaic) pâmer ‘to faint’. Other examples are VLat. pasmus besides espasmus <Greek spasmus, 11 or OFr. pouse vs. CLat. sponsam (cf. Schuchardt 1867: 352; Rheinfelder 1937: 184; Sampson 2010: 69; but also espouse, cf. Section 2.2.1). In Modern Spanish, Portuguese, Italian, and French, the elision of initial pre-consonantal /s/ is not attested at all. 12 Only in Guadeloupean Creole some cases are documented, such as pangnòl (beside èspangnòl) ʻperson from the Dominican Republicʼ <Fr. Espagnol (cf. Ludwig et al. 2002: 120). In addition, there is some anecdotal evidence from first language acquisition: I have observed [tχumpf] (2;5) and [χumpf] (2;3) for Schtroumpf in the spontaneous speech of French/German bilingual children. Generally, however, it is fair to say that elision has not been systematically employed in Romance languages for making complex onsets simpler.

2.3 The emergence of complex onsets: Vowel elision and other sources

While complex onsets can change into phonotactically less marked sequences through vowel epenthesis, their emergence is often due to the opposite process, i.e. vowel deletion. Dressler et al.’s contribution to this issue provides ample evidence of this. Here, we consider Romance languages.

In French, complex onsets often arise, phonetically, through the post-lexical deletion of schwa in the clitic 1sg pronoun je (cf. Frei 1929: 126‒27; Krötsch 2004: 218). When je appears before verbs beginning with a voiceless stop, schwa elision and sonority assimilation produce onsets beginning with [ʃ], as in j(e) crois /ʒəkʁ̥wa/> [ʃkʁ̥wa] ‘I believe’ and j(e) trouve /ʒətʁ̥uv/> [ʃtʁ̥uv] ‘I find’ (cf. Section 3.2). Word internally, sibilant-stop onsets originate from schwa elision after word-initial /s/, as in s(e)crétaire /səkʁetɛʁ/> [skʁetɛʁ], s(e)cret /səkʁɛ/> [skʁɛ].

In contrast to English schwa reduction, French schwa elision does not admit of degrees but involves the categorical deletion of a vowel that is otherwise fully pronounced. At the same time, it is highly variable and conditioned by lexical, stylistic and regional factors. Reading, for example, favours spelling pronunciations in which schwas are fully realized (cf. Pustka 2007: 150). It is for this reason that Southern French, which reflects the classroom learning of the written language on the basis of an Occitan substrate, preserves schwa and is consequently regarded as sounding more typically Romance than Northern French in terms of syllable structure and rhythm (cf. Pustka 2007: 137). In Southern France, first syllable schwa is elided in only 10% of possible cases in spontaneous speech. In Paris, on the other hand, it is elided in 40% (cf. Pustka 2007: 154). More specifically, it is always, or very frequently, elided in grammatical words like s(e)ra ‘to be’ 3sg.fut or s(e)rait ‘to be’ 3sg.cond, (100%), or seconde (83%), but much more rarely in lexemes like secrétaire (10%) or secret (8%) (cf. Hansen 1994: 38; Racine and Grosjean 2002: 326). – In Quebec French, to give another example, vowel reduction is particularly frequent, and is found also in content words such as piscine [psɪn] instead of Standard French [pisin]. This creates the impression of stress-timed rhythm (cf. Pustka 2016: 202). Other sources of sibilant-stop onsets in Modern French include word formation, as in the acronym S.C.O. [sko] for Sporting Club Olympique (cf. Plénat 1993: 158), and verlan, a language game which permutes syllables, creating for example verlan [stikmi] from Fr. mystique [mistik] (cf. Azra and Cheneau 1994).

In Spanish, vowel reductions amounting to elisions are regionally and lexically conditioned. For regional variation, the rule of thumb is: “[L]as tierras altas se comen las vocales, las tierras bajas se comen las consonantes” 13 (Rosenblat 1970: 39). These vowel elisions can affect originally prosthetic /e/s, as in [s̩.pa.ra.ɣo] from (e)spárrago /ɛsparaɣo/ ‘asparagus’ (cf. De Crignis 2016: 209). However, they can also complexify or produce codas as in arbolit(o)s /aɾboˈlitos/ > [aɾboˈlits] ‘sapling’ or ahorit(a) /aoˈɾita/ > [aoˈɾit] ‘instantly’.

Table 3:

Initial sibilant-stop clusters in EP and BP (cf. Mateus and d’Andrade 2000: 43, 45)a

In Portuguese, there are differences between European and American varieties. Vowel elisions – and the resulting complex syllable structures – are typical of informal European Portuguese, e.g. desprestigiar [dʃpɾʃtiˈʒjaɾ] ‘to depreciate’ (cf. Mateus and dʼAndrade 2000: 44). In Brazilian Portuguese, on the other hand, epenthesis is frequent in all positions of the word (cf. Section 2.2.1). Mateus and dʼAndrade (2000: 44–45) give the following survey of initial sibilant-stop clusters in native words of EP. These clusters result from the non-realization of originally prosthetic <e->, which is pronounced [i] in BP. 14

In Italian, vowel elision can also give rise to new clusters. This especially concerns the prefix stra-, derived from Latin extra ‘outside of’ in colloquial word formations like [[stra][ricco]] ‘super-rich’.

2.4 Open questions

As we have seen, Romance syllables have undergone not only simplification but also complexification, and sibilant-stop onsets have both disappeared and (re-)emerged. However, different pathways have been followed in individual languages. Onset simplification has occurred equally in hyper-articulated formal speech styles and in natural informal ones, while more complex onsets have been brought about by borrowing from Latin in formal styles and by phonetic reduction in informal styles. Heinz (2010, 2014) therefore suggests that Romance languages differ with regard to the status that complex syllable structures have in the phonological system. If they can result from reduction in informal styles, Heinz considers them as belonging to the core (he marks them as ‘+central’). If they only appear in learned borrowings in formal styles, on the other hand, Heinz regards them as peripheral (or ‘-central’). This produces the classification in Table 4.

Table 4:

Increase and decrease in phonotactic complexity in different speech styles (cf. Heinz 2014: 106; see also Heinz 2010: 153).

Although this representation seems to work for Spanish and Portuguese, however, it fails to capture the situation in French, which is much more complex. First, French shows regional variation in at least the same way as the other languages. Second, the interpretation of schwa elision as an ‘allegro’-style reduction is too simple and not quite appropriate. Instead, it is lexically conditioned as well. Third, the schema also ignores that French has complex syllables in formal style as well. They occur in learned borrowings and are simplified in informal style, like exprès /ɛkspʁɛ/ > [ɛspʁɛ] or [ɛksəpʁɛ]. Finally, it is necessary to add that borrowed complexity does not only come from the Latin cultural superstrate but also from English and German. In the final section of this paper, we therefore attempt to provide a more fine-grained account of the situation in French.

3 Sibilant-stop onsets in French: Two empirical studies

The two studies in this section focus on the status of complex sibilant-stop onsets in French and address the question of how they have come to acquire it. The first one is a diachronic lexicographic study on the sibilant-stop onset types listed in the Petit Robert dictionaries, the second one a synchronic corpus study on the token frequency in Southern and Parisian French.

3.1 Evidence from the Petit Robert

Le nouveau Petit Robert is a dictionary, whose digital edition of 2008 comprises 300,000 entries. It allows phonetic searches and makes it easy to survey the phonotactic structures attested in French words. Although the dictionary contains no information on word frequency and does not distinguish between widely and less widely known words, 15 it provides interesting first insights, particularly because it also reports the dates at which words were first attested.

A phonetic search in the Nouveau Petit Robert returns 796 words that begin with sibilant-stop clusters (Table 5), of which 181 words that begin with sibilant-stop-liquid clusters (Table 6). 16 These figures must be interpreted with some caution, however. It needs to be taken into account, for example, that the Petit Robert has separate entries for spelling variants of the same word (such as <chti> and <ch’ti>), as well as for derivatives from the same base (special, spécialement, spécialisation, spécialisé, spécialiser, spécialiste, spécialité). Also, some phonetic transcriptions in the dictionary are doubtful. It is questionable, for example, if clusters like /sb/ and /sg/ really exist, since there is a strong preference for obstruent sequences to agree in voicing, so we would expect assimilation (cf. Léon 1996: 71). Furthermore, the possibility of schwa elision is not systematically represented in all entries. It is given in the entry of secrétaire [s(ə)kʁetɛʁ] ‘secretary’, for example, but not in that of secret [səkʁɛ] ‘secret’. So, words like secret are not found in a phonetic search for initial sibiliant-stop clusters. Finally, the dates at which words with optional schwa elision were first attested do not tell one when they were first affected by the process. Although spelling variants such as as <stier> for setier ‘hin’ or <c’pendant> for cependant ‘however’ are already attested in the sixteenth century (cf. Fouché 1958: 526), it is impossible to know for individual words of the secretaire-type when they were first pronounced with initial clusters.

Table 5:

Words beginning with sibilant-stop clusters in the Petit Robert.

Table 6:

Words beginning with sibilant-stop-liquid clusters in the Petit Robert.

The first thing that Table 5 shows is that clusters with initial /ʃ/ (10 types) are much rarer than clusters beginning with /s/ (785 types). While this conforms with the intuition that /ʃ/-stop onsets are exceptional in the phonotactic system of French, it is in fact rather surprising that there are 10 words with /ʃ/-stop onsets at all. They have either resulted from schwa elision (e.g. chtouille <j(e)touille) or been borrowed from German during the nineteenth century or from French varieties (argot in Paris, Picard in the North of France) in the twentieth century. Google suggests that the few words with /ʃ/-stop onsets are also quite rare in French. The only exception is the word chti(mi), which can refer both to Frenchmen from the North of France and to the variety of French they speak. It probably originated as an imitation of the Picard phrase ‘It’s me?’, and can count a case of expressive onomatopoeia. The /ʃ/-stop onset enhanced its expressivity not only because it was highly marked, but possibly also because it sounded German, and because German raised strong (and negative) emotions after World War II. This made words with /ʃ/-stop onsets ‘ear-catching’ in the sense that they caught the attention and raised the emotions of listeners (cf. Frei 1982 [1929]: 283; Pustka 2015: 83). The success of the name Schtroumpf, coined in 1958, may be due to these mechanisms, even though the feelings associated with the creatures it refers to are of course not negative. 17

As far as /s/-stop onsets are concerned, the earliest-attestation dates in the Petit Robert suggest that they began to enter the French language from the twelfth century onwards. They first appeared in re-latinized words such as statue, or scorpion, which were followed by new borrowings from the – mainly Latin – cultural adstrates, such as stoïque ‘stoic’, and finally by loans from modern neighbour languages (mainly English) in the twentieth century. Examples are stop or spam. All other [s]-stop onsets resulted from language internal processes, either from word-formation, such as SMIC <Salaire minimum interprofessionnel de croissance ‘minimum wage’, or from schwa elision, such as c(e)pendant ‘however’ or s(e)cours ‘help’.

As far as the 181 entries with triple onset clusters are concerned, their second element is always a voiceless stop and the third one always a liquid. [stʁ̥] onsets are most frequent (108 occurrences), [skʁ̥] onsets occur in 38 words, and /spʁ̥/, /spl̥/ and /skl̥/ appear in 32 words, if counted together. In contrast, triple clusters beginning with [ʃ] appear in three words only. The two beginning with [ʃpʁ̥] (schproum and sprechgesang from German) are also extremely rare, while strudel – the only one with [ʃtʁ̥] – appears to be more common.

A question that arises from the data in the Petit Robert is what they imply for the status of sibilant-stop onsets in the phonotactic system of French. Do speakers treat the words in which they appear as integrated borrowings? Or do they switch codes when they use them?

First, let us consider the rarest types, i.e. triple clusters beginning with [ʃ], which are attested in only three German loans. A Google search 18 for le sprechgesang 19 returns only 2,450 hits (including a Wikipedia article), and one for le schproum returns only 283. Only for le strudel Google returns a higher number, namely 27,400 hits (including a Wikipedia article). That does not say much, however. The pastry is neither common in French pâtisseries nor in restaurants or family meals (perhaps except for some regions). Designating an internationally known Austrian culinary speciality, uses of the word strudel probably represent cases of code switching in spite of its frequency on the internet.

When it comes to the more frequent [s]-stop onsets, we see that they also occur mostly in borrowings. We find them in learned loan words from Latin (e.g. structure) as well as in more recent borrowings from English (e.g. spray). Native words with a triple onsets are splendeur (1120), forms resulting from schwa elision like s(e)crétaire, and language-internal innovations like scrogneugneu (1884), a euphemistic deformation of the interjection Sacré nom de Dieu! ʻHoly name of God!ʼ or the onomatopoeic scratch (1985) for ‘velcro’ (cf. Petit Robert). Since native words with [s]-stop onsets are so rare, it is therefore also difficult to say if they should count as core constituents of the French phonotactic system.

Since dictionary evidence is indecisive, we examine evidence from usage in the next section. As we will see, all types of sibilant-stop onsets, even the ones beginning with [ʃ], turn out to be more frequent than dictionary data might make one expect.

3.2 Evidence from the PFC-corpus

The evidence considered in this section comes from two sub-corpora of the international corpus program Phonologie du Français Contemporain PFC (http://www.projet-pfc.net/, accessed February 8, 2019; Durand et al. 2002). The two specific subcorpora chosen for the purpose represent (a) Parisian French (a variety that can be seen as Standard French [cf. Fouché 1956: II]) and (b) Southern French from the department Aveyron. The two varieties represent opposite extremes with regard to reduction processes and constraints on phonotactic complexity. The capital Paris is considered as the point of origin of the French standard. It has remained the center of innovation until today, and is strongly influenced by linguistic contact. In Parisian French, schwa elision occurs very frequently and results in a high frequency of complex onsets. The Southern third of France, on the other hand, represents the former contact zone with Occitan, which occupies a medial position in the Romance continuum between Catalan and Italian. Southern French is known to realize nearly all schwas and to preserve in this way the typical Romance CV-structure (cf. Pustka 2007 and Section 2.1).

Each of the two sub-corpora contains 20 minutes of spontaneous speech from 20 speakers, which add up to a total of 13 hours and 20 minutes. The sound material has been orthographically transcribed and annotated with the PFC schwa codings. 20 For the purposes of this study, I focussed on a specific context of schwa elision, namely on monosyllables at the beginning of prosodic phrases, such as je in j(e) trouve ‘I find’. In that context, the realization of schwa counts as optional in Standard French. In the peripheries, it is considered Parisian, informal, and associated with youth language accents (cf. Pustka 2007). Crucially, the described context type is diverse in terms of segments. That is to say, it also includes contexts in which schwas are not preceded by sibilants and/or not followed by stops (as in l(e) coup, or (verre) d(e) vin). This has made it possible to compare the rate at which schwas are elided specifically between sibilants and stops to the average rate at which they are elided between other segments, and this allows one to draw conclusion about the specific status of sibilant-stop contexts. The results of this study are represented in Tables 7 to 9.

Table 7:

Schwa elision rates between sibilants and stops in phrase initial monosyllabic words in Parisian and Southern French (Aveyron).

Table 8:

Schwa elision rates between /s/ and /k/ for ce que and ce qui in phrase initial monosyllabic words in Parisian and Southern French (Aveyron).

Table 9:

Schwa elision rates between /ʒ/ and stops for je pense, je trouve and je crois in phrase initial monosyllabic words in Parisian and Southern French (Aveyron).

Overall, our results have confirmed the expected difference between the Parisian and the Southern varieties. (cf. e.g. Durand et al. 1987). Parisians elide 41% of all schwas in monosyllables at the beginning of prosodic phrases, while Aveyronnais speakers elide them in only 24%. 21 Generally, schwa is much more frequently elided after sibilants than after other sounds, which is also predicted by the literature on Standard French (cf. Delattre 1966) and Southern French (cf. Armstrong and Unsworth 1999). Before /p/, for instance, Parisians delete 77% of schwas in phrase initial monosyllables if they are preceded by /ʒ/, but only 10% and 4% of schwas if preceded by /l/ and /d/ respectively.

What Table 7 also shows, is that schwas are much more readily elided when they occur between sibilants and stops than in all other contexts. This is true of both varieties under investigation. While Parisians elide only 41% of all schwas on average, they elide between 68% and 90% when the schwas occur between sibilants and stops. Aveyronnais speakers elide 24% of all schwas on average, but between 40% and 55% of schwas between sibilants and stops. This means that schwa elision is twice as frequent between sibilants and stops than it is on average, i.e. in all possible contexts.

Note that these segmental contexts correspond to very small set of word combinations. Thus, 96% of all tokens of the context type /s_k/ are instances of either the compound relative pronoun c(e) que ‘which’ (61%), or the compound relative pronoun c(e) qui ‘who’ (35%) (cf. Table 8).

Similarly, 64% of all tokens of the context type /ʒ/_/p/ are instances of je pense ‘I think’, 52% of all tokens of the context type /ʒ/_/t/ are instances of je trouve ‘I find’, and 58% of all tokens of the context type /ʒ/_/k/ are instances of je crois ‘I think’ (cf. Table 9). Thus, we are dealing with a small set of fixed constructions that involve the 1sg pronoun and frequently function as discourse markers (“verbes épenthétiques”; Andersen 1996).

With regard to phonological theory, the results of our study have two interesting implications. First, they point to the special phonetic and phonological properties of sibilants. They contrast so well with other consonants that they can assume something like a quasi-syllabic function without losing their consonantal character. Therefore, they can appear before stops in onsets and violate the principle that sonority should fall towards syllable margins (see note 10 above, as well as the discussion in Dziubalska-Kołaczyk’s contribution to this issue). This may also be why schwa elision occurs more often when its outputs are sibilant-stop onsets than when in would result in sequences such as [dv] (<de vin) or [lk] (<le coup). Second, the frequency of schwa elsion in phrases like je trouve, je crois, or je pense shows how constructions that become highly frequent for pragmatic reasons can lead to the entrenchment of reductions that affect the phonotactic system.

In terms of socio-linguistic insights, our study confirms the hypothesis of a change in progress in Southern French, caused by linguistic contact with Parisian French (via the media or migration). While sibilant stop-clusters resulting from schwa elision are currently quite common in Parisian French, Pustka (2007) has shown that schwa elision is still rather scarce amongst Aveyronnais speakers born before 1955. The only two exceptions are the very frequent constructions je suis ‘I am’ (34% of elisions) and je sais ‘I know’ (35%). The average elision rate is only 12%. Amongst the speakers born after 1973, in contrast, elision is much more frequent in je suis (67%) and je sais (62%) and has also begun to affect the constructions je pense ‘I thinkʼ (73%) and je crois ‘I believe’ (58%). 22

4 Conclusion

Our review of sibilant stop-clusters in the Romance languages (Section 2) and two empirical studies on sibilant stop-clusters in French (Section 3) have shown that three factors explain increasing and decreasing phonotactic complexity: language use, language contact, and expressivity.

The cases of vowel elision in French (cf. Section 3.2) as well as in European Portuguese and Highland Spanish (cf. Section 2.3) have demonstrated that reduction processes can operate in language use even if they violate constraints otherwise reflected by attested structures in the language system. Such processes are sometimes rooted in the principle of least effort (Fr. paresse articulatoire) and diffused via the lexicon, particularly in frequent constructions with grammatical or pragmatic functions (cf. Wang 1969; Labov 1981). The question why certain processes are possible in Parisian French but not in Madrid Spanish (in terms of standard varieties localized in the respective capital) remains unanswered: Is a certain language contact the precondition for these internal processes?

Furthermore, we have seen that language contact represents another way for new complex structures to enter phonotactic systems. ‘Foreign’ structures can first be pronounced in the context of code-switching and then as loans (Section 3.1), either being integrated into the structure of the recipient language or changing that language. Which of these pathways is followed, depends on the prestige of the contact language and the education of the speakers. Highly educated speakers are particularly likely to adopt innovations from prestigious donor languages (cf. Levitt 1978: 49). A special case of language contact is the visual contact with written forms, which gives rise to ‘eye loans’, in contrast to ‘ear loans’ (Pratt 1980: 16). In Romance languages, spelling pronunciations (also called Buben effect; cf. Buben 1935) have had a particular impact since re-latinized spellings often reflect Latin etyma. The differences between French and Spanish are probably due to language policies with respect to orthography: re-latinization in France versus emancipation from Latin in Spain.

A third reason for the adoption of new complex onsets is their expressivity (cf. Pustka 2015: 83). The negative prestige of German in post-war France made German-sounding phonoticatic patterns particularly conspicuous and therefore also expressive (cf. Frei 1929: 283). This may have motivated the adoption of loans such as Fr. ch(e)lague ‘stroke’ (<Germ. Schlag) with a /ʃl/-consonant onset (cf. Frei 1929: 281), ersatz with a consonant-/s/ coda (cf. Frei 1929: 283) and schnaps with both (cf. Gauger 2012: 77). The case of Schtroumpf, resembling the German Strumpf ʻstockingʼ (cf. Section 1), perfectly fits this pattern.


I would like to thank Kristina Dziallas, Luise Jansen, Niki Ritt & his editorial team, as well as two anonymous reviewers for a critical reading of this paper. All remaining errors are mine.


  • 1

    French has additionally borrowed smurf (1983) for a type of breakdance with the derivations smurfer and smurfeur (cf. Petit Robert). 

  • 2

    Indicatively, German did not maintain the original name, but replaced it with Schlumpf, thereby avoiding homophony with German Strumpf ‘stocking’, which does not sound funny to German ears at all. For the names of the smurfs in different languages, see the respective articles on Wikipedia and the map on https://www.reddit.com/r/etymologymaps/comments/7g56ox/the_smurfs_in_european_languages_xpost_from (accessed February 8, 2019). 

  • 3

    Although the variant iscuola is also attested (cf. Section 2.2.2). 

  • 4

    It has been suggested that the deletion of /s/ in Old French has caused the disappearance of prosthesis (cf. Sampson 2010: 123). 

  • 5

    In contrast to English, which counts as a stress-timed language with long consonant sequences. Note that the dichotomy between syllable-timed languages and stress-timed ones is controversial (see e.g. Bertrán 1999; Nespor et al. 2011), but works well enough for the present purposes. 

  • 6

    Already attested in Latin, e.g. splendor (cf. Sampson 2010: 42). 

  • 7

    Our reference point is the following quite detailed sonority scale (where „> “ stands for ʻhas more sonority thanʼ): low vowels> mid vowels> high vowels> r-sounds> laterals> nasals> voiced fricatives> voiceless fricatives> voiced stops> voiceless stops (cf. Restle and Vennemann 2001: 1312). 

  • 8

    Additionally, phonological accounts of sibilant-stop onsets have been proposed, in which they do not figure as complex onsets at all. Instead, the initial sibilants are interpreted as syllabic or as extrametrical, as in scuo.las.cuo.la or s)cuo.la. An early example is Rheinfelder’s (1937) grammar of Old French. Rheinfelder proposes that the Vulgar Latin ‘s impurum’ (i.e. word initial /s/ before consonants), represented a syllable in its own right, no matter if there was a prosthetic vowel (as in VLat. s.co.la) or not (as in VLat. is.co.la). However, Rheinfelder provides no empirical evidence for this claim (cf. 1963: 182), and for Sampson (2010: 68), this modelling is at least “questionable”. Instead of regarding the ‘s impurum’ as a syllable of its own, Sampson (2010: 46) considers it as an “extrasyllabic” element or “prependix”, i.e. a sonority peak outside the core syllable. The concept goes back to Sievers (1901), who coined the term ʻside syllableʼ (German Nebensilbe; cf. Restle and Vennemann 2001: 1330). In support of such an analysis, Cser (2012: 365) points to the fact that reduplicated forms do not double the initial /s/, e.g. Lat. stare ~ steti ‘stand’ (and not *stesti). Still, unambiguous empirical evidence for the extrametricality of initial /s/ before consonants is hard to come by, and for arguments against it see Hallʼs (2002) manifest “Against Extrasyllabic Consonants in German and English”. 

  • 9

    In contrast, epenthesis is only attested in a single case, namely in CLat. spiritus versus VLat. sipiritus (which exists alongside the usual espiritus and might be a spelling mistake, cf. Sampson 2010: 71). 

  • 10

    In Italian, initial pre-consonantal /s/ may have been articulated with more intensity than in French, so that French speakers perceived it as a syllable on its own and added a prosthetic vowel when they emulated it (cf. Rheinfelder 1963: 182–185). 

  • 11

    In French, spasme with sibilant-stop onset is reintroduced as a learned loan from Latin in the thirteenth century (cf. supra). 

  • 12

    In Spanish, Portuguese, and Italian elisions do not affect onsets, but codas. Thus, we find Sp. [ek.ˈsak.to] / [eɣ.ˈsak.to] / [e.ˈsa.to] <exacto> (cf. Heinz 2010: 131–132), Pt. exato EP [i.ˈza.tu] (cf. Portal da Língua Portuguesa), and It. esatto [eˈzatto] (cf. PONS). In French, elisions occur optionally in internal position between consonants, where they affect sounds such as the /k/ in exprès /ɛkspʁɛ/> [ɛspʁɛ] (cf. Gadet 1992: 41). In French-based creoles such realizations have been lexicalized as in GuaCr. èspré [ɛspʁɛ] from Fr. exprès. 

  • 13

    ʻThe highlands eat the vowels; the lowlands eat the consonants.’ 

  • 14

    Interestingly, the distribution of simple vs. complex onsets mirrors the one that obtains among Spanish varieties. This might be explained by semi-creolization in Brazil versus contact with indigenous language in the rest of the continent. 

  • 15

    No dictionary represents the mental lexicon of any specific speaker. Thus, the results of lexicographical studies like the present one would profit from being complemented with data from frequency dictionaries or corpora, and with perception and production studies with speakers from different social backgrounds. 

  • 16

    The Petit Robert lists 5,143 words beginning with /s/, 191 with /z/, 903 with /ʃ/ and 933 with /ʒ/. This means that about 12% of the 7,230 words beginning with sibilants begin with clusters. 

  • 17

    Similar evidence comes from /ʃ/-liquid and /ʃ/-nasal onsets, which occur in words such as shrapnel(l) (deriving ultimately from a proper name) and pseudo-Germanisms such as chnoque/schnock ʻimbécile’ (1872, “origine inconnue, p.-ê. de la chanson alsacienne Hans im Schnokeloch” [Petit Robert: s.v.]) or schmilblick (1949, “création de Pierre Dac (…) mot dʼapparence alsacienne, choisi sans doute pour sa graphie compliquée, diffusé par un jeu télévisé” [Petit Robert: s.v.]). 

  • 18

    The search was conducted on www.google.fr with the adjustment ‘French language’ (accessed April 22, 2018). 

  • 19

    With the definite article, to only find French occurrences. 

  • 20

    Both subcorpora are part of a larger project with 100 speakers in total, investigating also the effects of migration, not only from Southern France to Paris, but also from the oversea department Guadeloupe (cf. Pustka 2007). 

  • 21

    In the first syllables of polysyllabic words schwas are elided in 48% of all cases by Parisians versus only 11% by Aveyronnais speakers (cf. Pustka 2007: 155, 162). 

  • 22

    These data can be modeled as lexical borrowings originated from the contact with Parisian French which first lead to suppletive exemplars with and without schwa (and consequently complex consonant clusters, e.g. [ʃkʁ] in je crois) and subsequently to the emergence of a schwa elision pattern (cf. Bybee 2001). 

