Skip to content
BY 4.0 license Open Access Published by De Gruyter Mouton August 23, 2021

A typology of consonant-inventory gaps

Dmitry Nikolaev ORCID logo
From the journal Linguistic Typology


This article provides a new precise algorithmic definition of the notion “phonological-inventory gap”. On the basis of this definition, I propose a method for identifying gaps, provide descriptive data on several types of consonant-inventory gaps in the world’s languages, and investigate the relationships between gaps and inventory size, processes of sound change, and phonological segment borrowing.

1 Introduction

When discussing processes of change in segment inventories, phonologists often invoke the notion of inventory gaps, a notion which is used to describe a situation when certain segments are expected to be present in an inventory but are not there.[1] This notion can be interpreted in two basic ways.

On one hand, there is a set of consonants that are found in most languages of the world, and it is notable when those are absent. This understanding is reflected in Maddieson (2013a), which deals with the absence of cross-linguistically frequent sounds in phonological inventories. For example, voiced pharyngeal fricatives are cross-linguistically rare, and most linguists would not consider their absence to be remarkable. However, a language missing a voiced bilabial nasal /m/ is unusual enough to be noted in descriptions of languages like Wichita (wich1260; Garvin 1950).

On the other hand, an inventory is often considered to contain a gap when it contains several segments whose specifications in terms of voice-onset time (vot), manner, place, and other features can be recombined to create another segment, and this constructed segment is lacking. Thus, it would not be particularly surprising to find a language without a voiceless labio-dental fricative /f/.[2] However, if this language also has bilabial stops /p b/ and a voiced labio-dental fricative /v/, the absence of /f/ becomes unusual: labio-dental fricatives usually co-occur with bilabial stops, and given that the language has both voiceless and voiced bilabial stops /p b/ and the voiced labio-dental fricative /v/, one would expect it to have the voiceless labio-dental fricative /f/ as well. Similarly, it cannot be said that /b/ constitutes a gap in an inventory without voiced stops, but it is a gap in a language with /p t k d ɡ/.[3]

The hypothesis that languages tend to lack gaps of the second type gave rise to Hockett’s (1958: 109) “Principle of the Neatness of Pattern” in phonemic analysis, as well as later theories such as the Principle of Feature Economy (Clements 2003) and Geometric-Constraints theory (Dunbar and Dupoux 2016). In a diachronic setting, this hypothesis was used to explain tendencies in sound change processes (Martinet 1952) and phonological borrowing (Maddieson 1985).

Despite such widespread use of the notion of inventory gaps, no comprehensive statistical data on the distribution of gaps have been published. The aims of this paper are (i) to provide a new algorithmic definition of this notion, (ii) to present typological findings, and (iii) to highlight possible connections between inventory gaps, sound change processes, and consonant borrowing.

The paper is organised as follows. Section 2 surveys related work on the subject. Section 3 provides definitions of several classes of inventory gaps and an algorithm for extracting them from datasets; it also describes the dataset used for this study. Section 4 describes within-manner gaps, i.e. gaps in stop, fricative, and affricate inventories established based on vot and place distinctions. Section 5 describes between-manner gaps, i.e. gaps in stop, fricative, and affricate inventories established based on manner and place distinctions. Section 6 briefly investigates the connection between the number of gaps in an inventory and its size. Section 7 gives an overview of the distribution of gaps of different types across macro areas. Section 8 discusses the connections between inventory gaps and sound change, while Section 9 discusses the role of “gap filling” in segment borrowing. Section 10 concludes the paper and discusses possible avenues for future research.

2 Related work

The problem of how to explain gaps and asymmetries in segmental inventories has attracted researchers for a long time. An important treatment of this subject was given by Gamkrelidze (1973, 1975, who proposed a markedness theory of phonological gaps: he observed that voiced stops become progressively more marked the further back they are pronounced (with /ɡ/ being a frequent gap), while voiceless stops display the reverse tendency (with /p/ being a frequent gap). A similar tendency was observed by Sherman (1975). Based on the analysis of voiceless and voiced stops in a sample of 571 languages, he found that /p/ was lacking 34 times, /t/ and /k/ zero times, and /b d ɡ/, 2, 21, and 40 times respectively. This directional interpretation, although given more weight by laboratory-phonology research (Ohala 1983), was later questioned by Maddieson (2013b), who noted that the absence of /p/ in particular seems to be an aerial phenomenon characteristic of a group of languages concentrated around the Sahara desert.

Another line of work concentrated on why languages are expected to lack gaps and whether this is indeed the case. Thus, Lindblom and Maddieson (1988, 70) noted that “small paradigms tend to exhibit “unmarked” phonetics whereas large systems have “marked” phonetics”, and Clements (2003, 287), in his influential paper on the feature-economy principle, hypothesised that “languages tend to maximise the ratio of sounds over features”. This thesis was upheld as a statistical universal in several later publications (Coupé et al. 2009; Dunbar and Dupoux 2016; Marsico et al. 2004), but it also came under criticism because it fails to adequately predict actual inventory structures (Nikolaev and Grossman 2020). From the point of view of the analysis of gaps per se, these approaches give only an aggregate view of the prevalence of gaps in inventories, and some of the phenomena that depress economy/symmetry indices (e.g., the fact that a language has a single lateral segment) do not correspond to gaps as they are customarily thought of.

An attempt to decide which of the two theories, the markedness approach or the feature-economy approach, better explains the gaps found in the inventories in the PHOIBLE dataset (Moran and McCloy 2019) has been recently made by Wang, who defines a gap as “the absence of an [α voice] stop/fricative in a certain place of articulation when a [ α voice] counterpart exists in the inventory” (Wang 2019: 195).[4] Wang fitted several classification models that, given two versions of an inventory—one with the gap and one where the gapped consonant replaced the actually present “foil” consonant (e.g., with /k/ replaced with the missing /ɡ/)—decided which was the real one. Markedness-based models performed better than feature-economy based ones. This outcome is not surprising given that the markedness theory was explicitly formulated in order to explain frequent gaps, while the feature-economy-based theories regard gaps as deviations from the predicted inventory shapes, which should be explained based on articulatory and auditory constraints (Clements 2003: 288–289). The latter constraints, however, were not modelled explicitly by Wang, nor was there a systematic survey of identified gaps.

3 Defining and identifying gaps

This section provides new formalised definitions of two kinds of inventory gaps and proposes algorithms for their identification in datasets (Section 3.1). The results reported in the following sections were obtained using an IPA-feature-parser script and a fully parsed subset of PHOIBLE. They are described in Sections 3.2 and 3.3 respectively.

3.1 Within-manner and between-manner gaps

Enumeration of feature combinations made “available” by the make-up of an inventory but not realised in practice is a computationally intensive combinatorial task because additional articulations can be nearly arbitrarily combined. In order to make the process more manageable, it is necessary to restrict the combinations of interest by selecting features that can vary and their possible values. In this study, I begin with the simplest possible feature combinations that can produce typologically interesting gaps. I consider place of articulation (place), manner of articulation (manner), and voice-onset time (vot). Furthermore, I leave place of articulation unrestricted but only consider voiceless and voiced stops, fricatives, and affricates.

In order to define and identify gaps I introduce the notion of gap-generating triplets:

A gap-generating triplet (GGT) is a triplet of consonants, such that two pairs of consonants from it are featural minimal pairs with respect to two different features and the members of the third pair differ in the values of both these features.

For example, /b d z/ is a gap-generating triplet (GGT) because /b d/ differ only in their place of articulation, /d z/ differ only in their manner of articulation, and /b z/ differ in both place and manner of articulation. Similarly, /p t d/ is another GGT, because /p/ and /t/ differ only in their place of articulation, /t/ and /d/ only in terms of vot, and /p/ and /d/ in place and vot.

In order to construct or identify GGTs for this study, it is necessary to fix the value of one of the three selected features. When one fixes the manner of articulation, one obtains within-manner GGTs; when one fixes the vot, one obtains between-manner GGTs. For example, /b d z/ is a between-manner GGT on voiced stops and fricatives, and /p t d/ is a within-manner GGT on stops. All other values of all other features must agree for the GGT to be well formed. Thus, /bʲ dʲ zʲ/, /bʷ dʷ zʷ/, and /bʲ ː dʲ ː zʲ ː/ are GGTs, but /bʲ ː dʲ zʲ ː/ and /b dʷ zʷ/ are not, because not all segments agree in length in the first, or in labialisation in the second.

GGTs are the main tool for identifying gaps: any such triplet implies the potential existence of the fourth consonant that will “close the square” by making all consonants in the resulting quadruple participate in two featural minimal pairs each.[5] In order to check if a GGT is indeed associated with a gap, I test all other segments in the inventory, and if there is no segment that can be associated with a GGT (“fill the GGT”), a gap has been found. The same gap can of course be associated with several GGTs, so it is necessary to take care to avoid overcounting.

It is possible to manually associate every possible triplet with the corresponding gap-filling consonant, but this is very tedious even with the low number of feature values included in the analysis. Instead, I adopted a three-stage procedure:

  1. I enumerated all GGTs for all languages in the sample not filled by segments from the same inventory.

  2. For all GGTs, I checked if they can be associated with a segment from any other language in the sample. A small minority of GGTs demanded relatively exotic segments missing from all inventories, such as /ʃˤː/. These GGTs were filled by constructing segments by hand. As these gaps are very rare, they eventually did not figure in the analysis.[6]

  3. Lists of GGTs for all languages were replaced with sets of segments filling these GGTs.

This algorithm corresponds to the following definition of a gap:

A consonant inventory has a gap if there is a segment that can be associated with at least one GGT in this inventory but is not present in it.

According to this definition, for example, the GGT /p t d/ produces a gap if the segment /b/ is not found in the same inventory.

I also distinguish between two types of between-manner gaps, direct and inverse gaps. Given the tendency for stops to be found at more places of articulation than fricatives in any given language, and for fricatives to be found at more places of articulation than affricates,[7] one would expect to find an asymmetry between stops lacking corresponding fricatives and fricatives lacking corresponding stops, and a similar asymmetry for fricatives and affricates. I call the gaps where the missing segment belongs to a less typologically diverse class direct gaps and the gaps where it belongs to a more diverse class inverse gaps.

3.2 Feature parsing

In order to identify GGTs and check if they can be filled, it is necessary to have feature specifications of all segments in an inventory. The largest available segmental dataset, PHOIBLE (Moran and McCloy 2019), provides feature specifications for phonemes. However, the feature system used there, based on the one proposed by Hayes (2008), is not well suited for capturing GGTs. For example, it does not have place and manner features but instead provides binary features, such as coronal and delayed release. For the present purposes, it is more convenient to use the feature system underlying the tables illustrating the International Phonetic Alphabet and most often utilised in phonological descriptions of inventories, where place, manner, and vot are nominal features with values such as labiodental, stop, or voiceless. However, in principle, other feature sets could be used.

For this study, the IPA Parser script powering the feature-search feature in the EURPhon database (Nikolaev 2018) and published together with it[8] was used. It parses segments in the IPA notation into a structured format,[9] which makes it easy to extract GGTs. The script can parse all consonants in PHOIBLE except for clicks and several segments written in non-standard notation. These segments were excluded from the analysis.

3.3 The dataset

The latest version of PHOIBLE (Moran and McCloy 2019) was used for the analysis. PHOIBLE often includes several descriptions (doculects) of the same language. Each doculect has its own unique inventory ID. For each language (identified by its glottocode[10]), the highest inventory ID was taken. This is an arbitrary choice, but it is assumed that higher IDs correspond to more recent descriptions, which, in my experience, tend to be more phonetically accurate. Only consonant inventories where all segments can be parsed with IPA Parser were selected. The resulting sample consists of 1,694 languages (out of 2,186 different languages). A breakdown in term of macro-areas and language families is given in Tables 1 and 2 respectively.

Table 1:

Macro-area breakdown of the sample.

Macro-area Count
Africa 683
Australia 15
Eurasia 412
North America 102
Papunesia 157
South America 323
No data 2

Table 2:

Family breakdown of the sample.

Family Count Family Count Family Count
Abkhaz-Adyge 1 Hibito-Cholon 1 Pano-Tacanan 25
Afro-Asiatic 99 Hmong-Mien 1 Peba-Yagua 2
Ainu 1 Huavean 1 Pidgin 1
Algic 4 Huitotoan 4 Pomoan 1
Angan 1 Ijoid 6 Puri-Coroado 1
Araucanian 1 Indo-European 145 Quechuan 22
Arawakan 33 Iroquoian 3 Sahaptian 1
Arawan 5 Japonic 3 Saharan 4
Athabaskan-Eyak-Tlingit 6 Jarawa-Onge 1 Salishan 7
Atlantic-Congo 413 Jodi-Saliban 3 Sepik 5
Austroasiatic 34 Kadugli-Krongo 7 Sino-Tibetan 91
Austronesian 99 Kakua-Nukak 2 Siouan 2
Aymaran 2 Kartvelian 4 Sko 3
Baining 1 Kawesqar 1 Songhay 4
Barbacoan 4 Khoe-Kwadi 3 South Bird’s head family 1
Basque 1 Kiowa-Tanoan 3 South Omotic 3
Blue Nile Mao 1 Koiarian 1 Southern Daly 1
Bookkeeping 4 Kolopom 1 Surmic 5
Boran 3 Koman 4 Ta-Ne-Omotic 8
Border 2 Konda-Yahadian 1 Tai-Kadai 15
Bororoan 2 Koreanic 1 Tamaic 1
Cahuapanan 2 Kresh-Aja 2 Teberan 1
Cariban 24 Kuliak 2 Temeinic 2
Central Sudanic 23 Kxa 1 Ticuna-Yuri 1
Chapacuran 1 Maiduan 1 Timor-Alor-Pantar 1
Chibchan 8 Mande 41 Totonacan 3
Chicham 4 Matacoan 4 Tsimshian 2
Chimakuan 1 Mayan 11 Tucanoan 19
Chocoan 6 Misumalpan 1 Tungusic 3
Chonan 2 Miwok-Costanoan 2 Tupian 50
Chukotko-Kamchatkan 3 Mixe-Zoque 6 Turkic 20
Chumashan 1 Mixed language 2 Unclassifiable 1
Cochimi-Yuman 2 Mongolic 10 Uralic 31
Dajuic 1 Muskogean 3 Uru-Chipaya 2
Dogon 2 Nadahup 4 Uto-Aztecan 13
Dravidian 33 Nakh-Daghestanian 10 Wakashan 2
East Strickland 1 Nambiquaran 4 Western Daly 1
Eastern Jebel 1 Narrow Talodi 4 Wintuan 1
Eskimo-Aleut 2 Ndu 3 Yanomamic 5
Furan 1 Nilotic 25 Yareban 1
Great Andamanese 1 North Halmahera 1 Yawa-Saweru 1
Guahiboan 4 Nuclear Torricelli 6 Yeniseian 1
Guaicuruan 5 Nuclear Trans New Guinea 25 Yukaghir 2
Gumuz 1 Nuclear-Macro-Je 20 Yuki-Wappo 1
Gunwinyguan 1 Nyimang 1 Zamucoan 2
Harakmbut 1 Otomanguean 10 Zaparoan 4
Heibanic 6 Pama-Nyungan 12 Isolate 60

As might be expected, a “total” sample like the one provided by PHOIBLE is dominated by several large language families. In particular, Atlantic-Congo and Indo-European are well-represented, while Austronesian languages are underrepresented, and many phyla are represented by a single language. When discussing different types of gaps below, I address whether there are noticeable connections between gap types and phyla and/or macro-areas.

Another pitfall inevitable in a study of this kind is the reliance on aggregated and normalised data. Many important distinctions and commonalities between segments across inventories are almost surely hidden or distorted due to notation conventions employed by different scholars (e.g., aspirated segments are often recorded as plain voiceless ones if they are contrasted with voiced stops, and stops with a near zero VOT can be treated as voiced, devoiced, or voiceless, etc.). A somewhat optimistic assumption that this analysis, like all large-scale work on phonological typology, is based on is that the noise in the data will lead to increased variance of the results but will not bias them.

4 Within-manner gaps

I begin with within-manner gaps, which are operationalised as cases when there are GGTs unfilled with respect to vot and place, with manner fixed to stop, fricative, or affricate. In other words, a within-manner gap is a situation when a stop, fricative, or affricate is missing from in an inventory that has the same stop/fricative/affricate but with different voicing and a pair of voiced and voiceless stops/fricatives/affricates at some other place of articulation.

4.1 Stops

Statistics for within-manner stop gaps occurring 10 and more times in the dataset are given in Table 3. Note that the first and third positions in the ranking are occupied by the velar and uvular voiced stops. This is unsurprising, as it has been hypothesised that the oral cavities associated with their articulation are too small to sustain prolonged vocal-fold vibration (cf. a discussion in Ohala 1983, 195–196). Moreover, it seems that gaps are overall much more likely to be found among the voiced part of the stop inventory.

Table 3:

Within-manner stop gaps.

Gap ɡ p ɢ d b ɟ c ʈ k
Count 163 116 81 72 57 40 39 31 20

Gap ɖ ɖʰ ŋ̊ ɲ̊
Count 19 18 17 15 15 14 12 12 11

It may be pointed out that the mildly frequent gapping of /d/ and /b/ contradicts the feature-economy/symmetry theories (Clements 2003; Dunbar and Dupoux 2016): there are no evident articulatory or auditory reasons for these segments to be absent, which depresses the feature-utilisation ratio in respective inventories.[11] The fact that /d/ is gapped more often than /b/ contradicts the markedness-cline interpretation of Gamkrelidze (1975).

The voiceless bilabial stop /p/ is also known to be a frequent gap (Maddieson 2013a; see also Ohala [1983: 195] and the discussion in Section 8). A strong areal tendency is evident: /p/ is quite often lacking in Atlantic-Congo (N=45) and Afro-Asiatic (N=28), which accounts for its being a gap in 12.4% of African languages, while this gap is comparatively very rare in Eurasia (1.2%) and South America (2.8%). Gapped /p/ is also rather frequent in Papunesia (10.8%), but the sample size there is smaller. Out of 102 North American and 335 Australian languages in the sample, none have /p/ as a gap.

By contrast, the /b/ gap is distributed much more evenly (Eurasia: 1.55%, Papunesia: 3.16%, Africa: 3.34%, North America: 2.76%, South America: 4.8%, Australia: 0%) and /g/ is often gapped in South America (18.32%, cf. Eurasia: 6.87%, Papunesia: 5.79%, Africa: 6.82%, North America: 7.59%, Australia: 0%). A more comprehensive overview of which gaps predominate in which macro-areas is given below in Section 7. It is already evident that strongly “horizontal” inventories of Australia are very unlikely to develop gaps (Fletcher and Butcher 2014).

Returning to Table 3, it is notable that /ɟ/ and /c/ are nearly tied. The main reason for gapping here seems to be the tendency of these stops to become affricates.[12]

Other notable stop gaps include murmured voiced stops, the retroflex stop, labialised voiceless stops, and voiceless nasal stops.[13]

4.2 Fricatives

Statistics for within-manner fricative gaps occurring 10 and more times in the dataset are given in Table 4.[14]

Table 4:

Within-manner fricative gaps.

Phoneme ɦ ʒ z v ɣ x f ʁ ʝ
Count 535 162 156 117 112 98 31 30 30

Phoneme ʐ h θ χ ð ʑ ʕ ʃ
Count 26 23 20 17 14 13 11 10 10

The top spot is occupied by /ɦ/, which is a rather rare segment (59 occurrences in the sample), perhaps due to its easy confusability with /h/. It is mostly found in Africa and Southeast Asia.[15] The whole top five consists of voiced fricatives, which shows that vot skewedness in this class is even stronger than with stops.

The most frequent gap in the domain of voiceless fricative is /x/, which is only slightly less frequent than /ɣ/. The situation here seems to be the mirror-image of that of palatal stops described above. A diachronic interpretation suggests itself: while palatal stops tend to become affricates, velar stops tend to lenite to fricatives (Kümmel 2007).

4.3 Affricates

Within-manner affricate gaps are infrequent: only /dz/ (N=55) is gapped more than 10 times, /ʈʂ/ 5 times (if one includes cases where it corresponds to /ɖɽ/), and /dʒ/ and /bv/ 4 times each. The case of /ɖɽ/∼/ʈʂ/ is special, and it is otherwise evident that it is nearly always voiced affricates that are missing.

/dz/ is the second most frequent voiced affricate in the sample (N=211), and the corresponding gap is created when the inventory has only the much more frequent /dʒ/ (N=539) together with some other affricate pair. The fact that the absolute gap frequency for /dz/ is comparable to the one for /d/ (72) indicates that languages are much more likely to have a gapped coronal affricate inventory than a gapped coronal stop inventory. Affricate inventories tend to be small compared to stop inventories (the median number of affricates in an inventory in the sample is two while the median number of stops is 12), so ceteris paribus the probability of affricate gapping should also be much lower.

5 Between-manner gaps

Between-manner gaps—which is operationalised as a situation when an inventory lacks a voiced or voiceless stop/fricative/affricate at some place of articulation where it has a segment of another manner with the same voicing, while it has a full pair at some other place of articulation—stem from two different sources. On one hand, there is a tendency (see fn. 7) for fricatives to be found at fewer places of articulation than stops; there is a similar disparity between affricates and fricatives. These tendencies are likely to be reflected in direct gaps in fricatives and affricates (i.e. when fricatives corresponding to stops and affricates corresponding to fricatives are missing).

On the other hand, there is a mismatch between the number of places of articulation available for stops and fricatives: labio-dental, interdental, and pharyngeal stops are not found in inventories and some postalveolar stops are extremely rare.[16] As a consequence, some inverse between-manner gaps for stops (i.e. stops missing corresponding fricatives) are trivial. However, inverse gaps for stops that usually do have corresponding fricatives are no less intriguing than within-manner stop gaps, because in “well-behaved inventories” striving to maximise the ratio of distinctive features over segments, one would not expect to see them at all. E.g., an inventory with /s/ lacking /t∼t̪/ or an inventory with /x/ lacking /k/ is highly unexpected.

A special case, at least in terms of frequency, is presented by /ʔ/, which, unlike the corresponding fricative, does not participate in a vot/voicing opposition and therefore cannot be a within-manner gap. It is an extremely frequent inverse between-manner gap, found in 541 languages: it is triggered whenever a language has /h/ but not the glottal stop and some other consonant with a corresponding stop. However, it must be taken into account that in many cases, e.g. in some varieties of German, /ʔ/ is present in a language as a “sub-phonemic” segment ensuring that there are no zero-onset syllables.

5.1 Stops and fricatives

5.1.1 Direct gaps

Some of the fricatives can only be within-manner gaps because they do not have a corresponding stop. Such fricative gaps appearing five or more times in the sample are show in Table 5.

Table 5:

Within-manner fricative gaps without stops at the same place of articulation.

Phoneme ɦ ʒ θ ð ʑ ʕ ʃ ɦː ʒː ɲʒ ɦʷ
Count 535 162 20 14 13 11 10 8 8 6 6

Conversely, there are fricatives that are found only as between-manner gaps (Table 6), but these are rather rare.

Table 6:

Exclusively between-manner fricative gaps.

Phoneme ŋɣ ɲʝ mv çː ʂː
Count 53 9 17 26 11 8 13 6 5 6 5 5

Statistics for common (20+ occurrences) between-manner gaps that are also found as within-manner gaps are given in Table 7. An important result is that /x/, /f/, and /ɣ/ are found much more frequently as between-manner gaps than as within-manner gaps: 1,078 vs. 98, 558 vs. 31, and 461 vs. 112 respectively. This stands in stark contrast to /v/, which is found as a within-manner or a between-manner gap with a comparable frequency (124 vs. 117).

Table 7:

Between-manner fricative gaps also found as within-manner gaps.

Phoneme x f ɣ ç h ʝ v
Count 1,078 558 461 241 134 134 124

Phoneme ʂ s ʐ χ z
Count 97 83 70 69 65 39

Turning to less frequent gaps, a still significant disparity is found for /ʐ/ (70 vs. 26) and /χ/ (69 vs. 17). This can be contrasted with /s/, a mid-frequency between-manner gap, which is found as within-manner gap only five times. It is notable also that /z/ is more often found as within-manner gap (156) than as between-manner gap (65).

Overall, within-manner gaps underline the huge disparity in the frequency of gaps in voiceless vs. voiced fricatives. Between-manner gaps, on the other hand, indicate that there are disadvantaged places of articulation: languages statistically tend to acquire voiced versions of coronal fricatives before they acquire voiceless labiodental or velar ones.

5.1.2 Inverse gaps

In contravention of the feature-economy principle, which demands that there are no unused possible combinations of vot and place features, inverse gaps occur with most stop types, with only special groups of stops found only as within-manner gaps (Table 8).

Table 8:

Within-manner stop gaps not found as inverse between-manner gaps.

Phoneme ɖʰ ŋ̊ ɲ̊ ɢʷ m̥p
Count 19 18 14 12 11 8 6 5

At the same time, one can see that inverse between-manner gaps that are not also within-manner gaps in the same languages are never really frequent (Table 9). Moreover, they fall into two distinct groups: /p/ and, to a smaller extent, /ɢ/ are often found both as within-manner and between-manner gaps in the same languages, while other stops are exclusively or predominantly found either as within-manner or between-manner gaps.

Table 9:

Major inverse between-manner stop gaps also found as within-manner gaps.

Phoneme Within-manner Between-manner Both
ʈ 30 71 1
c 38 56 1
q 0 47 2
ɢ 57 33 24
t 3 33 2
ɖ 15 25 0
ɟ 40 22 0
P 18 20 98

This points to the existence of rather profound differences in how subinventories are structured in different languages. As an illustration, let’s consider the case of the voiceless retroflex stop /ʈ/. It is a rather common between-manner gap in inventories with the retroflex fricative /ʂ/ but without retroflex stops. Such inventories are characteristic of, e.g., Uralic languages and Sino-Tibetan languages of the Circum-Tibetan area (cf. the Uralic language Erzya, erzy1239, whose stop + fricative inventory consists of /b p t t̪ d̪ k ɡ v f s̪ z̪ ʂ ʐ x/, and the Sino-Tibetan language Rgyalthang Tibetan[17] with /p pʰ b t tʰ d k kʰ g ʔ s z ʂ ʐ ɕ ʑ h/, data from EURPhon [Nikolaev 2018]). At the same time, it is a mildly common within-manner gap in Africa, where languages often have a single retroflex segment, /ɖ/. This distribution of gaps shows that the “retroflex component” of сonsonant inventories has at least three distinct variants: (1) a group of one or two fricatives; (2) a single stop; (3) a full-blown subsystem with stops, fricatives, and potentially affricates.[18]

5.2 Fricatives and affricates

5.2.1 Direct gaps

Affricates can potentially be formed at all pre-pharyngeal places of articulation utilised by fricatives, but in terms of frequency, languages favour coronal affricates. Table 10, which includes direct between-manner affricate gaps, demonstrates how unlikely languages with /f/ and /x/ are to have /pf/ and /kx/ (or /bv/ and /gɣ/ if they have any voiced affricates). The presence of the rather frequent affricate /ts/ in the top five is explained by the fact that nearly exactly half of languages with voiceless affricates (563 out of 1,122) have only one, and more than half (398) of the latter have /t̠ʃ/. The distribution is even more skewed in the domain of voiced affricates: out of 471 languages with a single voiced affricate, 338 have /d̠ʒ/.

Table 10:

Direct between-manner affricate gaps.

Phoneme pf ts kx bv dz ɡɣ ʈʂ
Count 402 225 208 165 96 88 83 46

Phoneme t̠ʃ t̪θ ɢʁ ɢʁ d̪ð ɟʝ
Count 41 40 31 26 20 14 11

5.2.2 Inverse gaps

Inverse between-manner fricative gaps found five or more times are shown in Table 11. The presence of both voiceless and voiced retroflex fricatives in the top three indicates that the connection between retroflex fricatives and affricates is not particularly strong. Overall, however, these gaps are clearly marginal.

Table 11:

Inverse between-manner fricative gaps.

Phoneme ʃ ʂ ʐ ɕ ʒ s z ç ʑ
Count 28 15 13 12 12 8 7 6 5

6 Gaps and inventory sizes

One question that arises when looking at data on gaps is whether there is a limiting ratio of gaps to the number of segments in an inventory that languages converge to. In other words, it is natural to assume from basic probability and combinatorics that large inventories will on average have more gaps than smaller ones, but how skewed can large inventories become?

Interestingly, it seems that there is a limit to the skewedness and that it is in fact small inventories that are the most skewed in relative terms. Figure 1A shows the dependence of the number of within-manner stop and fricative gaps (the contribution of affricate gaps being marginal) on the inventory size in absolute counts, and Figure 1B shows the relationship between the inventory size and the ratio of the gap count to the inventory size. While the absolute number of gaps grows in a linear fashion with the inventory size (the maximum value of 13 is attained in Tashlhiyt Berber, tach1250), the ratio of the number of gaps to the inventory size quickly stabilises to 0.08 (IQR = 0.07). In other words, large inventories are not more skewed than small ones, and the processes of inventory growth that are detrimental to feature economy (Nikolaev and Grossman 2020) are at least partly balanced out by different “gap-filling” processes.

Figure 1: 
(A) Gap counts by inventory size. (B) Ratios of gap counts to inventory sizes by inventory size. Grey areas indicate 95% confidence intervals of a loess (local polynomial regression) approximation.

Figure 1:

(A) Gap counts by inventory size. (B) Ratios of gap counts to inventory sizes by inventory size. Grey areas indicate 95% confidence intervals of a loess (local polynomial regression) approximation.

7 Inventory gaps and macro-areas

In this section, I provide statistics regarding the most frequent within-manner stop and fricative gaps in different macro-areas. As will be shown below in Section 9, these data are relevant for an understanding of segment borrowing processes.

7.1 Stops

The five most common within-manner stop gaps for all macro-areas (except Australia, which does not have them) are given in Table 12. This table does not include nasal stops and murmured (“voiced aspirated”) stops; otherwise the Eurasian top 10 would be as follows: /ɢ/: 39, /ɡ/: 31, /ɟ/: 21, /bʰ/: 17, /dʰ/: 17, /ɖʰ/: 14, / ŋ̊ /: 9, /c/: 7, /b/: 7, /d/: 7. There is a high degree of uniformity with /ɡ ɢ d b c p/ occupying nearly all the cells. As noted in Section 5.1.2, /ʈ/ is a frequent gap in Africa where many inventories have only one retroflex stop, /ɖ/. In South Asia, by contrast, arrays of retroflex stops are usually complete.

Table 12:

Most common within-manner stop gaps in different macro-areas.

Africa Eurasia North America South America Papunesia
Phoneme Count Phoneme Count Phoneme Count Phoneme Count Phoneme Count
p 85 ɢ 39 ɡ 11 ɡ 61 ɡ 61
ɡ 49 ɡ 31 ɢ 8 d 24 d 24
d 26 ɟ 21 d 5 b 16 b 16
ʈ 25 c 7 k 4 ɢ 9 ɢ 9
c 24 b 7 b 4 P 9 P 9

7.2 Fricatives

The five most common within-manner fricative gaps for each of the macro-areas (except Australia, which, again, has few fricatives and hence few within-manner fricative gaps) are given in Table 13. An even higher degree of uniformity can be seen than with stop gaps, with /ɦ ʒ z v x ɣ f/ being the most frequent gaps across macro-areas. The prevalence of this set of fricative gaps constitutes a strong statistical universal in the sense advocated by Dryer (1989).

Table 13:

Most common within-manner fricative gaps in different macro-areas.

Africa Eurasia North America South America Papunesia
Segment Count Segment Count Segment Count Segment Count Segment Count
ɦ 302 ɦ 162 ɦ 21 ɦ 32 z 17
ʒ 97 ɣ 56 Z 10 z 20 ɦ 15
Z 77 V 35 ʒ 9 ʒ 15 ʒ 7
X 75 ʒ 33 F 4 ɣ 13 x 7
V 72 Z 32 V 2 v 5 ɣ 5

8 Where do gaps come from? Inventory gaps and sound change

The relationship between inventory gaps and processes of sound change is rather complex. On one hand, gaps may arise due to sound changes (the loss of /p/ in proto-Celtic being a classic case [McCone 1996]). On the other hand, it has been often argued that processes of sound change can be shaped by structural asymmetries in inventories, thereby performing a “gap-filling” function (Boersma 1998; Salmons and Honeybone 2014). Testing the latter hypothesis is outside the scope of this paper as it entails looking for gaps not in contemporary but in historical and reconstructed inventories and checking whether they have since been filled. Testing if there are connections between present-day gaps and sound changes is more straightforward, as one only needs access to a sample of sound change processes.

Considerable progress has been made in this area through the work by Kümmel (2007) and the participants of the UniDia project.[19] A combined dataset of 13,095 sound change processes has been compiled by me based on the data from these sources, augmented with additional data on Sino-Tibetan languages. The sound changes in these data-set are of three types: (1) deletion of single segments; (2) emergence of single segments due to epenthesis; (3) single-segment to single-segment change (such as k > tʃ / _{i,j}). The sample is heavily skewed towards African and Eurasian languages with some data from South America, so the results should be interpreted with a grain of salt.

Using these data, one can test if a particular segment is more likely to be a source of a sound change (and thus disappear from a particular type of context) or a reflex of a sound change; again, this gives only a rough approximation of the processes that create gaps, since in most cases sound changes affect only some of the contexts where a segment is found. I computed reflex/source odds for all segments encountered more than 10 times in the data.

The results are very different for stops and fricatives. Consonants corresponding to major within-manner stop gaps are indeed more likely to undergo a change than to be a reflex of a change: the odds for /ɡ p d b ɟ c k/[20] are all smaller than 0.5 (there are no data on /ɢ/). As for /ʈ/, it is only encountered as a reflex of sound changes, and it seems likely that African language having only /ɖ/ did not lose /ʈ/ due to rare sound changes but never acquired it in the first place.

Among the top 10 fricative gaps,[21] however, only /ɦ/ and /x/ have odds smaller than 1, as they often lenite (/ɦ/ to nothing, and /x/ to /h/ or nothing). Others are much more likely to be the result of sound change than to be lost, with odds ranging from 1.5 for /ɣ/ to 9.7 for /v/: these segments are the usual products of lenition.

This discrepancy seems to indicate that gaps in stops and fricatives arise in two different ways. If fricative gaps are overwhelmingly the product of the processes of inventory growth (inventories acquire some fricatives and not others), stop gaps are more likely to arise due to the loss of segments through sound change processes.

9 Inventory gaps and segment borrowing

Segment borrowing is by its nature prone to at least have the appearance of being a gap-filling process: if a language lacks a typologically frequent consonant and borrows words from other languages, chances are high that these words will contain the missing segment. Therefore, it is very hard to establish a real bias in segment borrowing towards borrowing gap-filling segments as opposed to borrowing what could possibly be borrowed. An attempt at estimating the relative importance of gap filling for segmental borrowing has been made by Maddieson (1985). He noted that approximately a half of the cases of segmental borrowing he surveyed (90 out of 184) can be considered examples of gap filling, but he did not control for the typological frequency of the borrowed segments. Conversely, however, if the segments that are often borrowed are not frequently found as gaps, this should indicate that the gap-filling tendency in segmental borrowing is not particularly strong.

In order to check if there are correspondences between segment borrowing and gaps, I compared the data from Tables 12 and 13 to the statistics on the data about consonants most often borrowed in different macro-areas. These data, shown in Table 14, were extracted from the SegBo database (Grossman et al. 2020). The comparison demonstrates that gap-filling borrowing can be advanced as an explanation of at least some of the facts: /p z/ in Africa, /ʒ z/ in Eurasia, and /ɡ b d/ in the Americas are both often found as gaps and often borrowed. Other frequent borrowings, however, such as /p f h tʃ dʒ/, are very rarely found as gaps. This seems to indicate that they are usually the first segment of their kind in the borrowing language’s inventory, which therefore uses them not for gap filling but as a means to co-opt new articulatory regions (e.g., labio-dental fricatives or affricates of any kind).

Table 14:

Most commonly borrowed consonants in different macro-areas.

Eurasia Africa North America South America Papunesia
f P ɡ ɡ
ʒ H b b
Z f f h
z ʃ d d f
v F r r ɡ

10 Conclusion

The main aim of this paper has been to draw attention to the fact that it is possible, and quite promising, to systematically study inventory gaps in the world’s languages. My approach, in addition to providing new descriptive findings, has shown that theories of consonant-inventory structure advanced in the literature demonstrate only partial agreement with the data. Thus, Gamrkelidze’s markedness cline (1973; 1975), while correctly pointing to /p/, /ɡ/, and /ɢ/ as most frequent gaps found at the opposite side of the front–back axis, fails to predict that /d/ will be a more frequent gap than /b/. It can be also noted that /p/ is not a frequent gap in Eurasia and the Americas, which makes Gamkrlidze’s claim universally valid only for voiced stops. Clements’s feature-economy Principle (2003) fails to predict a noticeable number of inverse between-manner gaps such as /p/, /t/, and /q/ (i.e. cases when there are fricatives with place of articulation and voicing corresponding to those of missing stops) showing that even articulatory and auditory advantageous combinations of feature values can be underutilised. Moreover, the statistical analysis of the degree of skewedness of inventories presented in Section 6 indicates that there is a degree of asymmetry that inventories are ready to tolerate and that this degree is higher for smaller inventories. The latter finding is in line with the conclusions of Nikolaev and Grossman (2020), who noted that smaller inventories, mostly consisting of the members of what they call the Basic Consonant Inventory and the First Extension, are highly likely to violate the feature-economy Principle.

The results presented above are based on a formal algorithmic definition of an inventory gap, which makes it possible to automatically extract these gaps from existing segmental databases. However, this operationalisation of the notion of a gap is not the only one possible, tied as it is to the conservative IPA feature set, and there is more theoretical work to be done in this area. More data are also needed on phonological inventories (especially from the Pacific region and North America) and more data on historical and reliably reconstructed sound change processes in order to test in a more rigorous way hypotheses about the role that inventory gaps play in the history of phonological systems. Finally, a whole class of segments—clicks—is not covered by this method because the extant IPA parsers do not handle it and due to a lack of primary data. This is another important avenue for future work.

Data-availability statement

The dataset and the code can be downloaded from a GitHib repository:

Corresponding author: Dmitry Nikolaev [dmʲˈitrʲij nikɐlˈajif], Stockholm University, Stockholm, Sweden, E-mail:


Boersma, Paul. 1998. Functional phonology: Formalizing the interactions between articulatory and perceptual drives. University of Amsterdam dissertation.Search in Google Scholar

Clements, G. Nick. 2003. Feature economy in sound systems. Phonology 20(03). 287–333. in Google Scholar

Coupé, Christophe, Egidio Marsico & François Pellegrino. 2009. Structural complexity of phonological systems. In François Pellegrino, Egidio Marsico, Ioana Chitoran & Christophe Coupé (eds.), Approaches to phonological complexity, 141–170. Berlin; New York: Mouton de Gruyter. in Google Scholar

Dryer, Matthew S. 1989. Large linguistic areas and language sampling. Studies in Language 13(2). 257–292. in Google Scholar

Dunbar, Ewan & Emmanuel Dupoux. 2016. Geometric constraints on human speech sound inventories. Frontiers in Psychology 7. in Google Scholar

Esling, John H. 1999. The IPA categories “pharyngeal” and “epiglottal” laryngoscopic observations of pharyngeal articulations and larynx height. Language and Speech 42(4). 349–372. in Google Scholar

Fletcher, Janet & Andrew Butcher. 2014. Sound patterns of Australian languages. In Harold Koch & Rachel Nordlinger (eds.), The languages and linguistics of Australia: A comprehensive guide, 91–138. Berlin; Boston: Walter de Gruyter.Search in Google Scholar

Gamkrelidze, Thomas V. 1973. Über die Wechselbeziehung zwischen Verschluß- und Reibelauten im Phonemsystem. Phonetica 27(4). 213–218. in Google Scholar

Gamkrelidze, Thomas V. 1975. On the correlation of stops and fricatives in a phonological system. Lingua 35(3–4). 231–261. in Google Scholar

Garvin, Paul L. 1950. Wichita i: Phonemics. International Journal of American Linguistics 16(4). 179–184. in Google Scholar

Grossman, Eitan, Elad Eisen, Dmitry Nikolaev & Steven Moran. 2020. SegBo: A database of borrowed sounds in the world’s languages. In Proceedings of the 12th Language Resources and Evaluation Conference, 5316–5322. Marseille, France: European Language Resources Association.Search in Google Scholar

Hayes, Bruce. 2008. Introductory phonology. Chichester: Wiley-Blackwell.Search in Google Scholar

Hockett, Charles F. 1958. A course in modern linguistics. New York: Macmillan.Search in Google Scholar

Iverson, Gregory K. & Joseph C. Salmons. 2005. Filling the gap: English tense vowel plus final /š/. Journal of English Linguistics 33(3). 207–221. in Google Scholar

Kümmel, Martin. 2007. Konsonantenwandel: Bausteine zu einer Typologie des Lautwandels und ihre Konsequenzen für die vergleichende Rekonstruktion. Wiesbaden: Reichert.Search in Google Scholar

Laufer, Asher. 1996. The common [ʕ] is an approximant and not a fricative. Journal of the International Phonetic Association 26(2). 113–117. in Google Scholar

Lindblom, Björn & Ian Maddieson. 1988. Phonetic universals in consonant systems. In Victoria Fromkin, Larry M. Hyman & Charles N. Li (eds.), Language, speech, and mind: Studies in honour of Victoria A. Fromkin, 62–78. London; New York: Routledge.Search in Google Scholar

Maddieson, Ian. 1985. Borrowed sounds. UCLA Working Papers in Phonetics 61(1). 51–64.Search in Google Scholar

Maddieson, Ian. 2013a. Absence of common consonants. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar

Maddieson, Ian. 2013b. Voicing and gaps in plosive systems. In Matthew S. Dryer & Martin Haspelmath (eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology.Search in Google Scholar

Marsico, Egidio, Ian Maddieson, Christophe Coupé & François Pellegrino. 2004. Investigating the ‘hidden’ structure of phonological systems. In Proceedings of the 30th meeting of the Berkeley Linguistic Society, 256–267. University of California Berkeley.Search in Google Scholar

Martinet, André. 1952. Function, structure, and sound change. Word 8(1). 1–32. in Google Scholar

McCone, Kim. 1996. Towards a relative chronology of ancient and medieval Celtic sound change (Maynooth Studies in Celtic Linguistics 1). Maynooth: Dept. of Old Irish St. Patrick’s College.Search in Google Scholar

Moran, Steven & Daniel McCloy (eds.). 2019. Phoible 2.0. Jena: Max Planck Institute for the Science of Human History.Search in Google Scholar

Nikolaev, Dmitry. 2018. The database of Eurasian phonological inventories: A research tool for distributional phonological typology. Linguistics Vanguard 4(1). 20170050. in Google Scholar

Nikolaev, Dmitry & Eitan Grossman. 2018. Areal sound change and the distributional typology of affricate richness in Eurasia. Studies in Language 42(3). 562–599. in Google Scholar

Nikolaev, Dmitry & Eitan Grossman. 2020. Consonant co-occurrence classes and the feature-economy principle. Phonology 37(3). 419–451. in Google Scholar

Ohala, John J. 1983. The origin of sound patterns in vocal tract constraints. In Peter F. MacNeilage (ed.), The production of speech, 189–216. New York, NY: Springer New York. in Google Scholar

Salmons, Joseph & Patrick Honeybone. 2014. Structuralist historical phonology. In Patrick Honeybone & Salmons Joseph (eds.), The Oxford handbook of historical phonology, 32–46. Oxford: Oxford University Press. in Google Scholar

Sherman, Donald. 1975. Stop and fricative systems: A discussion of paradigmatic gaps and the question of language sampling. Stanford University phonology archiving project. Working Papers on Language Universals 17. 1–31.Search in Google Scholar

Wang, Sheng-Fu. 2019. The organization of sound inventories: A study on obstruent gaps. Proceedings of the Society for Computation in Linguistics 2(1). 195–204.Search in Google Scholar

Received: 2020-10-20
Accepted: 2021-07-20
Published Online: 2021-08-23
Published in Print: 2022-05-25

© 2021 Dmitry Nikolaev, published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.