Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Cognitive Linguistics

Editor-in-Chief: Divjak, Dagmar / Newman, John

4 Issues per year

IMPACT FACTOR 2017: 1.902
5-year IMPACT FACTOR: 2.297

CiteScore 2017: 1.62

SCImago Journal Rank (SJR) 2017: 1.032
Source Normalized Impact per Paper (SNIP) 2017: 1.930

See all formats and pricing
More options …
Volume 27, Issue 1


Cognitive Grammar and gesture: Points of convergence, advances and challenges

Kasper I. Kok
  • Corresponding author
  • Department of Language, Literature and Communication, Vrije Universiteit, Amsterdam, Netherlands
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Alan Cienki
  • Department of Language, Literature and Communication, Vrije Universiteit, Amsterdam, Netherlands
  • Multimodal Communication and Cognition Lab, Moscow State Linguistic University, Moscow, Russia
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2015-12-23 | DOI: https://doi.org/10.1515/cog-2015-0087


Given its usage-oriented character, Cognitive Grammar (CG) can be expected to be consonant with a multimodal, rather than text-only, perspective on language. Whereas several scholars have acknowledged this potential, the question as to how speakers’ gestures can be incorporated in CG-based grammatical analysis has not been conclusively addressed. In this paper, we aim to advance the CG-gesture relationship. We first elaborate on three important points of convergence between CG and gesture research: (1) CG’s conception of grammar as a prototype category, with central and more peripheral structures, aligns with the variable degrees to which speakers’ gestures are conventionalized in human communication. (2) Conceptualization, which lies at the basis of grammatical organization according to CG, is known to be of central importance for gestural expression. In fact, all of the main dimensions of construal postulated in CG (specificity, perspective, profile-base relationship, conceptual archetypes) receive potential gestural expression. (3) CG’s intensive use of diagrammatic notation allows for the incorporation of spatial features of gestures. Subsequently, we demonstrate how CG can be applied to analyze the structure of multimodal, spoken-gestured utterances. These analyses suggest that the constructs and tools developed by CG can be employed to analyze the compositionality that exists within a single gesture (between conventional and more idiosyncratic components) as well as in the grammatical relations that may exist between gesture and speech. Finally, we raise a number of theoretical and empirical challenges.

Keywords: gesture; cognitive grammar; multimodality; iconicity

1 Introduction

According to Cognitive Grammar’s (henceforth CG; Langacker 1987, 1991, 2008a) usage-based perspective on language structure, the realm of the linguistic need not be limited to verbal expression. Several scholars have discussed the potential of speakers’ gestures (in the sense of Kendon 1980, 2004) to attain grammatical status. Langacker, himself, already alluded to the potential of certain gestures to be part of a linguistic system, claiming that any type of expressive behavior can in principle become entrenched as a symbolic structure (Langacker 1987). This potential received further attention in Langacker’s (2001, 2008a) writings, and was evaluated in more detail by Cienki (2012, 2014, in press). With respect to signed languages, Wilcox (2004), Wilcox and Xavier 2013), and Liddell (2003), among others, have shown that CG provides useful analytical constructs for detailing the relation between (ASL) grammar and iconicity. Others have adopted CG-terminology in their discussions of sequential speech-gesture compositions (Enfield 2004; Ladewig 2011b) and the phenomenon of gestural iteration (Bressem 2012).

Despite the many connections between CG and gesture research made by these scholars, CG has thus far not developed so as to adopt multimodality in its basic design, nor has it become clear what it entails to incorporate speakers’ gestures in actual cognitive grammatical analysis. The aim of this article is to provide a more comprehensive overview of the points of convergence between CG and gesture than is currently available, and to advance the incorporation of gesture in CG as appropriate. After reviewing current literature on this topic, we show how CG’s analytical tools can be applied to language as multimodal (spoken-gestured). We conclude with a discussion of theoretical and operational challenges.

2 Points of convergence

Whereas the study of speakers’ gestures from a linguistic perspective is still in its infancy (Müller et al. 2013), several scholars have already explored the potential of CG as a framework for understanding the grammar-gesture relationship. At least three aspects of the theory are of particular interest in this respect. First of all, CG views grammar as emergent from actual communication, without posing restrictions on the kind of behaviors that may constitute a linguistic system. Second, CG holds that grammatical meaning resides in conceptualization – a cognitive process that has often been hypothesized to receive expression in manual gestures as well. Third, CG draws on the assumption that many grammatical notions reflect cognitive representations that are spatial in character, which aligns with the inherently spatial expressive domain of manual gestures. In this section, we synthesize current literature on these and related points of convergence.

2.1 Gestures as symbolic structures

A central thesis in CG is that grammar comprises a repository of symbolic structures. These are thought of as pairings of a formal structure, the ‘phonological pole’, and a semantic structure, the ‘semantic pole’. What constitutes these poles may go well beyond what is traditionally considered the realm of phonology and semantics, however. CG’s usage-based paradigm maintains that symbolic structures are abstracted from communicative experiences, or ‘usage events’, which “consist […] of a comprehensive conceptualization, comprising an expression’s full contextual understanding, paired with an elaborate vocalization, in all its phonetic detail” (Langacker 2001: 144). Thus, virtually any aspect of language use may become entrenched as part of a grammatical structure, as long as it is apprehended as a common denominator in multiple communicative experiences. The semantic side of a usage event does not only correspond to the imagery (in the technical sense of CG, not limited to visual imagery) underlying the situation described, but also involves the discourse context and aspects of the interaction itself (e.g., patterns in turn taking). Likewise, on the expressive side, grammatical structures may include “both the full phonetic detail of an utterance, as well as any other kinds of signals, such as gestures and body language” (Langacker 2008a: 458). This broad conception of the (potential) grammar proper is expressed by Langacker’s (2001) representation of usage events as comprising multiple ‘channels’ at each pole (Figure 1).

Symbolic units as comprising multiple channels of conceptualization and expression; adopted from Langacker (2012: 97).
Figure 1:

Symbolic units as comprising multiple channels of conceptualization and expression; adopted from Langacker (2012: 97).

What follows from this view is that whether or not elements of expression qualify as linguistic does not depend on the modality through which they are expressed. Rather, the grammatical potential of co-verbal behaviors is to be assessed according to their degree of entrenchment as symbolic structures in an individual’s mind and the degree of conventionalization of those symbolic structures within a given community (Langacker 1987; Zima 2014). Taking this continuous view as a starting point, the picture arises that some gestural behaviors are clearly candidates to be part of the linguistic system (as a general category, or of a given language; Cienki 2012), whereas others are more on the periphery of that system.

2.2 Degrees of schematicity in gestural form and meaning

Co-verbal manual behaviors come in many varieties. Kendon (1980) first analyzed different types of gesture as forming a continuum from fully autonomous to more idiosyncratic – later coined ‘Kendon’s continuum’ by McNeill (1992). Kendon’s classification mirrors CG’s tenet that entrenchment is a matter of degree rather than a dichotomy.

The entrenched status of so-called emblems (Efron 1941 [1972]; Kendon 2004: Ch. 16) is rather clear-cut. These are gestures with a stable form, that can be attributed a meaning within a given culture in the absence of verbal context (e.g., forming a ring with the thumb and index finger to say ‘OK’ in North American and some other cultures, or extending the index finger and middle finger in a V-shape, with the palm of the hand facing outward, to say ‘peace’). The use of emblems is to some extent language-specific, with different repertoires being employed by different cultural communities (Efron 1941 [1972]; Payrató 1993). Hence, this category of gestures can be assumed to consist of symbolic units that are both entrenched and conventionalized.

For most other categories of gestures, the grammatical status involved is only evident on a higher level of abstraction. Even for gestural behaviors that may seem rather systematic, such as manual pointing gestures, substantial variation in form and meaning exists. Whereas pointing gestures often involve a stretched index finger (in most Western cultures), there appear to be no strict constraints on the shape of the hand and the degree of tension in the fingers. Likewise, in terms of their meaning, pointing gestures may serve a variety of functions: they may be used to draw the interlocutor’s attention to some contextual or abstract referent; to resolve referential ambiguity, or to give or request the turn in a conversation (Bavelas et al. 1992; Kita 2003). From a CG perspective, this variability in form and meaning across different contexts “correlates with what cognitive grammarians believe is the schematicity of grammatical meaning” (Lapaire 2011: 95). Pointing gestures thus can be said to manifest grammatical structures that are, as a category, slightly more schematic (i.e., more contextually variable) in form and meaning than emblems.

The grammatical status of the category of ‘recurrent gestures’ or ‘gesture families’ (Kendon 2004; Müller 2004) can be analyzed in similar terms. These are gestural patterns that share a set of formational and semantic features on some level of abstraction. An example is the ‘cyclic gesture’ (Ladewig 2011a) used by speakers of many European languages, whereby either the hand or one or more fingers is continuously rotated outwards in a circular fashion. This gesture does not have a fully fixed form, but rather “constitute[s] what can be called a family resemblance category of phonological structure” (Cienki in press) in the sense of Wittgenstein’s (1953) characterization of categories constituted by partially overlapping features. That is, different instances of cyclic gestures can be seen as more or less prototypical exemplars of this gesture type. In terms of their meaning, cyclic gestures are dependent on the context in which they are performed, but are nonetheless bound to a limited set of discourse situations. Ladewig (2011a) characterizes this semantic commonality among cyclic gestures in terms of an Idealized Cognitive Model (ICM, cf. Lakoff 1987) that encompasses the image schema CYCLE and a number of metaphorical extensions. Because some degree of commonality in form and meaning exists between instances of these gestures, they can be qualified as symbolic structures, with phonological and semantic poles that are largely schematic (Cienki 2012).

A final category of gestures, for which the potential grammatical status is even less obvious, is that of creative gesticulation: those gestures that are creatively generated during speech production and do not instantiate a type of emblem or recurrent gesture. Note that the term ‘creative gesticulation’ as used here is narrower than Kendon’s 1988 category of gesticulation, which includes certain recurrent patterns as well. 1 Because these gestures do not appear to conform to clear standards of form, McNeill (1992) characterizes them as ‘idiosyncratic’. It is questionable, however, whether no systematicity exists in such gestures at all. On both the expressive and semantic side, creative gesticulation is undoubtedly constrained by certain normative expectations. For instance, speakers make consistent use of a small number of ‘representation techniques’ when using the hands to refer to objects and processes (e.g., molding the shape of the object, or tracing its outline in the air; Müller 1998). Each of the representation techniques could be considered as a central characteristic of a schematic class of gestures. As Enfield (2013) argues with respect to tracing gestures:

It may be that there are conventions which allow interpreters to recognize that a person is doing an illustrative tracing gesture, based presumably on formal distinctions in types of hand movement in combination with attention-directing eye gaze toward the gesture space. While the exact form of a tracing gesture cannot be pre-specified, its general manner of execution may be sufficient to signal that it is a tracing gesture.

(Enfield 2013: 701)

In terms of their semantics, creative gestures are strongly context-dependent, but certain commonalities nevertheless exist. One recurrent feature, shared by most instances of creative gesticulation (as well as some other gesture types), is the function of modifying or highlighting of some aspect of the spoken discourse. Further, we can see an overlap with the category of ‘speech-linked gestures’, e.g., when forming part of a performed quotation or impersonation (“And I was like [gesture]”). In such cases, as Cienki (in press) argues, “there is the schematic form-meaning structure in place that such words call for some kind of depiction or illustration in order for the speaker’s point to be adequately expressed (as if the word had a slot that needed to be filled by a gesture).” Accordingly, it seems inadequate to fully dismiss creative gesticulation from the realm of (cognitive) grammar. Rather, it can be considered to manifest a type of very schematic grammatical structure, sharing only very few formal and semantic features in common.

In conclusion, as summarized in Table 1, different types of gestures can be thought of as ranging on a continuum from relatively fixed to much more flexible symbolic structures.

Table 1:

Summary of the degrees of specificity/schematicity in the form and meaning of different types of gestures.

This continuum aligns well with the CG perspective that verbal structures such as words, morphemes and syntactic constructions can be placed along a continuum of schematicity. As follows from the discussion above, different types of co-speech gestures can be characterized in a similar fashion. They are not principally different from one another or from verbal structures, but rather analyzable as a set of structures that range from fixed to more variable and schematic in their phonology and semantics. A view of this kind may help to answer the question whether gestures are to be seen as subject matter for linguistic theory. To echo Langacker’s (2008b) view on this matter: some gestures are clearly part of a grammatical system, others less so.

2.3 Gestures and CG’s dimensions of construal

A second important tenet of Cognitive Grammar is that language structure reflects conceptualization. This presumption provides another line of convergence with the gesture literature. It is consistent with the hypotheses that gestures emerge from imagistic processes during ‘thinking for speaking’ (McNeill 1992; McNeill and Duncan 2000), ‘conceptualizing’ (De Ruiter 2007), or ‘visual-motor simulation’ (Hostetter and Alibali 2008). More specific parallels between CG and gesture research can be noticed in relation to what Langacker (1987) sets forward as the dimensions of construal that receive grammatical expression. These include specificity, perspective, focus-background and the postulation of cognitively basic semantic structures. This section outlines how each of these notions relates to speakers’ gestures.

2.3.1 Specificity

When communicating about the world, any referential situation needs to be construed with a certain level of specificity. The same object may, for instance, be described as a thing, or a small object, or a corkscrew, depending on the specificity of the predication (Langacker 1987: 118). The same holds for gestural expression: because our hands are not clay that can be reshaped however we please, referential gestures are inherently underspecified. Gestures that represent objects are therefore always only able to iconically show some part of the whole, and pointing gestures indicate a salient point to refer to the whole referent entity or space (Cienki 2013).

An example of a gesture on the most specific end of the spectrum is that for ‘telephone’ – whereby the pinky and thumb are extended and the other fingers folded – which has a rather restricted domain of possible referents (namely, a telephone, perhaps even just the handset of a landline phone). 2 Gestures with a grip handshape, whereby all five fingers are bent as if holding something in the fist, are much less specific: they most likely (but not exclusively) refer to small-sized, round or cylindrical objects. A type of gesture on the most schematic end is the palm-up-open-hand gesture (Kendon 2004; Müller 2004). Palm-up-open-hands may, via metonymy, refer to any type of object that might be held on the hand, either concrete (e.g., the physical referent of the co-expressed noun phrase) or abstract (e.g., a position in a debate) (see further discussion in Mittelberg and Waugh 2009).

2.3.2 Subjective and objective construal

Elements of linguistic expression can involve different degrees of recourse to the ground, i.e., the contextual circumstances of the linguistic interaction. Those that are construed without reference to the ground are described as objective (e.g., the conceived spatial relation evoked by ‘the lion is next to the rock’); those that are contingent upon the physical location of the conceptualizer (or another aspect of the ground) have a higher degree of subjectivity (e.g., the conceived spatial relation evoked by ‘the lion is in front of the rock’).

Similar construal options have been discussed with respect to the use of referential gestures (McNeill 1992; Parrill 2010). Whereas pointing gestures typically assume a reference frame relative to the physical location in which the interaction takes place (e.g., ‘go this way’ +[pointing gesture]), other gestures are more objectively construed in the sense that the current communicative situation is irrelevant to their interpretation (e.g., ‘the statue is shaped like this’ + [tracing gesture]). 3 This difference is further captured by the proposed distinction between gestures with a character perspective and those with an observer perspective (McNeill 1992). In the latter case, the speaker construes a situation as objective – e.g., when tracing the path of the car by moving the hand through space along a certain trajectory. A more subjective construal of the same scene can be achieved when a character perspective is adopted, e.g., when one mimics the action of driving in a car by impersonating hand movements of the driver holding the steering wheel.

2.3.3 Focus and the stage model

In CG, foreground-background distinctions are seen as vital to grammatical organization. Langacker (1991: 283–284) employs the stage model metaphor to explicate this. According to the stage model, meaning resides in the relation between a profile, i.e., the center of attention, and a conceptual base, i.e., the immediately relevant context and background knowledge. These notions apply to word meaning (the noun rim, for instance, is only meaningful against the background of the concept of something having such an edge, such as a WHEEL), but also to syntactic relations (e.g., a verb phrase profiles a process, but requires as its conceptual base a scene that involves an agent).

The stage model is not only relevant for the grammar of the verbal channel, but can also help elucidate the way speech and gestures relate to each other semantically. As Bressem (2012) notes, the profile of a verbal expression often serves as part of the base with respect to the gesture. That is, speech often provides the relevant background frames against which the meaning of the gesture gets elaborated. A gesture whereby a circle is drawn in the air is intrinsically underspecified, but attains a quite specific referential meaning when co-expressed with the noun rim (Figure 2).

The profile of the verbal expression may serve as (part of) the conceptual base with respect to the gesture.
Figure 2:

The profile of the verbal expression may serve as (part of) the conceptual base with respect to the gesture.

At the same time, gestures may further specify the verbally presented content. The different representation techniques people use for making iconic reference – molding, drawing, holding or enacting (Müller 1998) – profile different aspects of the referent. While referring to a rim, a gesture whereby the hands draw its contours in the air profiles its shape and outline; when using the hands as if interacting with the rim, one profiles its physical affordances.

The trajector-landmark distinction that holds between primary and secondary participants within the focus domain can also have a gestural correlate, as demonstrated in Enfield’s (2004) description of the symmetry-dominance construction in Lao. Enfield describes a gesture sequence whereby the speaker first depicts a fish trap by means of a two-handed, symmetrical gesture, and then, while one hand is held in place, makes a gesture with the other hand that represents the fish moving into the trap. In CG terms, the moving hand is the Trajector with respect to the non-dominant hand, which takes the role of Landmark. Note that in this example, the focus-background relation is not only mapped onto an asymmetry in perceptual prominence (moving vs. non-moving), but also on the temporal ordering of the two subsequent gestures.

2.3.4 Gesture and basic grammatical categories

In its description of grammatical classes, CG holds that language structure reflects patterns in basic human experience, or ‘conceptual archetypes’, through which we view the world. The existence of discrete objects which occupy physical locations, for instance, is an archetype that is prototypical for the grammatical class of nouns. To capture the semantic cohesion across the entire class of nouns, including less prototypical instances, Langacker postulates the more schematic conceptual structure thing, defined as “any product of grouping and reification” (Langacker 2008a: 105). The temporal relationships that may hold between things – referred to as processes – constitute a conceptual structure that corresponds to verbs (Langacker 1991: 13).

CG’s semantic characterization of grammatical categories can provide further insights into the relation between linguistic elements expressed through speech and those conveyed with the hands. In sign language research, it has been adopted to characterize the semantics of manual expression. With respect to ASL classifier predicates, Wilcox (2004: 127) has argued that “the things of Cognitive Grammar are mapped onto handshape, and process is mapped onto phonological movement.” In line with this view, it is possible to characterize those gestures that refer to objects and represent their physical and/or kinesic properties in terms of their semantic relation with basic grammatical notions (Ladewig 2011b). All gestures that perform the reification of some conceptual content (thereby construing some referent as a thing) evoke a conceptual domain that overlaps with that of nouns. Gestures whereby the hands of the speaker (in addition) depict some static property of the referent bear correspondence to the semantic domain of adjectives: they profile a non-processual relationship between the referenced entity and some other conceptual structure. In cases where the hand moves through space to represent the motion of some entity, the gesture can be said to have a verb-like character, as it designates a processual relationship (Figure 3).

A CG representation of the semantic poles (in their most schematic form; adopted from Langacker 1987) of three major grammatical classes and the features of gestures that relate to them. Things in CG are represented by circles, whereas squares stand for entities (a broader category, which may encompass any kind of conceptual structure).
Figure 3:

A CG representation of the semantic poles (in their most schematic form; adopted from Langacker 1987) of three major grammatical classes and the features of gestures that relate to them. Things in CG are represented by circles, whereas squares stand for entities (a broader category, which may encompass any kind of conceptual structure).

Two possible caveats of the proposed parallel can be noted at this moment. First: since gestural expression is not linearized in the same way as speech, it is very well possible that multiple conceptual structures are simultaneously evoked by a single gesture. In fact, gestures that depict the shape of an object are likely to be interpreted as simultaneously performing reification (cf. Kok et al. in press). Second, it should be noted that movement of the hand(s) does not necessarily evoke a representation of a process. As Ladewig (2011b) notes, the movement of the hands may instead be part of the act of reference, e.g., when referring to a physical object by tracing its contours in the air. The gesture in this case can still be seen as an object-process synthesis, where the hand is some drawing utensil and the movement represents the act of drawing, but it serves a meta-referential function: not the act of drawing but the drawn object is part of the situation communicated.

2.3.5 Autonomy – dependence

CG distinguishes autonomous and dependent structures; the former can be described in their own terms, whereas the latter presuppose the support of another structure. A dependent structure, in other words, “refers schematically to an autonomous, supporting structure as an intrinsic aspect of its own characterization” (Langacker 2008a: 199). Examples of autonomy/dependence relationships exist on various levels of grammatical organization: consonants are dependent on vowels in the same syllable to be clearly perceived; verb phrases are dependent on the presence of a noun phrase (in canonical written language). In the latter case, the autonomy-dependence relation is motivated by the underlying conceptualization. Verbs are dependent on nouns because the corresponding processes cannot be conceptualized without the presence of a thing to perform them. The semantic integration of autonomous and dependent structures is described as elaboration: the semantic pole of a dependent structure is said to contain an elaboration site that can become specified (or: elaborated) by another, semantically more fine-grained structure.

For an adequate characterization of the relationship between speech and gesture, the concept of autonomy-dependence alignment is of crucial importance. Dependence between the two tiers is pervasive in multimodal language use and exists in two directions. Verbal constructions, in particular those involving deictic reference (e.g., look over there) or iconic specification (e.g., the German so einen X; ‘such a X’/‘a X like this’; Fricke 2012, 2013), require the presence of some gesture (presumably a pointing gesture in the first case and a representational gesture in the second). Conversely, many manual gestures articulated during speech are functionally contingent upon the semantic and pragmatic frames evoked verbally (Wilcox and Xavier 2013). Depending on the verbal context, a cyclic gesture may, for instance, represent a cyclic movement of some object, but may also indicate that the speaker is searching for the right words (Ladewig 2011a).

It is worth emphasizing that autonomy-dependence is not always a strictly asymmetrical relationship. In the case of multimodal demonstrative utterances like the man over there [+ pointing gesture], speech and gesture are mutually dependent: the verbal component does not only presuppose the performance of a deictic gesture, but the meaning of the gesture is at the same time elaborated by the speech (the gesture could perhaps have pointed to the woman standing next to the man). This symbiotic character of speech-gesture compositions is in accordance with Langacker’s characterization of the autonomy-dependence relationship as variably asymmetrical:

Canonically the structures in a valence relation manifest substantial asymmetry, with one of them (on balance) clearly dependent, and the other autonomous. As always, though, recognition of the prototype must not be allowed to obscure the existence of other possibilities. Nothing in the definition precludes a relation of mutual dependence between two structures, or guarantees that there will always be a significant relation of dependence in one direction or the other. (Langacker 1987: 300)

Thus, CG does not predict a simple dichotomy between gesture-compatible and gesture-incompatible structures. There may instead be a continuous range of degrees to which gestures and verbal constructions presuppose the presence of one another.

2.4 Gestures and CG-diagrams

In addition to the many theoretical points of convergence with gesture research, CG also has representational advantages over other grammar frameworks. Many cognitive linguistic models, including CG, explicitly attempt to mirror language’s relation to spatial cognition in their representational tools. The frequent employment of diagrammatic tools creates a natural point of connection with the inherently spatial expressive domain of manual gestures (cf. Tversky et al. 2009). This potential has already been proven by the natural integration of gesture research with the notion of image schemas (Cienki 2005) and Talmyan force dynamics (Hassemer 2015), but has not been elaborated in great detail in CG.

Nevertheless, as Cienki (in press) notices, “there is great potential for incorporating schematic images of relevant gesture forms as part of the phonological pole […], images which, through their form and orientation, would also inherently show the perspective of the speaker’s construal.” Liddell (2003), has demonstrated that CG’s notation conventions can be useful for analyzing the phonological and semantic structure of ASL signs. 4 The benefit of these diagrams over text-only notations is most obvious when it comes to displaying iconic mappings between phonological space and semantic space, e.g., when the position or movement of the hands is isomorphic to the conceptualized position or movement of some object. This potential is demonstrated Section 3 with respect to data from a video corpus.

3 Towards a CG analysis of gesture-internal and cross-modal compositionality

As is evident from the previous sections, CG offers a rich analytical apparatus to investigate the grammar-gesture interface. Nonetheless, the literature contains a paucity of actual application of CG to multimodal data. In this section, we aim to bridge this gap: we first present a CG-analysis of the internal structure of a gesture, and subsequently we provide two example analyses of combined spoken-gestured utterances.

3.1 Gesture-internal compositionality: Conventional and ad hoc aspects

Like verbal expressions, gestures are complex structures that combine different types of semiotic signs. On the one hand, as we have seen in Section 2.2, gestures can be recognized as belonging to a particular class or category (e.g., cyclic gesture, thumbs-up emblem, tracing gesture). On the other hand, they typically evoke form-meaning mappings that are specific to the particular context in which they are used. As Mittelberg (2014: 1714) notes, “when producing [creative] gestures, speakers-gesturers do not select from a given form inventory of a system, […] but they create semiotic material each time anew.” Especially iconic and indexical components of gestures, indeed, rely heavily on mappings between form and conceptualization that are created on the fly and tailored to the context of the utterance as a whole. Thus, as noted by Enfield (2009, 2013), many gestures combine conventional and non-conventional (ad hoc) signs. This dual structuring is perhaps most clearly illustrated with respect to pointing gestures: the pointing handshape itself is categorically linked to an expectation of deictic reference or placement, but the location and direction of the pointing gesture, which elaborate that expectation, are ad hoc and analog in nature (cf. Liddell’s 2003 discussion of pronouns in ASL). Because this component of the gesture does not (directly) rely on entrenchment in long-term memory or convention, its symbolic status is of a different nature than that of prototypical grammatical units. 5 As Langacker (1987: 91) proposes, an iconic or indexical structure is interpretable as a result of being “put in correspondence with itself”. In other words, as a result of their inherent conceptual value, manually expressed phonological structures have the potential to be self-symbolizing (cf. Wilcox’s 2004 notion of cognitive iconicity).

At the risk of considerable oversimplification (for more detailed accounts, see Mittelberg 2014; Mittelberg and Waugh 2014; Taub 2001), the analyses below pursue the view here that instances of creative gesticulation can be decomposed into a conventional component and an ad hoc component. An example analysis of the internal structure of an instance of creative gesticulation is given in Figure 5. The analysis concerns a fragment of the Speech and Gesture Alignment (SaGA) corpus (Lücking et al. 2013), which contains video recordings of German direction giving discourse. As seen in Figure 4, speech and gesture are employed in tandem to describe a physical landmark that is relevant for the route description. While the speaker says auf den Seiten sind zwei blaue Wendeltreppen ‘on the sides there are two blue spiral staircases’, he simultaneously makes an upward spiral-shaped tracing gesture with the index fingers of both hands. The time course of the gesture relative to speech is represented below the transcript, following the conventions of Kendon (2004). 6

Video stills of the fragment of the SaGA corpus analyzed in example (1).
Figure 4:

Video stills of the fragment of the SaGA corpus analyzed in example (1).


‘on the sides there are two blue spiral staircases’

Figure 5 shows how the internal structure of the tracing gesture can be analyzed, building on the assumption that the conventional (long-term memory) and the ad hoc (self-symbolized) components of the gesture can be thought of as separable symbolic structures.

A CG analysis of the internal structure of a spiral tracing gestures as being composed of a conventional part and a self-symbolized, analog part.
Figure 5:

A CG analysis of the internal structure of a spiral tracing gestures as being composed of a conventional part and a self-symbolized, analog part.

The diagram on the bottom-left part of the figure represents the assumption that the handshape instantiates a category of tracing gestures. This category is analyzed as akin to a schematic grammatical class that subsumes adjectives as well as adverbs: it profiles a relationship between some physical characteristic (a region of shape, contour or trajectory space; analogous to Langacker 2008a: 102) and some thing. In line with this interpretation, the diagram on the left contains two elaboration sites. The site on the right corresponds to the shape or contour that is drawn by the hand; the one on the left corresponds to the entity to which this shape or contour is to be attributed (presumably elaborated in the verbal channel). Thus, the bottom-left diagram in Figure 4 indicates that the conventional component of the gesture conveys as much as there is some entity that has some spatial property – presumably a path, motion, or contour.

The ad hoc component of the gesture signifies, through self-symbolization, a shape or path that is homologous to the tracing movement that is performed. 7 This is represented on the bottom-right part of the figure, where the label ‘self-symbolized’ is added to emphasize that this dimension of the gestural structure is not a direct manifestation of an entrenched/conventionalized mental structure. As seen in the upper part of the diagram, the unification of the two symbolic structures simply entails recognition that the ad hoc component of the gesture (the trace) elaborates one of the elaboration sites invoked by the handshape: it restricts the region of shape space that is being attributed. To emphasize the direction of the elaboration while keeping the diagrams manageable, the elaboration site is marked with a hatched area and the process of elaboration with a single dashed arrow (cf. the notation with two separate lines for correspondence and elaboration used by Langacker 2008a: 198 and elsewhere). What results from the integration of the component structures is a construct that functionally expresses: ‘there is some entity that has a spatial feature resembling the trace of the hand.’

The semantic pole of the gesture in Figure 4 is not only determined by the tracing structure, however. An additional symbolic unit is established by the use of two hands that are conspicuously positioned on the sides of the body, outside central gesture space. The meaning of this aspect of the gesture is at least as schematic as that of the tracing component. As diagrammed in Figure 6, it can be assumed to signify a spatial relation between two unspecified things along a horizontal axis. 8

An example CG analysis of the interaction between the symbolic structures evoked by the tracing-component and the location-component of the gesture.
Figure 6:

An example CG analysis of the interaction between the symbolic structures evoked by the tracing-component and the location-component of the gesture.

As seen in the middle diagram in the lower part of the figure, the positioning of the hands is analyzed as a symbolic structure that evokes two things, with the conception of a horizontal axis as a salient part of the conceptual base. Like the tracing component, this element of the gesture involves self-symbolization: the physical distance between the hands maps onto the conceptualized spatial relation (although on a different scale). The two things that are schematically referred to by the hand-positioning component of the gesture correspond to those evoked by the tracing structure (Figure 5). This correspondence is perhaps so obvious that it may give the impression of redundancy, but it nevertheless demonstrates how CG can be employed to disentangle the elementary components of gestures and their semantic qualities. The simplified, integrated representation on the top part of the figure will serve as the basis for analyzing the interaction with the verbal component of the utterance in the subsequent section. Before doing so, it is important to note that this analysis, although it exceeds the level of detail usually found in the gesture literature, is still likely to be incomplete. Formal aspects such as the vertical position of the hands, their position relative to gestures that have been made previously, the degree of tension in the fingers may also carry some semantic import. Understanding how these are best described in terms of symbolic structures requires more empirical and analytical work, and is left out of consideration here (see Section 4.1 for further discussion).

3.2 Multimodal compositionality

In addition to its application to gesture-internal structure, CG may help to elucidate how gestures relate to the grammar of speech (for a review of different ways in which speech and gestures interact, see Wagner et al. 2014). The analysis in Figure 7 is limited to the fragment zwei blaue Wendeltreppen ‘two blue spiral staircases’. Because the performance of the gesture temporally coincides with the articulation of this noun phrase in its entirety (as seen from the time stamps in Example 1), we analyze the semantics of the verbal component of the utterance as an integrated whole; the compositional path of the verbal channel is not considered relevant for the current analysis (see however Section 3.3 for a more incremental analysis, where the gesture interacts with elements of the verbal channel before the meaning of the utterance as a whole is established).

CG analysis of the spoken-gestured utterance in Example 1. Vertical connections are not displayed for reasons of clarity and because they follow directly from the correspondences depicted in the lower part.
Figure 7:

CG analysis of the spoken-gestured utterance in Example 1. Vertical connections are not displayed for reasons of clarity and because they follow directly from the correspondences depicted in the lower part.

The analysis of the gestural channel is already given in the previous section: each of the hands schematically refers to a thing and attributes a self-symbolized spatial characteristic to it. In combination with the noun phrase zwei blaue Wendeltreppen ‘two blue spiral staircases’, the most obvious interpretation is that each of the elaboration sites evoked by the hands corresponds to one of the conceptualized objects (the spiral staircases) that are profiled by the verbal component of the utterance. That is, the schematic structures evoked by the hands become elaborated by the more specific structures evoked by the noun phrase zwei blaue Wendeltreppen. The integrated meaning can be represented as in the upper part of the diagram: the full multimodal utterance profiles the existence of two spiral staircases (where the quantity is expressed in speech as well as in gesture) as well as their color (only evoked through speech) and their shape and orientation (most explicitly evoked through gesture). 9 The asymmetrical autonomy-dependence relation between the speech and the gesture follows clearly from the fact that the semantic pole of the gestural unit has a salient elaboration site, whereas the verbal component can be understood in its own terms.

3.3 Multimodal deixis

The interaction between speech and gesture can take higher degrees of complexity than in the example outlined above. Here, we provide an analysis of the intersection between speech and gesture in the case of a multimodal deictic utterance. It should be noted that our goal is not to take any particular stance in deixis theory, nor do we hold that our analysis is the only one possible; the main aim instead is to illustrate CG’s potential in analyzing the interaction between verbal and gestural forms of spatial expression. In the example displayed in Figure 8 and example 2, again taken from the SaGA corpus, the speaker performs two pointing gestures while referring to a physical landmark (a spiral staircase again, coincidentally). First, while she says sieht ein bisschen aus wie die Wendeltreppe ‘looks a bit like the spiral staircase’, she points sideways, ostensibly in the direction of the actual hallway in the university building where she is located. Shortly thereafter she points again, roughly in the same direction, while she says hier in der Halle ‘here in the hallway’. A more precise representation of the timing of the pointing gestures relative to the speech is given in Figure 8.

Video stills of the fragment of the SaGA corpus analyzed in example (2).
Figure 8:

Video stills of the fragment of the SaGA corpus analyzed in example (2).


‘looks a bit like the spiral staircase here in the in the hallway’

We see that the first of the two consecutive pointing gesture is performed right before the noun phrase die Wendeltreppe ‘the spiral staircase’ is vocalized. The combination of these two components of the utterance is analyzed in Figure 9.

CG-analysis of the first pointing gesture in combination with the definite noun phrase die Wendeltreppe ‘the spiral staircase’.
Figure 9:

CG-analysis of the first pointing gesture in combination with the definite noun phrase die Wendeltreppe ‘the spiral staircase’.

The semantics of the pointing gesture can be represented as in the bottom-left diagram in Figure 9. The analysis follows the assumption that the semantics of pointing gestures involves “the understanding that a body part projects a vector toward a particular direction” (Kita 2003: 5). In addition, it follows Langacker’s (2008a: 283) assumption that pointing gestures are associated with the function of singling out one of the available candidate referents. The candidate referents are depicted as circles with points in them (following Langacker 2008a: 278), to make them visually distinguishable from classes of entities such as things. The diagram furthermore indicates that the pointing gesture by itself does not reveal what type of entity is being referred to: the identity of the profiled referent remains underspecified.

About half a second after having performed the gesture, the speaker says die Wendeltreppe ‘the spiral staircase’. Of particular interest here is the use of the definite article. This suggests that the referent of the noun phrase is contextually salient. Langacker (2008a: 285) represents the most basic meaning of the definite article roughly as in the bottom-right diagram. The rounded rectangles represent the attentional frames that are postulated in CG to analyze the linear-sequential dimension of discourse organization (see Langacker 2001). As shown in the diagram, the definiteness of the article suggests that the referent of the (projected) noun phrase has been attended to in some previous attentional frame. 10 Because of the close temporal succession between the gesture and the noun phrase, we can assume that the attentional frame that is schematically referred to by the definite article aligns with the one in which the gesture was performed. Thus, the referential content of the gesture elaborates one of the elaboration sites set up by the definite article die ‘the’; a correspondence is established between the referent of the gesture and that of the (projected) noun phrase. The subsequently articulated noun Wendeltreppe ‘spiral staircase’ further specifies the type of entity that is being referred to. Here, the relatively schematic referential content of the article-gesture combination becomes elaborated by the more specific meaning of the noun, as shown in the middle and upper segments of the figure. 11 Overall, the verbal and gestural components of the utterance segment are mutually informative: the noun phrase elaborates the elaboration site that is created by the gesture (it classifies the referent that is being singled out) and conversely, the gesture contributes to grounding the utterance in the immediate physical context; it forms part of the conceptual base for the interpretation of the noun phrase.

The analysis provided in Figures 10, 11 and 12 concern the part of the utterance that follows, comprising the phrase hier in der Halle ‘here in the hallway’ and the concurrent pointing gesture (the second of the two pointing gestures). In all, this part of the multimodal utterance contains three deictic elements: the spatial adverb hier ‘here’, the prepositional phrase in der Halle ‘in the hallway’ and the pointing gesture. Figure 10 presents separate analyses of each of these elements.

CG-analyses of the three deictic terms in the second part of the multimodal utterance in (2).
Figure 10:

CG-analyses of the three deictic terms in the second part of the multimodal utterance in (2).

A CG analysis of the locative adverb hier in combination with the pointing gesture.
Figure 11:

A CG analysis of the locative adverb hier in combination with the pointing gesture.

Unification of the intrinsic and relative deictic frames.
Figure 12:

Unification of the intrinsic and relative deictic frames.

The locative adverb hier ‘here’ profiles a region of space in that is proximal to the location of the speaker and hearer (Langacker 1991: 222). Its construal is strongly subjective; the designated region is conceptualized relative to location of the speaker and hearer (denoted with S and H), which serves as a reference point. The prepositional phrase in der Halle ‘in the hallway’, in contrast, designates a region of space in a more objective manner. It profiles a landmark Halle ‘hallway’ and a relation of inclusion with an unspecified trajector. The pointing gesture, finally, has a subjectively construed semantic pole: the location of its referent is defined relative to the location and orientation of the hand of the speaker, as discussed before. In the analysis, it is furthermore assumed that the gesture profiles an entity that is associated with a point on the projected vector, namely the spiral staircase. This assumption is motivated by the fact that the spiral staircase was brought to focus in the first part of the utterance and was the referent of the first pointing gesture, as discussed above. Thus, the representation of the pointing gesture in Figure 10 and beyond includes its anchoring to the preceding noun phrase die Wendeltreppe ‘the spiral staircase’, as well as the relative frame of spatial reference that specifies one dimension of its location (note that we here use a strongly simplified notation relative to analysis shown in Figure 9). Figures 11 and 12 represent the sequential unification of the pointing gesture with the two other spatial elements in the utterance. The analysis assumes that the gesture first forms a unit with the adverb hier ‘here’, which subsequently elaborates the trajector schematically referred to by in der Halle ‘in the hallway’. Given the immediate discourse context described above, this interpretation is deemed more plausible than alternative analyses (e.g., where the gesture directly grounds the referential content of in der Halle as to disambiguate which hallway is being referred to exactly).

The combined meaning of hier ‘here’ + [gesture] derives from the overlap of the two spatial frames evoked. Crucially, hier profiles a region of space that includes the position of the interlocutors whereas the location designated by the pointing gesture is construed as external to the position of the speaker. The integration of these two frames is analyzed in Figure 11, as somewhat analogous to a demonstrative noun phrase. The pointing gesture serves as the profile determinant, in the sense that the utterance fragment as a whole profiles the spiral staircase, with the proximal region of space as a salient aspect of its conceptual base. The search domain for the referenced object is altogether restricted by the direction of the stretched index finger and by the indicated proximity to the speaker and hearer. Thus, the profile of the verbal element (hier) is part of the immediate scope for the interpretation of the gesture. In Figure 12, the analysis continues with the integration of the noun phrase in der Halle ‘in the hallway’.

The semantic representation of hier in der Halle ‘here in the hallway’ + [pointing gesture] entails the integration of the structure depicted in Figure 10(a) and the upper diagram in Figure 11. As diagrammed in Figure 12, the trajector schematically referred to by the prepositional phrase in der Halle ‘in the hallway’ is elaborated by the referential content of the hier + [pointing gesture] construct: the spiral staircase. The resulting semantic representation involves an intersection between three spatial frames. Two of these are defined relative to the location of the speakers – one through a proximity relation and one through a position on the vector line that extends from the stretched finger. The third is specified relative to the landmark Halle ‘hallway’, which is contained in the scope of hier ‘here’.

Notwithstanding that there may be alternative interpretations of the semantics of pointing gestures (see, e.g., Fricke 2007 for a comprehensive discussion), the current analysis demonstrates CG’s potential utility in detailing the interaction between gestural and verbal means of spatial reference. In particular, we have seen that CG’s diagrammatic notation conventions are fruitful for the analysis of the iconic and indexical means through which (aspects of) gestures convey meaning and the scope relations that hold between the verbal and gestural structures.

4 Further challenges

In order to further develop the application of CG to language as a multimodal activity, various challenges remain to be addressed. We here discuss three potential hurdles as they apply to the analyses presented above. Although these might not necessarily be specific to CG – they are relevant for other approaches to multimodal grammar as well (e.g., Kok in press) – a brief discussion of these issues is important at this point because they will be encountered by anyone pursuing a CG-analysis of spoken-gestured data.

4.1 What’s in a gesture?

As discussed in Section 3.2, gestures are semiotically rich units of expression that may evoke various semantic structures at once. Our analyses of examples (1) and (2) were necessarily limited to some of the most salient symbolic units. It is very likely, however, that all of the gestures analyzed have additional meaningful qualities. Because gesture forms reflect a complex interplay of representational, interactional and contextual dimensions of communication (Kok et al. in press), it is often difficult to define clear one-to-one mappings between aspects of form and function. To the extent that such mappings do exist, it furthermore remains an open question on what level of abstraction they are best described, e.g., whether they are most adequately captured in terms of individual form parameters (handshape, location, etc.), more holistic patterns (planes, lines) or theoretical constructs such as image or action schemas. The empirical challenges raised by this issue may furthermore bring along questions regarding representation and notation. Considering gestures in their full functional complexity may entail taking account of aspects of gestures that are somewhat removed from traditional linguistic analysis, such as their role in managing rhetorical relations and turn taking (e.g., Kendon 2004; Streeck 2009). Whereas CG has in recent years made considerable advances in dealing with pragmatic-discursive factors (Langacker 2012), its analytic devices were originally developed for the analysis of clause-internal structure. Since the theoretical discussion and analyses presented in this paper are largely limited to ‘traditional’ grammatical phenomena, it remains an open question of how much diagnostic value CG’s toolbox can be for the analysis of functions of gestures that play out on higher levels of communicative structure. As mentioned, however, there is a need for further empirical inquiry into the formal correlates of these functions before this question can be adequately evaluated.

4.2 Structure as temporal, beyond the metaphor of ‘linearity’

In modeling the structure of language as a dynamic multimodal activity, the notion of (morpho)syntax drastically increases in complexity. Symbolic structures expressed by speech and gesture are typically not organized in a purely sequential fashion or constrained by well-defined rules or templates. In order to better understand the grammatical relations that exist between elements of the gestural and verbal tiers, at least three additional factors need to be taken into account. One is the overlap in conceptual space as evoked by the two channels. In example 1, the gesture is quite obviously related to the noun Wendeltreppen ‘spiral staircases’ because the trace of the index fingers is homologous to the physical outline of the referenced objects. In addition, there is a role for conventionality in the use of mimetic modes. Some of the modes of representation distinguished by Müller (1998, 2014) appear to be preferentially used with specific types of referents. Ladewig (2011a) found that gestures replacing noun phrases relatively often represent an object (where the hands ‘become’ the object), whereas gestures replacing verb phrases favor the acting mode (where the hands act upon an object), although this trend was not without exceptions. Such regularities can guide the interpretation of the grammatical relations that exist between the verbal and gestural tier. A third factor is temporal coordination. Gestures are known to appear in rough temporal correspondence to the verbal element they relate to most (Kendon 1970; McNeill 1992; Nobe 2000). As argued by Cienki (in press), such patterns of temporal coordination ought to be considered as part of the phonological pole of the symbolic structures underlying speech-gesture compositions. If this argument is taken seriously, notational conventions may need to be enriched: “[a] trick that remains to be solved is how to display these analyses dynamically (with moving graphics), so as to better reflect the actual dynamic processes of expression and online thinking (or understanding) for speaking” (Cienki in press). Indeed, whereas preliminary attempts to a dynamic representation have been made in the current paper (Figures 9, 11 and 12), the analyses presented may still fall short in revealing how the temporal relation between the verbal and gestural tiers affects the grammatical structure of the utterance.

4.3 Speaker-hearer asymmetry

A third potential problem concerns CG’s take on grammar as ‘direction-neutral’, i.e., as equally applicable to language production and comprehension. This view may not always be tenable, since considerable asymmetry can exist as to whether a given manual behavior is meant or apprehended as communicative. It has been argued that some forms of gestural expression primarily serve to ‘externalize’ imagery and other cognitive processes onto one’s body (Clark 2013; Pouw et al. 2014). People may for instance use their hands to scaffold internal cognitive processes such as counting and numerical reasoning. Being primarily self-oriented, such gestures do not seem good candidates for inclusion in linguistic analysis. However, even if a hand movement is not intended as part of a communicated message by the speaker, it might be interpreted as such by the addressee. To give a somewhat contrived example: at the moment a bartender sees a customer counting on his fingers before making any eye contact with him, he might already have understood the amount of drinks that the customer wants to order. Whether the customer’s finger counting is to be regarded as a linguistic action, is open to interpretation. Conversely, communicatively intended gestures are not always picked up on by the addressee. People’s disposition to integrate gestures with verbally presented information has been shown to differ from person to person (Wu and Coulson 2014).

Overall, the challenge of deciding which forms of manual expression are to be considered subject matter for grammatical analysis does not appear to have a trivial solution (but see Enfield’s 2009 heuristics for ‘sign filtration’). This is a general problem for multimodal approaches to grammar, but of particular relevance to CG, which decisively rejects a sharp distinction between coded and inferred meaning. Delimiting the set of gestural behaviors that are of potential concern to cognitive grammatical analysis may require more serious consideration of the relationship between grammar and the online processes of language production and comprehension, and the asymmetries that may exist between them.

5 Conclusion

In view of the incorporation of speakers’ gesture in grammatical theory, CG has various theoretical and operational strengths. Due to its non-restrictive, usage-based nature, it does not require fundamental amendments or a supplementary ‘gesture component’. Because CG conceives of grammar from the perspective of prototype theory, as having more central and more peripheral structures, it avoids having to make a rigid, arbitrary distinction between the linguistic and non-linguistic proper. We see here that for spoken language, this aspect of the theory can be extended beyond orally produced sounds to include other behaviors. Moreover, CG’s notational conventions fit well with the inherently spatial nature of (manual) gesture. Various, more specific points of convergence are apparent between gesture research and the dimensions of imagery postulated in CG (e.g., specificity, perspective, focus-background) as well as in the potential to incorporate gesture in CG’s diagrammatic notation conventions.

We have shown that CG is not only theoretically compatible with a view of language as multimodal, but can also be fruitfully applied to the analysis of video data. A CG approach can benefit our understanding of gesture-internal structure as well as of the interaction of iconic and deictic components of gestures with elements of speech. This corroborates CG’s potential utility for a view of language as multimodal, although some challenges remain. Most importantly, further understanding is needed of the rich symbolic potential and functional complexity of gestures, of the constraints that govern the grammatical relations between different modalities, and of the asymmetries that may exist between gesture production and comprehension. Overall, however, CG has been demonstrated to provide important theoretical constructs and analytical tools for further advancing our understanding of the structure of multimodal language use.


We thank John Newman and three anonymous reviewers for helpful comments on previous versions of this article.


  • Bavelas, Janet Beavin, Nicole Chovil, Douglas A. Lawrie & Allan Wade. 1992. Interactive gestures. Discourse Processes 15(4). 469–489. Google Scholar

  • Bressem, Jana. 2012. Repetitions in gesture: Structures, functions, and cognitive aspects. Frankfurt (Oder): European University Viadrina thesis. Google Scholar

  • Cienki, Alan. 2005. Image schemas and gesture. In Beate Hampe & Joseph E. Grady (eds.), From perception to meaning: Image schemas in cognitive linguistics, 421–441. Berlin: Mouton de Gruyter. Google Scholar

  • Cienki, Alan. 2012. Usage events of spoken language and the symbolic units we (may) abstract from them. In Krzysztof Kosecki & Janusz Badio (eds.), Cognitive processes in language, 149–158. Frankfurt am Main: Peter Lang. Google Scholar

  • Cienki, Alan. 2013. Cognitive linguistics: Spoken language and gesture as expressions of conceptualization. In C. Müller, A. Cienki, S. Ladewig, D. McNeill & S. Teßendorf (eds.), Body – language – communication: An international handbook on multimodality in human interaction, 1, 182–201. Berlin: Mouton de Gruyter Google Scholar

  • Cienki, Alan. 2014. Grammaticheskie teorii v kognitivnoi lingvistike i polimodal’nost’ kommunikacii. [Grammatical theories in cognitive linguistics and the multimodality of communication.]. In O. V. Fedorova & A. A. Kibrik (eds.), Mul’timodal’naja kommunikacija: Teoreticheskie i èmpiricheskie issledovanija [Multimodal communication: Theoretical and empirical research], 86–98. Moscow: Buki Vedi. Google Scholar

  • Cienki, Alan. 2015. Spoken language usage events. Language and Cognition 7(4). 499–514. Google Scholar

  • Clark, Andy. 2013. Gesture as thought. In Zdravko Radman (ed.), The hand, an organ of the mind: What the manual tells the mental, 255–268. Cambridge, MA: MIT Press. Google Scholar

  • De Ruiter, Jan Peter. 2007. Postcards from the mind: The relationship between speech, imagistic gesture, and thought. Gesture 7(1). 21–38. Google Scholar

  • Efron, David. 1941 [1972]. Gesture, race and culture. The Hague: Mouton de Gruyter. Google Scholar

  • Enfield, Nick. 2004. On linear segmentation and combinatorics in co-speech gesture: A symmetry-dominance construction in Lao fish trap descriptions. Semiotica 149. 57–124. Google Scholar

  • Enfield, Nick. 2009. The anatomy of meaning: Speech, gesture, and composite utterances. Cambridge: Cambridge University Press. Google Scholar

  • Enfield, Nick. 2013. A “Composite Utterances” approach to meaning. In C. Müller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill & J. Bressem (eds.), Body – language – communication: An international handbook on multimodality in human interaction, 2, 689–707. Berlin & Boston: De Gruyter Mouton. Google Scholar

  • Fricke, Ellen. 2007. Origo, Geste und Raum. Lokaldeixis im Deutschen. Berlin: Walter de Gruyter. Google Scholar

  • Fricke, Ellen 2012. Grammatik multimodal: Wie Wörter und Gesten zusammenwirken. Berlin: Mouton de Gruyter. Google Scholar

  • Fricke, Ellen 2013. Towards a unified grammar of gesture and speech: A multimodal approach. In C. Müller, A. Cienki, S. Ladewig, D. McNeill & S. Teßendorf (eds.), Body – language – communication: An international handbook on multimodality in human interaction, 1, 733–754. Berlin: Mouton de Gruyter. Google Scholar

  • Hassemer, Julius. 2015. Towards a theory of gesture form analysis: Principles of gesture conceptualisation, with empirical support from motion-capture data. Aachen: RWTH Aachen University dissertation. Google Scholar

  • Hostetter, Autumn B. & Martha W. Alibali. 2008. Visible embodiment: Gestures as simulated action. Psychonomic Bulletin & Review 15(3). 495–514. Google Scholar

  • Kendon, Adam. 1970. Movement coordination in social interaction: Some examples described. Acta psychologica 32. 101–125. Google Scholar

  • Kendon, Adam. 1980. Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (ed.), The relationship of verbal and nonverbal communication, 207–227. The Hague: Mouton de Gruyter. Google Scholar

  • Kendon, Adam. 1988. How gestures can become like words. In F. Poyatos (ed.), Cross-cultural perspectives in nonverbal communication, 131–141. Toronto: CJ Hogrefe. Google Scholar

  • Kendon, Adam. 2004. Gesture: Visible action as utterance. Cambridge: Cambridge University Press. Google Scholar

  • Kita, Sotaro (ed.). 2003. Pointing: Where language, culture, and cognition meet. Mahwah, NJ: Lawrence Erlbaum Associates. Google Scholar

  • Kok, Kasper I. in press. The grammatical potential of co-speech gesture: A functional discourse grammar perspective. Functions of Language 23. 

  • Kok, Kasper I., Kisten Bergmann, Alan Cienki & Stefan Kopp. in press. Mapping out the multifunctionality of speakers’ gestures. Gesture 15. 

  • Ladewig, Silva. 2011a. Putting the cyclic gesture on a cognitive basis. CogniTextes 6, http://cognitextes.revues.org/406. 

  • Ladewig, Silva. 2011b. Syntactic and semantic integration of gestures into speech: Structural, cognitive, and conceptual aspects. Frankfurt (Oder): European University Viadrina thesis. Google Scholar

  • Lakoff, George. 1987. Women, fire, and dangerous things: What categories reveal about the mind. Chicago: University of Chicago Press. Google Scholar

  • Langacker, Ronald W. 1987. Foundations of cognitive grammar, Vol. I: Theoretical prerequisites. Stanford: Stanford University Press. Google Scholar

  • Langacker, Ronald W. 1991. Foundations of cognitive grammar, Vol. II: Descriptive application. Stanford: Stanford University Press. Google Scholar

  • Langacker, Ronald W. 2001. Discourse in cognitive grammar. Cognitive Linguistics 12(2). 143–188. Google Scholar

  • Langacker, Ronald W. 2008a. Cognitive grammar: A basic introduction. Oxford: Oxford University Press. Google Scholar

  • Langacker, Ronald W. 2008b. Metaphoric gesture and cognitive linguistics. In Alan Cienki & Cornelia Müller (eds.), Metaphor and gesture, 249–251. Amsterdam: John Benjamins. Google Scholar

  • Langacker, Ronald W. 2012. Interactive cognition: Toward a unified account of structure, processing, and discourse. International Journal of Cognitive Linguistics 3(2). 95–125. Google Scholar

  • Lapaire, Jean-Rémi. 2011. Grammar, gesture and cognition: Insights from multimodal utterances and applications for gesture analysis. Visnyk of Lviv University. Series Philology 52. 87–107. Google Scholar

  • Liddell, Scott K. 2003. Grammar, gesture, and meaning in American Sign Language. Cambridge: Cambridge University Press. Google Scholar

  • Lücking, Andy, Kirsten Bergman, Florian Hahn, Stefan Kopp & Hannes Rieser. 2013. Data-based analysis of speech and gesture: The Bielefeld Speech and Gesture Alignment Corpus (SaGA) and its applications. Journal on Multimodal User Interfaces 7(1–2). 5–18. Google Scholar

  • McNeill, David. 1992. Hand and mind. What gestures reveal about thought. Chicago: University of Chicago Press. Google Scholar

  • McNeill, David & Susan D. Duncan. 2000. Growth points in thinking-for-speaking. In David McNeill (ed.), Language and gesture, 141–161. Cambridge: Cambridge University Press. Google Scholar

  • Mittelberg, Irene. 2014. Gestures and iconicity. In C. Müller, J. Bressem, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill & J. Bressem (eds.). Body – language – communication: An international handbook on multimodality in human interaction, 2, 1712–1732. Berlin/Boston: De Gruyter Mouton. Google Scholar

  • Mittelberg, Irene & Linda R. Waugh. 2009. Metonymy first, metaphor second: A cognitive-semiotic approach to multimodal figures of thought in co-speech gesture. In Charles Forceville & Eduardo Urios-Aparisi (eds.), Multimodal metaphor, 329–356. Berlin: Mouton de Gruyter. Google Scholar

  • Mittelberg, Irene & Linda R. Waugh. 2014. Gestures and metonymy. In C. Müller, A. Cienki, E. Fricke, S. H. Ladewig, D. McNeill & J. Bressem (eds.), Body – language – communication. An international handbook on multimodality in human interaction, 2, 1747–1766. Berlin &Boston: De Gruyter Mouton. Google Scholar

  • Müller, Cornelia. 1998. Iconicity and gesture. In S. Santi, I. Guaïtella, C. Cavé & G. Konopczynski (eds.), Oralité et gestualité: Communication multimodale, interaction, 321–328. Paris: L’Harmattan. Google Scholar

  • Müller, Cornelia. 2004. Forms and uses of the palm up open hand: A case of a gesture family? In Cornelia Müller & Ronald Posner (eds.), The semantics and pragmatics of everyday gestures, 234–256. Berlin: Weidler. Google Scholar

  • Müller, Cornelia. 2014. Gestural modes of representation as techniques of depiction. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill & S. Teßendorf (eds.), Body – language – communication: An international handbook on multimodality in human interaction, 2, 1687–1702. Berlin: Mouton de Gruyter. Google Scholar

  • Müller, Cornelia, Silva Ladewig & Jana Bressem. 2013. Gestures and speech from a linguistic perspective: A new field and its history. In C. Müller, A. Cienki, E. Fricke, S. Ladewig, D. McNeill & J. Bressem (eds.), Body – language – communication: An international handbook on multimodality in human interaction, 1, 55–81. Berlin: Mouton de Gruyter. Google Scholar

  • Nobe, Shuichi. 2000. Where do most spontaneous representational gestures actually occur with respect to speech. In David McNeill (ed.), Language and gesture, 186–198. Cambridge: Cambridge University Press. Google Scholar

  • Parrill, Fey. 2010. Viewpoint in speech–gesture integration: Linguistic structure, discourse structure, and event structure. Language and Cognitive Processes 25(5). 650–668. Google Scholar

  • Payrató, Lluís. 1993. A pragmatic view on autonomous gestures: A first repertoire of Catalan emblems. Journal of Pragmatics 20(3). 193–216. Google Scholar

  • Pouw, Wim T. J. L., Jacqueline A. de Nooijer, Tamara van Gog, Rolf A. Zwaan & Fred Paas. 2014. Toward a more embedded/extended perspective on the cognitive function of gestures. Frontiers in Psychology 5. 359. Google Scholar

  • Streeck, Jürgen. 2009. Gesturecraft: The manu-facture of meaning. Amsterdam: John Benjamins. Google Scholar

  • Taub, S. 2001. Language from the body, iconicity and metaphor in American Sign Language. Cambridge: Cambridge University Press. Google Scholar

  • Tversky, Barbara, Julie Heiser, Paul Lee & Marie-Paule Daniel. 2009. Explanations in gesture, diagram, and word. In Kenny R. Coventry, Thora Tenbrink & John Bateman (eds.). Spatial language and dialogue, 119–131. Oxford: Oxford University Press. Google Scholar

  • Wagner, Petra, Zofia Malisz & Stefan Kopp. 2014. Gesture and speech in interaction: An overview. Speech Communication 57. 209–232. Google Scholar

  • Wilcox, Sherman. 2004. Cognitive iconicity: Conceptual spaces, meaning, and gesture in signed language. Cognitive Linguistics 15(2). 119–148. Google Scholar

  • Wilcox, Sherman & André Nogueira Xavier. 2013. A framework for unifying spoken language, signed language, and gesture. Todas as Letras-Revista de Língua e Literatura 15(1). 88–110. Google Scholar

  • Wittgenstein, Ludwig. 1953. Philosophical investigations. Oxford: Basil Blackwell. Google Scholar

  • Wu, Ying Choon & Seana Coulson. 2014. Co-speech iconic gestures and visuo-spatial working memory. Acta Psychologica 153. 39–50. Google Scholar

  • Zima, Elisabeth. 2014. English multimodal motion constructions. A construction grammar perspective. Papers of the Linguistic Society of Belgium 8. http://uahost.uantwerpen.be/linguist/SBKL/sbkl2013/Zim2013.pdf(accessed 11 December 2015). 


  • 1

    Also note that this term does not imply that the speaker has the intention of being creative. 

  • 2

    The intended meaning of a phone handshape may of course go further than mere reference to a phone, e.g., when used as a full pragmatic move, as to say ‘I’ll call you’. The current argument is concerned only with a simple, referential use of this gesture (e.g., as performed while telling a story that involves the action of picking up a phone). 

  • 3

    Note that neither of these gestures has an absolute reference frame: both draw upon the spatial configuration of the hand of the speaker relative to the rest of the body. Where they differ, however, is in whether the real-space physical position of the body matters to the interpretation of the utterance. 

  • 4

    Liddell’s analyses, which extend CG notations with constructs from conceptual blending theory, differs from the ones presented in this paper, which draw on CG notions only. 

  • 5

    This is not to deny that many conventionalized units have iconic/analog features or that analog representations may be susceptible to conventionalization. The point at stake here concerns only those aspects of gestures that generated ‘on the fly’ and meaningful through the construction of an ad hoc iconic mapping. 

  • 6

    ‘Prep’ stands for the preparatory phase of the gesture; the ‘Stoke’ is the most energetic and salient phase of the gesture; the ‘Recovery’ is the phase where the hands are retracted to rest position. 

  • 7

    The iconicity that governs the self-symbolizing tracing is not as straightforward as suggested by the diagram in Figure 4. Because iconicity rarely involves full overlap between form and meaning, some degree of arbitrariness remains as to what aspects of the referent are profiled by the gesture. With the rare exception of cases where a speaker’s hand actually represents her own hand at the moment of speaking, iconic reference involves some degree of schematization. The challenge of capturing the systematicity that exists on this ‘iconicity-internal’ level in a CG-based analysis remains outside the scope of this article. 

  • 8

    It remains an open question how this symbolic structure is most accurately characterized. The use of two hands can also refer to a single object, for instance when the hands are being held with the palms towards each other, as if holding an object. In this case, however, this interpretation is precluded by the handshapes and by the respective orientation of the hands, which do not give the appearance of holding something. 

  • 9

    The meaning of the noun Wendeltreppe ‘spiral staircase’ may already involve experiential knowledge regarding its typical shape and orientation. The gesture makes these characteristics of the referent more specific and marks them as salient. 

  • 10

    It may also be contextually salient for another reason. The current analysis however is limited to the ‘anaphoric’ use of the definite article. 

  • 11

    In fact, the semantics of the gestural component, in the preceding discourse frame, also gets elaborated by the noun; to maintain clarity in the diagrams, these correspondences are not depicted. 

About the article

Received: 2014-12-16

Revised: 2015-08-25

Accepted: 2015-10-28

Published Online: 2015-12-23

Published in Print: 2016-02-01

Funding: We are also grateful for research support from the Netherlands Scientific Organization (NWO; grant PGW-12-39) to the first author and from the Russian Science Foundation (grant #14-48-00067) to the second author.

Citation Information: Cognitive Linguistics, Volume 27, Issue 1, Pages 67–100, ISSN (Online) 1613-3641, ISSN (Print) 0936-5907, DOI: https://doi.org/10.1515/cog-2015-0087.

Export Citation

©2016 by De Gruyter Mouton.Get Permission

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Kasper Kok
International Journal of Corpus Linguistics, 2017, Volume 22, Number 1, Page 1

Comments (0)

Please log in or register to comment.
Log in