Given its usage-oriented character, Cognitive Grammar (CG) can be expected to be consonant with a multimodal, rather than text-only, perspective on language. Whereas several scholars have acknowledged this potential, the question as to how speakers’ gestures can be incorporated in CG-based grammatical analysis has not been conclusively addressed. In this paper, we aim to advance the CG-gesture relationship. We first elaborate on three important points of convergence between CG and gesture research: (1) CG’s conception of grammar as a prototype category, with central and more peripheral structures, aligns with the variable degrees to which speakers’ gestures are conventionalized in human communication. (2) Conceptualization, which lies at the basis of grammatical organization according to CG, is known to be of central importance for gestural expression. In fact, all of the main dimensions of construal postulated in CG (specificity, perspective, profile-base relationship, conceptual archetypes) receive potential gestural expression. (3) CG’s intensive use of diagrammatic notation allows for the incorporation of spatial features of gestures. Subsequently, we demonstrate how CG can be applied to analyze the structure of multimodal, spoken-gestured utterances. These analyses suggest that the constructs and tools developed by CG can be employed to analyze the compositionality that exists within a single gesture (between conventional and more idiosyncratic components) as well as in the grammatical relations that may exist between gesture and speech. Finally, we raise a number of theoretical and empirical challenges.