“Chunking” spoken language: Introducing weak cesuras


 In this introductory paper to the special issue on “Weak cesuras in talk-in-interaction”, we aim to guide the reader into current work on the “chunking” of naturally occurring talk. It is conducted in the methodological frameworks of Conversation Analysis and Interactional Linguistics – two approaches that consider the interactional aspect of humans talking with each other to be a crucial starting point for its analysis. In doing so, we will (1) lay out the background of this special issue (what is problematic about “chunking” talk-in-interaction, the characteristics of the methodological approach chosen by the contributors, the cesura model), (2) highlight what can be gained from such a revised understanding of “chunking” in talk-in-interaction by referring to previous work with this model as well as the findings of the contributions to this special issue, and (3) indicate further directions such work could take starting from papers in this special issue. We hope to induce a fruitful exchange on the phenomena discussed, across methodological divides.


This special issue
With this special issue, we would like to present current work in Conversation Analysis (CA) and Interactional Linguistics (IL) (see also Section 1.3) on the "units" of naturally occurring talk to a broader linguistic public.¹ While there are methodological (and theoretical) differences to other, earlier approaches, we all share an interest in how language works in human interaction. One aspect of the workings of human language is its being produced in spurts or chunks. Yet, especially when applying chunking models, such chunks are not always easy to separate and thus to identify. Moreover, when studying the fuzzier instances of boundaries of such chunks in more detail, it turns out that participants can use them for interactional purposes. This is what this special issue aims to topicalize thus also pursuing further issues highlighted in Szczepek Reed and Raymond (2013).
In this introduction, we aim to lay out the focus of this special issue in more detail and the methodological approach chosen by the authors in dealing with its various aspects. In addition, we summarize their findings and indicate further directions such work could take. We hope that this special issue serves as a starting point for a fruitful exchange on the phenomena discussed, across methodological divides, to further our common understanding of natural human interaction.

The issue
Spoken language is usually (perceived, and analyzed, as being) produced in smaller spurts, or chunks. The following excerpt illustrates this kind of "chunking".² The relevant audio snippet can be obtained from the supplementary material to this special issue. Transcription conventions follow the GAT2 system, which depicts one intonation unit per line. For further details, please see the Appendix as well as Couper-Kuhlen and Barth-Weingarten (2011). Line numbers are based roughly on seconds into the recording at which the relevant stretch occurs.
(1) Glass tiles (UAK 10004, 2:32-2:34) (overlap simplified) ((AE everyday face-to-face conversation. Marge and Frank are talking about renovating one of their bathrooms. Frank has already said that he is in favor of a certain kind of tiles: "silverish", "light", "aluminumlooking", "glass tiles", or "something similar" down the shower walls; "small ones on the sides" (not shown here), while Marge does not like them and suggested alternatives. Frank is coming back to his suggestion, though, after Marge's mother (in the background) mentioned mosaic tiles.)) There is no doubt that this stretch consists of three chunks of speech, each depicted in one line in the transcript. In linguistics, this observation is usually captured with a unit model. The chunks are conventionally referred to as "intonation units" (IUs; also variously referred to as "tone group" in Halliday 1970, 1985, "tone-unit" in Crystal 1969 in Cruttenden 1997 among others; for a more detailed survey, see Szczepek Reed 2010). They are identified on the basis of a number of external and internal criteria. The external cues include final lengthening, pausing, as well as a change in pitch level and/or pitch direction and anacrusis, i.e., the fast delivery of unaccented syllables at the beginning of a next unit. The internal cues mainly consist in a prominent syllable (the nucleus) embedded in a "coherent" pitch contour, which usually also embraces a number of further stressed and unstressed syllables. The latter are usually less prominent than the nucleus and form the pre-head, head, and tail of the IU (see, for instance, Cruttenden 1997, 27-8, 42-4). At times, some external boundary criteria are added to these, such as aspiration and a possible change in paralinguistic features, such as creaky voice quality (Crystal 1969;Halliday 1970Halliday , 1985also Brazil et al. 1980;Brown et al. 1980 for the so-called British school; also du Bois et al. 1992, 100;Schuetze-Coburn 1992, 1994du Bois 2008 for Discourse-Functional Linguistics). Similar criteria are used in the autosegmental-metrical approach (see, for instance, Beckman and Pierrehumbert 1986;Gussenhoven and Jacobs 2005;Nespor and Vogel 1986;Selkirk 1986). In essence, then, identifying IUs is mainly a matter of applying the list of criteria. In our sample excerpt, for instance, all three candidate chunks contain a focus accent (-SA, -SA, -LU-). Moreover, while the end of the third chunk is less audible due to the overlap, the first two are delimited, among other features, by an audible high level tone and lengthening (last syllables of lines 1 and 2) as well as a small pitch upstep and anacrustic syllables (beginning of line 2).
However, already early accounts of the chunking of spoken language highlighted difficulties in any exhaustive analysis of spontaneous speech: Brown et al. (1980) noted that there are "many occasions where no clear indication is given of tone group boundary markers" (41). Cruttenden (1997) even admitted the identification of intonation-groups to be "something of a circular business" (29), and he highlighted sentence-final adverbials, vocatives, and reporting clauses in particular as notoriously difficult in terms of intonation-group delimitation (35-7). Other authors attribute any exceptions to the "unit rule" to emphasis, grammatical complexity, and speech rate (Quirk et al. 1985, 135, 1602, also Tench 1990). Yet, even that basically just acknowledges the problem: not all prosodic units are easily identifiable.
This also holds for our sample excerpt. When we try to transcribe how it continues after line 3 (for the sound file, see the supplementary material), we may end up with different transcript versions, especially for the stretch of talk 7-12 s into the excerpt. We depict two of the possible versions below with the relevant lines in minimal GAT2 transcript only, for argument's sake.
If we focus on lines 7(-9) of the excerpt, we can see that transcript version (a) represents these as one chunk, while version (b) does so as three chunks. Both these versions can be considered correct, for different reasons. In favor of version (a) speaks that prominent unit-ending cues, like those at the end of line 5, occur only shortly before second 10 again. There is noticeable final lengthening and a high-rising final intonation on "kind" (Figure 1). Because of its marked pitch and duration, "kind" can be considered to bear the focus accent of this unit. However, we can also hear somethough less extensivepitch and tempo changes within this candidate chunk: "tiles" is lengthened and ends with a small final rise, followed by a small pitch downstep to "might". Something similar can be heard with "good" and "depending" a little later in this stretch. It is exactly these observations which transcript version (b) is based on. There these prosodic details serve as the justification for the line breaks notating this stretch of talk as three separate IUs with focus accents on "glass", "might", and "kind" (lines 7-9). Yet, these chunks are less clearly marked than the unit endings in line 5 as well as that preceding the pause in line 10. And, in retrospect, "glass" and "might", while sufficiently prominent from an on-line perspective, are also less prominent than "kind".
As a result, these units and in particular their alleged boundaries are somewhat fuzzy. A similar phenomenon can be found in lines 11-12. There we hear some parameter changes with "floor", but, in retrospect, "slipping" carries the focus accent ( Figure 2). So, within the clearly identifiable chunks, further relevant parameter changes are audible, but are they sufficiently prominent to justify an IU boundary?
Scholars have suggested various solutions to the problem of identifying IUs. These have included, for instance, taking recourse to pausing (Brown et al. 1980, 46; but see, e.g., the clearly unit-internal pause in line 12) as well as syntactic and semantic criteria (Cruttenden 1997, 30;also Crystal 1969, 207; but see, e.g., the syntactically incomplete prosodic units in lines 1 as well as 4-5 in our sample excerpt). Others have suggested assuming a third category ("intonation subunits" in Du Bois et al. 1992, 68; also the intermediate phrase "ip" in the ToBI system, Beckman and Pierrehumbert 1986, for instance), but what if there are even more degrees, or if we only observe some of the parameter changes relevant for these unit types?
Hence, all these suggestions are variously problematic, for reasons of circularity and lack of granularity when it comes to fuzzy boundaries, i.e., instances of units that have something that is less than a full  , spectrogram (middle), with overlaid pitch (blue) and intensity (yellow) of lines 11-12, annotated with wording, pausing, and other sounds (generated in PRAAT, Boersma and Weenink (2020)). boundary but more than no boundary at allan unsatisfactory situation when such cases are apparently not infrequent. And indeed, as Auer (2010) points out, the situation as such does not come as a surprise. The main reason for this is that in the unit approach it remains unclear whether one or several of the criteria mentioned for identifying IUs need to be met. Instead of an operationalization, the unit model suggests a complex heuristic. Its application to the data requires "erhebliches Geschick [considerable skill]" (8, see also Du Bois et al. 1992, 100) and even that does not always lead to clear decisions (Auer 2010, 8). Hence, the structuralist, unit-based approach to analyzing spoken language as such is an analytic problem, and as a result, the notion of "unit" becomes a rather questionable one (Deppermann and Proske 2015) and a challenge for novices to prosodic-phonetic chunking, but also for more experienced scholars, in particular, when their analyses go beyond idealized versions of talk.
Solving such analytical issues may then indeed, as Schegloff (1996a) put it, require "stretching the old linguistics to meet the challenge of talk-in-interaction […] search with fresher eyes and ears, in the details of the talk with which we must, in the end, come to terms" (114). Rather than seeing talk-in-interaction as in some way defective or imperfect, we would then understand it as emerging, with all its specifics, from many indigenous practices to deal with the challenges ("contingencies") of talking in real time, with one or more co-participants, and in a local environment that is "in flux" and brought about by the very fact that talk is produced in interaction.

The methodological approach: Conversation analysis and Interactional linguistics
Emanuel A. Schegloff's advocating "fresher eyes and ears" comes from his particular interest in talk-ininteraction. Together with Harvey Sacks and Gail Jefferson, the developed Conversation Analysis (CA)an approach that is influenced by sociology but interested in how the in situ social order of everyday interaction is accomplished by the participants themselves on the basis of their shared understanding of social order (see also Heritage 1984). CA starts from the assumption that the participants' main aim is to maintain intersubjectivity and accomplish actionhence inter-action (for detailed introductions to CA, see, e.g., Hutchby andWooffitt 1998, Sidnell 2010, also the papers in Sidnell and Stivers 2013, among many others). "Why that now?" summarizes CA's research interest. To solve this question, CA has, for one, mainly considered the sequential organization of talk, i.e., the orderliness of the organization of utterances and actions and described a range of generic orders of organization to solve the "contingencies" participants face regularly in everyday and institutional talk. These include the issues of turn-taking, action formation, sequence organization, overall structural organization, word selection, and the practices of repair (Schegloff 2007, xiv). Second, in studying these, CA adheres to a number of basic methodological assumptions, among them: -Conversational interaction is "the basic and primordial environment for the use and development (both ontogenetically and phylogenetically) of natural language" (Schegloff 1996a, 54; also Heritage 1984, 234-8). Also therefore, CA studies are intrinsically empirical and investigate "raw" data, i.e., audio and video recordings of naturally occurring interactions in their natural environment. -In collections of instances of a phenomenon (Schegloff 1996b), "the locus of order is the single case" (Schegloff 1987, 102). Hence, CA cannot disregard deviant cases (Hopper 2008;Wootton 1989). Moreover, quantitative analysis, if needed, may only follow thorough and exhaustive qualitative analysis (Schegloff 1993a;Zimmerman 1993;but cf. Heritage 1999;also Mayring 2001;Barth-Weingarten 2003;Heritage 2004;Clayman et al. 2006). -In its interactional exigencies, everyday talk is systematically organized; this order is displayed in the participants' systematic use of shared sets of methodic practices and procedures for talking (or interacting), whose sharedness and use they rely upon and for which they hold each other mutually accountable. By their using these practices, the latter also become visible to the analyst. Next to microanalysis, the analyst can therefore also draw on sequential analysis and the "next turn proof procedure" to examine participants' own displayed treatment and understanding of one another's talk through features of linguistic design and the next action embedded in the sequential context. -There is order at all points (Schegloff and Sacks 1973, 290), so we cannot disregard any observation, even if it seems irregular at first sight. "[N]o order of detail can be dismissed, a priori, as disorderly, accidental or irrelevant" (Heritage 1984, 241).
In sum, then, CA seeks to explain both the single case and its details in its sequential context as well as the aggregate to identify robust interactional practices across specific instances. As such, it also presents itself as an ideal methodological starting point for tackling the details and challenges of prosodic chunking. The more so, as CA hassince the 1990sbeen joined by interactional linguistics (IL)an approach which combines CA's methodological tools and assumptions with a specific interest in linguistic phenomena and adds further linguistic methods where useful (for details, see, for instance, Couper-Kuhlen and Selting 2018). From the very start, this has also included a strong interest in prosody (see, e.g., Barth-Weingarten et al. 2010;Couper-Kuhlen and Selting 1996;Couper-Kuhlen and Ford 2004;Selting and Couper-Kuhlen 2001), inspired by the York Phonology/Phonetics for Conversation approach, which advocated impressionistic listening and a parametric approach already in the 1980s (Kelly and Local 1989a,b; also Walker 2004, 1400).
This special issue is adopting the CA/IL approach to chunking spoken language. All papers discuss issues of chunking of natural talk-in-interaction and they consider the issues raised in a qualitative way. Where there is quantification, it is done only after detailed single-case analyses and in consideration of the potentially relevant details of the sequential placement of the target phenomenon. This approach is guided by the desire to understand the details of everyday talk, in everyday communicative situations, without the restrictions of, for instance, elicitation or laboratory equipment. The investigation includes the "order" of the details of prosody and phonetics. Thus, while acoustic analyses with PRAAT (Boersma and Weenink 2020), for instance, serve for illustrative purposes, prosodic-phonetic details are first studied auditorily in their indigenous environment and grounded in a close interactional and sequential analysis of the unfolding talk.
It was on this basis that Barth-Weingarten (2016) developed an alternative approach to the chunks audible in longer stretches of spoken language in order to solve the problems mentioned in Section 1.2.

The cesura model
More promising attempts at solving the problems of chunking began attributing more relevance to the boundaries between the chunks, rather than the chunks themselves (Peters et al. 2005;Mertens 2006;Degand and Simon 2009;also Auer 2010;Barnwell 2013). Yet, these boundary approaches still arrive at chunks in the end (but see Mertens and Simon 2013). The cesura approach (Barth-Weingarten 2016), in contrast, "radicalizes" the focus on the "boundary signals" and considers the chunks which the unit approach, and in essence also the boundary approach, focus on (Figure 3a and b), mere epiphenomena, the perceptual result of prosodic-phonetic parameter changes. In the cesura approach, in contrast, the focus lies on the prosodic-phonetic parameter changes themselves (changes in pitch, loudness, tempo, voice quality, vocal-tract configurations, etc.), but also their extent as well as where and how they cluster and thereby create "cesuras" (Figure 3c). The term "cesura", then, refers to all kinds of discontinuities in the flow of talk produced by prosodic and phonetic parameter changes.
This understanding has two consequences. For one, the term "cesura" is not restricted to what has been studied under the heading of IU boundaries but can be employed to studying the variously sized "chunks" of the unit approach, including, but not restricted to, the concept of IU. Hence this concept also embraces the "beginnings" and "endings" of chunks larger than the IU (see, for instance, "spoken paragraphs" in Lehiste 1980, "paratones" in Yule 1980, "pitch sequences" in Brazil 1997 as well as of chunks smaller than the IU (see, for instance, "intonational phrase" [IPh] and "phonological utterance" in Beckman and Pierrehumbert 1986;Gussenhoven and Jacobs 2005;Nespor and Vogel 1986;Selkirk 1986) down to word, and even segment, boundaries.
Second, focusing on what comes between the "chunks" introduces a greater granularity in studying "boundaries" and how they are created. Parameter changes such as pitch and loudness changes may occur on their own, but more frequently they clusterconsider, for instance, the range of changes in the vocaltract configurations involved in moving from one phonetic segment to the next (conventionally referred to as "segment boundaries"), but also the set of prosodic-phonetic cues, i.e., parameter changes, between (phonological) words, phrases, and IUs for instance (conventionally referred to as "word", "phrase", and "IU boundaries", respectively). In general here, the more parameter changes occur at one point in talk, the stronger the perception of the cesura will be. As a result, the concept of "cesura" includes all sorts of variously strong "breaks" (junctures), ranging from stronger to "regular", weaker, and weakest cesuras (see Figure 3c again).
This focus on what comes between chunks, rather than the units as such, and the resulting greater granularity are then also what in the end justifies speaking of a shift of perspective with the cesura approach as compared to the earlier unit, and also the boundary, approaches. And it is this focussing on "what comes between, and creates, the chunks" that also helps us explain, and capture, phenomena such as that of fuzzy IU boundaries: They are understood in terms of weak(er) cesuras, created by changes of fewer parameters and/or smaller extents of their changes at certain points in talk.

Notating weak(er) cesuras
Depending on the research focus, the parameter changes can be notated in variously granular fashions. A parametric notation (Figure 4a) shows greatest granularity, while a Gestalt notation is least granular but most efficient (Figure 4b). The latter uses "|" (a single vertical bar (character code: U + 007)) to notate weak(er) cesuras and can be further adapted to the granularity required by a certain study by distinguishing "|" and "||" etc.
Awareness of the phenomenon and being able to notate it allow us to take note of all cesuring phenomena, no matter whether big clusters or single parameter changesor anything in between for that matter. Hence, the cesura approach also allows us to systematically attend to weak and fuzzy "boundaries", i.e., points which previously were often either ignored in transcription and analysis or forced into one of the two categories ("IU" vs "non-IU", for instance). The advantage of attending to them is obvious: once we can not(at)e such points, we can also study them, moving closer to answering questions such as: How are weak(er) cesuras produced? Which parameter changes are decisive? What functions do weak(er) cesuras have? 2 Gains from "taking a closer look" into prosodic-phonetic chunking Employing a more granular approach allows further insight into prosodic-phonetic organization, as will be briefly illustrated in the following on the basis of previous work (Section 2.1) as well as papers in this special issue (Section 2.2).

Previous work on prosodic-phonetic "cesuring"
The cesura approach already helped complement earlier observations. Barth-Weingarten (2016, 93-177), for instance, ascertained cesuring parameters on the level of turn-final IUs in American English with CA/ILtype evidence. Most minimally these cues include a prominent syllable which is sufficiently marked by pitch, loudness, and/or duration, a noticeable final pitch movement rising or falling across at least 2ST, often more. With falling final pitch movements this is often accompanied by glottalization, predominantly on the final (two) syllable(s)). This cluster is, moreover, regularly complemented by lengthening of mainly the utterance-final syllable to an extent that is often greater than the lengthening of the prominent syllable occurring in non-final position, a loudness/intensity decrease, often to lower than before in the Turn-Constructional Unit (TCU; see Sacks et al. 1974  Moreover, Barth-Weingarten (2016, 179-218) also showed that the end of turn-internal chunks as well as chunks within such chunks regularly exhibit clusters of features which resemble those for turn-final IUs in kind but are reduced in number and/or extent. Hence, we now also have CA/IL-type evidence that participants have prosodic-phonetic cues as to "where they are" in a longer exchange.
This already highlights that the chunking issue is not just an analytic one, but that participants can use weak(er) cesuras to accomplish certain functions in talk-in-interaction (for more details, see Sections 2.2 and 3). Moreover, when we notate weak(er) cesuras more systematically, we lay the foundation for not only extending our understanding of the prosodic-phonetic dimension but also for living up to the CA/IL principle of assuming order-at-all-points and pursuing the question: Are weak(er) cesuras interactionally relevant, i.e., do they constitute, or contribute to, practices that speakers use to pursue particular interactional ends? These latter issues then are exactly the questions this special issue intends to pursue further.

Papers focusing on prosodic-phonetic "cesuring" in this special issue
The paper by Dagmar Barth-Weingarten, Uwe-Alexander Küttner, and Chase Raymond, for instance, revisits what previous research labelled "pivots"syntactic units that are connected by using one and the same lexico-syntactic item at their common juncture such that it constitutes both the end of the first unit and the beginning of the next at the same time. The component units of pivots are typically considered to be prosodically smoothly through-produced, such that also in this dimension the completion of the first unit simultaneously launches the second, allowing the speaker to continue past a possible transition-relevance place (TRP, Sacks et al. 1974). However, the paper shows that pivot constructions exhibit systematically varying degrees of prosodic-phonetic integrationi.e., stronger and weaker cesuraswhich can be ordered on a cline. What is more, also the action dimension exhibits a cline, and it parallels the one in the prosodicphonetic dimension. Tighter prosodic-phonetic integration of the actions accomplished by the components of the pivot construction co-occurs with rather retrospectively oriented post-pivot actions, looser prosodicphonetic integration is associated with a more prospective orientation of the post-pivot's action.
The paper by Elizabeth Couper-Kuhlen investigates weak(er) cesuras in the particle combination OH+OKAY. For this, she focusses on a specific sequential context: informings in recent everyday American-English talk-in-interaction (telephone and face-to-face). Here, the particles OH and OKAY are regularly used together in so-called "third position" to respond to information solicited by the OH/OKAY speaker in a prior turn. The paper shows, first, that different from simple OH and simple OKAY, the integrated OH+OKAY combinations regularly respond to the informing as not only supplying the information solicited but also accomplishing another action made relevant by the previous turn, such as dealing with a complaint, correcting a misbelief etc. Second, such OH+OKAY combinations can occur with varying degrees of prosodic-phonetic integration, ranging from juxtaposition of OH and OKAY with a strong cesura between them to fusion of the components with weak and weakest cesuras to occurrence as an indivisible whole. As long as the two parts are still distinguishable, both OH and OKAY make their individual contribution to the turn action-wise. Weak(er) cesuras here then seem to be a symptom of language change, the emergence of the particle combination as a relatively recent phenomenon, with OH and OKAY being integrated more closely due to the more frequent combined use (see also Bybee 2001, for instance). Nevertheless, differing cesuring strength also seems to be usable strategically: with OH+OKAYs with a strong cesura, OKAY still carries its full weight as a sequenceclosing third. OH+OKAYs with a weak cesura, in contrast, may be deployed to construct multi-unit turns, while those with weakest cesuras serve as turn-or TCU-initial prefaces. Moreover, fused OH+OKAY can be used as a means to ambiguate epistemic positioning with a turn and thus buy the participants more space to manoeuvre. Differing cesuring strengths thus may indeed be a relevant design feature with OH+OKAY.
Also Simona Pekarek Doehler studies weak(er) cesuras in a process of language change. With French conversational data, the paper examines a recurrent pattern in which a question-word question is followed by a candidate answer. The candidate answer, however, follows its question with different degrees of proximity. Pekarek Doehler observes a continuum of synchronic usage, ranging from (i) the pattern being implemented as two TCUs separated by a gap, to (ii) cases which display weak(er) cesuras due to speakers' more or less rushing into the next unit, to (iii) a variant in which both components are integrated in one TCU with one prosodic contour. As with the previous papers, the different degrees of integration also have repercussions on the action dimension: While cases of type (i) are used when participants pursue a response, the TCUs of type (ii) serve to propose the candidate answer as tentativea guess, accomplished "on the fly". With type (iii), the speaker delivers just one action, proffering a highly tentative guess. Pekarek Doehler argues that the three variants are connected in that the integrated format (iii) originates in the repeated interactional sequencing of the two originally subsequent actions of type (i) via type (ii). As with the particle combination in Couper-Kuhlen's paper, weakening of the cesuras is, thus, the routinized product of a frequent combination of language items in use. In Pekarek Doehler's case, it emerges from an interactionally motivated two-unit sequential pattern. Her findings support an understanding of interaction as a driving force for the routinization (or: grammaticization) of patterns of language use, which becomes even more visible when we study the fine details of talk in the prosodic-phonetic dimension. It also strikes us as interesting that in both these papers dealing with instances of language change, the patterns studied still show traces of the merger of the original components in the form of weakest cesuras between the original parts. Not(at)ing weak cesuras may thus also be considered a tool for identifying such cases of language change (for a similar phenomenon with the emergence of a final particle, see also Barth-Weingarten 2016, 221-40, for instance).
Finally, it needs to be noted that both Couper-Kuhlen's and Pekarek Doehler's studies also open up our view to other linguistic dimensions. For one, their observations highlight the divergence of lexical and prosodic units (fused version of OH+OKAY in Couper-Kuhlen's paper) as well as syntactic and prosodic units (type iii of the question-word question+candidate response pattern in Pekerak Doehler's paper). Second, Pekarek Doehler brings the kinetic dimension into the picture in that in her types ii and iii body movements and gaze behavior contribute to delivering the component parts as integrated. And indeed, extending the cesura approach to other linguistic dimensions (verbal and non-verbal) would be a useful next step, as will be argued in Section 3 and shown by these and other papers in this special issue.
3 Beyond prosodic-phonetic cesuras 3.1 Lexico-syntax While the cesura model as such has been developed on the basis of, and for, the prosodic-phonetic dimension, it may also suggest a way of approaching other dimensions relevant in human interaction. Thus, we may, for instance, fruitfully expand the picture to other verbal dimensions. Couper-Kuhlen's paper in this special issue on the OH+OKAY particle combination already highlights an aspect of the role of weak(er) cesuras in the lexical dimension (see Section 2) (see also Szczepek Reed 2015).
Pekarek Doehler's paper, in turn, highlights cases where two syntactic units occur under one prosodic contour. Hence, specific syntactic and prosodic units may have a more intricate relationship than has hitherto been acknowledgedsee also, for instance, the syntactically incomplete prosodic units in lines 1 and 4-5 in our sample excerpt. A systematic approach to "syntactic cesuring" based on CA/IL principles then, for instance, may also take Auer's (2009) "online syntax" as a starting point. Further relevant phenomena include turn-offering conjunctions (Barth-Weingarten 2016, 221-40; Koivisto et al. 2011;Mulder and Thompson 2008) and the questioning of syntactic categories in Ford et al. (2013, also Ono andThompson 2020).
Xiaoting Li's paper, on the other hand, contributes to the discussion of weak cesuras by investigating syntactically incomplete TCUs/turns creating fuzzy boundaries with negative assessments of the recipient and non-present parties. Particularly, she explores how the speaker, at points of syntactic incompletion, uses bodily-visual conduct to engage the recipient in negatively assessing others (see Section 3.2 for further details). This speaks to the fact that the kinetic dimension, too, needs to be added to complete the picture of how (dis)alignment of the dimensions can help the participants accomplish interactional goals together.

Kinetics
Kinetics is a dimension of human interaction that is available to the participants due to their co-presence and consequent mutual perceptibility in face-to-face interaction. Here, too, the cesura approach could help pave the way for dealing with the complexities of this dimension in that it guides us to see also the various bodily behaviors in a parametric way.
Research has shown that non-verbal bodily behaviors align with talk, while adding other dimensions of meaning to it (see, for instance, Kendon 1972Kendon , 2004McNeill 1992McNeill , 2005Mondada 2006; Mondada and Oloff 2011;Streeck 2009;Li 2014). This alignment, however, is complex, not least because bodily behaviors are also complex, embracing e.g. facial expression, manual gestures, body movements, etc. One fundamental issue in such studies is the "binding" between speech and non-speech events. For example, Streeck and Jordan (2009) highlight that gestures often start before the verbal formulation of their "referent". Kaukomaa et al. (2013Kaukomaa et al. ( , 2014 explore how smiles and frowns work in turn-initial position. Sikveland and Ogden (2012) show that gestures may be held across several turns at talk, until the action that coincided with the start of the gesture is closed off. Loehr (2012) shows that gesture peaks tend to align with intonation peaks, and gesture phrases with intermediate phrases. Holler and Levinson (2019) explain the binding problem in terms of multimodal Gestalts, where gesture and speech are bound together and enhance comprehension: multimodal speech is processed more rapidly than unimodal speech. What this shows is thatwhile there are still lots of unchartered terrain in this dimensionthe alignment of important events in different modalities iswhile close enough in time so that they are often perceived as belonging together, as part of a cluster of related eventsnot so precise as to be exactly synchronous. Discontinuities of movements, where they can be identified, are not all convergent, and certainly not so across all modalities. This is also illustrated by our sample excerpt, which, after all, comes from face-to-face talk-in-interaction. Multimodal transcription conventions follow Mondada (2018).
(1") Glass tiles (UAK 10004, 2:38-2:40) (multimodal transcript) As line 5 shows, gesture peaks can also occur later than their "referent" (see Frank's "little" gesture, which occurs on "tiles"). And body movements are not tied to verbal behavior as they can also occur instead of the latter (see Marge's reactions to Frank's suggestion in lines 5b and 7a).
Indeed, scholars studying talk-in-interaction have rather felt the need to introduce specific terms for those points in talk where linguistic dimensions converge (see "complex completion point" in "Zäsur" in Auer 2010; see also , Selting 1996, for instance). And Ford (2004) already suspected that "different facets of turn projection may be deployed to be precisely non-convergent" (31, our emphasis; see also Clayman 2013, 158), in other words, that cases of divergence may be functional and thus interactionally relevant.
Also the papers in our special issue reflect this. Verónica González Temer and Richard Ogden set out to explore the (non)convergence of cesuras on the various verbal and non-verbal dimensions in Chilean Spanish conversation. With the example of the particle MM in face-to-face tasting sessionsin particular MMs marking incipient speakership and gustatory MMsthey show that vocal and non-vocal unit boundaries often disalign and argue that this serves as a semiotic resource in action ascription. As a possible multiple-meaning particle, MM is, for one, suited to show that various linguistic and non-linguistic dimensions contribute to its meaning, including sequential position, phonetic design, and kinetic behavior. Second, its embodied boundaries do not always coincide with the verbal onesresulting in weak cesuras. The non-aligned boundaries of verbal and non-verbal phenomena are connected by larger temporal domainsstretches, rather than points, of talkand thereby help create windows of opportunity. The latter provide participants with chances to do different actions at the same time, such as marking incipient speakership, indicating what the topic of that will be, securing recipiency, displaying a stance, and mobilizing response, with what at first sight seems to be the same token. Hence, divergent boundaries, Temer and Ogden argue, should be considered a source of informational richness.
Li, too, focusses on the divergence of these main dimensions. Based on an examination of some 5 h of everyday Mandarin face-to-face conversations, she shows how such divergence can be used as a resource by the participants when speakers make negative, and thus socially inappropriate, assessments of the recipient and a non-present party. They leave the assessments syntactically incomplete, while at the same time deploying prosodic cues, such as lengthening, pause, and bodily-visual resources, such as facial expressions (lip pursing, eyebrow furrows), head shakes, and postural shifts, to display their evaluative stance and to solicit, and thus manage, recipient participation to collaboratively produce the assessment. Li argues that this displays the speaker's orientation to the social-interactional inappropriateness of the assessments, which is an interactional, rather than a production, problem. The divergence of the dimensions, thus, has a clear function. Disregarding it for the sake of focusing on interactional units as such would make us miss out on the ways participants accomplish such subtle interactional tasks.
In the light of studies such as these and their findings, we would like to propose that a better way to see major events of conversation, such as TRPs, might indeed be as "windows of opportunity". They may emerge "fuzzily", with weak cesuras within and across modalities (see the prosodic-phonetic cesural areas in Barth-Weingarten (2013, 2016 and the multimodal phenomenon studies by Temer and Ogen in this special issue). As these "windows" emerge, they make (conditionally) relevant particular kinds of next actions (in terms of action accomplishment, but also in terms of "discourse/structural" organization, such as turn-taking). At the same time, such events are also evanescent, i.e. the opportunity for action fades after some time. Thus, with TRPs, in hearing a turn at talk in real time, a coparticipant monitors the talk, and other bodily conduct in face-to-face interaction, for cues which mark a place in time when they may come in with their own talk. This "place" is not a precise point, but rather a short stretch of time when the relevance of incoming talk and speaker transition is high. It builds up gradually, and there are also practices for a current speaker to close, or even prevent, this window and project their own continued talk while the TRP is building up (see the phenomenon of "rush-through" in Schegloff 1982; also Local and Walker 2004;Walker 2010, for instance). For multimodal talk, see the phenomenon studied by Temer and Ogden. At the same time, once that window is in place, it is not permanent and after a while (typically a fraction of a second), unless work is done to keep it open, it "closes", and the process of speaker selection, for instance, starts again.
Pushing this approach even further, Oliver Ehmer and Daniel Mandel come to understand the verbal and non-verbal details of transitions to joint enactments in German face-to-face story tellings as kinetic kinds of "cesural areas". Such enactments involve both a verbal quote and prosodic and bodily conduct that depicts the enacted figures. Looking into the precise realization of the transitions to the actual enactment, Ehmer and Mandel observe that, often, the gist of such enactments is already foreshadowed with prosodic and bodily cues in the talk preceding the actual quote in specific ways. For one, the cues of the various dimensionsprosodic and bodily parameter changesoccur at various places across the relevant stretch of talk (see also Temer and Ogden's as well as Li's paper). Moreover, just like with prosodic-phonetic "cesural areas" also the various multimodal cues (head movement, gaze etc.) are not necessarily aligned in their onset. Ehmer and Mandel argue, for one, that the spread of the phonetic and kinetic cues may lead to the perception of fuzzy boundaries, or weak cesuras, of the enactments. Second, these contribute to the building up of an "opportunity space" (Lerner 1991, see also "window of opportunity" above) for the enactment being accomplished jointly by the co-participants, which allows the latter to mutually take up, and elaborate on, and thus develop further, each other's prior contributions. The fuzzy boundaries of the foreshadowings of enactments, for instance, help to jointly reconstruct what has happened or has been said (epistemic), negotiate a stance (deontic), and create involvement (emotional alignment). Ehmer and Mandel thus expand the "cesura approach" to multimodality even more radically. At the same time, they provide very practical examples of how this phenomenon could be analyzed, and how the results of such an analysis could be depicted.

Other "chunks" in talk-in-interaction
Last but not least, how we tackle these phenomena may also have consequences for our modelling of talkin-interaction in general. So far we employ "action" (for instance, Levinson 2013), "practice" (for instance, Couper-Kuhlen and Selting 2018, 29), "turn", and "TCU" (for instance, Clayman 2013; Drew 2013) as basic concepts in the description of how participants organize talk-in-interaction to accomplish action. Many of these concepts, however, are based on unit concepts. A TCU, for instance, is based on syntactic units ("sentences or clauses more generally, phrases, and lexical items" (Schegloff 2007, 3)), "intonational 'packaging'" (4), i.e. IUs, and "it constitutes a recognizable action in context" (4; also Selting 1995; Thompson and Couper-Kuhlen 2005;Levinson 2013, 107;Couper-Kuhlen and Selting 2018, 29, for instance). While we need labels to refer to these features, already their labellingsentence, clause, phrase, item, package, actionas such suggests that we are dealing with clear-cut entities, which, once we look at things more closely, may not typically be the case. Some papers in this special issue demonstrate that the cesura approach can, for instance, also lead to insights concerning further relevant "chunks". Couper-Kuhlen's paper highlights a composite particle, Pekarek Doehler's paper focusses on a composite clause format, both with specific functions. And the paper by Barth-Weingarten, Küttner, and Raymond, for instance, demonstrates how a more fine-grained prosodic-phonetic analysis may lead to an appreciation of "inbetween" concepts such as that of "subsidiary actions" (see also Auer 2007), i.e. an action that, rather than being clearly separable from the surrounding ones, retrospectively elaborates the main action preceding it, and thus neither represents a different version of "the same action", nor a "fully independent" or "discrete" next action. The idea of describing "action" in terms of a cline connects this research to other scalar reconceptualizations of longstanding interactional phenomena in CA/IL, such as conditional relevance and the mobilization of response.

Conclusion
In this way of seeing things, then, non-convergent boundaries and weak cesuras are not problematic, but a phenomenon relevant for both scholars and participants. No doubt, such detail in the production of talk is a challenge to describe, but at the same time it is a tool for research and it can apparently even be a resource for social action. Shifting our attention from identifying boundaries (and with boundaries, units), toward a developing understanding of the orderliness of the (fine) details of talk helps us to appreciate the processes and emerging relevancies of talk-in-interaction. With this special issue, we extend an invitation to collaborate in, and contribute to, this analytic endeavor.
The editors of this special issue would like to thank the participants of the international workshop "Divergent units", held at Potsdam (Germany) in September 2018 and the panel "Divergent units and fuzzy boundaries" at IIEMCA 2019 in Mannheim (Germany) for their willingness to embark on this expedition with us, sharing their thoughts and ideas on fuzzy boundaries and weak cesuras. This holds in particular for the contributors to this special issue, who also commented on first versions of the papers published here. Moreover, our thanks is due to colleagues who were willing to share insights from their perspectives when commenting on the papers submitted for this special issue (in alphabetical order): Michael Ashby, Pia Bergmann, Steven Clayman, Arnulf Deppermann, Davydd Gibbon, Marja-Liisa Helasvuo, Elliott Hoey, Jill House, Lorenza Mondada, Beatrice Szczepek Reed, Giovanni Rossi, Ann Wichmann, Nick Williams, and Margret Zellers.
Acknowledgements: The authors would like to thank Elizabeth Couper-Kuhlen and Uwe-A. Küttner as well as two anonymous reviewers for their helpful comments on earlier versions of this introductory paper.
Funding information: The workshop "Divergent units", held at Potsdam (Germany) in September 2018, was funded by the German Research Foundation DFG (GZ BA 3495/5-1).
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.