Sequences of normative evaluation in two telecollaboration projects: A comparative study of multimodal feedback through desktop videoconference

Marco Cappellini and Brahim Azaoui


In our study we analyse how the same interactional dynamic is produced in two different pedagogical settings exploiting a desktop videoconference system. We propose to focus our attention on a specific type of conversational side sequence, known in the Francophone literature as sequences of normative evaluation. More particularly, we analyse data from two telecollaborative projects through desktop videoconference: a French-Chinese tandem, and a French-Irish telecollaboration between trainee teachers and learners of French as a foreign language. The comparison of two different pedagogical settings allows us to understand what types of interactional dynamics are co-constructed through the desktop videoconference environment and which characteristics are specific to each pedagogical setting. Within a socio-interactionist perspective, we analysed four hours of interactions, focusing particularly on the transmodal enactment of the sequences under scrutiny and its relation to learners’ uptake. Our results show on the one hand that there are some quantitative differences in the production of sequences of normative evaluation between the two pedagogical settings, and on the other hand that, contrary to our hypothesis, the co-construction of these sequences does not differ in multimodal density across the two contexts. We discuss these results and propose some tentative explanations for them.

1 Introduction

The present article proposes to consider how the pedagogical setting mediates the realization of specific conversational dynamics in language learning interaction mediated by desktop videoconference (DVC). To the best of our knowledge, all the studies in the CALL field that addressed language learning through desktop videoconference either focus on just one pedagogical setting (Cappellini and Rivens Mompean 2015; Develotte et al. 2010; Hampel and Stickler 2012; Wang 2006 among others) or they compare desktop videoconference and other forms of CMC within the same setting (Bower and Kawaguchi 2011; Sotillo 2000 among others). In other words, none of these studies addressed the question of how a change in the pedagogical setting may or may not affect interactional dynamics in a given CMC environment, and a fortiori within a desktop videoconference environment. Our study aims to fill this gap.

To do so, we consider a data corpus from two pedagogical settings coming from different models: a teletandem, i. e. tandem (Lewis and Walker 2003) through desktop videoconference (Telles 2009) on the one hand, and a Français en (première) ligne-based (Develotte et al. 2007) pedagogical setting on the other hand. Our hypothesis is that a comparison between the interactions within these two pedagogical settings may tell us about how the same CMC environment supports different dynamics in relation to the task and the social roles of the participants. However, to develop a comparison of interactions in different pedagogical settings entails addressing epistemological issues in order not to compare what is incomparable (Gadet and Wachs 2015). This is why we have selected a particular conversational dynamic, the collective realisation of a specific type of conversational side sequence, which has been well documented within the Francophone studies on exolingual communication [1]: sequences of normative evaluation (see below). Through this comparison, our article aims therefore to identify which interactional dynamics related to language teaching and learning are realisable in a DVC environment and which are specific to a given pedagogical setting.

In the next section, we illustrate the two pedagogical settings. In Section 3, we explain our theoretical framework. Section 4 deals with the methodology we elaborated for this study. In Section 5, we analyse our data and discuss the findings. Finally, Section 6 will draw some conclusions.

2 Contexts

In our study we analyse data from two pedagogical settings. The first one is the Teletandem Dalian-Lille project (Cappellini 2014a, 2016), a teletandem between third year undergraduate students learning French at the Dalian University of Foreign Languages in China and first year graduate students learning Chinese at the Lille3 University in France. Students had an intermediate proficiency level in their target language. More precisely, French students had a B1 or B2 proficiency level in Chinese, while Chinese students had a B2 proficiency level in French, as defined by the CEFR (Council of Europe 2001). The Chinese students were enrolled in a French language and civilisation program, while the French students were specialising in international relations. None of them were trainee teachers. This telecollaboration was based on the tandem model (O’Rourke 2007), where each student uses the target language half the time and their mother language half the time in order to improve their skills and help the partner in their learning. Since teletandem is based on the principle of learner autonomy, each student in this setting had an individual set of activities, shaped as a conversation-for-learning (Kasper 2004. See Cappellini (2014a) for further details). As for the desktop videoconference environment, participants used Skype, either at their residence or in a cyber-café. The data we analyse come from two cycles of the Teletandem Dalian-Lille, which ran respectively between November 2010 and February 2011 (14 pairs) and between March and May 2012 (4 pairs).

The second pedagogical setting is the ISMAEL telecollaboration (Guichon et al. 2014), which consists of CMC interactions between 12 native French postgraduate trainee teachers and 18 Irish learners of French with a B2 level enrolled in economics. The training sessions tackled six topics related to business: (1) working life in France; (2) professional experience; (3) preparing a placement in Reims; (4) project management; (5) project implementation and (6) job interviews. The training sessions – each lasting about 45 min – ran from October to December 2013 exploiting a desktop videoconference environment called VISU (Guichon et al. 2012). The students were at their universities during the interactions.

3 Theoretical framework

In our study, we draw on work within what Mondada and Pekarek Doehler (2004) call a socio-interactionist perspective, based on the Francophone tradition of socio-interactionist studies on exolingual communication (Pekarek Doehler 2014). Such a perspective, in contrast to the input-interaction framework (Gass 1997), is based on the theoretical foundations and methodological tools of conversation analysis (Sacks et al. 1974; Markee 2004) on the one hand, and of sociocultural theory (Vygotsky 1978; Lantolf and Thorne 2006) on the other hand. In other words, in our perspective language learning is situated in social interactions and these social interactions may be considered as the place where co-construction of social reality can be observed. Moreover, (language) learning is conceived of in terms of internalisation of symbolic instruments, such as language itself, which takes place through interaction between learners and their social environment (Lantolf and Thorne 2006).

3.1 Sequences of normative evaluation

Within the socio-interactionist approach, many authors have identified different types of conversational side sequence (Jefferson 1972) in which the mediational work of joint attention driving the internalization of symbolic instruments at the microgenetic level (Wertsch 1985) can be observed. From among these types of side sequence (see Cappellini 2016), in this study we selected “sequences of normative evaluation” (séquences d’évaluation normative, hereafter SEN) identified and described by Py (2000). Py defines SENs as side sequences starting when a native speaker/expert interlocutor takes the unsolicited initiative to repair the utterance of a non-native speaker/learner. By doing so, (s)he “draws a line between verbal expressions that are acceptable and those that are not” [2] (Py 2000, par. 17). On a conversational level, SENs could comprise just the other-initiated other-repair turn (Schegloff et al. 1977) or include possible following turns, including learner uptake (in Lyster and Ranta 1997 sense of a verbal reaction to recast), until the interlocutors come back to the main topic. From a socio-interactionist perspective, these sequences provide information on how the social reality is co-constructed at different levels. First of all, they allow an analyst to understand the native speaker/expert’s representations of what is linguistically acceptable and what is not. Moreover, by starting a SEN, at the interactional level a person takes the position of the expert about the language and at the same time positions the interlocutor as a learner/novice (Cappellini and Rivens Mompean 2013). Another characteristic is that since SENs are initiated by the expert, they indicate a project of teaching rather than learning, contrary to other types of side sequences (De Pietro et al. 1989; Gülich 1990; Krafft and Dausendschön-Gay 1994) where the learners themselves may solicit feedback from the expert. In this sense, Py (2000) observes that SENs usually do not lead to an explicit uptake and that when there is one, it means that the teaching project has been accepted by the interlocutor, who therefore assumes their role of learner and enacts it. Finally, from a micro-sociological point of view, SENs are face-threatening (Goffman 1967), since they point out limits in the novice/learner speaker’s communicative competence and since they are often produced with the expert interrupting the novice/learner turn (see example 1 below).

At this point, we would like to explain why we selected this type of side sequence as the basis of our comparison. First, many studies showed that these conversational exchanges are present in a wide range of pedagogical or non-pedagogical settings (Py 2000; Azaoui 2014a; Cappellini and Pescheux 2015), which means that this type of exchange is not specific to one context. Second, as we specified above, SENs indicate a will to teach and are therefore linked to teacher role-taking in the conversation (Cappellini and Rivens Mompean 2015). This is interesting for us since a main characteristic differentiating the two pedagogical settings that we take into consideration is a different status of the interlocutors: trainee-teachers and learners on one hand vs. native-speaker and learners on the other. In this paper we study if such a difference in the status results in different conversational behaviours.

3.2 Desktop videoconference for language learning: Multimodality and affordances

The term “multimodality” is ambiguous since it could refer to multiple modalities or to multiple modes. To avoid this ambiguity, we follow Drissi (2011: 134) who refers to modality in order to characterize the properties of the artefact (hardware and software), and to mode to describe the semiotic resources. In this sense, DVC offers different modalities: video of the interlocutor and of oneself, audio, and text, among others. These modalities allow the use of different modes: for instance, in the case of the audio, volume, intonation, speech tempo, pitch; in the case of the video, facial expressions, gestures, proxemics, among others (see Rivens Mompean and Cappellini 2015 for a detailed discussion).

Therefore desktop videoconference can be characterized by the presence of different semiotic modes active at the same time (Develotte et al. 2011). In fact, desktop videoconference enables meaning construction through a variety of modes: verbal – be it oral or written – facial expressions, gestures and/or posture. These modes may be used, possibly unconsciously, by interlocutors to co-construct meaning (Azaoui 2017; Holt et al. 2015). In this sense, the different modes available in desktop videoconference environments are to be conceived of as in interaction, as a whole, which led authors to speak of “orchestration of modes” (Hauck 2010) or “transmodality” (Cappellini 2014a).

In order to describe what interlocutors do with CMC multimodality, many scholars exploited the concept of “affordance” [3] (Lamy and Hampel 2007; Develotte et al. 2011; among others). This concept is defined as what a particular environment offers to an agent, either to accomplish or to constrain their action. An affordance is therefore a mixed entity, combining the perception of the agent – and therefore their action – and the characteristics of the environment (Gibson 1979). For our study, we will take an interest in affordances as the enactment of a mode, or a range of modes, in the co-construction of a SEN through the DVC environment.

Given our contexts and our theoretical framework, our main research questions are: how are SENs co-produced within the DVC environment? Is there some kind of difference in the production of SENs in the two contexts? Our hypothesis is that within the same synchronous CMC environment, i. e. desktop videoconference, the participants’ status in relation to the different pedagogical setting induces different realizations of the SENs and therefore of the communicative roles in interaction. More precisely, our hypothesis is that since in the ISMAEL telecollaboration teacher trainees are learning to teach online, in the ISMAEL sub-corpus we may find a greater number of SENs and a wider variety of modes in their co-construction. On the learners’ side, our hypothesis is that we will find a greater ratio between uptakes/SENs in the ISMAEL sub-corpus. In fact, many studies on e- and teletandem found that linguistic accuracy is usually neglected in favor of efficient communication (Bower and Kawaguchi 2011; Cappellini 2016; Darhower 2008; O’Rourke 2007 among others).

4 Methodology

4.1 Data collection and corpus for analysis

In both pedagogical settings, data were collected using dynamic screen capture and audio-recording software. Data were later transcribed and annotated using the Eudico Linguistic Annotator – ELAN (Sloetjes and Wittenburg 2008), a tool that allows transcription not only of the verbal dimension(s) of interactions, but also of other dimensions related to multimodality.

Within the broader corpora produced (see Cappellini (2014a) and Guichon et al. (2014)), we selected six sessions from six pairs. [4] We used various selection criteria. First of all, we decided to compare interactions with the same number of participants. In fact, though teletandem implies by definition a relationship between two learners, [5] for the ISMAEL project most of the training sessions took place between a trainee teacher and two learners. Consequently, we selected the groups with just one learner and a session from two other groups where one of the Irish students was absent. Another important criterion was to compare the SENs produced for the same language, which led us to take only the French parts of the teletandem interactions. Moreover, in the Teletandem Dalian-Lille sub-corpus, one of the interactions (the LS pair, see Table 1) lasted twice as long as the other ones, which we took into consideration for the quantitative analysis. Finally, we decided to discard the first sessions, when participants were still getting acquainted with the CMC environment and their interlocutors. This led us to a corpus of analysis of five interactions of approximately 30 min each and one interaction of about 1 h, represented in the table below. All names in the table are pseudonyms.

Table 1:

Participants and corpus of analysis.

Identifier Participants Pedagogical setting Session n° Length
VL Victor (trainee teacher)

Liam (learner)
ISMAEL 4 34 min
SA Samia (trainee teacher)

Angela (learner)
ISMAEL 2 35 min
SN Severine (trainee teacher)

Naomie (learner)
ISMAEL 4 34 min
CS Cécilia (native speaker)

SaiSai (learner)
Teletandem 3 28 min
CW Colette (native speaker)

Wan (learner)
Teletandem 3 34 min
SL Sonja (native speaker)

LiNa (learner)
Teletandem 3 1 hour 8min
Table 2:

Identification of SENs.

Initial number of SENs Agreed SENs Discussed SENs Final number of SENs
Total 47 24 23 35
Table 3:

Quantitative analysis.

Pairs Pedagogical setting Length Number of SENs
VL ISMAEL 34 min 8
SA ISMAEL 35 min 8
SN ISMAEL 34 min 10
CS Teletandem 28 min 2
CW Teletandem 34 min 2
LS Teletandem 1 h 8 min 5
Total 4 h 3 min 35

4.2 Data analysis

We adopted a mixed methods approach (Ware and Rivas 2012), combining elements from quantitative and qualitative approaches. On the quantitative side, first of all we identified the number of SENs. To do so, each of the authors did his own analysis, and the results were then compared and discussed to reach an agreement. Table 2 shows the process of selection. About 52 % of cases in the combined lists were shared between each individual researcher’s list, the remaining 48 % of cases having been identified by only one of the researchers. We subsequently discarded 25 % of the SENs identified in the first analysis, mainly because deeper analysis showed evidence of some sort of solicitation by the learner. We reached a final number of 35 occurrences of SENs in our corpus of analysis. [6]

For the qualitative analysis, we also ran two different analyses. However, in this case the aim was less to reach an agreement on two perfectly corresponding analyses but to enrich each other’s perspective on the SENs. On a more practical level, this qualitative analysis was largely based on two principles: the “next-turn” principle of conversation analysis (Hutchby 2001: 68) and the integration of semiotic resources principle of social semiotics (Baldry and Thibault 2006: 17). The first principle means that there is no intrinsic value of a turn in conversation; on the contrary its value is to be analyzed in relation to the preceding and the following turns. The second principle, resource integration, means that even if for the sake of analysis we may isolate one mode from the others at a particular point, in the end we need to consider all the modes together in the co-construction of meaning. For this reason, we suggest the term “transmodal” to analyse meaning-making across different semiotic modes. As for “multimodal”, it will refer in this paper specifically to the existence of a plurality of modes, each taken individually.

5 Analysis and discussion

5.1 Quantitative analysis of SENs

The results of the quantitative analysis are reported in Table 3. As we already mentioned, we identified 35 SENs in our corpus. In the table, pairs are identified by the initial letter of each student’s name.

The first observation is that SENs are much more frequent in the ISMAEL setting than in the teletandem one: 26 vs 9, and this even without weighting the time of the LS pair session. More precisely, in the teletandem sub-corpus we may find a SEN every 15 min of conversation, while in the ISMAEL sub-corpus there is a SEN at an average of every 3 min, five times more frequently. This indicates a first difference between the achievement of SENs in the two pedagogical settings: quite unsurprisingly, trainee teachers in the ISMAEL telecollaboration have a stronger inclination to provide linguistic feedback without it being solicited by the learner, contrary to native speakers in teletandem. This indicates that the status of (trainee) teacher has a greater influence than the status of native speaker on the quantity of feedback. In other words, our first hypothesis is confirmed by the data.

5.2 Qualitative analysis of SENs

In this section, we proceed in three steps. First, we analyse some occurrences of SENs in order to highlight different conversational phenomena and their co-construction through the DVC affordances. Second, we present an analysis of the relation between multimodality and uptake. Third, we describe the characteristics of each sub-corpus, pointing out differences and similarities.

5.2.1 Occurrences of SENs

The first example we analyse is a classical SEN. It comes from the SA interaction in the ISMAEL telecollaboration. The transcription convention is reproduced in Appendix 1. English translations of the examples are presented in Appendix 2.

Example 1

Turn Interlocutor Oral Image
01 Angela j’ai: aussi euh: je suis allée: au euh: chorale/ [0.505] hier/

02 Samia à la chorale/

03 Angela mais: je sais pas [0.364] °à la chorale oui° mais: je ne sais pas si je veux: le faire chaque semaine/


In this example, Angela is talking and during her turn she faces a lexical gap on the word chorale (choir), which she signals transmodally using her facial expressions – she frowns – lengthening the determinant au (at the) and with a rising intonation on the problematic word. During this time, she keeps her eyes on the screen, probably looking at the interlocutor. Angela produces a small intra-turn pause and starts again to continue her utterance. Samia then interrupts Angela’s turn to correct the gender agreement of the determiner before chorale saying à la chorale. Angela then stops her turn and produces an uptake, followed by oui (yes). This is done on a slightly lower volume, while when she takes up her utterance again, she returns to her previous volume. Finally, after Angela’s uptake, Samia nods and smiles.

This example is a quite prototypical occurrence of a SEN, since it corresponds to the initial definition of Py (2000). It allows us to point out different characteristics of the co-construction of SENs through DVC. The first characteristic is the multimodality inherent in oral communication. In our analysis in the previous paragraph, we tried to highlight the role of volume, speech tempo and intonation during the learner’s turns. In fact, the affordances of DVC allow these modes of the oral language to be present and therefore exploited in meaning making. In the example above, speech tempo and intonation are affordances to identify a lexical gap, while the volume is an affordance to signal at which level of the conversation the utterance is – the uptake is in a side sequence. The second characteristic is that beyond the multimodality of speech, meaning is managed in a transmodal way across different modalities and modes. In the example above, we may note that after the learner’s uptake following the corrective intervention, the trainee teacher validates the correct form of the uptake by nodding, i. e., with a kinetic mode. This kind of “evaluation” (as defined by Mehan 1979) is present only in the ISMAEL interactions (see below). The third characteristic is that even though eye contact is impossible in DVC (Develotte et al. 2010), the interlocutors may observe each other, inspecting each other’s reactions. In the example above, we conjecture that this is the case for the learner during her first turn, [7] when she keeps a watch on the trainee teacher while trying to express herself. A fourth and final characteristic of this example is the fact that the face work it contains is relatively minimal and inobtrusive (Goffman 1967), even if there is an interruption and a correction. In fact, the only elements that could be related to such face work are the oui at the end of the learner’s uptake, which could be an acknowledgment of the interlocutor’s role and therefore her right to correct, and the subsequent smile of Samia while she nods.

The second example we analyse is taken from the LS conversation in the teletandem setting. A translation of the SEN is in Appendix 2.

Example 2

Turn Interlocutor Oral Image
01 LiNa il était un poète et aussi un policien [1.868] et aussi

02 Sonja et aussi un/

03 LiNa policien [1.315] po-li-cien

04 Sonja °policien° un POliticien

05 LiNa ah oui oui désolée politicien

06 Sonja d’accord non mais c’est pas grave


In this example, LiNa is describing a myth related to a Chinese national celebration day. During her utterance, she produces a word that does not exist in French, policien. After her utterance, she produces a transition-relevance place (Sacks et al. 1974), leaving space for the interlocutor to speak. After a pause of almost two seconds, she takes the turn again, but at the same time Sonja overlaps with a turn where she echoes what LiNa just said with a rising intonation on the determiner un in order to ask for a repetition of the misperceived or misunderstood word. During her turn, Sonja also tilts her head on the side. LiNa then repeats the word policien, the first time with an unmarked speech tempo, the second time, when there are still no clues of understanding on Sonja’s face, detaching the syllables, producing a sort of reversed foreigner talk. Sonja repeats on a lower volume policien, as if she was processing the information, and then, on a higher volume, she says un politicien (a politician), quite oddly stressing the first syllable, which is not the problematic one. Almost instantaneously (after 0.320 seconds), LiNa acknowledges saying ah oui oui désolée (oh yes yes sorry), tilting her head on the side, smiling and closing her eyes, and within the same turn she uptakes politicien. During the uptake, Sonja starts to smile, says d’accord (ok) and while shaking her head to say “no”, she says non mais c’est pas grave (no but it doesn’t matter) and she detaches her eyes from her screen.

The first observation for this SEN is that it is a non-prototypical one (Cappellini and Pescheux 2015), since the learner’s mistake leads to a phase of non-understanding between the interlocutors. As such, this example of SEN presents some characteristics of other types of conversational side sequences. Then, we may note once again the importance of the multimodality of speech as it can be accomplished through the affordances of the DVC environment, be it through speech tempo (LiNa’s syllable-by-syllable repetition of policien) or intonation (Sonja’s demand for repetition). Transmodal meaning making is also visible, through facial expressions (LiNa’s expression after the correction) and head movements (the final turn of Sonja). A peculiar characteristic of this example is a way in which transmodality was used to make meaning. In fact, during the first phase of the side sequence, during turn 03, LiNa understands that Sonja does not understand because of the absence of any ostension by Sonja, be it kinetic or verbal. In other words, since Sonja does not “use” any mode, and since she does not do anything at a moment – the transition-relevance place – where she is supposed to, this “nothing” becomes significant. This corresponds to what one of the authors called a “zero ostension” (Cappellini 2014b), that is, an absence of any ostension where there should be one. Such an absence is therefore relevant for the purpose of mutual understanding. In our opinion, in CMC, this is a very peculiar conversational dynamic enabled by the affordances of certain types of DVC, [8] which is very important in pedagogical communication. A final consideration regards the face work which is present in this example. Contrary to example 1, here face work is salient at the end of the sequence, when mutual understanding and the correction have been accomplished. This face work is verbalized with a désolée (sorry) by the learner, which probably refers to the fact that not only was she not able to make herself understood, but she also insisted on a wrong form of the word. Moreover, the ah oui oui with the head movements and facial expressions could be paraphrased as “of course, how silly of me”, which is a way for the learner to take full responsibility for the error. On the other hand, the native speaker diminishes the importance of the episode, refusing the importance that has been given to it – non – and then making it explicit that it does not matter – c’est pas grave literally: “it’s no big deal”. This analysis concurs with Darhower’s finding (2007) about the possible conversational dynamics stemming from the face threatening nature of error correction, even if in our case this is not related to a weakening of the social bond between interlocutors. In fact, the learner accepts the error correction without questioning the fact that the interlocutor could do it.

5.2.2 Multimodality and uptake

As we highlighted in the theoretical framework, a key (though not defining) element in a SEN is the presence of the learner’s uptake. Even if such a presence cannot be considered as evidence for learning, some studies have shown how it might be of primary importance for the learning process (De Pietro et al. 1989; Krafft and Dausendschon-Gay 1994; Matthey 1996). In our view, it would be interesting to know if certain types of multimodality in the first turn of a SEN lead to learner uptake. In this section, we therefore discuss the relations between multimodality in the teacher trainee or native speaker turn and learner’s uptake. To do so, for each occurrence of SEN, we identified the different modes that could be used markedly during the first turn of SENs. By “used markedly”, we mean that there is a difference between the use of these modes outside the side sequence and during the side sequence. For instance, it is obvious that each utterance can be characterized in terms of volume or speech tempo. However, if these modes are marked during the SEN (higher or lower volume, quicker or slower speech tempo) in respect to other parts of the interaction, then we considered that they are “used” in the SEN. These modes are: volume, intonation, speech tempo, pitch, facial expressions, head movements, gesture (i. e. arm or hand movements), proximity to the screen, and written cues. For each of these modes in SEN, we looked at how many times they are present and within these occurrences, how many times they are in a SEN with an uptake.

This led us to note that out of 35 SENs, in 16 there is an uptake. Among those 16 SENs, marked use of two modes are never present in our corpus of SENs: gestures [9] and pitch. The most prevalent modes are: head movements (10 occurrences) and facial expressions and screen proximity (7 occurrences each). In three cases the marked use of a mode in first turn always corresponds to the presence of an uptake: volume (1 occurrence), speech tempo (2 occurrences) and the written mode in the chat window (4 occurrences). However, it is important to stress that this cannot lead to the conclusion that these modes trigger the uptake. Only experimental research could confirm or disconfirm such a hypothesis. Moreover, one could still ask the question of what determines an uptake in the cases where these three modes are not present. As for the other modes, they are present in the 19 SENs without uptake too. In conclusion, there is no clear link between one or many modes in the trainee teacher or native speaker initiative turn and the presence of an uptake in SENs.

Another way to analyze the relation between uptake and multimodality is to take into account the multimodal density (Develotte et al. 2011), that is, how many modes are present in SENs with an uptake. In this case too we observe a great variability, since SENs presenting an uptake could be characterized by the use of one to five modes. This leads us to the conclusion that a great multimodal density is not necessary for uptake to happen.

5.2.3 Comparison: Differences and similarities between the settings

To begin our comparative perspective, we would like first of all to take into account the rate of the uptakes in the two pedagogical settings. Table 4 presents the results of this analysis.

The table shows that given a SEN, it is more likely on average that there will be an uptake in the teletandem setting. These results counter one of our initial hypotheses, since the ratio of uptakes to SENs is not higher in the ISMAEL telecollaboration. In fact, we expected a relation between a stronger inclination to take or to enact the role of the teacher (i. e. in our study to initiate SENs) by the French students in the ISMAEL telecollaboration and a stronger inclination to take the role of the learner (i. e. to operate uptakes) by the Irish students. On the contrary, we observed a stronger inclination to take the role of the learner in the enactment of SENs in the teletandem setting. Several hypotheses could be proposed to understand these findings.

We may first relate them with the notion of educational culture (Beacco et al. 2005). Most of the Chinese interlocutors in the teletandem interactions may position themselves as learners, whose role – among others – would be to acknowledge their teachers’ expertise by uptaking the correction. In other words, the Chinese speakers are enacting the role of learners, which implies taking into account the correction of their expert co-speaker, and reformulating their utterance. A second hypothesis we could formulate is that the quantity of SENs and their frequency (i. e. 1 every 3 min) in the ISMAEL context may deter the learners from uptaking the correction. This is linked to a sort of double bind (Bateson et al. 1962; Watzlawick et al. 1967) experienced by the trainee teachers: on the one hand, they are supposed to or want to let their learners speak as much as possible; on the other hand, they are expected to interrupt them to correct their utterances. This paradoxical situation may be responsible for a strategy that would be to correct learners but not expect them to uptake. From the learner perspective, the bigger ratio of uptakes in the setting where there is a smaller number of SENs (i. e. teletandem) could lead to an interpretation in terms of face work. In fact, if the initiation of a SEN by the teacher is perceived as normal, or even expected, and if these SENs are recurrent, as they are in the ISMAEL setting, they may lose their face-threatening nature. On the other hand, if SENs are rare, they could keep their face-threatening load and result in an uptake intended to restore the learner’s face, possibly jointly with classical types of multimodal face work such as smiles and/or apologies. However, one could also argue that the presence of a large number of SENs is indeed a face-threatening act in itself, possibly demotivating the learner and therefore producing the counter-effect of inhibiting uptakes. To conclude on this topic, we can see that even if SENs are related to face work, the link may vary from one participant to the other. With regard to this question, adding stimulated recall interviews to the data collection could provide better insights.

More globally, we think that this unexpected result is linked to a sort of misalignment, already noted by Py (2000). On the one hand, a SEN is the actualization of a project to teach by the (trainee) teacher. On the other hand, the uptake is an evidence for a project to learn by the learner. Given this, we could argue that in the ISMAEL setting, there is a misalignment resulting in a teaching project of the trainee teachers and the absence of uptake. As for the teletandem setting, we wonder if there is a misalignment too, less visible, consisting of an expectation of correction by the learners, unmet because of the paucity of SENs. What we perceive as a misalignment may be connected to the issue of role attribution. Considering uptakes from this perspective, we could indeed posit that the non-native speakers either admit (when they produce an uptake) or reject (when they do not) the role of learner/novice the expert assigns them through SENs.

Some of the other differences have already been mentioned in the previous sections. One of them is the presence of slightly more face work during the SENs in the teletandem setting, and especially in the LS pair. Another interactional difference is the presence of the trainee teachers’ evaluation (Mehan 1979) in the ISMAEL setting: they regularly validate their learner’s utterance after correcting its form, saying for instance très bien (very good) or c’est parfait (that’s perfect). Despite its recurrence, these evaluations may be ambiguous for the learner since they can refer either to the uptake or to their performance within the task framework (Azaoui 2014b). Beyond the differences between the two pedagogical settings, we also found that some interactional dynamics are present in both. The first one is related to transmodal meaning-making in SENs. Participants perceive and use the affordances related to the audio and video modalities and, to a lesser extent, written texts. Another common interactional dynamic is that in both groups SENs may be accompanied by a signal from the expert, just before or just after the correction, that mutual understanding has been reached.

Finally, in order not to conceive the two groups as homogeneous instances, this comparison needs to be put into perspective in the light of the internal variations within each setting. For instance, looking at Table 4, the absence of uptake for the Teletandem Dalian-Lille CS pair becomes apparent. This counters the general trend of a higher ratio of uptakes/SENs we noted in this setting. Another example of variation within the same setting is to be found for the SN pair of the ISMAEL telecollaboration. During the SENs, the trainee teacher usually leaves no transition relevant place for an uptake to take place, either because she closes the side sequence with an agreement (for instance: d’accord – ok), or because she directly continues the progress of the task.

Table 4:

Uptake analysis.

Teletandem Dalian-Lille Uptakes/SENs ratio ISMAEL Uptakes/SENs ratio
LS 4/5 VL 4/8
CS 0/2 SA 3/8
CW 2/2 SN 3/10
Total 6/9

Total 10/26


6 Conclusions

Our initial question was to identify how SENs are co-produced within a DVC environment and if this co-production varies in relation to the pedagogical setting. Regarding this question, we formulated three hypotheses, namely, that the ISMAEL interactions would generate:

  1. a greater number of SENs;

  2. SENs with a greater variety of modes, i. e. a greater multimodal density; and

  3. a greater ratio of uptakes to SENs.

Our results confirmed the first hypothesis, since we observed more SENs in the ISMAEL setting. By contrast, hypotheses 2 and 3 were not confirmed. With regard to multimodal density, we found that in both environments the SENs are co-constructed transmodally, with even a slightly denser multimodality in teletandem. In particular, we showed how the affordances of the DVC environment provide support for a wide range of modes to be used to reach mutual understanding and to co-construct meaning locally, be it when the expert produces a correction, when the learner faces a difficulty in their production, or when there is a lack of understanding. The third hypothesis, postulating a greater ratio between uptakes/SENs in the ISMAEL sub-corpus, it is not corroborated by the results. Unexpectedly, the findings showed that the greater uptake ratio was to be found in the teletandem environment. We suggested some possible factors that could account for this finding.

In order to deepen our view on SENs in DVC environments, our study could have benefitted from other data sources, such as analysis of the participants’ comments or stimulated recall. This would have brought another lens on the emic perspective. In particular, using a video-stimulated recall (Tochon 2008), we might have worked with the interlocutors to enhance our understanding of the presence/absence of uptakes, for instance determining whether the learners noticed particular corrections. Another source of relevant data could come from the use of eye-tracking (O’Rourke et al. 2015; Stickler et al. 2016), which would enable us to know whether the image of the expert is the target of learners’ focal attention during this and other types of side sequences, and, if yes, what are the most relevant elements (facial expressions, gesture, chat …).

Besides the future developments possible for the study of SENs, we believe that the present study has brought some insights into how the modes and modalities of the DVC environment may become affordances for language learning/teaching. In particular, this article showed how the pedagogical setting mediates the co-construction of SENs. More particularly, we found that the two pedagogical settings under scrutiny should be conceived less in terms of opposition and rather as variations within a continuum. Following up on this work, research on the criteria for realizing SENs may help shed more light on this issue. Indeed, as Darhower (2007) noted, the criteria for correcting one linguistic error rather than another remain unclear, all the more so since the repetition of similar errors does not necessarily cause the realisation of SENs. An experimental approach might help us to better understand this issue.


Appendix 1. Transcription convention

: Lengthening
/ Rising intonation
[0.000] Intra-turn pause/silence, calculated in seconds and milliseconds
00:03:04.665–00:03:11.834 Time of turn start

Time of turn end
°text° Lower volume
CAPITAL LETTER Higher volume
po-li-cien Slower speech tempo and detachment of syllables

Appendix 2. Translations of the examples

Turn Interlocutor Oral Image
01 Angela I also: hum: I also went hum: *to the choir/ [0.505] yesterday/

02 Samia to the choir/

03 Angela but: I do not know [0.364] °to the choir yes° but: I do not know if I want to do it every week/

Turn Interlocutor Oral Image
01 LiNa he was a poet and also a polician [1.868] and also

02 Sonja and also a/

03 LiNa polician [1.315] po-li-cian

04 Sonja °polician° a POlitician

05 LiNa oh yes yes sorry politician

06 Sonja ok no but never mind


