Combatting Linguistic

: The article o ﬀ ers an account of two projects conducted at Örebro University and Umeå University, Sweden, which are aimed at raising awareness of issues related to linguistic stereotyping using matched - guise - inspired methods ( Raising Awareness through Virtual Experiencing [ RAVE ] funded by the Swedish Research Council and a Cross - Cultural Perspective on Raising of Awareness through Virtual Experiencing ( C - RAVE ) funded by the Marcus and Amalia Wallenberg foundation ) . We provide an overview of the methods used in university courses, with the aim to raise awareness of how stereotyping can a ﬀ ect our perception. We also give a more detailed account of the ﬁ ndings from two case activities conducted in Sweden and the Seychelles. Here the response patterns indicate that the perceived gender of a voice as well as the accent ( native vs non - native ) do a ﬀ ect respondents ’ judgements of performance. We were also able to show that discussions and re ﬂ ections inspired by these response patterns led to raised self - awareness of matters related to language and stereotyping. The article then moves on to a critical query of our methods and also contextualizes our work in a broader discussion on methods and initiatives for how educational institutions actively can contribute to combatting ( language ) prejudice and discrimination in various ways.


Introduction
Stereotyping, i.e. attributing traits, characteristics and/or behaviours to a person on the virtue of shared and overgeneralized beliefs regarding the social groups she/he belongs to (cf.Puddifoot 2019: 71;Locksley et al. 1982: 270), seems to be a pervasive human tendency that stems from a basic cognitive need to categorize, simplify and process the complex world that surrounds us (Rakić et al. 2011: 17).Unfortunately, this tendency also leads to unmotivated in-group/out-group definitions that are a precondition for social bias, prejudice, discrimination and structural injustices.
One of the more disturbing aspects of stereotyping is that it seems to be partly implicit, i.e. unconscious and/or automatic.This in turn leads to unintentional biased judgements and/or behaviours towards individuals who are deemed to belong to a particular social categoryjudgements and actions that we may not even be aware of (see for example Banaji and Greenwald 2013;Sleek 2018;FitzGerald et al. 2019).Consequently, as FitzGerald et al. (2019: 2) point out, implicit stereotyping and bias are likely to be important, but largely hidden, contributory factors leading to systematic discrimination in areas such as education, law enforcement, employment, and health care.There is thus a general societal interest in finding ways to combat such tendencies among professionals who work in these fields, in particular.A major pedagogic challenge, however, is the fact that while it is easy to point to studies and statistics that demonstrate negative consequences of stereotyping, bias and prejudice, the majority of us are rather reluctant to entertain the possibility that we ourselves may be part of these systematic structures and "guilty" of racial or gender bias, for example.This is where the so-called unconscious bias training of the kind described in this article has a role to play (see also Project Implicit 2011).
As implied above, raising self-awareness of the subtle mechanisms and effects of implicit bias and stereotyping among target groups is, arguably, a critical first step in combatting such tendencies.Here we maintain that overtly demonstrating the consequences of implicit stereotyping, i.e. showing how it actually can affect our own judgements and perceptions, constitutes a powerful self-awareness-raising tool.This is also the main aim of the projects that are the subjects of this article.
One ambition of the current article is to share the design of a selection of awareness-raising activities developed under two projects: Raising Awareness through Virtual Experiencing (RAVE) funded by the Swedish Research Council, and a Cross-Cultural Perspective on Raising of Awareness through Virtual Experiencing (C-RAVE) funded by Knut and Alice Wallenberg Foundation, Sweden.Both projects focus on the importance of language aspects as primary cues for stereotyping.Central to the projects has been the design of various case activities aimed at raising awareness about precise stereotyping issues relevant to specific professional/educational contexts, but also with the common aim of raising general awareness of the effects of implicit stereotyping.A second aim of this study is to give an account of the outcomes of some of our activities.We also evaluate whether case activities conducted under the project have resulted in raised self-awareness as regards implicit bias and stereotyping or not, and more precisely what aspects of raised self-awareness respondents highlight in their accounts.
Raising awareness of two aspects of language and stereotyping is of particular interest to us.First, we explore so-called reversed linguistic stereotyping, i.e. how "attributions of a speaker's group membership trigger distorted evaluations of that person's speech" (Kang and Rubin 2009: 441).In other words, stereotyping seems to affect language perception (see Hay et al. 2006;Mulac et al. 2013, for example).More specifically, we want to make visible how language schemata and stereotypes (i.e.implicit and explicit preconceived ideas of how a particular social category such as women, for example, performs linguistically) may influence hearer perception and judgement of a language event (Mulac et al. 2013: 24).Collins et al. (2012: 379) describe this phenomenon as "a lens that directs and distorts cognition," and it appears that stereotyping leads us to notice behaviours that confirm our preconceived expectations and ignore behaviours that do not.
Second, we want to explore so-called linguistic stereotyping, i.e. the tendency for people to categorize and judge others on the merits of their language output (Lippi-Green 2012).For example, Hansen et al. (2017) and Rakić et al. (2011) have shown that accent is a far more salient cue for ethnic categorization than looks, and an abundance of other studies have shown that individuals with non-native or non-standard accents are judged differently than standard accent speakers (see Fuertes et al. 2012 for a comprehensive overview).
Both of these phenomena (linguistic stereotyping and reversed linguistic stereotyping), i.e. how language cues lead us to judge individuals, and how stereotypic linguistic preconceptions seem to affect our perceptions of language events, are explored in the case designs developed under our projects.

Background
In the cases trialled so far, we have primarily focused on two types of manipulations.First, voice pitch and timbre have been manipulated using Praat in order to simulate male and female sounding voices, thereby allowing us to design awareness-raising activities, which explore the phenomenon of reversed linguistic stereotyping (Kang and Rubin 2009), and highlight how the perceived gender of the speaker may affect our perception of a speech event.Second, we have constructed cases where aspects of accent have been manipulated using cut-and-paste techniques (see Labov et al. 2011) in the design of awareness-raising activities aimed at illustrating linguistic stereotyping, i.e. showing how different accents can lead to differences in judgements of speakers.Although we acknowledge the vast scope of the topic of language and stereotyping, given our focus, we limit our background overview to studies of language stereotyping related to gender and accent, and how language stereotyping can affect the judgement of speakers as well as the perception of language output.

Language stereotyping and gender
Language, we would argue, is a key area where popular gender stereotypes and expectations flourish; and, arguably, the research community has inadvertently contributed to the establishment of some of these.The study of language and gender has a long history; after the publication of Robin Lakoff's seminal work Language and Women's Place in 1975, a myriad of studies was dedicated to elucidating the differences in how men and women use language.Decades of research has since resulted in the description of gendered styles of communication which Leaper and Ayres (2007) label as affiliative (typically female) and assertive (typically male).More specifically, Coates (2004) and many others (see for example Nolasco and Arthur 1987;Cheshire and Trudgill 1998) describe women as tending to be more cooperative, facilitative, appeasing, indirect, emotional and person-oriented in conversations than men who in turn are claimed to show tendencies towards being more competitive, combative, direct and task oriented.Arguably, such studies have contributed to describing and defining gendered communicative styles and have, according to Holmes (2006: 6), thereby strongly contributed to the normative gender identity expectations of white middle-class men and women in Western societies.
It is not that such descriptions are entirely unfounded; more than 40 years of research into gender patterns in language usage have given some support for gendered structural tendencies in language usage according to the list of features described above.But lists of what constitutes male and female discursive styles are at best "a crude simplification of a complex reality" (Ladegaard 2011: 5).For example, metastudies have shown that differences, more often than not, are very small or nonexistent and that there is huge in-group variation among women as well as men (Leaper and Ayres 2007;Kaiser et al. 2009).Further, the "difference approach" inevitably leads to the detecting differences, rather than highlighting similarities, which indeed may be far greater (Hyde 2005).It is also difficult to establish whether differential tendencies, to the extent they are found, really are a result of gender influences.Various intersecting factors, such as the influence of other identity variables, context and power, may be far more important causal influences in language performance and perception, and these may just happen to correlate and/or interact with gender (see Cameron 2007;Ladegaard 2011: 5).A final point of serious critique is that language and gender research over the past 40-50 years has had a distinctly Western gaze (cf.Holmes 2006).
More importantly, from our perspective, is the fact that differential gender tendencies that emerge from the analysis of data from large groups of respondents, such as claims that women's language expresses "affiliation and interpersonal warmth" (see Park et al. 2016: 19), are not necessarily relevant when referring to the linguistic behaviour of a particular individual in a given context.Applying such structural models as a lens of perception on individual behaviour in a given context (consciously or inadvertently) would, we argue, fall under the category of stereotyping, i.e. "attributing traits, characteristics and/or behaviours to a person on the virtue of shared and overgeneralized beliefs regarding the social groups she/he belongs to." In reality, identity is complex and accommodates multifaceted, intersecting variables such as professional identity, gender, ethnicity, class, age and sexuality, just to mention a few.It is the specific context that influences which of these aspects of one's identity become more or less salient.Further, many individuals do not conform to the expected gender norms.A performative view of gender (see Butler 1988) recognizes the agency of the language users to identify and conform to, or to challenge, hegemonic and stereotypical models, as has also been demonstrated in numerous studies since the 1990s (e.g.Hall 1995;Cameron 1997;Bucholtz 1999).In short, it is very difficult to predict or assess a particular individual's traits and behaviour at a particular time based on simplistic identity categorization; as Sunderland and Litosseliti (2002: 1-2) put it: it is more accurate to see gender and languaging as a "continuous construction of a range of masculine and feminine [and other] identities within and across individuals of the same biological sex."This is also something that has been increasingly recognized by the research community, and Swann (2002: 43) maintains that "there has been a general shift that might best be characterised as running from relative fixity to relative fluidity in terms of how 'language' and 'gender' are conceived."In short, it is all rather complex and ever changing.
In contrast, and as has been evidenced from various contexts, stereotyping is all about simplification.For many, salient categories such as "men" and "women" still seem to be largely binary, as are the language expectations associated with these constructs.Disturbingly, such expectations, in turn, seem to affect perception, judgement and reactions, as a number of studies have shown.
For example, in a business context, Ladegaard's study (2011) highlighted how both male and female leaders tended to prefer normatively feminine communicative styles, e.g.indirect, collaborative and relationship-oriented communication.However, the way these similar strategies were received and decoded by the employees differed.Female leaders using feminine styles were doubted and contested more frequently than male leaders using the same.Similarly, Baxter (2017) was able to show how female leaders' authority was resisted on gendered, linguistic grounds.In her study, the female leader's assertive and direct "masculine" style led to discomfort, arguably because it did not meet gendered expectations (2017: 142).
In a legal context, Hildebrand-Edgar and Ehrlich (2017) found that the general tendency for powerful and assertive language strategies to be viewed as more credible than powerless and deferential styles in the courtroom did not apply in the context of a rape trial.Instead, the gendered stereotypes associated with rape victims meant that powerful speech styles undermined the credibility of the victim.
In educational contexts, an area of particular relevance to this article, it has been shown that stereotypes influence students' and teachers' perceptions alike.Abel and Meltzer (2007), for example, demonstrated that students evaluated a text more positively when they thought that the author was male.This type of differential evaluation has been seen in a number of other studies (Goodwin and Stevens 1993;Centra and Gaubatz 2000).Further, both male and female teachers are more likely to receive better evaluations if they fit gender stereotypes than if they deviate from them (e.g.Basow 1995;Deutschmann et al. 2016).It is also well-documented that schoolteachers, regardless of gender, tend to give more attention to male than to female students (Sunderland 2000;Chen and Rao 2011), even when they think that they are being more attentive to the female students (Sunderland 2000: 160).However, as Sunderland herself points out (2000: 165), her findings are based on the "average" girl or boy, and there is a huge individual variation.Hidden behind averages are in fact a small subset of individuals, i.e. boisterous boys, that boost the figures for the group as a whole.A further problematized view that Sunderland (2004) points out is that even if boys get more attention, girls may well get attention of higher quality, partly due to prejudiced expectations.

Language stereotyping and accent
Evaluative beliefs and attitudes surrounding accents are prevalent in society/ies.Several studies conducted in English-speaking contexts, for example, show that standard accents such as British received pronunciation (RP) and General American tend to be evaluated more favourably than non-standard accents (Coupland and Bishop 2007;Lippi-Green 2012;Lindvall-Östling et al. 2020b).According to Milroy (2007: 133), language attitudes are "dominated by powerful ideological positions that are largely based on the supposed existence of the standard form."Such "ideological positions" often carry with them inadvertent linguistic stereotyping favouring the native standard accented speaker and disfavouring the non-standard speaker (Monfared and Khatib 2018: 59).This in turn leads to practical negative or positive consequences that go beyond mere language ideology.
For example, Torstensson (2010) highlights how some immigrant groups in Sweden are at a legal disadvantage in court since interpreters most frequently speak foreign-accented Swedish, the (negative) perception and evaluation of which affect court outcomes.Similarly, in Britain, Dixon et al.'s (2002) matched-guise experiment showed that a suspect's plea of innocence was deemed less trustworthy when delivered in a Birmingham accent as opposed to a standard RP accent.
Effects of non-standard vs standard accent have also been a topic of investigation in educational contexts.Boyd (2003), for example, demonstrated that non-native-speaking teachers in Sweden were ranked low for teacher suitability by a panel of headmasters and pupils on the basis of their accents, although they were highly competent on other linguistic variables and had good track records with many years of teaching experience.Similar results, i.e. that a non-native accent undermines teachers' credibility, have been found in other cultural contexts (see Bresnahan et al. 2002;Kavas and Kavas 2008;Alberts et al. 2013).

Project context
In summary, there is plenty of evidence that linguistic stereotyping has a real impact in areas such as education, the judiciary and business.One of the great challenges in combating stereotyping is raising selfawareness of how such phenomena influence us personally.This is a particularly urgent challenge in the domain of people-oriented professions where metalinguistic knowledge ideally should be translated into objective professional language practice.It is thus in this unconscious bias training context that we position ourselves, i.e. finding methods for raising self-awareness about issues related to implicit language stereotyping.
Several other projects deal with issues related to language and stereotyping.Many of these address inlanguage bias with the ambition to encourage more gender-inclusive language (see for example initiatives under the EUthe European Institute for Gender Equality, the United Nations -Gender-inclusive language, and the Gender Fair Language project funded by the Swedish Research Council).Other projects such as Project Implicit (2011), the Prejudice Habit-breaking Intervention model (Devine et al. 2012) and the Breaking the Prejudice Habit project (Kite 2014) are more general in nature and deal with various forms of bias and prejudice, including racial, sexual, ethnic and religious prejudice.Since much research seems to indicate that prejudice is largely the result of implicit biases, i.e. biases that occur unintentionally despite conscious non-prejudiced attitudes (see for example Gaertner and Dovidio 1986;Devine and Monteith 1993;Bargh 1999;Blair 2002), many of these projects fall under the category unconscious bias training and address the issue of raising self-awareness of such processes.

Methodological background
Our methods are inspired by (but do not exactly copy) the matched-guise design (see Lambert et al. 1960;Bradac et al. 2001;Kircher 2016), and later developments of this method whereby computer simulations and manipulations have opened up new possibilities (see Campbell-Kibler 2008;Connor 2008;Lindvall-Östling et al. 2019).The matched-guise technique involves respondents evaluating the personal qualities of what they believe to be different speakers on the basis of what in fact is different recordings of the same speaker using different linguistic varieties.The method was initially developed to measure attitudes towards a specific language, dialect, or accent.The matched-guise test is still widely used today to test how judgement is affected by stereotyping in various disciplines ranging from sociolinguistics and social psychology, to business, law and medicine (Lawson and Sachdev 2000;Dixon et al. 2002;Carson et al. 2004;Buchstaller 2006).
Several points of critique have been raised about the traditional matched-guise technique.First, the artificiality and experimental nature of the design has been criticized (see Lee 1971;Fasold 1984;Laur 1994).One argument is that asking participants to evaluate the juxtaposed versions of the same monologic spoken material, often a read passage, in laboratory conditions, is too far removed from the real-life language situations to be of generalizable relevance.Further, the design excludes more authentic-like language production since identical versions of longer strings of authentic spontaneous speech produced by supposedly different speakers would be implausible, and the covert factor, on which the design relies, would thus be lost.This fact also means that the language stimuli in the classic matched-guise designs have been contextless, a critique raised by several authors (see Bierbach 1988;Bradac et al. 2001, for example).
One way around some of the issues raised above has been to split respondents into two randomized groups, where each group gets to hear only one of the two samples.This modification opens up the possibility for the stimuli recordings to be much more complex, including dialogue which allows contextual features such as speaker roles and purpose to be incorporated.This type of procedure, which we have also adopted in many of our method designs, however, requires larger groups of respondents for statistical reliability (Stefanowitsch 2005).
Another point of critique of the matched-guise technique has been that even when working with the same actor, it is impossible to entirely control for unwanted background variables such as speed, intonation or pitch.Producing two separate recordings where the only variable that differs is accent is extremely challenging/impossible (Tsalikis et al. 1991).When multiple actors are used, as traditionally has been the case when gender is being investigated, this challenge becomes even more problematic.
Recent developments in technology have afforded entirely new possibilities in the traditional matchedguise design.Pioneers in this field, Graff et al. (1986), manipulated recordings from Afro-American speakers in Philadelphia, using digital cut-and-paste techniques, to create versions containing regional vowel markers specific to White speech in the area.Similarly, Fridland et al. (2004) were able to use acoustically manipulated vowel variants in monosyllabic tokens to create synthetically manipulated guises of Southern Memphis accents (see also Campbell-Kibler 2008;Labov et al. 2011, for more examples of cut-and-paste manipulations).Further, electronic manipulation to match the speech rates and intensity levels of recordings has been used since the early 1990s (see Podbresky et al. 1990).More recently, pitch has also been successfully manipulated to create identical recordings where only pitch differs (see for example Levon's (2007Levon's ( , 2014) ) work on perception of "gayness" in speech).In our methods, we have taken inspiration from all of the aforementioned digital techniques, i.e. cut-and-paste techniques for manipulation of individual vowel sounds, digital manipulations to match speech rates (see Lindvall-Östling et al. 2020b) and pitch manipulations (see Deutschmann et al. 2016;Dennhag et al. 2019;Deutschmann and Steinvall 2020;Lindvall-Östling et al. 2020a).
A final, but very serious, point of critique of the matched-guise method is that the experimental nature of the design, where respondents are forced to take a stand on various statements on a Likert rating scale, compels respondents to look for contrast where they might not normally note it.Thereby the technique risks actually evoking stereotypes that would not exist in a normal unconditioned situation (Luhman 1990), something which represents both an ethical and a methodological dilemma.Further, since matched-guise set-ups tend to be binary in nature (e.g.regional vs standard dialect/male vs female/gay vs straight), and at best only explore one situational context, the method does not capture the potential complexity of language attitudes.As pointed out, the stimulus in matched-guise tests rarely constitute authentic speech, and as pointed out by Ryan, Giles and Hewstone (1988), respondents' attitudes may well manifest very differently when they are actually participating in a speech exchange, instead of just observing it.We will return to some of these issues in our discussion.

Methodological overview
Our methods are not primarily designed to provide rigorous data for sociolinguistic, social psychological or other empirical research focused on revealing stereotyping effects.Instead, our design primarily serves a pedagogic purpose aimed at providing an efficient overview of a specific group's stereotypes related to a specific language issue.The main strength of this method is that data entirely generated by the group itself can be used educationally as a starting point for self-awareness-raising activities and explorative group discussions.
The overall design in all our awareness-raising activities follows the same basic procedure (see Figure 1 below for an overview).First, we introduce the task at hand to the group in question.Since it is important to hide the real intentions of the exercise at this stage, the activity is "camouflaged" as an exercise relevant to the particular course or context that the students/respondents are partaking in.Two guiding factors thereby dictate the design of the cases: first, that they can be presented as believable exercises in the context they are being used, and second, that there is minimal risk that students/respondents discover the real aims of the exercise.The true purpose of the case activities is revealed as soon as responses have been collected.At this stage, the participants are also given an opportunity to decide whether their results are to be included in the research database (see below).The design has been approved by the Swedish Ethics Review Authority.¹After the introduction, participants are given a link to a SurveyMonkey questionnaire aimed at providing us with relevant metadata of the respondents (age, sex, national identity, for example); the creation of an anonymized ID, so that we can follow-up information of a particular respondent in the post-survey without knowing their identity; baseline stereotype measures which can be followed up in the post-survey; as well as the recording and response surveys relevant to the exercise.This survey also contains a randomizer tool that splits the group into two subgroups, where one group are directed to version A of the recording and the other group to version B.
After the response phase, we analyse and summarize the data before gathering the group for the debriefing and discussion seminar.During this seminar, we introduce the project context, give general information about stereotyping effects, reveal the true design of the exercise and, finally, present the specific results for the group.After this, the group is split into smaller discussion groups to discuss the Combatting Linguistic Stereotyping and Prejudice by Evoking Stereotypes  657 results and relevant questions in a more intimate setting.Finally, the group is reassembled and each subgroup is given an opportunity to present a summary of their discussions and reflections.
In the post-survey, we ask respondents to give qualitative input and reflections on how they have perceived the activity.Further, we attempt to measure self-estimation of stereotyping awareness-raising effects quantitatively by comparing answers to the question "To what extent do you think that you are influenced by stereotypical preconceptions (conscious or unconscious) in your expectations and judgements of others?" (0-100 scale, where 0 = not at all and 100 = very much so) provided before and after the exercise.Note that this question is also posed in the initial survey, prior to the experiment.²Finally, we ask students to reflect on ethical or other design-related issues and also ask them whether they are willing to include their responses in our research database.

Findings
Since the start of the projects, we have conducted awareness-raising activities according to the overall scheme described above using eight different case designs with over 2,300 respondents, in more than 60 student groups from courses in language, teacher training, psychology, sociology and law.We also conducted C-/RAVE activities with several groups of professionally active language teachers.We primarily worked in two cultural contexts: Sweden and the Seychelles.The Seychelles was chosen as a primary context for C-RAVE activities since the nation has a rather unique situation in terms of structural biases and prejudices: it has been described as a matrifocal society where women and girls have many advantages over men and boys (for a more thorough discussion of gender structures in the Seychelles, see Geisler and Pardiwalla 2010 and Deutschmann and Steinvall 2020), and it is generally acknowledged as a multiracial Creole society with minimal racial and ethnical antagonism (with the exception of recent developments as regards attitudes towards Indians, see Lindvall-Östling et al. 2020b).Further, its limited size and a very centrally controlled education system makes the Seychelles a perfect "laboratory" for educational research.
The subsequent sections provide brief accounts of two case designs conducted within the projects.We limited our choice here to examples of cases that specifically approach language-related issues.Note however that descriptions and results from several other cases conducted that approach language matters and stereotyping issues relevant to other subject domains such as law and psychology are available from our website.³6.1 Case activity 1a: gender stereotypes and conversational styles 6.1.1Design One of our prime target groups for our projects has been language teachers.Most language teacher training programs will include course modules in sociolinguistics, where gender and language normally constitute one of the topics.Designed to be used in such contexts, the aim of the following case is to raise awareness about stereotypes surrounding gender and conversational styles with special focus on conversational management.The case was presented as an exercise where students were asked to listen and respond to a dialogue between two researchers discussing an article about biased and gendered language representations in popular media, a relevant and credible topic in a course in sociolinguistics aimed at teacher trainees.The students were then asked to respond to statements regarding aspects of the conversational behaviour of one of the speakers.
 2 For more detailed information and descriptions of the method, see our website: https://www.stereotyping.se/.
3 Note that all cases are available for open-access use on https://www.stereotyping.se/foreducators.html.
We wrote a script with the ambition that it should be a balanced collaborative conversation where neither of the two participants was dominant or confrontational.After recording the conversation, one of the actors' voices was digitally manipulated for pitch, producing two versions of the conversation: one where Researcher A sounded like a male and one where s/he sounded like a female (see Lindvall-Östling et al. 2019, 2020a for more complete descriptions of the method).The "believability" of the manipulations, i.e. that they sounded authentic and natural, was tested with a panel of 13 colleagues prior to the case activities.

Response patterns
In the case exercise, respondents were asked to evaluate the language performance of Researcher A by responding to statements (on a 7-point Likert scale) that highlighted assertive features, such as taking up floor space, interrupting, being contradictory, arguing forcefully, as well as more affiliative features, such as signalling interest, being supportive and being sympathetic.The overall response patterns of 134 respondents from six classes of language teachers that partook in the case activities in Sweden are summarized in Figure 2.
As is evident from Figure 2, the response patterns show clear reversed linguistic stereotyping tendencies (Kang and Rubin 2009).Respondents who listened to the male version tended to give higher scores on conversational features associated with assertive conversational styles (e.g.taking space, interrupting, contradicting), while respondents who listened to the female version tended to give higher scores on conversational features associated with affiliative conversational styles (e.g.being supportive, signalling interest and not taking too much space).In accordance with Collins et al.'s (2012: 379) model, it thus seems that stereotyping "directs and distorts cognition" in such a manner that respondents focus on, and note, conversational behaviour that is stereotypically associated with masculinity in particular when listening to the male version and vice versa (cf.Holmes 2006: 6 above).The response pattern illustrated in Figure 2 was Combatting Linguistic Stereotyping and Prejudice by Evoking Stereotypes  659 mirrored in each of the six classes who partook in the activities and thus constituted the starting point for discussions and subsequent reflections in the post-survey.

Debriefing discussions
Many of the classroom discussions concerned the surprise element in the design and implications of the findings for general aspects related to stereotyping in society but also more specific issues related to professional practice.A common discussion topic has been the important, but difficult, role teachers play in helping to shape the behaviour of children and young adults.Here discussions have highlighted stereotypes of how boys and girls behave and communicate in the classroom, and how applications of such models can actually lead to these stereotypes becoming realities if left unchecked, or even encouraged, by teachers.There was overall general consensus about the importance that teachers are made aware of these aspects, and that each pupil should be treated as an individual rather than as a representative of a group.

Quantitative self-estimation of awareness raising
In order to gain a quantitative measure of self-evaluations of raised self-awareness of stereotyping resulting from the activities, we compared responses to the question "To what extent do you think that you are influenced by stereotypical preconceptions (conscious or unconscious) in your expectations and judgements of others?" (0-100 scale, where 0 = not at all and 100 = very much so) prior to and after the experiment.We could match the responses of 107 respondents who answered this question both in the pre-and in the post-survey (i.e.before and after the activities).There was a significant average increase of 7.4 units, from 50.8 in the pre-survey to 58.2 in the post-survey (p = 0.000 in a two-tailed paired T test).We see this result as a clear indication that the activity led to increased self-awareness of the effects of stereotyping on perceptions, at least in the short-term.

Qualitative analysis of open post-survey responses
In order to gain a more qualitative overview of how the activity had affected the participants, we included two general and open questions in the post-survey: "What will you take with you from this "experiment" and the following discussions into your future profession?" and "Were there any aspects of the design that worked particularly well, or not very well at all?" Of the 134 respondents who took part in the activities, 118 answered the first question in the post-survey.In our analysis of the responses, we have tried to elucidate common themes and patterns (cf.Braun and Clarke 2006) in the answers by looking at details such as choice of pronouns, commonly occurring verbs and nouns, the direct and indirect objects of the sentences, and themes of additional clauses, for example.The aim has been to identify what constitutes the "typical answer" as well as commonly occurring deviating patterns as summarized in Figure 3 and discussed below.
As illustrated in Figure 3, approximately half (58/118) of the responses included elements of explicit self-reflection (i.e. the subject or direct object was the first-person singular): "It has made me aware of how I am affected by norms" and "I will do my best in thinking twice before I act in certain situations."There were also 27 instances of more distanced reflections ("You should be aware that how you speak and what you say does not really depend on whether you are male or female, it depends on who you are as a person.").Many responses (33) also lacked explicit subjects as regards who the reflection referred to (Be aware that interaction is way more complicated than people say it is.).Modal constructions signalling strong obligation and commitment were also common in the responses (71/118): "I need to be more neutral when listening to males and females" and "I will actively consider my personal prejudices and try to refrain from letting them affect my judgement of students," for example.
Various aspects of awareness raising were referred to in most answers (90/118) and the most common constructions included the stem aware (for example, "Be more aware of stereotypes!" and "It has raised my self-awareness of how I am a product of my times when it comes to norms") but also other verbs and nouns signalling similar meaning such as "It has made me see the stereotypes that I carry with me.I think of myself as fairly good at equal treatment of my students, but I now know that I should never stop working on my prejudices and biases" and "It might have served as an eye-opener to make sure that I don't treat people based on a specific stereotype I might have."Further, a majority of the respondents (89/118) made specific reference to what specific aspects they had become more aware of, such as stereotypes, preconceptions, assumptions and beliefs, for example.Approximately half of the responses (51/118) also referred to how this awareness in turn might affect cognition and behaviour (judgements, interpretations and assumptions).Of these responses, 19/118 made specific reference to language aspects (I'll think more about how I react to different styles of speaking).
Notable in the answers was that respondents communicated relative certainty in their claims, and hedging was uncommon (only six instances, for example "I guess, the most important thing that I have learnt from this experiment is that unfortunately we cannot change our society, its norms and stereotypical preconceptions.But we can try to be aware of them.").Finally, although the majority of responses were rather general, 32 specifically referred to gender stereotyping and 27 referred to educational contexts specifically.Interestingly, 12 responses referred to the importance of recognizing cultural differences in stereotyping, a topic which was specifically taken up in the debriefing by showing respondents' results from similar activities conducted in the Seychelles (see Case 1b below).Four respondents claimed that the activity had not taught them anything new at all, and one respondent disturbingly claimed that the activity "Probably also made me even more anti-feminist, though in terms of students hopefully objective."This rare, but problematic, type of response, which indicates that the activities in some cases may arouse prejudiced views, will be discussed later.
In summary, a typical answer contained the key elements summarized in the following template answer: "The experiment has made me aware/see/conscious of my own stereotypes and preconceptions.I need to take these into consideration in order to change/be fair/be objective in my behaviour/ judgements (towards students/girls and boys)." There were 67 responses to the second question, "Were there any aspects of the design that worked particularly well, or not very well at all?" Of these, 54/67 were relatively simple positive praises such as "Everything was great!I learned a lot!" and "I was shocked and will remember this Disney recording deception.Very cool."Ten of the positive responses also included specific reference to the surprise element as being particularly successful, as the above example illustrates.Of the remaining more critical comments (13 in all), 6 criticized the relevance of the context and participants of the conversation.The majority of these would have liked to see a school setting and younger speakers (It was a nice design, however, a different age group might be more relevant for us as a student group (younger people maybe)).In addition, five critical responses concerned technical aspects, often concerning software and hardware issues: "I have a Maccomputer (if that is a factor), and the survey-part did not appear." 6.2 Case activity 1b: a cross-cultural comparative approach to gender and conversational styles Under the framework of the C-RAVE project, we also conducted the same case exercise as described above with teacher-trainee groups in the Seychelles.

Response patterns
The overall response patterns of 97 respondents from four classes of language teachers that took part in the case activities are summarized in Figure 4.The response patterns that emerged from the awareness-raising activities in Seychelles were somewhat different from those seen in the Swedish contexts.In contrast to the Swedish data, there were clear tendencies for respondents in the Seychelles to give higher scores on conversational features associated with assertive conversational styles (taking space, interrupting, contradicting, for example) when listening to the female version of the recording.Very few differences were observed between the responses to the male and the female versions regarding features associated with affiliative conversational styles (being supportive, signalling interest and not taking too much space).

Debriefing discussions
A central theme in the debriefing discussions in the Seychelles was how the results differed from the Swedish results and what this said about cultural differences as regards stereotypes.It became apparent from these discussions that assertive conversational strategies, such as being argumentative, interrupting and taking up conversational space, were not considered to be stereotypically masculine behaviour but rather feminine in the Seychelles.This could potentially explain the differences in the response patterns in the two contexts: Both respondent groups focus on, and note, conversational features that are stereotypically associated with the female behaviour when listening to the female version and vice versa.However, these gender stereotypes differ between the two contexts, which in turn explains differences in response patterns.
There are some indications that support the above hypothesis.For example, and as mentioned above, the Seychelles have been described by many as a "matriarchal" or "matrifocal" society (see Geisler andPardiwalla 2010 andDeutschmann andSteinvall 2020), and according to Geisler and Pardiwalla (2010: 63), "women in the Seychelles are considerably more empowered than […] in other regional countries," with women occupying 62% of the civil service as well as a large proportion of managerial positions in the private sector.Further, gender stereotypes that emerged from Deutschmann and Steinvall's study (2020) confirm that men are generally seen as disempowered, unreliable and lazy in the Seychelles.However, more extensive studies are needed before we can make any firm claims as regards cultural differences in gender stereotypes between the two contexts.
The discussions also concerned how gender stereotypes may lead to differential expectations and treatments of boys and girls in school contexts.Unequal gender structures in the Seychelles schools favouring girls have been highlighted in several studies (Ministry of Education and Youth 2002; Geisler and Pardiwalla 2010; Deutschmann and Zelime 2015); and according to Hungi and Thuku (2010), the islands have the largest gender differences in educational achievement in the region.The importance of being aware of how gender stereotypes can feed negative feedback loops was discussed, and here there was general consensus that teachers had a special role to play.

Quantitative self-estimation of awareness raising
We matched the responses of 72 respondents who answered the question "To what extent do you think that you are influenced by stereotypical preconceptions (conscious or unconscious) in your expectations and judgements of others?" both in the pre-and in the post-survey (i.e.before and after the activities).There was a significant average increase of 13.8 points, from 46.5 in the pre-survey to 60.3 in the post-survey (p = 0.03 in a two-tailed paired T test).We see this result as an indication that the activity led to increased selfawareness among the respondents.

Qualitative analysis of open post-survey responses
Of 97 respondents, 72 answered the question "What will you take with you from this 'experiment' and the following discussions into your future profession?" in the post-survey.Qualitative analysis of these responses gave very similar results as in the Swedish trials (see Figure 3 and Case 1a above): Various aspects of awareness raising were mentioned in 65 of 72 responses: "I have become aware that I should never judge someone without knowing them […] yes I used to!," "The experiment was an eye-opener!" and "I have become aware that stereotypes can cloud our judgement and thinking."Just as in the Swedish sample, many answers also made specific reference to cognitive and behavioural aspects (judgements, preconceptions and perceptions) that may be affected by stereotyping: "I have learn that we should not judge a person just by looking at him, thinking a lot of negative things without giving him a chance."There were also some notable differences between the responses in two groups.First, specific reference to self-reflection (i.e.first person singular) was less common in the Seychelles sample.In more than half of the responses (42/72), the collective first person (we, our, and us) was used in the reflections: "It showed something that maybe we fail to see and pay attention to in our everyday lives and how such preconceptions really can affect others and the whole society" and "How preconceived notions tend to affect our judgment."Second, the Seychelles respondents referred to cross-cultural aspects more frequently (22/72) than the Swedish respondents (12/118): "I have become aware that language and culture can influence our judgements," "I have learned that the Seychelles society is very different than that of the Swedish society" and "I have learned that stereotyping is different in different countries." Finally, 32/72 responses made specific reference to gender stereotyping ("It made me aware of the different perceptions and the thinking processes based upon female and male gender stereotypes"), and 23/72 responses specifically referred to educational contexts: "This exercise will enable us as teachers to change the way we think about people, especially our pupils."and "We need to stop the stereotypical thinking especially if we are using it in our everyday job with the pupils."In summary, the typical answer contained the following elements: "The experiment has made us aware/see/conscious of our gender stereotypes and preconceptions, and how these differ from Sweden.We need to stop stereotyping boys and girls, especially in the classroom." There were 43 responses to the second question, "Were there any aspects of the design that worked particularly well, or not very well at all?" Of these, 37 were very positive "It was great!" and "It was really interesting!I was shocked at how prejudice we are!"The few critical responses primarily dealt with connectivity issues.
6.3 Case Activity 2: the effects of accent bias in the evaluation of oral language performance in Swedish

Design
This case is aimed at Swedish teachers and teacher trainees with the goal of showing how stereotypes surrounding a non-native accent may affect judgements of other aspects of a speaker's general language performance.To hide the real purpose of the exercise, the case was initially contextualized as an evaluation exercise, where the respondents were told that they would listen to an interview with a teenage male, and where they would then be asked to evaluate his language performance.Again, they were not aware of the fact that there were two versions of the recording at this stage.Participants were also told that there would be a follow-up seminar when their results would be discussed.
The script represents a teenage boy being interviewed on the language situation at his school.The language is fairly typical of youth language: hesitant, full of invariant tags such as eh, okay, right and yeah, incomplete sentences and some grammatical mistakes motivated by false starts (see Palacios 2014, for example).A certain vowel sound that signals native/non-native Swedish accents (more specifically, the [ʉ:] sound, which is often reproduced more rounded and further back as an [u:] sound (basically a shift from u > o) by non-European immigrants with Arabic, Persian and Somali as first languages (see Thorén, 2010, for example)) was then manipulated, using cut-and-paste methods and vowel distortions using Praat (cf.Graff et al. 1986;Fridland et al. 2004), in order to produce two versions of the recording.In short, the two versions were identical apart from the pronunciation of this signal vowel sounds, something which did not interfere with comprehensibility.

Response patterns
Respondents were asked to evaluate the language performance of the interviewee by responding to statements (on a 7-point Likert scale) that focused directly on aspects such as understandability, variability, language structure, adaptability to the context, vocabulary, fluency, grammar and pronunciation.These statements were inspired by the Swedish School Authorities' (Skolverket) recommendations for evaluations of the oral part of the national test.
The overall response patterns of 290 respondents (from four classes of secondary school language teacher trainees and five classes of primary school teacher trainees) who took part in the case activities in Sweden are summarized in Figure 5.
The manipulated version (i.e. the version manipulated to sound non-native) was evaluated significantly more favourably on all variables except pronunciation (see Figure 5 below).In other words, respondents evaluated what they perceived to be a non-native Swedish speaker as easier to understand, as having more varied language, more structured arguments, better adapted language, richer vocabulary and more correct grammar than the native Swedish speaker.Overall, the primary teacher trainees evaluated the performance of both versions more favourably than the secondary school language teacher trainees.These response patterns were repeated for all the nine groups and formed the starting point for the subsequent debriefing discussions and post-survey reflections.

Debriefing discussions
The discussions that emerged from these response patterns primarily concerned issues of objectivity in the evaluation of students.Participants pointed out that they gave favourable evaluations of the manipulated version since they assumed that the person was a non-native speaker and judged his language skills to be good "for being a non-native."The unmanipulated version was obviously not judged on these criteria, which has raised discussions on objectivity in evaluations, and the importance that a certain grade reflects a certain level of language skills, regardless of whether the person is a native or non-native speaker.
During the discussions, the role of summative evaluations as a pedagogic tool were also considered.Many students pointed out that evaluating the same performance differently depending on whether it was a Combatting Linguistic Stereotyping and Prejudice by Evoking Stereotypes  665 native vs a non-native speaker made sense from a formative (but not from a summative) point of view.Many claimed that praising someone for their linguistic competence taking his/her linguistic prerequisites into account is something that we do, and should do, all the time in the language classroom.

Quantitative self-estimation of awareness raising
We could match the responses of 180 respondents who answered the question "To what extent do you think that you are influenced by stereotypical preconceptions (conscious or unconscious) in your expectations and judgements of others?" both in the pre-and in the post-survey (i.e.before and after the activities).There was a significant average increase of 7.7 points, from 55.8 in the pre-survey to 63.5 in the post-survey (p = 0.000 in a two-tailed paired T test).We again see this result as an indication that the activity led to increased selfawareness among the respondents.

Qualitative analysis of open post-survey responses
A total of 138 of the 180 respondents who did the post survey answered the question "What will you take with you from this 'experiment' and the following discussions into your future profession?" in the post-survey.Qualitative analysis of these responses gave very similar results as in Cases 1a and 1b: Various aspects of awareness raising were mentioned in 121 of the 138 responses: "I have become aware of how we can evaluate students unconsciously on the basis of their accents only" and "The experiment has opened up my eyes to the preconceptions that may exist among teachers."Just as in Case 1, many answers also made specific reference to cognitive and behavioural aspects (judgements, preconceptions and perceptions), and a number of answers made specific reference to how stereotyping may affect evaluations and grading (24/138).The typical answer can thus be summarized as variants of the following content: The experiment helped me to become aware of/see/pay attention to/think about how stereotyping can affect my actions in general, and my assessment/grading in particular.Finally, relatively few respondents (8/138) made specific reference to stereotyping based on ethnicity and/or native/non-native accents.
There were 91 responses to the second question, "Were there any aspects of the design that worked particularly well, or not very well at all?" Of these, 72 were very positive: "Exciting!,""All worked well!" and "The manipulated accent worked really well.I was really fooled!My own father speaks with a foreign accent and speaks just like this!"Many of the positive comments (34) also made reference to the surprise element: "You really fooled me.Good work!."There were also a few critical issues brought up: 14 respondents complained about the "performance" of the actor, which some did not perceive as authentic; 5 respondents pointed out the fact that they had had technical difficulties and 1 thought that the experiment might encourage prejudicial views: "Asking us to judge someone on the basis of pronunciation risks reproducing thoughts that foreign people are less smart.Some do not take in all the research results but choose details that suit their agenda, and given the rampant racism in these times this is risky."We will return to this critique in the final discussion.

Discussion
Overall our results are encouraging and we have been able to show that our method raises general as well as specific awareness of how stereotyping can affect our judgements and perceptions.We have now conducted several successful awareness-raising activities in various course programs.Evaluations from case activities have generally been very positive.One of the great strengths of the method has been that we can capture specific stereotyping patterns for a particular group and context, something which also has led very focused and relevant discussions.
While our methods have been very successful in raising awareness and in initiating discussions about stereotyping related to a multitude of issues in various learning contexts, we are also able to identify several flaws.First, as pointed out by others (Lee 1971;Fasold 1984;Laur 1994), the juxtaposed binary nature of matched-guise inspired set-ups creates a very artificial situation.Arguably, the design with predetermined response questionnaires, for example, forces respondents to make interpretations in ways that do not necessarily reflect aspects they would focus on real complex life situations.In this way, we may be "uncovering" language triggers and stereotype effects that may only have minor influence on judgements in real-life situations, and thereby falsely assign a disproportionate importance to these.Consequently, we may contribute to highlighting and evoking stereotype issues that do not really exist or that are of minor importance.By so doing, and as pointed out by the respondent in Case 2, there is a risk that the activities described above are party in strengthening, rather than dismantling, existing stereotypes.
As an illustration, we can problematize Case activity 2 (the effects of accent bias in the evaluation of oral language performance in Swedish).In a real-life situation, set-ups for evaluation would be quite different.First, teachers would be familiar with the person in question and would be present and actively involved in the language event.In short, the lack of context in the exercise creates a real problem (cf.Bierbach 1988;Bradac et al. 2001).Further, in a real evaluation exercise, teachers would have tools available to help in objective evaluation and would likely discuss and compare their grading with colleagues to prevent unwanted bias.Accordingly, the results from our study may run the risk of creating a false illusion that Swedish teachers give non-native Swedish speakers higher grades than they deserve.Thereby, by extension, we may inadvertently be contributing to prejudice and discrimination that undervalue the competence of individuals on the basis of their accent and nationality by giving substance to claims that reports and grades are not a true reflection of knowledge and skills since teachers have a tendency to set "betterthan-deserved grades" for non-native speakers out of kindness.
Another point of critique that we can raise about our set-ups is that, for reasons of clarity and anonymity, we merely present the average responses of the groups in our debriefing/discussion seminars.This can create a false illusion that all respondents act in a similar manner, which is of course not the case.Averages are compiled from a range of data, and a difference of one point on a Likert scale between two groups, for example, still often implies a great deal of overlap; significant average differences can thus be the result of relatively few extreme differential responses.Put simply, what is presented as a difference between two groups as a whole, may in fact only reflect a difference between a relatively small number of individuals in those groups (cf.Sunderland 2000).Of course, we try to highlight this issue in the debriefing seminars by explaining the complexity of the data presented, but the question is whether the lasting impression remains not one of difference rather than similarity.Ironically, we may thus be part of the problem of creating stereotypes about stereotypical behaviour!A final point of critique is that while our projects may raise self-awareness about stereotyping, an important initial stage in combating such tendencies, it is beyond our scope to fully follow this up with subsequent exercises and workshops.The university course contexts we work within seldom have enough time allocated for this type of work as syllabi are overloaded and teaching time limited.However, in other institutions, models such as the Prejudice Habit-breaking Intervention model, that include interventions such as Stereotype replacement, Counter-stereotypic imaging, Individuation and Perspective taking, have proven successful in achieving long-term results (see Devine et al. 2012 for further descriptions of these methods).Though recommendable, such models do require a lot of time and resources.In the best of worlds, our model would work well as an introductory self-awareness-raising activity in such set-ups.However, we maintain that the case activities developed under the projects have a value on their own, something which has been confirmed in numerous evaluations from our participants.

Figure 2 :
Figure 2: Response patterns (7-point Likert scale) to statements of conversational behaviour of Speaker A, male and female manipulations (n = 134).Red represents responses to the female version of the recording while blue represents responses to the male version.

Figure 3 :
Figure 3: Overview summary of constituents in responses to the question "What will you take with you from this "experiment" and the following discussions into your future profession?" (n = 118).

Figure 4 :
Figure 4: Response patterns to male and female manipulations, Seychelles respondents (n = 97).The red bars represent average responses on a 7-point Likert scale to the female manipulation while blue represents responses to the male manipulation.

Figure 5 :
Figure 5: Differences between perceptions of language performance in unmanipulated (dotted lines) vs the manipulated version (solid lines) among Swedish secondary (red) and primary (blue) teacher trainees (n = 290).