An Exploratory Study on Linguistic Gender Stereotypes and their Effects on Perception

This study explores how stereotypical preconceptions about gender and conversational behaviour may affect observers’ perceptions of a speaker’s performance. Using updated matched-guise techniques, we digitally manipulated the same recording of a conversation to alter the voice quality of “Speaker A” to sound “male” or “female.” Respondents’ perceptions of the conversational behaviour of Speaker A in the two guises were then measured with particular focus on floor apportionment, interruptions and signalling interest. We also measured respondents’ explicit stereotypical gender preconceptions of these aspects. Results showed that respondents perceived the male guise as having more floor apportionment and interrupting more than the female guise. Results also indicated that the respondents had explicit stereotypes that matched these patterns, i.e. that interrupting and taking space were deemed to be stereotypically male behaviour, while signalling interest was deemed to be a female feature. The study suggests that stereotypical preconceptions about gender and conversational behaviour may skew perceptions of similar linguistic behaviour.


Introduction
Stereotyping has been envisaged as a selective filter that directs and distorts cognition and has, according to Levon (2014), a profound impact on the perceptions, and in extension on the judgements, of the people we encounter. In short, stereotyping means that we are likely to take in information that fits our model expectations of a particular social group and ignore details that do not (Talbot, 2003;Collins and Clément, 2012;Levon, 2014). The present study aims to investigate this phenomenon in relation to gender and interactional styles, with specific focus on pragmatic functions related to conversational management. The main goal is to see whether gender stereotypical preconceptions of conversational behaviour, such as taking space, interrupting or signalling interest, affect observers' perceptions of how an interlocutor acts in a conversation. Put simply, we want to see whether the same behaviour is interpreted differently depending on the perceived gender of the speaker. For this purpose, we have used a method inspired by matched-guise methodology (Lambert et al., 1960;Ko et al., 2009) to produce two variants of the same recording of a conversation between two interlocutors, where the only difference between the two versions is the perceived gender identity of one of the speakers as signalled by digitally manipulated voice quality.


The current study explores new grounds in a number of ways. Unlike previous matched-guise studies, which have been based on short monologic utterances or readings, this study focuses on matched-guise interpretations of interaction as represented by the behaviour of a person in a constructed dialogic case. Furthermore, because the gendered variants of our cases are digitally manipulated versions of the same recording, we are able to eliminate unwanted background variables such as variation in pacing, pausing, voice level, etc.details that have been highlighted as important in influencing interpretations, but problematic to control for when different recordings, or indeed different "actors," are used to produce the input stimulus (Tsalikis et al., 1991). The current study was conducted under the project Raising Awareness through Virtual Experiencing (RAVE), a primarily pedagogic project financed by the Swedish Research Council (VR), with the aim to develop methods for raising students' awareness of how stereotypical preconceptions can affect our interpretations and judgements of others.

Gender stereotypes and research surrounding interactional styles
A primary feature in person perception (Ellemers, 2019) and traditionally viewed as a binary category, gender is a social category that seems to invite stereotypical categorization. Following Biernat and Sesko (2018) among others, the content of a gender stereotype is here viewed as a schematic framework representing the attributes people associate with members of that gender category, including behaviours, features and traits (compare also Beukeboom and Burgers, 2019). Such attributes are also informed by ideology and can be viewed as normative, prescribing behaviour that individuals have to respond to (Talbot, 2003: 472).
The process, stereotyping, means applying such a framework of expectations, the stereotype, onto individual cases. This "ubiquitous feature of everyday life" (Macrae et al., 1994: 45) has been described as a largely automatic and reductive categorization of people which emphasizes rapidity and efficiency but leaves little room for individuality and variation (see, for example, Fiske and Neuberg 1990;Macrae et al. 1994;Knippenberg and Dijksterhuis 2000).
Within a research framework for convenience frequently labelled as "the difference approach" (Cameron, 1996), a number of same-sex conversational studies conducted in the 1980s and 1990s demonstrated tendencies for women to be more collaborative, facilitative, conciliatory, indirect, affective and person-oriented than men who tended to be competitive, confrontational, direct and task-oriented in conversations (see, for example, Nolasco and Arthur, 1987;Holmes, 1995;Coates, 1996;Cheshire and Trudgill, 1998). Situated in a later research tradition, researchers such as Talbot (2003) and Holmes (2006: 6) argue that these studies have strengthened stereotypes of male and female discursive styles and thereby strongly contributed the "normative and unmarked gender identity" expectations of white middle-class men and women. As such, men being seen as competitive and women as collaborative in speech are direct manifestations of proposed models of hegemonic masculinity (Connell, 1987(Connell, , 2005 and hegemonic femininity (Pyke and Johnson, 2003;Connell and Messerschmidt, 2005;Schippers, 2007). In such models, key characteristics of hegemonic masculinity are toughness, risk-taking, stoicism, competitiveness, violence, achievement, etc. (Donaldson, 1993), while hegemonic femininity creates a hierarchical and complementary relationship to hegemonic masculinity characterized by traits such as cooperativeness, meekness, submissiveness, attentiveness, etc. (Schippers, 2007). Arguably then, the idea of male and female discursive styles is an integral part of established, more general, gender stereotypes.
The real world is of course much more complex. Talbot (2003) and Holmes (2006) among others point to the need to take the full complexity of sources of diversity and variation, such as context and power distributions, into consideration. If not, explanatory variables for a particular discursive behaviour would risk being blurred by gender factors simply because they happen to correlate. Moreover, researchers taking a social constructionist point of view, thus embracing gender as performance, have questioned the unproblematized model of gendered speech as a simple dichotomy of cooperative vs competitive as elements of both styles can be found in same-sex conversations of both genders (for example, Cameron, 1997;Eckert and McConnell-Ginet, 2013).
Indeed, many researchers are increasingly questioning the focus on gender difference in research. Kaiser et al. (2009) argue that this approach inevitably leads to the detection of differences rather than similarities; and in proposing her "gender-similarities hypothesis," Hyde (2005) was able to show that the effect size of detected gender differences in many studies is non-existent to small, a finding confirmed by other studies such as Leaper and Ayres (2007).
Indeed, the complexity of context in relation to gender and communication is addressed in a subject review by Wingate and Palorames (2017). While acknowledging substantial similarities between men's and women's communication, and the fact that effect sizes for differences tend to be small, they stress that subtle differences could still have important outcomes in a certain context. Furthermore, they argue that the perceived gender salience of a situation could be particularly significant. Increased contextual gender salience can lead speakers to adopt gender-prototypical behaviour because of the greater focus on gendered identity in such a context.
A growing body of language-focused research on gender is now focusing on reception and adaptations to context. For example, Ladegaard (2011) was able to show that while both male and female leaders tended to prefer "normatively feminine management styles" (indirect and mitigated directives, interrogatives and modal verbs, for instance), the perception of and response to these strategies from the employees differed. Female leaders were questioned and challenged more frequently than male leaders, indicating that similar styles were interpreted differently depending on whether it was a man or a woman who used them. In another study, Hancock and Rubin (2015) demonstrated that, in a counterbalanced design of eighty 3-min conversations, there were no overall gender differences in the use of "gendered" linguistic variables such as fillers, hedges, interruptions and tag questions among their 40 test subjects. Instead, it was the gender of the trained male or female communication partner that decided whether respondents' conversational styles were more "male" or "female," suggesting that speakers, regardless of gender, accommodated their styles to the gender of the conversational partner. Similarly, in a study by Mulac et al. (2013), the results indicated that the language used by male and female respondents when describing a photograph differed significantly depending on whether they were instructed to describe it to a woman or a man. Further, Hildebrand-Edgar and Ehrlich (2017) could show that context was highly relevant to the perception of gendered speech in their study of a rape trial. In that trial, a woman was perceived as being too "assertive" to be a victim of rape. Finally, Scheelf (2008) found that instructors in academia, regardless of gender, tend to use the same speech style and only very seldom draw on wider identity resources.
All of the above studies seem to suggest that (stereotypical?) social expectations of language behaviour are instrumental in shaping language behaviours and perceptions, and that an interplay of these, in turn, shapes output and perception in specific contexts. In effect, as pointed out by Crawford (1995), social identity expressed through language is consequently something that is renegotiated during every meeting between humans and involves an interplay between the speaker and the receiver as well as implicit expectations defined by context and societal norms. This complex view of sociolinguistic output as the result of negotiations suggests that historical research with focus on language production only gives an incomplete, and potentially misleading, picture of how social identity is reflected through language. For this reason, a shift towards language reception is crucial, motivating the present study with its focus on perceptions of the hearer. As Levon (2014) points out, sociolinguistic perception and production are different; in production individuals can draw on a variety of variable forms to construct a social identity, but in the perception of that construction, a great deal of that nuance could be lost or misconstrued as a result of cognitive modes of processing. In short, listeners' attitudes and preconceptions as well as general cognitive constraints can limit what social meanings are detected in a speech event.

Gender-linked language effect
Of interest in this context is a more systematic model of how stereotyping affects our language expectations, perception and behaviour in various contexts. Mulac et al. (2009) and Mulac et al. (2013) propose the so-called gender-linked language effect (see Figure 1 above) and argue that there are cognitive schemata whereby we sort, judge and produce language based on both contextual information and (un)conscious gender stereotypes. According to Mulac et al. (2013), a listener first perceives the language context, which includes the speaker's perceived gender. Then the speaker's language output is filtered through the listener's gender-linked language schemata and stereotypes, which in turn directly influence the listener's judgement of the speaker. A different context (male speaker rather than female, for example) should then trigger different stereotypes in the general process model, which in turn may result in the listener judging the language output differently.
While the rigidly sequential nature of the model could be questioned, we still find it a useful starting point since it provides a cognitive explanation for linguistic stereotyping. However, we would argue that there is a missing piece in the above model. More specifically, there is little attention paid to how genderlinked schemata may affect the perception of the speech event itself. Studies in phonetics have demonstrated that the perception of speech sounds is affected by the perceived gender of the speaker (see e.g. Johnson et al., 1999;Strand, 1999). Such "Face Gender Effects" (Strand, 1999) clearly indicate that socially constructed schemata affect the perception of speech sounds. However, to the best of our knowledge, no studies have addressed how the perception of a dialogic speech event can be affected by perceived gender.
Following conclusions from Strand (1999), we are hypothesizing that language-linked schemata often do not only directly affect judgements of speakers, or what is occasionally called social perception in the literature, but also affect the perception of a speech event itself. Edwards (1999), discussing the nature of social perceptions, points out that judgements of speakers arise from stereotypical attitudes that the judges have of a group, which have been associated with a particular linguistic feature (often of a phonetic nature). What we are suggesting here, however, is that the very perception of the speech event in turn is affected by preconceived ideas based on social information available, such as perceived gender. If this hypothesis is correct, it is not just that our perception of a speech event may trigger stereotypical judgements of speakers, the perception of the speech event itself may also have been affected by stereotype-like schemata, the result of which will then, in turn, affect our judgements of the speaker. For example, an interruption performed by a male speaker may actually be perceived as more aggressive than an interruption performed by a female speaker since the listener is influenced by stereotype-like schemata. This may then influence the hearer's judgement of that speaker -"he is aggressive because he interrupts aggressively." In order to explore this question, the perception of a speech event alone, measured in linguistic units, is thus the main focus of enquiry in this article.

Methodological background
Systematic enquiries into linguistic stereotyping and judgement began more than half a century ago with Lambert et al.'s study (1960). Using the so-called matched-guise methods, whereby an actor produced the same text in two or more variants, they were able to show how a brief recording in French vs English (in Canada) triggered different responses regarding speaker's personality, social status and character, depending on the language/accent of the speaker (Lambert et al., 1960;Bradac et al., 2001). The matchedguise test is still used today to test how judgements of speakers are affected by stereotyping in various disciplines ranging from sociolinguistics, social psychology, business research and medicine (Cargile, 1997;Cargile and Giles, 1998;Lawson and Sachdev, 2000;Dixon et al., 2002;Bilaniuk, 2003;Carson et al., 2004;Buchstaller, 2006). However, in the majority of such studies the focus has been on the hearer's judgement of the speaker, rather than on how the hearer perceives the speech event itself.
A major critique of traditional matched-guise setups has been that it is almost impossible to control for unwanted background variables, even when the same actor/actress is used. Speed, intonation or pitch can all have a significant impact on how something is perceived, and it is very difficult to control for these when making multiple recordings of the same text (Tsalikis et al., 1991). Furthermore, when exploring the gender variable in matched-guise setups, one has to use two actors, making it even more difficult to control for unwanted background variables (accent, for examplesee Bilaniuk, 2003). Challenges such as these have limited gender research using matched-guise setups with voice recordings to date, and most studies are limited to very short utterances, evaluations based on a simple "hello" (Mcaleer et al., 2014) or a short reading of a passage (Ko et al., 2006), for example. With that said, developments of the method using digital technology have allowed computer simulations and manipulations of a recording to open up new possibilities in perception-based research (see Campbell-Kibler, 2008;Connor, 2008;Lindvall-Östling et al., 2019). In the current study, we use "voice-morphing" techniques to digitally manipulate a dialogic recording in order to create two versions of the same recording where one participant is seemingly male in one version and seemingly female in the other. We are thus able to eliminate many unwanted background variables previously pointed out as problematic in matched-guise designs (Tsalikis et al., 1991).

Voice quality as a trigger for stereotypical judgements
Several studies have shown that the quality of voice can trigger stereotype judgements about speakers. For example, this phenomenon has been studied in relation to sexual orientation (Smyth et al., 2003;Levon, 2007;Fasoli et al., 2017) and, more specifically, in relation to gender and masculine-/feminine-sounding voices (Ko et al., 2006: 41;Ko et al., 2009;Mcaleer et al., 2014). Ko et al. (2006Ko et al. ( , 2009, for instance, demonstrated that auditory cues (male or female voices) acted as an overall between-category basis for gender stereotyping, whereby female guises were rated higher for warmth, while male guises were rated higher for competence. Similarly, Nass et al. (2006) were able to show that similar automated computer voice evaluations of student performance in a computer task were deemed as more relevant when delivered by a male voice, as opposed to a female voice. In a previous study by Dennhag et al. (2019), the perceived personality and social behaviour of speakers in a dialogue were studied using the same method as in the present study. Crucially, there the focus was on the judgement of the speaker's personality and social behaviour, whereas here the focus is on the speech event itself.

Choice of linguistic variables
Queries into gender differences regarding aspects related to conversational management have been an important focus of linguistic studies over the past decades, and thereby such aspects also hold a central position in standard sociolinguistic course literature on gender and language (Cheshire and Trudgill, 1998;Coates, 2004;Sunderland, 2006). Gender differences in interruptions, for instance, have been the subject of a vast number of studies which have shown men to interrupt more than women (Zimmerman and West, 1975;Kennedy and Camden, 1983;Smith-Lovin and Brody, 1989;Anderson and Leaper, 1998;Zhao and Gantz, 2003). Furthermore, Blair-Loy et al. (2017) have shown that the gender composition of a particular work environment seems to have an effect on the tendency to interrupt. Closely related to the above are gender studies on floor apportionment in conversations, where studies from various contexts such as parliamentary debates (Shaw, 2000), Disney films (Fought and Eisenhauer, 2015) and conversations in various public/semipublic contexts such as seminars, meetings, etc. (Holmes, 2003) have helped to develop and strengthen the stereotype that men "hog the floor" and that women tend to "leave the floor to men" (Holmes, 1995).¹ Gender differences in signalling interest and encouraging conversational partners to speak by posing follow-up questions, for instance, have also been subjects of intense study in sociolinguistics (see Holmes, 1995;Johnson, 1994;Sugawara et al., 2017, for some examples). Again, many studies show this type of activity to be typically female conversational behaviour. In summary then, the linguistic variables chosen as objects of enquiry for this study are all well-researched and described as "gendered," which leads us to hypothesize that they may be part of the respondents' gender-linked language schemata and stereotypes.

Aims and research questions
The overall aim of this study is to investigate whether stereotypical gender preconceptions regarding conversational styles affect perceptions of a speech event, i.e. if the same speech event is interpreted differently depending on whether the listener thinks she/he is listening to a woman or a man. We have broken down this query into three specific research questions: 1. What are respondents' explicit stereotypical preconceptions of discourse behaviour in relation to the investigated variables floor apportionment, signalling interest and interrupting? Here we seek to acquire a measurement of the respondents' explicit gender preconceptions. 2. Do respondents' interpretations of the identical speech event (in relation to the investigated variables floor apportionment, signalling interest and interruptions) differ depending on whether they think they are listening to a male or female conversant? Here we hypothesize that respondents will especially notice aspects of conversational behaviour that match the groups' stereotypical views regarding male conversational styles when listening to the male guise and vice versa. 3. Is there a correlation between the respondents' explicit stereotypes (research question 1) and their perception of the guise (research question 2)? Here we hypothesize that a respondent who perceives interruptions as a typically masculine feature, for example, will rate the male-morphed guise as interrupting more than a respondent who does not share this stereotype.

Overall framework
The methodological model followed a typical matched-guise setup (see Figure 2 for a visual overview of the method). Respondents were exposed to one of two different guise versions of a recording and were then asked to rate the guise they listened to on different floor management variables in a response questionnaire.


More specific details of the method are described in Sections 3.1-3.5. The respondents were asked only to focus on one of the speakers in the dialogue since pilot runs had shown that respondents had difficulties in concentrating on more than one speaker at the time (Lindvall-Östling et al., 2019). We also administered a short survey addressing the groups' explicit stereotypical preconceptions as regards gendered conversational behaviour and floor management (see Section 3.7).

Introductions and contextualization
In this particular study, it was important not to reveal the exact nature of the experiment prior to the activity. Respondents, who were also students in a regular class on sociolinguistics, were thus told that they were going to listen to a discussion on gender and language between two researchers and that this would later be the focus of a seminar discussion. At this stage, the group was also split into two equally sized "seminar groups." Thus, they did not know the real purpose of the experiment initially, although this was revealed in a debriefing session, once the response data had been collected.

Respondents
In total, 112 students participated in the study; all but 13 being English-language teacher trainees, the remaining being English-language students at the undergraduate level. Only ten of the participating students identified themselves as other than Swedish, namely: South Korean, Brazilian/Italian, Iraq, Syrian, Thai, Finnish, Albanian, "other than Swedish," "international" and "Scandinavian." All respondents had near-native level of proficiency in English. Because the experiment was carried out over 2 days (response phase and discussion/post-survey phase), there were missing data because some respondents did not attend the second day. On day 2, a total of 21 respondents were not present, which meant that 91 respondents were present on both days. Participation is summarized in Table 1.

Choice of linguistic variables
We decided to focus our queries in the response questionnaires on linguistic variables of interactional styles that related to conversational floor management. This was partly motivated by findings from similar experiments conducted under the project, which showed that students perceived floor management variables to be particularly salient when it came to gender differences (Lindvall-Östling et al., 2019: 221). Our choice was further motivated by our ambition to ensure that the variables chosen were easily recognizable by relatively naive respondents. These included features associated with competitive conversational management such as interrupting the other speaker and occupying and holding the floor. Features we focused on associated with collaborative conversational management styles included signalling interest in what the other speaker was saying (back channelling and posing information seeking follow-up questions, for example) and occupying less conversational space.

Pre-survey
Prior to listening to the recording, respondents were asked to fill in a short survey giving basic information about themselves (gender, age and nationality); and here they also created an anonymous identity, so that they could be traced in the various stages of the experiment (pre-survey, response survey and post-survey) while still remaining anonymous.
3.6 The recording 3.6.1 Creating the script Here a primary ambition was to create a script that could be motivated as relevant to the ordinary content of the course in sociolinguistics so as not to arouse suspicion. We decided on contextualizing the recording as an academic Skype dialogue between two researchers discussing findings from a study on language and gender roles in Disney Princess movies (see Fought and Eisenhauer, 2015). When working with the details of the script, we aimed for a collaborative, constructive dialogue where neither of the two participants were  Table 2 summarizes an overview of some key features of the dialogue.

Recording and voice manipulation
In choosing actors for the recording, we first had to make sure that their voices would respond well to voice morphing. From prior manipulations that took place under the project, we knew that some voices sounded more believable than others after manipulation. For this reason, we pre-tested a number of different voices to see how they responded to digital morphing before choosing actors. From these pre-tests, four "best" voices were chosen and two short sound files from each potential actor (one original and one that had been voice manipulated) were sent to 13 peers asking them whether (a) the recordings manipulated or not, (b) sounded convincing as male/female voices or not and (c) sounded like the same person in both sound files or not. Based on these responses, we chose the two voices, one male and one female, which were evaluated most positively. The dialogue was then recorded using separate channels, so that each voice could be edited separately. The initial recordings were made in a studio setting using Audition, and they were then edited in Praat. In Praat, the "change gender"² function was used, with a focus on pitch range median, formant shift ratio and some slight modification to pitch range factor. The methodology used to produce a credible male and female morph was mostly a case of individual procedures. Because some voices need their formant shift ratio to be in the 350 s, and others in the 250 s, a clear and generic picture of the end variables is hard to give since it is highly dependent on the original recording and voice. After the major alterations were done in Praat, the sound file was imported back into Audition where the equalizer function was used to eliminate the high and low frequency in the recording, thus making it sound like a low-quality Skype call. Further, some background static noise was added onto the recordings, and, finally, the two versions were badly compressed so the overall quality was lowered further. These quality-lowering actions were taken in order to "camouflage" some of the more salient effects of the voice morphing, which are less noticeable when high and low frequencies are eliminated. The above procedures left us with two versions of the recording. In one of the versions, the female actor's voice playing "Speaker A" was altered, or "morphed," so that she matched what was conceived as a male speaker (based on the pre-tests described above). In the other version, her voice was largely left unmanipulated, but only adjusted so that the sound quality matched that of the "manipulated" recording.

Exposure and response to the dialogue
Each respondent listened to one version of the dialogue guises once individually. Note that the illusion of the gender of the conversant was further strengthened by using silhouette images of a man or a woman as backdrop to the sound file. Importantly, there were no indications in the ensuing debriefing discussion, nor in the responses of the post-survey, that the respondents were aware of the voices being digitally manipulated or that there was any confusion as to the gender of Speaker A in the two case versions.
In an immediate post-exposure response survey, the test subjects were then asked to respond to questions that related to three linguistic aspects to do with floor management on a 7-point Likert scale (ranging from 1 = disagree completely to 7 = agree completely). The rated statements were as follows: • Speaker A interrupts the other speaker a lot • Speaker A signals interest in what the other person is saying • Speaker A takes up a lot of the space in the conversation After all respondents had listened and responded to the dialogue, we had the so-called debriefing session where we revealed what we had done, showed the students their results and gave them the opportunity to discuss the reasons for the different responses that the two guises had generated.

Post-survey
Following the debriefing session, which took place a day after the exposure, the respondents were given a post-survey, where, apart from questions relating to their impressions and reactions to the experiment, we asked them to rate a number of conversational behaviours as typically male, female or gender neutral. The aim here was to map gender stereotypes among our respondents as regards conversational management and discourse styles, in order to document the listeners' Gender-Linked Language Schemata (cf. Mulac et al.'s model, 2013). The respondents were asked to rate the following linguistic tendencies on a gender scale ranging from "very typically male behaviour (−2), slightly typically male behaviour (−1), gender neutral (0), slightly typically female behaviour (1), to very typically female behaviour (2):" The tendency to […] • take up more space in a conversation than others • interrupt other people in conversation • signal interest in what others are saying (through back channelling and asking questions, for example) Moreover, the results of this rating were used to roughly determine whether the groups were similar in terms of held preconceptions of men and women. In this survey, we also enquired whether respondents had been convinced of the gender of the speakers in the dialogue. Finally, the respondents were also asked to confirm whether they wanted their data to be included in our research database.

Ethics and consent
Although implemented as a feature in a regular course setup, students' participation in this study was not obligatory, and their course grade was in no way linked to their participation. All students were informed before their participation that the data generated would be part of a research project and that they could discontinue their participation at any point. Because they were not made aware of the purpose of the study even after they had taken part in it, they were not asked to give their informed consent until the post-survey, when the full picture was clear to them. Results from students who chose not to give their informed consent have been deleted. As described in Section 3.4, anonymity was guaranteed since the students themselves generated the personal codes that made it possible to trace their results from the pre-survey through the post-survey. No code keys were kept. The project has been approved by the Swedish Ethical Review Authority.³

Results
For all statistical analyses, IBM SPSS Statistics was used. A one-sample t test with reference number zero was used to find whether participants perceived any particular language features examined in this article as either typically male or female. In order to compare the perception of the guise, an independent sample t test was conducted, using male or female guise as the grouping variable and perception of floor management (interruptions, floor apportionment, etc.) as the test variables. All statistical models took the gender of the respondent into account; and the first model by doing two different one-sample t tests and the second model by using a two-way analysis of variance (two-way ANOVA). The statistical models used for the interaction effects were ANOVA and linear regression analyses. Effect sizes were expressed in Cohen's d, a measure that gives an indication of the size of differences between two data sets. Effect sizes of 0.01, 0.2, 0.5 and 0.8 indicate very small, small, mediate and large effect, respectively (Cohen, 1988;Sawilowsky, 2009).
In Sections 4.1-4.3, we will address three separate research questions. In Section 4.4, we will summarize the overall results.

The respondents' stereotypical preconceptions
Overall, the results matched our expectations. The respondents as a group had stereotypical preconceptions whereby taking a lot of space in a conversation and interrupting others were seen as stereotypically male behaviour, while signalling interest was seen as stereotypically female behaviour. The results are summarized in Table 3.
Although all differences deviated significantly from zero (i.e. what was conceived as neutrally gendered behaviour), the effect sizes for all respondents varied from small (floor apportionment (d = 0.21) to mediate [interruptions (d = 0.58), and signalling interest (d = 0.53)]. In other words, while taking up space in a conversation was perceived to be typically gendered behaviour, this was a rather weak tendency. On the other hand, the preconceptions that interrupting is typically masculine behaviour and that signalling interest in a conversation is typically feminine behaviour were stronger.

The respondents' interpretations of the dialogue
Of the 112 respondents who participated in this part of the study, 60 listened to the female guise and 52 listened to the male guise. Some distinct patterns emerge from these analyses. The group as a whole seemed to perceive the male guise as interrupting more and having more floor apportionment, and the female guise was seen as signalling more interest. However, of these descriptive statistics, only the perception of interruptions (p = 0.031, d = 0.42) and floor apportionment (p = 0.01, d = 0.61) were statistically significant and the effect sizes were small to mediate. The results are summarized in Table 4.

Interaction effects between respondents' stereotypes and perception of guise
Only 91 respondents participated in both the response survey and the post-survey. The interaction analysis showed no statistical interaction between the respondents' explicit stereotypes and their perception of the guise. That is, we could not statistically show that a person who, for example, thinks that interrupting a lot is a male trait also perceived the male guise as interrupting a lot (p = 0.382). We could not show any interaction effects for signalling interest (p = 0.923) or floor apportionment (p = 0.825) either.

Results summary
We found that the group as a whole held expected stereotypical views as regards male and female conversational behaviour and judged traditionally competitive conversational styles, such as having a lot of  floor apportionment and interrupting a lot, as typically male behaviour, and traditionally collaborative conversational styles, such as signalling interest as typically female behaviour. The effect sizes were small to mediate. The respondents perceived the two guises significantly differently on two of three analysed variables. They perceived the male guise as having more floor apportionment (p = 0.01) and interrupting more (p = 0.031). The effect sizes were small to mediate. There was no direct correlation between the respondents' explicit stereotypical preconceptions and their perception of the guise.

Discussion
This study has explored how two groups of students rate the conversational behaviour of the same conversational participant in two different gender guises (male and female). The query addressed how the ratings of aspects related to conversational management (floor apportionment, interruptions and signalling interest) may be affected by the perceived gender of the person as manifested through voice quality. We were able to show that the respondents, taken as a group, had clear explicit stereotypes that matched our expectations with regard to gender and conversational behaviour. The results indicate that the respondents believe that men take more space and interrupt more than women do, while women are more likely than men to signal interest and take less conversational space. We could also show that the perceptions of the behaviour of Speaker A in the two guises differed depending on whether respondents thought they were listening to a man or a woman. These differences in the perception of behaviour in the speech event matched our expectations, i.e. that many respondents listening to the male guise would notice behaviour that matched generally held stereotypical views of male behaviour to a greater extent than respondents who listened to the female guise and vice versa. Important, however, is that we could not correlate specific explicit stereotypical preconceptions with specific response patterns. This is probably because the effect sizes were small, so a much larger data set would be needed to establish such a trend.
The post-survey and the data related to the first research question illustrate how established truisms concerning male and female interactional styles are reflected in student beliefs. We would thus argue that many university students of language do stereotypically associate collaborative and competitive speech styles with women and men, respectively.
In the second part of the study, we were able to show that, on average, the language behaviour of the male guise was perceived differently from that of the female guise among our respondents. The perceptions were different in spite of the fact that both recordings were identical in all respects except for voice quality. Relating these results to Mulac et al.'s (2013) gender-linked language effect model, we can confidently claim that our results verify the existence of gender-linked language schemata and stereotypes and that these in turn affect not only judgement (e.g. concerning socio-intellectual status and personality, the so-called social perception (Edwards, 1999;Dennhag et al., 2019)), but also the perception of the speech event itself. Overall, our results supplement those of previous studies (Ko et al., 2006;Nass et al., 2006;Ko et al., 2009;Mcaleer et al., 2014), showing that voice quality affects both the hearer's judgement of the speaker and that of the speech event itself. More importantly, however, here we specifically explore the role of the perception of actual conversational behaviour rather than judgements of speaker qualities. Describing their model, Mulac et al. (2013) point out that a hearer or reader (i.e., message recipient) perceives the communication context, which includes the situational circumstances and fixed speaker attributes such as sex, along with the speaker's language.
[…] Both the hearer's perceptions of the context and the speaker's gender-linked language behavior activate hearer schemata and stereotypes, which affect hearer judgments of the speaker. (p. 24, our emphasis) They also acknowledge that situational input could affect hearer's perception of context and the activation of schemata and stereotypes. However, what the present study shows, and what has also been demonstrated in phonetic studies (Johnson et al., 1999;Strand, 1999), are that gender-linked schemata and stereotypes operate on the very perception of a speech event itself. It would appear that non-linguistic situational and contextual information (e.g. gender of the speaker), presumably available before a speech event, activate schemata and stereotypes in such a way that they affect the processing of the perception of the language event, skewing it in accordance with expectations. For this reason, we suggest that the model also should take stereotype affected perception factors into consideration.
We would argue that it is differential perception effects that make the mechanisms of gender-linked language stereotyping particularly treacherous. We may think that we are rightfully basing judgements of individuals on their behaviour, without being aware of the fact that this "behaviour" has been filtered through our perceptions, which in turn are tainted by language schemata and gender stereotypes. In other words, evidence from this study suggests that our senses may modify language input to fit our language schemata, and by so doing also confirm the same. Further, since language events include both speakers and hearers, hearer perception, no matter how inaccurate, may in turn shape speaker behaviour since it affects how the hearer reacts, thereby potentially contributing self-fulfilling prophecy mechanisms.
Given the exploratory nature of this study, there are, however, a number of limitations. Firstly, and in direct reference to the above model, we have to acknowledge that our tool to capture the gender-linked language schemata and stereotypes failed to capture more complex aspects of this phenomenon. The statements given in this part of the study were generic in nature and did not take aspects related to context into account. For example, it may have been the case that respondents' gender expectations may have differed markedly depending on aspects such as whether the conversation was informal or formal, professional or private, public or intimate, etc. Moreover, identity aspects other than the gender of the speaker were not included in the model. Here stereotypes may have differed depending on whether we had asked about young vs older speakers, their professional identity, the relationship and gender of the conversational partner, etc. Further, with reference to the measurement of the respondents' stereotypes, there is always a risk that respondents' answers reflect what they believed we expected them to answer, rather than what they actually truly felt to be the case.
Secondly, we could not actually show that there was a link between the groups' general stereotypic preconceptions and their individual responses to the speech event. This was arguably a consequence of the relatively small number of respondents in combination with potentially small effect sizes. Another confounding factor could be that the preconceived stereotype survey was taken after a debriefing seminar where stereotyping and gender issues were discussed, something which may have affected the respondents' answers. However, the alternative, i.e. administrating the preconceived stereotype survey prior to the response survey, could have primed respondents for the response part of the experiment, which was the main focus of the study. Administering this survey prior to the response part of the experiment could also have aroused suspicion as to the real nature of the experiment.
Thirdly, at the point of data collection, the speech event was over, and the respondents would already have made a judgement of the speaker. Therefore, although our ambition was that the survey questions address the respondents' perception of the speech event itself as coloured by the gender of the speaker alone, it might have been the case that their impressions had also been coloured by judgements of other contextual aspects related to the speaker, such as role, education, profession, etc. Since these aspects may be interpreted differently depending on gender, they may have had differential effects on the judgements depending on whether respondents listened to the male or female version of the recording. To illustrate this, compare the associations that "male professor" vs "female professor" may conjur.
Furthermore, we acknowledge that the effects of voice quality itself may have been an important background variable influencing our results. As previously demonstrated (Ko et al., 2006;Ko et al., 2009), voice quality not only is an important cue leading to between-category stereotyping but also affects within-category judgements. There is thus scope for further studies of similar design, where voice quality variation and its effect on stereotyping are explored within a category.
On a more general note, our study has broader implications. Arguably, sociolinguistic research focusing on identifying gender differences in language production inadvertently contributes to language schemata and gender stereotypes, which in turn may affect perception. In this way, sociolinguistic research aimed at exposing gender injustices may instead serve to confirm and strengthen these. Many sociolinguists (for example, see McConnell-Ginet, 1992, 2013;Holmes, 2006;Cameron, 2008) are increasingly critical to this approach and correctly point out that other potential causal factors for a particular behaviour, such as context and power, can be blurred by the gender variable. In line with such arguments, we would argue that we need a more problematized approach to the "gender question" in quantitative sociolinguistic research, and here focus on perception needs to be included. By doing so, we can begin to explore how gender expectations form part of the complex interplay between speaker, hearer, perception and expectations that make up any language event.