Tell me more! Assessing interactions with social robots from speech

: As social robots are increasingly introduced into health interventions, one potential area where they might prove valuable is in supporting people ’ s psycholo - gical health through conversation. Given the importance of self - disclosure for psychological health, this study as - sessed the viability of using social robots for eliciting rich disclosures that identify needs and emotional states in human interaction partners. Three within - subject experi - ments were conducted with participants interacting with another person, a humanoid social robot, and a disem - bodied conversational agent ( voice assistant ) . We per - formed a number of objective evaluations of disclosures to these three agents via speech content and voice ana - lyses and also probed participants ’ subjective evaluations of their disclosures to three agents. Our ﬁ ndings suggest that participants overall disclose more to humans than arti ﬁ cial agents, that agents ’ embodiment in ﬂ uences dis - closure quantity and quality, and that people are gener - ally aware of di ﬀ erences in their personal disclosures to three agents studied here. Together, the ﬁ ndings set the stage for further investigation into the psychological un - derpinnings of self - disclosures to arti ﬁ cial agents and their potential role in eliciting disclosures as part of mental and physical health interventions.


Introduction
People tend to disclose thoughts and feelings with others, especially when experiencing unique and challenging life events [1].Disclosure thus serves an evolutionary function of strengthening interpersonal relationships but also produces a wide variety of health benefits, including helping people to cope with stress and traumatic events and to elicit help and support [2][3][4].Moreover, self-disclosure appears to play a critical role in successful health treatment outcomes [5] and has a positive impact on mental and physical health [6].Given the importance of self-disclosure for psychological health, here we are interested in assessing the viability of using social robots for eliciting rich disclosures to identify people's needs and emotional states.
Social robots, defined here as autonomous machines that interact and communicate with humans or other agents by following social behaviours and rules [7], are gradually being introduced in psychosocial health interventions [8] as well as in mental health and well-being research [9].Concurrently, social robot-based interventions are also being introduced into care settings and tasked with providing physical assistance (e.g.ref. [10][11][12][13][14][15]), serving as companions, providing emotional support, and contributing to the mental well-being of patients (e.g.ref. [8,9,[16][17][18][19][20][21]).Autonomous systems such as social robots can support care recipients in a variety of ways but also support their caregivers' physical and mental health (see ref. [22]).Moreover, social robots are increasingly being built equipped with technologies (e.g.sensors, cameras, and recorders) that promote high-fidelity data collection and online, ongoing analysis of a human interaction partner's behaviour.When implemented in an ethical and responsible manner, such features hold promise for robots being able to analyse and respond to user responses during an interaction in a sensitive, timely, and nuanced manner.
In order for health interventions to succeed, they depend on open channels of communication where individuals can disclose needs and emotions, from which a listener can identify stressors and respond accordingly [23,24].This is particularly important for self-help autonomous systems, and for personalizing interventions and other assistive solutions, as these should be able to use the rich input provided by human users to extract salient information, identify patterns and emotional states, and respond accordingly [25].It follows from this that socially assistive robots should also be attuned to the content and emotion of disclosed information.While social robots and other artificial agents do not (yet) offer the same opportunities as humans for social interactions [26], their cognitive architectures and embodied cognition can nonetheless elicit socially meaningful behaviours from humans [27][28][29].Accordingly, people infer a great deal about what an agent does or is capable of doing, based on its embodiment (i.e.what it looks like, its physical presence, how it moves, etc.).In addition, other cues of embodiment are driven by a human interaction partner's cognitive reconstruction [30], wherein their beliefs or expectations about an agent further shape perception and behaviour [31][32][33][34].
However, a number of outstanding questions remain regarding the utility and scope of using social robots in selfdisclosure settings, which require careful evaluation before such agents might be deployed in actual care contexts.For instance, it remains unclear the extent to which individuals convey emotions and personal information in disclosures to social robots as well as how disclosures to artificial agents differ depending on the agent's embodiment or physical presence.As socially assistive robots continue to be developed with the aim to provide meaningful support to people across a variety of contexts, our goal with this study was to explore how a social robot's embodiment influences people's disclosures in measurable terms, and how these disclosures differ from disclosures made to humans and disembodied agents.Hence, our primary research question concerns the extent to which disclosures to social robots differ from disclosures to humans and disembodied agents.

Embodiment as a social cue
The media richness theory (MRT) [35] explains that a communication medium's ability to reproduce information sent through it is driven by its ability to communicate a complex message adequately.Hence, personal communication behaviours, such as disclosure, would typically be transmitted better (or with greater fidelity) through media with the capacity to convey richer social cues, such as gestures and body language [35,36].However, MRT was originally concerned with computer-mediated communication (CMC), and accordingly, social cues within the MRT framework are bound to human origins.In this study, we address this in the context of human-robot interaction (HRI) and explore people's disclosures as a reaction to agents' physical features, when these are the only available cues to an agent's intentions.Therefore, we ask whether an agent's embodiment influences people's disclosures to them, in terms of both objective and subjective measurements of disclosure quality.
Within the MRT framework [35], the complexity of a communication message is related to the task and the context of the interaction but not the content of the interaction.Carlson and Zmud [36] expanded on this and explained that the topic of the interaction also has a substantial impact on how one experience the interaction, and accordingly, respond and communicate.Therefore, we are also asking how disclosures differ in relation to the agents' embodiment in comparison with the disclosure topic.

Current study
In our study, we are primarily interested in the extent to which disclosures to social robots differ from disclosures to humans and disembodied conversational agents.Further-more, we investigate how disclosures differ in relation to the agent's embodiment in comparison with the disclosure topic.We wish to explore and describe differences in subjective and objective disclosures to social robots and how people's perceptions and their actual disclosures are related across three experiments.Disclosure is important in order for a person to benefit fully from an automated assistant, which should be able to recognize commands and tasks, as well as respond appropriately to a human user's needs, emotions, and psychological state.
We conducted three laboratory experiments to address research questions centered on this topic.Experiment 1 was designed to provide an initial indication and baseline result regarding subjective and objective disclosures to social robots (see ref. [70]).Experiment 2 replicated the design of Experiment 1 with a sample of native English speakers only.In Experiment 3, we replicated the experimental design again, this time with a larger sample size for greater statistical power, which enabled us to further probe the reliability and generalizability of the findings from the first two experiments.

Method
Consistent with recent proposals [71,72], we report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study.In addition, following open science initiatives (e.g.ref. [73]), the de-identified data sets, stimuli, and analysis code associated with this study are freely available online (https:// osf.io/f3d5b/).By making the data available, we enable and encourage others to pursue tests of alternative hypotheses, as well as more exploratory analyses.
In order to address our primary research questions, three laboratory experiments were conducted.Preliminary results of the first experiment were reported as late breaking reports in the Human-Robot Interaction conference (HRI) 2020 (see ref. [70]).

Experiment 1
The first experiment consisted of 26 university students between the ages of 17 and 42 years ( = = ) M 24.42, SD 6.40 , including 61.5% females.Participants reported being from different national backgrounds, with 50% of participants reporting English as their native language.For most participants (88.50%), this was their first interaction with a robot.All participants were recruited using the University of Glasgow's participant pool.Participants provided written informed consent before taking part in any study procedures and were compensated for their time with either course credits or cash (£3 for 30 min of participation).All study procedures were approved by the research ethics committee of the University of Glasgow.

Experiment 2
Following the first experiment, the target population of the second experiment was limited to native English speakers.This was highlighted in the advert that was shared over email to potential participants, on the advert in the University of Glasgow's subject pool, and only potential participants who were defined as native English speakers in the subject-pool system could sign up to participate in the study.Participants from the previous experiment (Experiment 1) were excluded from participating in Experiment 2.
The participant sample for Experiment 2 consisted of 27 participants between the ages of 20 and 62 years ( = = ) M 28.60, SD 9.61 , including 59.30% females.All of the participants reported English as their native language, whereas 85.20% of the participants reported being from the United Kingdom, 11.10% reported being from other English-speaking countries, and 3.70% (one participant) reported being from Chile.For most of the participants (81.50%), this was their first interaction with a robot.The participants were recruited using The University of Glasgow's subject pool or by being directly contacted by the researchers.All of the participants provided written informed consent before taking part in any study procedures, and the participants were compensated for their time with either credits or cash (£3 for 30 minutes of participation).All study procedures were approved by the research ethics committee of the University of Glasgow.

Experiment 3
Following the first and second experiments, the target population of the third experiment was limited to native English speakers.This was highlighted in the adverts shared over email to potential participants and on the University of Glasgow's subject pool, and only native English speakers in the subject-pool system could sign up to participate in the study.Participants from Experi-ments 1 and 2 were excluded from participating in Experiment 3.
The study consisted of 65 participants, of which 4 were excluded due to technical failures.The 61 participants were between the ages of 18 and 43 years( = = ) M 23.02, SD 4.88 , including 67.20% females.All of the participants reported English as their native language, whereas 63.70% of the participants reported being from the United Kingdom, 16.30% reported being from other English speaking countries, and 19.5% reported being from non-English speaking countries.For most of the participants (72.10%), this was their first interaction with a robot.The participants were recruited using The University of Glasgow's subject pool or by being directly contacted by the researchers.All of the participants provided written informed consent before taking part in any study procedures and participants were compensated for their time with either credits or with cash (£3 for 30 minutes of participation).All study procedures were approved by a research ethics committee of the University of Glasgow.

Design
Three laboratory experiments consisted of within-subject experimental designs with three treatments, applying a round robin test.In a randomized order, all participants interacted with three agents: (1) a humanoid social robot, (2) a human agent, and (3) a disembodied agent (voice assistant).
In Experiment 1, participants were asked one question from each agent about one of three topics that were relevant to a student's experience (Section 2.6.1).Based on our experience and observations from running Experiment 1, as well as qualitative feedback received from participants, we decided to update and improve some aspects of our experimental approach when running Experiment 2. As our participant sample was not limited to students in Experiment 2 (see Section 2.6.2) and Experiment 3 (see Section 2.6.3),participants were asked two questions by each agent about one of three more general topics.In Experiment 2, the topics surveyed the same ideas as Experiment 1, but were not constrained to the context of student experience.Based on our observations and participants' feedback from Experiment 2, we designed Experiment 3 to collect data from a larger sample size.To optimize disclosure and ensure the data collected could extend the results of the previous experiments, we streamlined our questions so that multiple questions that were similar were combined into a single question.
The rationale behind the slight variations present across the three experiments that compose the present study was to (a) improve the experimental design (i.e.ask more questions) following each experiment; (b) adapt questions based on participants feedback, our observations, and the participant sample being recruited; and (c) provide evidence that even though the exact content of the questions changed across experiments, the effect of embodiment on key factors of self-disclosure endured compared to the effects of the topics of the disclosure.

Stimuli
Three agents communicated the same questions using different visual and verbal cues that corresponded appropriately to their form and capabilities.The same experimenter (G.L.) operated the Wizard of Oz (WoZ) of both devices (the humanoid social robot and the disembodied agent) via dedicated software and also served as the human agent for all three experiments.The questions were pre-scripted and integrated into the WoZ systems to minimize any possibility of mistakes and errors.Each agent asked question equally across all three experiments, as per random assignment.

Humanoid social robot
This treatment condition used the robot NAO (Softbank Robotics), a human-like social robot that can communicate with humans via speech and can also be programmed to display appropriate gaze and body gesture cues to increase its appearance of "socialness" (see Figure 1).NAO communicated with participants in this study via the WoZ technique controlled by the experimenter via a PC laptop.All of the pre-scripted questions and speech items were written and coded in the WoZ system, with the experimenter (G.L.) controlling NAO by pressing buttons on a PC laptop.Accordingly, the procedure followed clear pre-programmed protocol where the experimenter did not need to speak or type anything during the interaction but only press a button to start the interaction.
In the first and second experiments, when participants were answering NAO questions, NAO directed its gaze towards the participant and engaged in simulated breathing to contribute to its human-like embodiment.When speaking, NAO communicated using expressive and animated body language that corresponded to the spoken content and NAO's physical capabilities.NAO's movements were self-initiated based on NAO's demo software.
In the third experiment, NAO was further programmed to nod its head every few seconds when "listening" to the human participant speak.This change was implemented to reduce the variance in embodiment/listener cues between the humanoid social robot and the human agent.
NAO's joints are often noisy, and since this sort of noise is not ambient, it can be captured as an acoustic sound.Therefore, when participants were talking, NAOs' animated movements were limited to simulated breathing and gentle head nods to reduce the chance of noise coming from NAOs' joints.

Human agent
This treatment consists of the experimenter (G.L.) as an elicitor, taking an active part in the experimental manipulation.This treatment was naturally manipulated by the agents human looks, voice, and gestures (e.g.nodding; see Figure 2).The human gestures were not planned or controlled and followed his natural embodiment and behaviour to ensure that his body language will stay natural and will correspond to the embodiment of human communication patterns.However, the experimenter did not speak when participants were answering questions, to more closely reflect the conversation scenarios with the other two agents.This treatment was identical in all of three experiments and the questions asked by the human agent followed the same script when communicating the questions.In order to draw causal inferences and to be able to claim that there were no anecdotal deviations in the agents' behaviour or communication that might affect the results, the human agent had to be a bit more "robotic" and follow a script, like an actor.At the same time, the same script that the human agent used was also used by the humanoid social robot and the disembodied agent, thus minimising any confounding gross communication differences between the agents.

Disembodied agent
This treatment condition featured a "Google Nest Mini" voice assistant.A voice assistant is a particular software in a speaker (in the context of this study, a "Google Nest Mini" device).It has a minimal physical presence and is disembodied, in that it is not designed for visual interaction (i.e. it does not demonstrate any sort of visual cues), and its verbal cues are limited to clear manifest cues ("I understand" and "Okay, I see"), rather than natural implicit cues (e.g."ahh" and "amm;" see Figure 3).The voice assistant was also controlled by the experimenter (G.L.) via the WoZ technique.All questions and speech items were written and coded to a "ColorNote" application on an Android tablet.Using Bluetooth technology and Android's accessibility "select to speak" feature, the experimenter controlled the disembodied voice assistant by streaming questions and speech items to participants.Accordingly, the procedure followed clear pre-programmed protocol where the experimenter did not need to speak or type anything during the interaction but only press a button to start the interaction.The device was used as a Bluetooth speaker, and the Wi-Fi function of the device and the microphone were disabled to maintain  participants' privacy.Participants were explicitly told that the disembodied agent's software was developed by the lab and has no connection to Google, and the device's Wi-Fi function is not working.

Subjective self-disclosure
Participants were requested to report their level of subjective self-disclosure via the sub-scale of work and studies disclosure in Jourard's Self-Disclosure Questionnaire [38].This questionnaire was adapted and adjusted for the context of the study, addressing the statements to student experience in the first experiment, and general life experiences in the second and third experiments.The measurement included ten self-reported items for which participants reported the extent to which they disclosed information to one of the agents on a scale of 1 (not at all) to 7 (to a great extent).The scale was found to be reliable in Experiments 1, 2, and 3 when applied to all of the agents.In the second experiment, the reliability score of the scale when applied to the human agent was only moderate (Table 1).

Disclosure content
The recordings were automatically processed using a speech recognition package for Python [74].The text was manually checked and fixed by the researchers to ensure it corresponded accurately to the recordings.The following measurements were extracted from the recordings' content: -Length of the disclosure: The volume of disclosure in terms of the number of words per disclosure.The number of words per disclosure was extracted from the text using a simple length command on Python.-Compound sentiment: Using Vader for Python [75], the disclosures were measured to determine their overall sentiment in terms of positive, neutral, and negative sentiment.The compound sentiment evaluates a disclosure sentiment from negative (−1) to positive (+1), based on the calculated sentiment score [75].
-Sentimentality: The ratio of overall demonstrated sentiment, positive and negative, in each disclosure.This was calculated based on the combined scores of Vaders' [75] positive and negative sentiments.

Voice acoustics features
Basic prosody features are conveyed with changes in pitch, voice intensity, harmonicity, duration, speech rate, and pauses [44,76,77].For the scope of this study, we decided to focus on the following fundamental features for demonstrating basic differences in voice production and changes mean values of fundamental voice signals within a disclosure.The features were extracted and processed using Parselmouth [78], a Python library for Praat [79].The extracted features were: -Mean pitchin hertz (Hz).
-Mean harmonicitythe degree of acoustic periodicity in decibels (dB).-Mean intensitythe loudness of the sound wave in dB.
-Energyair pressure in voice, measured as the square of the amplitude multiplied by the duration of the sound.-Duration of speech in seconds.-Agency and experience: Research into mind perception entails that agency (the ability of the agent to plan and act) and experience (the ability of the agent to sense and feel) are the two key dimensions when valuing an agent's mind [80].To determine whether a difference in mind perception emerged between the agents, after each interaction participants were requested to evaluate the agent in terms of experience and agency, after being introduced to these terms (adapted from ref. [80]).Both concepts were evaluated by the participants using a 0 to 100 rating bar.
-Perceived stress scale: This scale was added to the second and third experiments.Participants were requested to report their periodic stress on ten statement items of the perceived stress scale [81], evaluating these on a scale of 1 (never) to 5 (very often).The scale was found to be reliable in the second ( = α 0.93, = M 2.76, = SD 0.92), and third ( = α 0.88, = M 3, = SD 0.73) experiments.
-Extraversion: This measurement was added to the third and final experiment.Participants were asked to rank their extraversion on a scale of 1 (not at all) to 9 (very applicable) on the 8 extraversion items of the Mini-Markers Big Five personality scale [82].The scale was found reliable ( = α 0.86, = M 5.58, = SD 1.43).
-Demographics: Participants across all three experiments were requested to complete a short questionnaire that gathered information on demographic parameters including age, biological sex, gender identification, level of education, nationality, job, previous experience with robots, and whether English is their native language.

Instruments and data preparation
The audio data were recorded using UMT800 by Microtech Gefell, a microphone known for its high sensitivity and outstanding signal-to-noise ratio.We used this device in an acoustic recording laboratory to ensure high quality of audio recordings and minimize any potential effect of noise.We reduced the microphone sensitivity by 10 dB to ensure that loud noises coming from the floor would not be amplified, ensuring that we were able to capture each participant's voice over any other sources of noise.We ensured that the agents were far enough from the microphone so that any other potential source of noise coming from the agents (e.g. the sound of the robot's motors) did not suppress or otherwise interfere with each participant's voice.When processing the recordings, we reduced noise by using spectral subtraction noise reduction method [83] for reducing the spectral effects of acoustically added noise in speech.A sample of recordings was manually checked to make sure that there is no apparent noise when participants speak and during silent breaks.

Procedure
All three experiments took place in a sound-isolated recording laboratory at the Institute of Neuroscience and Psychology at the University of Glasgow (see Figure 4).The recording room was completely soundproof to ensure the highest possible sound quality for the recordings to facilitate offline analyses.The participants booked their desired time slot for participation using the University of Glasgow subject-pool website and were picked up by one of the experimenters from the building's waiting room.The experiment took approximately 30 min per participant.In the first and third experiments, a single experimenter (G.L.) operated all the experimental procedures; and in the second experiment, two experimenters (G.L. and J.N.G.) operated the experimental procedure.The experimenter(s) sat near a desk outside of the recording room, where the participant could not see them.However, the recording room had a window that provided both parties the option to communicate with each other if needed.The experiment was administered using a "formR" application [84,85] that randomized the treatments automatically.
All participants across all three experiments received the same introduction and were told that the humanoid social robot and the disembodied agent were functioning autonomously, and that while we were indeed recording the interaction, and planned to use the data for the analysis, it would be fully anonymized and the experimenter(s) would not actively listen to their disclosures when talking to the robot and the disembodied agent.The participants were further told that the experimenter(s) will only actively listen during their disclosures with the robot and the disembodied agent in case during the interaction, there will be no indication of sound from the recording booth (and then the experimenter(s) would need to check in on them and the agent to see whether there was a technical failure or if the participant stopped talking for a specific reason), or in case the participant actively tries to reach the experimenters attention through the window.The experimenters were following the interaction using sound indication from the recording booth.Participants were explicitly told that the disembodied agent's software was developed by the lab and has no connection to Google, and the device's Wi-Fi function is not working.
After each interaction participants were requested to evaluate the agent in terms of agency and experience.In the first and second experiments, after all interactions, participants evaluated their perceived self-disclosure [38] to each of the agents.In the third experiment, after each interaction, participants evaluated their perceived selfdisclosure to each of the agents via the same instrument [38].Finally, after all interactions, participants were requested to complete a short questionnaire, reporting demographic parameters, and their previous experience with robots (see Section 2.4.4).In the second and third experiments, participants then answered the perceived stress scale [81]; and in the third experiment, participants also answered the extraversion items of the Mini-Markers Big Five personality scale [82].
Upon completing the experimental procedures, participants were debriefed about the aims of the study and told that the robot and the disembodied agent were preprogrammed.Then participants received payment of £3 (equivalent to £6/hour of participation) or the equivalent number of participation credits.All interactions between participants and agents (humanoid social robot/human/disembodied agent) were audiorecorded for analysis purposes, extracting content and acoustic features from the audio files.

Experiment 1
All participants interacted with each of three agents, and the order of interaction was randomly assigned across participants.They were asked one question from each agent about each of three topics: (1) academic assessment, (2) student finances, and (3) university-life balance.The questions were randomly ordered and allocated to the agents.All questions were the same across the agent treatments.

Experiment 2
As with the first experiment, all participants interacted with all agents in a randomized order.Participants were asked two questions by each agent: (1) work situation, (2) financial habits, (3) social life, (4) family matters, (5) romantic relationships, and (6) hobbies and spare time.The questions were randomly ordered and allocated to the agents.The questions were grouped into three topics: (1) work and finances (questions 1 and 2), (2) social life and leisure time (questions 3 and 6), and (3) intimate and family relationships (questions 4 and 5).All questions were the same across the agent treatments.

Experiment 3
As with the first and second experiments, participants in Experiment 3 interacted with all three agents, in a randomized order.Participants were asked two questions by each agent about each of three topics: (1) work and life balance (one question about ones work situation, and one question about their spare time and hobbies), (2) relationships and social life (one question about ones closest relationships, and one question about socializing habits), and (3) physical and mental health (one question about habits of sustaining physical health, and one question about habits of sustaining/treating mental health).The topics were randomly allocated to the agents, and the questions within each topic were randomly ordered.All questions were the same across the agent treatments.

Differences in agency and experience
Doubly multivariate analysis of variance was conducted for each of the experiments to determine whether a difference in agency and experience emerged within the different agents (humanoid social robot vs human vs disembodied agent).

Experiment 1
The model was found to be statistically significant, Wilk's = Λ 0.15, < p 0.001, suggesting that a difference emerged in the combined value of agency and experience across three agents.The agents' treatments elicited statistically significant large differences in people's perceptions of the agents sense of agency, ( ) F 2, 50 = 16.32,< p 0.001, = ω 0.28 .Post hoc analyses using Bonferroni correction revealed that people perceived a human to have higher agency and experience than a humanoid social robot and a disembodied agent (see Figures 5 and 6).The difference in people's perceptions of agency between a humanoid social robot and a disembodied agent was not statistically significant (see Figure 5).People perceived a humanoid social robots to demonstrate higher levels of experience compared to a disembodied agent (see Figure 6).

Experiment 2
The model was found to be statistically significant, Wilk's = Λ 0.14, < p 0.001, suggesting that a difference emerged in the combined value of agency and experience across three agents.The agents' treatments elicited statistically significant large differences in people's perceptions of the agents sense of agency, ( ) .Post hoc analyses using Bonferroni correction revealed that people perceived a human to have higher agency and experience than a  humanoid social robot and a disembodied agent (see Figures 5 and 6).Moreover, people perceived a humanoid social robot to have higher agency than a disembodied agent (see Figure 5).Finally, people perceived a humanoid social robots to demonstrate higher levels of experience compared to a disembodied agent (see Figure 6).

Experiment 3
The model was found to be statistically significant, Wilk's = Λ 0.11, < p 0.001, suggesting that a difference emerged in the combined value of agency and experience across three agents.The agents' treatments elicited statistically significant large differences in people's perceptions of the agents sense of agency, ( ) .Post hoc analyses using Bonferroni correction revealed that people perceived a human to have higher agency and experience than a humanoid social robot and a disembodied agent (see Figures 5 and 6).Moreover, people perceived a humanoid social robot to have higher agency than a disembodied agent (see Figure 5).Finally, people perceived a humanoid social robots to demonstrate higher levels of experience compared to a disembodied agent (see Figure 6).

The effect of agents on disclosure
Doubly multivariate analysis of variance was conducted for each of the experiments to determine whether a dif-ference in disclosure emerged within the different agents (humanoid social robot vs human vs disembodied agent), measured in terms of subjective self-disclosure and objective disclosure (length of the disclosure, compound sentiment, sentimentality, pitch, harmonicity, intensity, energy, and duration of the disclosures).

Experiment 1
The model was found to be statistically significant, Wilk's = Λ 0.06, < p 0.001, suggesting that a difference emerged in the combined disclosure (in terms of subjective and objective disclosure) across three agents.
The agents' treatments elicited statistically significant medium to large differences in subjective self-disclosure.Univariate tests revealed statistically non-  Tell me more!Assessing interactions with social robots from speech  145 significant differences within the agents in terms of the length of the disclosure, compound sentiment, sentimentality, pitch, harmonicity, intensity, energy, and duration (see Table 2).Post hoc analyses using Bonferroni correction revealed that people perceived that they disclosed more information to a human than to a humanoid social robot and to a disembodied agent.Nevertheless, there were no significant differences in the way people perceived their disclosures to a humanoid social robot compared to a disembodied agent (see Figure 7).Moreover, the pitch of people's voices was higher when talking to a humanoid social robot compared to when talking to a disembodied agent but not compared to when talking to a human (see Table 3).

Experiment 2
The model was found to be statistically significant, Wilk's = Λ 0.08, = p 0.005, suggesting that a difference emerged in the combined disclosure (in terms of subjective and objective disclosure) across three agents.The order of the questions was found to not have a significant effect in terms of the combined disclosure, Wilk's = Λ 0.72, = p 0.506, and neither did the interaction of the agents' treatments with the order of the questions, Wilk's = Λ 0.29, = p 0.185.The agents' treatments elicited statistically significant large differences in subjective self-disclosure, length, duration, pitch, and harmonicity of the disclosure.Moreover, the agents' treatments elicited statistically significant medium to large differences in the disclosures' compound sentiment.Univariate tests revealed that the differences within the agents in terms of the sentimentality, intensity, and energy of the disclosures were not statistically significant (Table 2).
Post hoc analyses using Bonferroni correction revealed that people perceived that they disclosed less information to a disembodied agent than to a human or a humanoid social robot.Nevertheless, there were no significant differences in the way people perceive their disclosures to a humanoid social robot compared to a human (see Figure 7).Furthermore, people's disclosures were longer in the number of words shared and duration when disclosing to a human than to a humanoid social robot or a disembodied agent.There were no statistically significant differences in disclosures' length or duration between disclosures to a humanoid social robot and to a disembodied agent (see Figures 8 and 9).
The pitch of people's voices was higher when talking to a humanoid social robot compared to when talking to a human or to a disembodied agent.No statistically significant differences in voice pitch emerged when talking to a human compared to a disembodied agent.People's voices were also less harmonious when talking  Tell me more!Assessing interactions with social robots from speech  147 to a human compared to a humanoid social robot or a disembodied agent; however, the difference in harmonicity between people's voices when talking to humanoid social robot and to a disembodied agent did not reach statistical significance (see Table 3).

Experiment 3
The model was found to be statistically significant, Wilk's = Λ 0.14, < p 0.001, suggesting that a difference emerged in the combined disclosure (in terms of subjective and objective disclosure) across three agents.The order of the questions was found to not have a significant effect in terms of the combined disclosure, Wilk's = Λ 0.76, = p 0.056, and so is the interaction of the agents' treatments with the order of the questions, Wilk's = Λ 0.68, = p 0.322.The agents' treatments elicited statistically significant large differences in the disclosures' pitch and harmonicity.Moreover, the agents' treatments elicited statistically significant medium to large differences in subjective self-disclosure and in the length of the disclosures, in addition to a small to medium difference in the duration of the disclosures.Univariate tests reveal that the differences within the agents in terms of the compound sentiment, sentimentality, intensity, and energy of the disclosures were not statistically significant (see Table 2).
Post hoc analyses using Bonferroni correction reveal that people perceived that they disclosed more information to a human than to a humanoid social robot or a disembodied agent.Nevertheless, there are no significant differences in the way people perceived their disclosures to a humanoid social robot compared to a disembodied agent (see Figure 7).Furthermore, people's disclosures were longer in the number of words shared when disclosing to a human than to a humanoid social robot or a disembodied agent.No statistically significant differences emerged among disclosure length between humanoid social robots and disembodied agents (see Figure 8).In terms of the disclosures' duration, people talk longer to a human than to a disembodied agent, whereas there are no statistically significant differences in disclosures' duration within disclosures to a humanoid social robot and a human and also to a disembodied agent (see Figure 9).
The pitch of people's voices was higher when talking to a humanoid social robot compared to when talking to a human or to a disembodied agent.There are no statistically significant differences in peoples' pitch when talking to a human compared to a disembodied agent.People's voice is more harmonious when talking to a disembodied agent compared to a humanoid social robot or a human, and also it is more harmonious when talking to a humanoid social robot than to a human.In terms of the disclosures' voice intensity, people talk louder to a humanoid social robot than to a disembodied agent, whereas there are no statistically significant differences in the disclosures' voice intensity within disclosures to a humanoid social robot and a human and also within a human to a disembodied agent (see Table 3).

The effect of topics of disclosure on disclosure
Doubly multivariate analysis of variance was conducted for each of the experiments to determine whether a difference in disclosure emerged within the different topics of disclosure (see Procedure, Section 2.6), measured in terms of subjective self-disclosure and objective disclosure (length of the disclosure, compound sentiment, sentimentality, pitch, harmonicity, intensity, energy, and duration of the disclosures).

Experiment 1
The model was found to not be statistically significant, Wilk's = Λ 0.20, p = 0.200, suggesting that a difference did not emerge in the combined disclosure (in terms of subjective and objective disclosure) across three topics (see Section 2.6.1).
The topics of disclosure elicited a statistically significant medium to large difference in the disclosures' compound sentiment.Univariate tests revealed that the differences within the topics in terms of subjective self-disclosure, the length, duration, sentimentality, pitch, harmonicity, intensity, and energy of the disclosures were not statistically significant (see Table 4).
Post hoc analyses using Bonferroni correction revealed that people's disclosures were more negative when talking about student finances compared to academic assessment and university-life balance.Nevertheless, there was no significant difference in the compound sentiment within disclosures about academic assessment and disclosures about university-life balance (Table 5).

Experiment 2
The model was found to not be statistically significant, Wilk's = Λ 0.21, = p 0.172, suggesting that a difference did not emerge in the combined disclosure (in terms of subjective and objective disclosure) across three topics (see Section 2.6.2).The order of the questions was found to not have a significant effect in terms of the combined disclosure, Wilk's = Λ 0.52, = p 0.133, and so was the inter- action of the agents' treatments with the order of the questions, Wilk's = Λ 0.18, = p 0.104.The topics of disclosure elicited a statistically significant large difference in the disclosures' sentimentality and a statistically significant medium to large difference in the disclosures' compound sentiment.Univariate tests revealed that the differences within the topics in terms of subjective self-disclosure, the length, duration, pitch, harmonicity, intensity, and energy of the disclosures were not statistically significant (see Table 4).
Post hoc analyses using Bonferroni correction revealed that people's disclosures were more sentimental when talking about their intimate and family relationships compared to their social life and leisure time, and their work and financial situation.In addition, people's disclosures were more positive when talking about their social life and leisure time compared to their work and financial situation.Nevertheless, there is no significant difference in the compound sentiment within disclosures about work and financial situation and disclosures about intimate and family relationships, and within disclosures about social life and leisure time and intimate and family relationships (see Table 5).

Experiment 3
The model was found to not be statistically significant, Wilk's = Λ 0.44, = p 0.051, suggesting that a difference did not emerge in the combined disclosure (in terms of subjective and objective disclosure) across three topics (see Section 2.6.3).The order of the questions was found to not have a significant effect in terms of the combined disclosure, Wilk's = Λ 0.76, = p 0.056, and so is the inter- action of the agents' treatments with the order of the questions, Wilk's = Λ 0.79, = p 0.711.The topics of disclosure elicited a statistically significant medium difference in the disclosures' compound sentiment, and statistically significant small to medium differences in the disclosures' length and duration.Univariate tests revealed that the differences within the topics in terms of subjective self-disclosure, the sentimentality, pitch, harmonicity, intensity, and energy of the disclosures were not statistically significant (see Table 4).
Post hoc analyses using Bonferroni correction revealed that people's disclosures were more positive when talking about their relationships and society compared to their work-life balance and their physical and mental health.Nevertheless, there is no significant difference in the compound sentiment within disclosures about physical and mental health and disclosures about work-life balance.In addition, people's disclosures about relationships and society are longer in length and duration than disclosures about physical and mental health (see Table 5).

Discussion
The study reported here assessed the extent to which disclosures to social robots differ from disclosures to humans and disembodied conversational agents.Across three laboratory experiments, we provide relatively consistent evidence highlighting that subjective perceptions of self-disclosures differ from objective evidence of disclosure across three agents.Moreover, the results underscore differences in the information elicited by the agents' embodiment compared to the information that is elicited by the conversational topics.

Overall disclosure differs by agent, not topic
The results indicate that, overall, disclosure is influenced more by an agent's embodiment than the disclosure topic.As can be seen in the results across three experiments, differences emerged in the combined disclosure measures across three agents, whereas no differences emerged in the combined disclosure across three topics.This reveals important insights into the role played by an agent's embodiment in disclosure settings.It demonstrates that an agent's embodiment has wider influence over what people disclose, and how they disclose it, and supports the assumption that different agents elicit different types of information.As can be seen in the following key results, agents' embodiment takes a broader role in disclosures through the way people perceive their disclosures, the amount of information that they disclose, and the way that they communicate it.The results demonstrate substantial differences in the disclosures' sentiment and sentimentality within the topics presented to participants.This is particularly interesting considering that while agents' embodiment influences the way people communicate information, the amount of information they share, and how they perceive their own disclosureshas little to no influence on the actual content that is shared.These findings expand on the functionalities of social robots as a social communication medium [86], and the attributes of embodiment that contribute to the "richness" of the medium [35].The results suggest that even though the exact content of the questions changed across experiments, the effect of embodiment on key factors of self-disclosure endured, while topics only impacted (in most cases) the sentiment of the disclosure (whether participants shared positive or negative information).Accordingly, we argue that, while the MRT [35] was originally proposed with respect to CMC, it should be studied further in HRI settings.Therefore, it can be concluded that, in line with MRT [35], embodiment is a key factor for evaluating responsiveness in HRI, as it extends the abilities of the communication medium [87].In contrast to channel activation theory [36], the topics of disclosure are situational, and while these impact the sentiment and sentimentally of the disclosed content, they are not as central in their influence on disclosure as embodiment, in terms of quantity, perceptions, and behaviour.Nevertheless, it is important to note that, by affecting the sentiment and sentimentality of disclosed content, the topic of the disclosure can frame an interaction in a positive or a negative way.Content and context also play a substantial role; like other elicitation procedures and techniques, the discussed content influences the essence of the information that is being shared.
It is also important to remember that interactions between people and artificial agents are influenced by multiple factors beyond an agent's embodiment and the content of conversation.A recent study demonstrated that when interacting with a crowd-operated social robot (where participants knew that real people were talking via the social robot), more participants reported privacy concerns in the experimental condition where only their voice was broadcast to the people operating the social robot, compared to the group of participants whose voice and video were seen by the people controlling the robot [88].These authors consequently suggest that the recording method (i.e.how much information a robot receives or records from a person) can also affect people' responses when communicating with robots.As social robots are gradually becoming integrated as telepresence devices [89] and taking an active role in health interventions (e.g.ref. [90]), it will be valuable for future research to consider additional aspects of interactions with social robots, including the form of broadcasting and privacy matters.

Artificial embodiment requires more than stimulus cues
The results clearly and unsurprisingly demonstrate that human embodiment elicits the richest disclosures in terms of quantity of information shared.When participants spoke to another person, their disclosures were longer, in terms of number of words and overall duration, than disclosures to the humanoid robot and the disembodied agent.Moreover, people perceived that they shared more with the human conversational partner than with the robot or disembodied agent.While participants disclosed the most to the agent that looked most like themselves, and with which they were most familiar (i.e. the human experimenter), we did not find that stimulus cues for embodiment influenced differences in disclosure quantity and perception between the humanoid social robot and the disembodied agent.While the results clearly demonstrate that people grasp that a humanoid social robot is different from a disembodied agent, such differences in physical embodiment did not result in differences in amount of information disclosed to these two artificial agents, or participants' perceptions of the quantity or quality of disclosure.Questionnaire ratings revealed that people perceived the humanoid social robot to provide a richer experience and to have higher agency than a disembodied agent, and yet, such perceptions did not directly influence their disclosures.
It can be argued that stimulus cues to agents' embodiment are limited to differences in quantity and perceptions of disclosure to artificial agents.Considering the novelty of interactions with artificial agents for most people (and certainly participants in this study), most people tend to perceive them as some manner of "black box" [91] and experience some uncertainty about how to behave around them.People might require more substantial information than stimulus cues of (human-like) embodiment to treat an agent as more human like [92], whereas human embodiment naturally addresses some uncertainties of behaviour [93]; it may be the case that varying levels of artificial embodiment do not comply to these rules.Certain behaviours or actions might provide cues or information that extend from the agent's physical embodiment and can support reasoning, mentalizing, and reacting, accordingly [94][95][96].Such attributes can provide a sense of intentionality and meaning to agents' behaviour, and would be in line with how humans interact with each other [97].
This finding corresponds to previous reports showing that reactions to artificial agents, triggered by their humanlike embodiment, are not solely based on an agent's physical features (e.g.human-like body and face, and human-like gestures) [30] but are also shaped by human perceivers' prior knowledge [31,32,34].Accordingly, we suggest that differences in quantity and perceptions of disclosure within artificial agents are inextricably linked to participants' prior knowledge and expectations about the robot and disembodied agent used here.As such, stimulus cues that endow an artificial agent with human-like features (such as a body with two arms and two legs, and a head) are not enough to trigger people to disclose the same quantity and quality of information to an artificial agent as they do to another person.

Embodiment cues as gestures of reciprocity
These findings further highlight the role of embodiment as a cue for disclosure reciprocity [98].As people ascribe meaning to agents' actions [94,95,99], they require systematic cues to evaluate the agents' reactions as acts of reciprocity [100][101][102][103].To disclose information, people look for social cues in an agent's embodiment, such as behaviours or gestures, to assess the agents' behaviour and identify its origins [104][105][106].When these cues are limited in conveying an agent's involvement in an interaction [101][102][103], and fail to achieve sufficient equilibrium or reciprocity with the human speaker [102], the speaker is more likely to downregulate their own levels of disclosure [107], and withdraw from the interaction [102].
The results suggest that the limits to embodiment of the artificial agents we used here restricted them from providing sufficient cues of knowledge or understanding when a person is disclosing information (and that human listeners naturally perform to signal reciprocity).Accordingly, participants shared less information with artificial agents and were aware of this fact.
It should be considered that features of embodiment are dynamic, and while HRI research is often focused on dialogue and gestures, physical and tangible cues can also be effective in promoting self-disclosure.A variety of different physical cues to embodiment serve to signal reciprocation during a conversation, including touch and dynamic gaze, and previous work documents how these cues hold potential to elicit rich disclosures (e.g.[69,[108][109][110]). Hence, a valuable avenue for further investigation will be to examine how disclosure reciprocity can be achieved with variety of embodiment cues, and which of these cues is responsible for a meaningful elicitation.

Subjective perceptions align relatively well with objective data
Subjective perceptions of self-disclosure align relatively well with the objective data and correspond to observed evidence of the length and duration of the disclosure.Across all three experiments, participants perceived that they shared more with the human listener than with a disembodied agent or a humanoid social robot, and analyses of speech volume and content corroborated this perception.This finding is especially interesting considering how reliably the effects of disclosure length and duration replicated across three experiments, with similar differences in the number of words uttered and seconds spent talking to three agents.This contradicts Levi-Belz and Kreiner [53] findings, and provides evidence supporting the notion that people's perceptions of their own disclosures are formed by observing self-behaviour (in terms of disclosure volume) and reflecting upon it, rationally [111,112].
It is of note that in Experiment 2, participants retrospectively evaluated their perceptions of self-disclosures and perceived that they were sharing more with the humanoid robot than they actually were (compared to the other agents), in terms of the disclosure volume.Therefore, we can assume that when reflecting on interactions retrospectively, it is possible to lose some of our objectivity and perceive our disclosures in line with the way that we perceive or experience the agent, and not in a manner that corresponds to actual behaviour during an interaction.This finding provides preliminary evidence of a retroactive uncertainty reduction reaction [92] when interacting with social robots, and with artificial agents in general.To explain our own and others' behaviour after an interaction takes place, we analyse and self-explain the situation.Once required to recall our own actions retroactively, we often become more prone to cognitive biases and lose objectivity to cues that are easier to explain than self-behaviour [113].As time passes from the stimuli and more information has been processed, people experience inconsistency with their memory regarding their behaviour [114] and form a perception in line with their preconceptions [115].These, as the results suggest, could be the agents' visual features, or perceptions regarding the agents' experience and agency.

Differences in disclosures to artificial agents are manifested in the voice
People' disclosures differed in communication and speech patterns according to the agent they were talking to.This finding matters for a number of reasons.It provides evidence that disclosure extends beyond measurements of quantity and people's overall perceptions of disclosure quality.While differences in disclosure quantity and subjective perception were limited to differences between artificial agents, information gleaned from participants' voices sheds light on a more complicated mechanism.Moreover, while different topics of disclosure shaped the content participants spoke about, the agents' embodiment elicited different reactions that were manifest in participants' voices.
It is important to consider basic prosody features for evaluating patterns of communications, speech styles, and emotional expression from the voice [44,45].These include essential features such as rhythm, intonation, stress, and tempo of speech, which are conveyed with changes in pitch, voice intensity, harmonicity, speech rate, and pauses [44,76,77].While previous studies in HRI and social robotics have focused on evaluating interactions on specific distinct processed prosodic patterns [44] in specific contexts (e.g.ref. [116][117][118][119][120][121]), the current findings suggest a standardized method for drawing explicit causal inferences in voice signal differences across different agents could be useful.Here, changes in voice signal values reflected basic differences in voice production that were driven by three different agents studied here, and these can be further processed into prosodic patterns for evaluating specific speech styles.Moreover, the experimental design, the voice signal extraction instrument, and analytical model can be easily replicated and applied in different settings and across a variety of conditions.Thus, the results of the present study provide empirical insights to changes and variations of voice features according to agents' embodiment.Whereas processed prosodic features might provide explanations to certain distinct behaviours, raw voice signals can demonstrate variances on a macro level that can be applied to a variety of measures and be replicated efficiently across different settings, conditions, and populations.
These changes correspond to, and were likely triggered by, unique features of the agent's embodiment.For example, the results of the second and third experiments provide clear evidence for people's voice being higher when communicating with the humanoid robot.This could potentially be triggered by robot's child-like embodiment and high-pitched voice.Another interesting example from the second and third experiments illustrates that disclosures to a disembodied agent were more harmonious.This could be triggered by associating the agent with pragmatic functionalities to follow simple and wellknown commands, rather than as a sentient conversation partner.Hence, embodiment does not seem to follow a linear trajectory, but rather, we see evidence for clear categories and sets of features.Different features of embodiment call for different variations of voice signals, different reactions, and different behaviours.

Conclusions
Taken together, the results of this study highlight the complexity of extracting meaning from disclosures to social robots and artificial agents in general.Current behavioral (e.g.eye tracking and motion tracking), performance (e.g.reaction times and error rates), and physiological (e.g.heart rate, skin conductance, and respiratory rate) measures often used in HRI research are prone to variety of challenges for participants as well as experimenters, such as discomfort, disruptions, and low temporal resolution [97].Here we attempted to address how voice signals can be used as natural behavioural and performance measures in empirical research, and also as physiological measures [49][50][51][52].Furthermore, as was demonstrated in this study, lexical and content features can be extracted from audio data to provide meaningful insights regarding a disclosure's volume and essence [37,43,44].Self-reported measurements provide access to one's subjective perceptions to their disclosure to others, and hold value for expanding on the cognitive connection between perception and speech [122,123].Voice signals, content and lexical features, together with self-reported measurements, offer a comprehensive set of measures with which to evaluate disclosure to social robots for assessing interactions.Finally, following Kreiner and Levi-Belz [37] suggestions, by employing a multidimensional approach, this study stresses the complicated nature of self-disclosure, where a single measure cannot capture its complexity and nuance.
These results hold several implications for assessing interactions with socially assistive robots, and for HRI research in general.As researchers and engineers work to develop social robots, agents, and software that rely upon high-quality verbal input from human users, these developers would be well served to consider the stimulus cues to agent embodiment that will lead to optimal eliciting of information from human users.Since different cues to embodiment call for different patterns of communication, it is important to identify the communicative requirements for the task or agent at hand.Furthermore, the current results highlight the fact that assessing quality of interactions in disclosures, especially in assistive settings, is not purely a matter of the quantity of information.

Figure 1 :
Figure 1: Illustration of experimental set up for talking to a humanoid social robot.

Figure 2 :
Figure 2: Illustration of experimental set up for talking to the human agent.

Figure 3 :
Figure 3: Illustration of experimental set up for talking to the voice assistant (Google Nest Mini).

Figure 4 :
Figure 4: The experiment settings at the sound-isolated recording laboratory.

Figure 6 :
Figure 6: Mean score of experience perceptions reported for each agent across three experiments.The error bars represent 95% CI of the mean score of experience perceptions.

Figure 5 :
Figure 5: Mean score of agency perceptions reported for each agent across three experiments.The error bars represent 95% CI of the mean score of agency perceptions.

Figure 7 :
Figure 7: Mean score of subjective self-disclosure towards each agent across three experiments.The error bars represent 95% CI of the mean score of subjective self-disclosure across participants.

Figure 8 :
Figure 8: Length differences between different agent pairs, across three experiments.The y-axis groups disclosure lengths by experiment number, and the x-axis shows the mean difference between disclosure length between the two agents indicated in each subtitle.The error bars represent 95% CI of the mean score of length dif- ferences between the two agents.

Figure 9 :
Figure 9: Duration differences between different agent pairs, across three experiments.The y-axis groups disclosure duration by experiment number, and the x-axis shows the mean duration differences between the two agents indicated in each subtitle.The error bars represent 95% CI of the mean score of duration differences between the two agents.

Table 3 :
Estimated marginal means and multiple pairwise comparisons between the agents

Table 4 :
Univariate results with disclosure topics as repeated measures treatment

Table 5 :
Estimated marginal means and multiple pairwise comparisons between the topics of disclosure