What is in three words? Exploring a three-word methodology for assessing impressions of a social robot encounter online and in real life

Abstract We explore the impressions and conceptualisations produced by participants after their first encounter with the teleoperated robot, Telenoid R1. Participants were invited to freely report the first three words that came to mind after seeing the robot. Here we triangulate (i) three-word data from an online survey (n=340) where respondents saw a brief video of the Telenoid with (ii) three-word data from an interaction study where participants interacted with a physically present Telenoid (n=75) and, (iii) data from qualitative interviews (n=7) with participants who had engaged with the Telenoid. Data were subjected to sentiment analysis, linguistic analysis and regression analysis. Ranking of the most frequently produced words in the two groups revealed an overlap on the top-10 produced words (6 out of 10 words). Sentiment analysis and regression revealed an association between negative predicates and the online condition. Sentiments were not convincingly associated with age or gender. Linguistic categorisations of the data revealed that especially adjectives expressing response-dependent features were frequent. We did not find any consistent statistical effect on categorising the words into cognitive and emotional predicates. The proposed three-word method offers, unguided approach to explore initial conceptualisations of robots.


Introduction
Robots are increasingly being introduced into everyday life such as vacuum robots and lawn mowing robots. For the most part, these robots do not elicit social or emotional interaction with humans and are mechanistic in appearance. Social robots, on the other hand, differ immensely as they are commonly anthropomorphic -even humanoid or zoomorphic in appearance -and are developed to be social interaction partners that elicit social and emotional responses. These social robots are as of yet not widely disseminated into western society and hence the majority of the general population may not have had first-hand experience with social robots nor knowledge of the state-of-theart within the field. Rather, media and science fiction may be the primary source of information on what to expect from social robots both in terms of appearance and capacities as interaction partners [1].
Most participants in HRI research encounter a real social robot for the first time in an empirical setting, such as online or in the laboratory. Research indicates that participants, lacking personal experience with robots, draw on prior knowledge and preconceptions from media and science-fiction in first encounters [1]. However, there is limited research on first-encounter impressions with social robots. Studies indicate that humans form first impressions of other humans rapidly utilizing sensemaking strategies on whatever information is available and with consequences for subsequent behaviours [2][3][4]. For instance, when asked to estimate static neutral images of strangers on-screen, participants formed impressions of their personalities in as little as 39 milliseconds [5]. Image-based first impressions have diverse real-life consequences such as the number of votes a political candidate receives [6] or the risk of being selected as a suspect for a police lineup [7].
Extrapolating from the research on first impressions of humans,' similar processes may be present when robots are encountered for the first time. Several studies explore the impact of (especially anthropomorphic) robots or virtual agents on humans ratings of likeability, uncanniness, human likeness and similar outcomes utilizing questionnaires of varying scopes [8][9][10][11][12] and in some instances behavioral coding [13]. Taken together, the results of such studies are largely inconsistent, which may reflect the multiplicities and complexities of the variables they explore [14][15][16] which together makes it difficult to reduce it to a single variable. Furthermore, to the best of our knowledge, no previous studies ask the participants to simply state how they would describe the robot based on very limited information without guiding their attention onto anthropomorphism, uncanniness, likeability, similarities to science fiction and similar concepts. Given that robots are unfamiliar to the majority of the population such immediate 'unfiltered' impressions may contain valuable information on how robots are perceived and conceptualized. This, in turn, may yield important practical implications for "impression management for robots": what common (mis)conceptions do people express about a given robot and could that be amended? Since the matching of functionality and expectations based on appearance seem particularly valuable in human-robot interactions, "impression management" may prove important [13]. However, it will also contribute to the body of literature exploring the theoretical understanding of human-robot interactions. Some authors have argued that we need a new ontological category for social robots [17,18] while others suggest that human-robot interactions challenge our common notion of sociality and need to be reconceptualized in a new framework of "asymmetric" social interactions [19,20]. Moreover, since verbal responses are used in market research for innovative products [21] in ways that can be linked to psychological research on types of novelty [22], it may be possible to investigate to what extent currently discussed reactions to robots can be attributed to the "novelty effect": "i.e., the cognitive and behavioural effects of encountering an experientially novel type of entity." [23].
Real life encounters with a new interaction partner provide more dynamic information onto which impressions can be formed compared to solely delineating impressions from a video, such as social presence, nonverbal body language and similar cues. Hence, being physically present with an embodied social robot and engaged in interaction may give rise to qualitatively different impressions. However, at least for one study utilizing the wizard of oz approach, it appeared that mannerism and presumably voice of the operator are less important. Here, Nishio and colleagues reported that their participants, who had engaged in interaction with various types of teleoperated robots, held differing impressions of their different robotic interaction partners though these were operated by a single person [24]. Kiesler and colleagues [25] reported no or only small differences on anthropomorphic attributions, ascription of personality, lifelikeness and similar characteristics between interacting with a robot that was physically present or interacting with a robot that was projected life-size onto a screen. However, as pointed out by the authors, this lack of convincing difference may have been due to the excellent resolution of the projection. Similarly, Yamato and colleagues [26] found that there was a larger likelihood that people would take the advice of a virtual agent than a co-present robot. This has instigated researchers to question the effect of presence in at of itself on the experience of robots. Rather, it may be the social interaction that induces changes in impressions. For instance, in a study exploring two aspects of impressions: "warmth" and "competence", Bergmann and colleagues [9] reported that ratings of the perceived warmth of a virtual interaction agent shaped in the form of a robot was influenced by the agents appearance and decreased over time (i.e. from pre to post-interaction) whilst changes in the ratings of competence was influenced by the use of nonverbal behavior of the agent during the interaction. In line with this, others have reported effects of robot initiated greeting behaviours on people's impressions of likeability, pleasantness and similar variables [27]. Others report that following even a brief interaction with a humanoid robot people change their impression of social presence and report reduced levels of uncertainty towards the robot [28]. The initial impressions towards a non-social mechanistic robot also changed amongst elderly residents after merely seeing a demonstration of the robot. However, their impressions were mainly described in terms of practical capabilities where the focus may have been different if it had been a social robot [29]. There is some indication that interaction may not be necessary but rather that exposure per se can affect impressions of robots. It may be possible that affective habituation over even a brief period of time reduces strongly felt positive or negative emotion towards robots [28]. This could in part explain reported reductions in feelings of uncanniness with prolonged exposure time. Overall, taken together, the majority of research on impression of robots is tapping into the effects of various engineering factors such as motion [30,31], other non-verbal behaviour [32], and design [33][34][35], which naturally limits the profundity of variables assessed.
Qualitative interviews offer one method of assessing people's immediate impressions of social robots without guiding their attention to any specific content, but as conducting large, qualitative studies and transcribing large amounts of data is time-consuming, interviews can be illfit for the fast-paced research turnover in HRI-research.

The present study
The present study should be construed as an exploratory proof of concept study. In this paper we report on a method for assessing impressions of a specific robot in an unguided, semi-qualitative way by simply asking people to report the first three words that come to mind when they are confronted with the robot for the first time. We expect that the open-ended nature of the method will encourage respondents to offer their immediate impression, whilst at the same time overcoming some of the limitations of conventional qualitative interviews (namely the wealth of data, transcription needs and time-consuming analysis). In the present study we triangulate (i) data from an online survey with a brief video-based stimuli-material of a social robot with (ii) data from an HRI study where participants interacted with a physically present social robot and, finally, (iii) data from qualitative interviews with the same group of participants who actively engaged with the robot. Utilizing the three data sources this paper offers a reflection on possible interrelations and differences between impressions of a robot tapped with the 3-word methodology in distinctive empirical settings. The sentiments and linguistic categorization utilized by the participants in their 3-word responses are investigated. Specifically we set to explore: i) if participants would form impressions that were uniform enough across respondents to enable us to observe trends; ii) if participants spontaneously would form impressions that were emotionally laden (i.e. could be construed as either positive or negative); iii) if there was a difference in the impressions formed between only seeing a robot online engaged in an interaction Vs personally interacting with a robot; iv) if there were uniform characteristics in the linguistic properties of the words produced.

Methods
The present study utilizes data from two different, independent studies: the online study which was conducted with the aim of validating two new questionnaires [36] and the interaction study investigating ethical decision-making with robots. The two main studies are described in turn below.

Online study: participants and procedure
Three hundred and forty participants were recruited. The sample was recruited to validate a new questionnaire for the assessment of attitudes towards social robots [36]. The majority of respondents (n=267) were recruited and compensated for their time via a survey panel system supplied by Qualtrics® (2015). The remaining participants were recruited in the student population but also at a local museum (for demographic information see Table 1). Participants at the museum could enroll into the survey by filling in an interest form on a tablet located at the end of the exhibition. Then they would automatically receive a link to the survey by email. Participants were compensated with a gift voucher (value of 50 DKK), electronically delivered to their email address. Hence, there was no direct interaction between researchers and participants in this study. Time and effort filters were applied during the data-collection automatically removing respondents who were unrealistically fast or responded systematically similarly to all items (e.g. always selecting the same option on the individual items). The total duration of filling in the survey was approximately 25 minutes.

Interaction study: participants and procedure
Seventy-five participants were recruited on the campus of a large Danish university. However, six participants were excluded due to incomplete data on the primary measure, leaving the final sample size at n=69. For demographic information see Table 1. The participants were compensated with a gift voucher with the value of a 100-150 DKK.
The study was designed to explore gender perception and decision making on different ethical dilemmas in three different conditions (computer-based, humanhuman and human-robot). In the robot condition, it was the robot who presented the dilemmas and hence only data from that condition will be included in the present paper. Before the study commenced written consent was obtained and baseline data was collected online on tablets. Afterwards, the participants were guided to an adjacent room where the robot was already placed. The experiment lasted approximately one hour, with participants interacting with the robot for 40-50 minutes. The script consisted of the telenoid outlining a dilemma (for instance on how to proceed if a shop assistant, by accident, have given you more goods than you have paid for) and then asking follow-up questions to induce reflection in the participant on his/hers decision (i.e. "can there be situations where stealing is OK"). The interaction was fully scripted and performed utilizing a Wizard of Oz paradigm. Following the interaction with the robot, the participants were escorted to a different room and were asked to answer questionnaires online.
Thereafter the participants were debriefed and a random subgroup was invited to partake in a qualitative interview.

Qualitative interviews
10 randomly selected participants who met the Telenoid in the interaction study were interviewed immediately after their participation in the study. The interviews lasted between 20-30 minutes. The interviews followed a semistructured interview guide focusing on the participant's experience of meeting the robot and reflections on the content of the study. In the interview, participants were asked to describe their immediate impressions of the robot, how they would describe the robot and if their impression of the robot changed during the session. For this paper, we have selected 7 interviews for a case-study on the link between the 3 words the participants have used to describe the robot and their explanation of their impression of the robot in the interview. 3 interviews were eliminated from the analysis as these participants had not filled out the 3 word suggestions.
All interviews were recorded and transcribed. For the purpose of the analysis in this paper, the interviews have been coded according to the 3-word entries used to describe the impression of the robot. In using this approach of analysing qualitative data on the basis of the quantitative findings we aim for further exploration of the survey data in an attempt to reveal the underlying reasoning behind participant's immediate impressions. By combining the two methodologies, we aim to reach variables that we cannot achieve with one form of data alone [37].

The robot
Respondents were invited to report their 3-word impressions (for description see below) based on the Telenoid R1 (see Figure 1). The Telenoid is a teleoperated android robot developed by Hiroshi Ishiguro from Osaka University and the Advanced Telecommunication Research Institute International. The Telenoid is: "developed to appear and behave as a minimal design of human features... A minimal human conveys the impression of human existence at first glance, but it doesn't suggest anything about personal features such as being male or female, old or young" [38]. The Telenoid was specifically selected for the proposed neutrality of gender, age and ethnicity. The teleoperation of the Telenoid entails that the limited body movements of the Telenoid mirror the operators.
In both the online and interaction studies the vantagepoint of the participant was within the formal social distance, i.e. approximately 0.8-1.3 meters. In the interaction study the participant was seated at a small table with the Telenoid placed slightly offset at the end of the table.
The respondents in the online condition were shown a short movie of a scripted interaction with the Telenoid (77 seconds). The recording angle of the movie is offset so that the participant is seeing the telenoid relatively headon while the face of the female interaction partner is not fully visible. The interaction was in the Danish language and displayed a woman expressing worry to the Telenoid because her daughter was falling behind at math. The Telenoid recognizes the woman's concerns and offers to act as a math teacher for the daughter. The woman reluctantly declines the offer where after the Telenoid expresses disappointment (stating: "I would have been really good at it" and directs its head and gaze to the ground).

The 3-word impression measure
The participants from both samples were asked in the online questionnaire to write the first three words that came to mind to describe the robot. Both samples were presented with the same image of the robot (see Figure 1) and three blank slots that they could type their words into. For the interaction condition, this question was presented after the interaction with the robot. For the online condition, this question was presented after seeing a short video of the robot in interaction. For description see the section describing the robot.

Procedure of sentiment analysis
Two independent raters, both linguistics students, were recruited to conduct the sentiment coding. One rater was a male Master student, whilst the other rater was a female Bachelor student. The raters were not given any information at all about the content of the study or the origins of the data. The data was made available to them in Excel, with every line of three words representing an individual respondent. The raters were instructed to rate each word on the following scale: Hence, the sentiment score was the assigned number for each word. Furthermore, the raters were instructed to treat each line as a separate text, so that all the words by a given respondent provided the context for evaluation. However, the initial assessment of interrater reliability revealed a low kappa (k=0.35). To circumvent this, it was decided to recode the sentiment scores into negative (scores 1-2), positive (scores 4-5), and neutral (3), scores which considerably improved the interrater agreement: k = 0.62.

Procedure of categorization
The linguistic responses for each participant consists of three words, which were filled in from left to right in accordance with the Danish writing system. The ordering from left to right thus represents the temporal order in which the words were produced. Each word was categorized by itself, in two regards. First, using common distinctions of parts of speech, the word was categorized as an adjective, common noun, proper noun, verb, or exclamative. Second, the descriptive meaning of the word was categorized using a classificatory system that is tailored to the specific context of HRI. This procedure seemed advisable, given that taxonomies in linguistic appraisal theory [39][40][41][42][43] are not yet standardized and insufficiently fine-grained for the application context. It should be noted, however, that the taxonomy we developed here is an exfoliation of the (contextrelevant part of the) more coarse-grained taxonomies used in linguistic appraisal theory as well as onto the tripartite taxonomy of affect, judgement, evaluation used in psychological appraisal theory.
Given that the majority of responses were adjectives, we divided adjectives into seven main categoriesadjectives expressing (I) response-dependent features, (II) features of physical appearance, (III) tactile features, (IV) human capacities, (V) human emotions, (VI) functional features and (VII) features of evaluation, partly introducing subcategories as follows.
(I) The notion of "response-dependency" stems from philosophy -a response-dependent predicate (e.g., frightening, amazing, funny) denotes a feature of something (typically an object) which elicits in human observers the impression stated by the predicate [44]. In other words, a response-dependent predicate is a way of characterizing an object in terms of how it affects us.
While in metaphysical investigations the term is also used to distinguish two sorts of physical features -e.g., red is a response-dependent predicate but square is notwe are using the term here exclusively for the classification of adjectives that characterize objects in terms of mental states that are more narrowly psychological conditions. We distinguished adjectives that characterize the robot in terms of response-dependent features relating to: 1. negative or positive feelings and emotions (e.g., uncanny, frightening, unsafe, cute, comical, charming; in Figure 6 coded as RDF-(+), RDF-(-)); 2. cognitive states of disorientation (neither positive nor negative, such as special, wondrous), as well as negative and positive cognitive arousal (e.g., boring, bizarre, interesting, fascinating; in Figure 6 coded as RDF-(cog)); (II) With regards to features of physical appearance we distinguished between adjectives expressing: 1. neutral physical features (e.g., big, bald, white-faced; in Figure 6 coded as PA); 2. negative physical features (e.g., unshapely, badly proportioned) in Figure 6 coded as PA-(-)); 3. positive physical features (e.g., pretty, good-looking, in Figure 6 coded as PA-(+)); 4. physical features that are deviations from a norm of physical appearance (e.g., deformed, amputated, albino, ungendered; in Figure 6 coded as PA-dev); (III) We separated features of physical appearance from negative and positive tactile features (e.g., cold, moist, warm, soft) since only in the interaction condition the robot was in the tactile range, even though not touched by participants (in Figure 6 coded as TF-(-), TF-(+)); (IV) Some participants attributed to the robot adjectives expressing human traits or capacities (e.g., smart, lively, objective, or naive; in Figure 6 coded as HC). Some of these adjectives (e.g., stupid or smart) are however ambiguousthey may also be understood as evaluative (see below); (V) From these we distinguished the class of adjectives attributing an emotion to the robot (e.g., sad; in Figure 6 coded as EM); (VI) We decided to establish a separate class for adjectives expressing functional features (e.g., programmed, mechanical, interactive) that are descriptive rather than evaluative (in Figure 6 coded as FF); (VII) The large class of evaluative adjectives we diversified into the following subcategories, depending on the type of norm that governs the evaluation: 1. Positive and negative evaluations relative to instrumental norms or norms applying to a tool (e.g., practical, useful, unfinished, badly manufactured; in Figure  6 coded as EV-F-(-), Ev-F-(+)); 2. Evaluations relative to norms of reality or naturalness (e.g., dream, artificial, fantastic, abstract, false, in Figure 6 coded as EV-R); 3. Evaluations relative to moral norms (e.g., innocent, pure, pathetic; in Figure 6 coded as EV-M); 4. Positive and negative evaluations relative to aesthetic norms, including style and good taste (e.g., modern, simple, compact, disgusting, base, inappropriate; in Figure 6 coded as EV-A-(+), EV-A-(-)); 5. Positive and negative evaluations relative to norms of social interaction, including assessments of affordances for social interactions (e.g., friendly, reliable, kind, unfeeling, suspicious, inhibited, alien; in Figure 6 coded as EV-SA-(+), EV-SA-(-)); The responses used a comparatively small number of nouns which we classed in nouns for six further categories, denoting the following entities (after the illustrations we list the respective codes in Figure 6): (VIII) kinds of object (robot, doll; NO); (IX) stages of humans (child, fetus; NH); (X) deformed humans (monstrosity, freak, prematurity; NHdev); (XI) body parts (face, body, child's body, hand; NHP); (XII) aspects and kinds of communication (articulateness, joke, theatre; NSC); (XIII) unreal items (ghosts, future; NR), and (XIV) metaphorical associations (cross, quietness, balance; XO).
Finally, in the encounter with a radically novel sort of agent such as a social robot exclamatives arguably deserve their own category (XV), even though our sample contained only three instances (wow, wtf, try-again).
As pointed out above, the fine-grained categorization we have chosen here corresponds with the common tripartite distinction of affect (here categories I and XV), judgment (here categories II, III IV, V, VI, VIII, IX, XI) and evaluation (here categories VII, X, XIII, XIV). It seemed important, however, to separate adjectives and nouns since ad-jectives express typically one feature, while nouns denote items with several features that are characteristic of a kind. This difference may be associated with differences in cognitive processing of adjectives and nouns [45].

Procedure on data handling and storage
All researchers with direct access to participants completed the NIH Web-based training course "Protecting Human Research Participants". Participants were informed pre-hand of the purpose of the studies and the expected length. No sensitive information was obtained from the participants (of identifiable variables only gender and age was obtained). No records were obtained about the participants names or addresses. To obtain the email addresses of the participants in the online study, who were not recruited through Qualtrics, we redirected at the end of the survey to a blank survey where only email addresses were obtained. This was unlinked to the survey with the participants questionnaire answers. In the interaction study all participants were recruited through a campus laboratory. Again, names and residence was not obtained from the individual participants and all data was recorded solely on ID numbers. The study was pre-approved by a university based human research committee. Both the online and the interaction study was reported to the Danish Data Protection Agency before data collection commenced.
All anonymized data was uploaded to a university based secure cloud service and deleted from the Qualtrics system. As per the regulations at Aarhus University (Denmark), data will be stored there for 5 years after the last publication on results obtained in the studies.

Statistics
Data analysis of the 3-word data involved basic text processing, descriptive statistics to visualize patterns and tendencies, and regression modelling to investigate the effects of the condition (online V interaction).
At first, the texts were manually reviewed, and the empty or meaningless data points were excluded (e.g. "phv", "ved ik" eng. translation "don't know"). Eventually, the data from 332 participants (out of 340) remained for the online condition, and data from all 69 participants in the interaction condition. In addition, if a participant answered with a sentence or phrase, it was shortened by removing functional words and sometimes verbs (e.g. "ligner ET for meget" eng. translation "looks too much like ET" -> "ET").
In the online condition, the total amount of words (token) was 993, with 317 unique words (type) and type-token ratio of 32%.
In the interaction condition, the total amount of words (token) was 204, with 99 unique words (type) and typetoken ratio of 48.5%.
The processed text data went into the sentiment analysis described above. Further statistical calculations and modelling was done in R Studio 1.1.463 (RStudio Team, 2018) and R 3.5.2 (R Core Team, 2018). 'irr' package [51] was used for inter-coder reliability assessment.
As mentioned above, the original sentiment scores by the 5-point scale were recoded into positive, neutral, and negative values. They were compared across two raters, and only the matching data points were taken into further analysis (883 data points out of 1197).
Negative Neutral Positive Disagreement 528 161 194 314 The sentiment scores and categories were compared across conditions and visualized with the help of 'dplyr' [52] and 'ggplot2' [53] packages.
At the final stage, 'stats' package (R Core Team, 2018) was used for simple linear and logistic regression models, and 'lmerTest' [54] for linear mixed effect models.

Results
The following two plots demonstrate the frequency distributions of the ten most frequently produced words in the online condition (see Figure 2) and interaction condition (see Figure 3).
Regarding parts of speech, only adjectives and nouns were used in the interaction condition, whereas the online condition had 33 more POS (parts of speech), which were excluded due to insignificant numbers. Adjectives were used considerably more often than nouns in both conditions (see Figure 4).
The participants in the online condition used negative words to describe the robot more often, whereas in the  interaction condition neutral words were applied instead (see Figure 5).
Positive evaluations had the same percentage in both conditions. A logistic regression model for the recoded (negative-positive) sentiment scores showed that in the online condition, the participants were more likely to use negative words to describe the robot than in the interac-   However, an effect of age was found in the interaction condition, as elderly people were more likely to give less negative judgements: b = -0.24 (SE = 0.12), z = -2.06, p = 0.0392; the model fit was good: X 2 (1) = 5.76, p < 0.0164.
With regards to the linguistic categories described above, the frequency distribution is presented in Figure 6.
In addition, collectively the participants used more predicates expressing cognitive assessment than predicates relating to their emotions and feelings (see Figure 7).
Regression models did not show any significant relationships between the categories used and the conditions the participants were enrolled in, neither with condition  However, one small effect of gender was observed, as male participants were less prone to use "emotional" categories for the second word (out of 3); b = -0.73 (SE = 0.29), z = -2.52, p = 0.0116; the logistic regression model fit was appropriate: X 2 (1) = 6.54, p = 0.0105.

Findings of qualitative interviews
When analyzing the quantitative data we were puzzled by the contradictory and apparently irreconcilable words used for description of the robot at the individual caselevel. The qualitative data was thus analyzed with a specific focus on exploring the connection between the three words used by the individual participant and their experiences of the interaction with the robot. Above are the 3word entries used by the interviewed participants (see Ta-ble 2). In the following discussion of the results of the qualitative analysis, we probe into the meaning-making processes behind the words used in the 3-word entries. What do participants actually mean when they use these words?
In order to open up the underlying meaning-making processes we include excerpts from the interviews in the following.
The participants' individual impressions and descriptions of the robot and their experience of it went in different directions. The way participants described their experience and impressions of the robot was jumbled and hesitant and with difficulties in how to articulate their impressions in coherent sentences, unsure of how to describe the robot and what to think of it. Thus the experience of interacting with the robot can be described through a central theme of ambivalence. The participants' impression of the robot and their experience of interacting with it, were, like their 3-word entries, not clear-cut but ambivalent and often conceptually incongruent and occasionally contradictory. Five of the interviewed participants had one of the top ten words in the interaction condition as their first word: Strange, Frightening, Creepy, and Naked. Two of the participants labelled their first word in the impression of the robot with words not present in the top-10 words, namely Lively and Cool, and these participants were most clear in their impression, yet not without ambivalence in their description of the experience. A young male participant describing the robot as cool, anonymous, and child described his first impression like this: "Uhm it was fun, when I first came in, I laughed a bit. Uhm, but it was fine." He went on to explain that it was the way the robot looked that made him laugh, that it looked like a baby, but with an adult head, yet also cute: "Yes, well a bit creepy, but cute in a peculiar way. It was very cool. (

. . . ). My impression was positive, I thought it was cool. It was a good way to do it. But I am also into technology. I think it is great that someone is beginning to look at stuff like that."
The word anonymous stemmed from the participant enjoying that he did not feel judged in the situation. This particular theme was also a recurring one, but one that we develop elsewhere [60]. A female participant describing the robot as lively, different, and fun was also generally positive in her descriptions of the robot, but opens up for a more ambivalent feeling: "I thought it was quite fun to enter the room with the robot. Uhm, very different. And then I really liked the fact that it actually moves and talks to you, but at the same time it was a bit awkward because, uhm, should I talk to it or not? How should one react to it?" She was guessing that the robot was teleoperated and felt she had that impression confirmed during the experiment, yet she described her interaction with the robot as strange and being uncertain of whether she should look at it or look at the computer. In the end, she decided to look at it and talk to it as if it was a human, as it felt most natural. When asked to describe the robot she said:

it is a little strange that you see something and think 'baby' or something like that, and then you have to deal with the fact that it knows something and that it actually has intelligence right? Sort of."
She also describes how her impression changed during the process: "Uhm well, the more I talked to it, the more I felt I got to know it, I guess, or got an idea about what it was thinking.

Trying to put an identity on it in a way. Trying to figure out what its' identity was."
This ambivalence in appraising the situation and the special efforts taken in trying to understand and categorize the robot became more apparent when more negative words were used in the 3-word entry. A female participant who described the robot as strange, cool, and unreal said: "I had not expected to see a robot, it was a bit awkward. It was fun to try, and it was not awkward all the time. It became more and more comfortable to have the robot there . . .

Mostly I was unsure of what I could say to it and how much it expected me to say to it."
The participant told the interviewer that the shock of seeing a robot abated some time during the experiment. She continued to say that she was studying to become an engineer and enthusiastically exclaimed that she thought it was cool to see such a robot, yet she also said that she hardly looked at the robot or talked to it even though it talked to her. She was unsure of the functionality of the robot, but thought that it was autonomous.
A male participant described the robot with the words creepy, silly, and unnatural, and elaborated on his experience of being with the robot as one of being on his own, or with a computer, rather than being with a person, whereas the way it looked made him feel creepy. When asked why he would describe it as creepy, he explained: "Hmm . . . just that it is unnatural. I think it has something to do with its limbs and stuff like that. I sat and looked at it, at some point, and I really think it looks peculiar, and that . .

. bottom of it. It looks a bit strange."
He thought that the shape of the robot made it less human like and he described it as an undeveloped foetus. However, the creepy feeling was transformed into an impression of it being silly very quickly after seeing the robot for the first time.
"Well, it looked silly, it was silly that it was there and so on." He explains that as time went by, he just concentrated on the experiment, answered the questions and stopped thinking about the robot, but it was not a conscious choice: "I just stopped noticing it, and then it sort of became a bit more of a routine." The participant was certain that the robot was autonomous, but did not believe that robots had any kind of consciousness, yet he said that he did not feel comfortable sitting with the robot. When asked if he felt more unsure or nervous than usual, he thought that that was too negative a description of his experience. He explained that he didn't feel unsure, but it was a strange feeling and he compared it to 'meeting a person with a disability where one also does not know how to begin the conversation,' thus feeling awkward.
Thinking about the functionality of the robot was also a cause for uncertainty and seemed to add to a feeling of ambivalence. A female participant describing the robot with the words creepy, absurd and child explained her impression in these words: "It was neither human nor robot. It looked more like an alien. So The participant laughed a lot while recounting her experience and explains that she thought the robot was autonomous. She went on to say: ". . . it was uhmm, a very funny experience, and very weird. . . . Well, this is a completely new situation. You have never had an experience of having to sit and talk to such a, such a human-alien-robot figure that is sitting there talking to you . . . It is definitely something one needs some time to get used to, that it is not creepy, that it doesn't do anything to you." Finally, a female participant, who described the robot as frightening, fascinating and cute, thought it was fascinating that it was possible to make a robot like the Telenoid, but on the other hand she was not sure she would like to see it out in society especially since it could move around. She wouldn't like it to become 'too lively'. She also expressed some concern about talking to the robot, not entirely sure it would understand her, and she was also unsure if she could be recorded in the future. Her comment below sums up the ambivalent feeling of both being surprised and excited yet concerned and unsure of the social dimensions of the situation: "I thought it was exciting, the fact that you didn't know there was a robot sitting in there. Because it gave me sort of a shock in the beginning, that there actually was someone. And then, the fact that it seemed lively, with eyes and so on, I thought it made it a bit more, . . . in a way one felt that one was being watched when having to answer. Even though someone may have been looking through the camera [placed above the robot] I felt that its' eyes moved, it made me feel like there was someone who was looking."

Discussion
In the present study, we explored the sentiment and linguistic categorisation of three words produced by participants on their first encounter with the teleoperated robot, Telenoid. Specifically, we triangulated (i) data from an online condition with responses to a brief video-based stimuli-material of the Telenoid with (ii) data from an interaction condition where participants interacted with a physically present Telenoid and, finally, (iii) data from qualitative interviews with the same group of participants who had interacted with the Telenoid. Ranking of the individual words produced in the online and interaction conditions showed somewhat of an overlap on the 10 most frequently produced words in the two conditions (6 out of 10 words). The sentiment analysis revealed an association between the production of negative words and the online condition. The ascription of negative or positive words was not convincingly associated with age or gender as only very small, inconsistent effects were found. Previous research report age differences in preferences for robot appearance [59,60]. In contrast, in the present study age or gender could not predict the sentiments of wordascriptions. Hence, there appeared to be no consistent significant effect of age or gender in the conceptualizations made about the robot. Tentatively, it can be taken to suggest that though age and gender affect preferences and ways of interacting with robots -the way people describe them appear uninfluenced by these variables.
A linguistic analysis of the data revealed that adjectives were dominant. Especially words reflecting ascription of response-dependent features were frequent. We did not find any consistent statistical effect on categorising the words into cognitive and emotional predicates. Finally, the qualitative interviews overall showed that participants' impression and experience of being with the robot reflects their three-word entries as the impressions are rarely clearcut and unambiguous. This impression seems to stem from an ambivalence caused by the way the robot looks; by thinking about the functionality of the robot; by being unsure of how to interact with a robot and by the participants' own preconceived ideas about robots. We turn now to discuss in more detail the results of the quantitative, linguistic, and qualitative analysis viewed together.
Overall, our first two research aims (i) if participants would form impressions that were uniform enough across respondents to enable us to observe trends and (ii) if participants spontaneously would form impressions that were emotionally laden (i.e. could be construed as either positive or negative) were partly confirmed although the semantic span of words was large. To begin, first, with a look at the entire set of over 1200 words offered in response, it is instructive to observe that there are many recurrent and many similar words. However, there is also large variation in the composition of each 3-word entry. Though research on robot appearance in general and this study in particular mainly focus on delineating general patterns, it is important to bear this vast inter-individual variation in mind. Furthermore, it is quite striking that in individual 3-word entries contradicting and seemingly irreconcilable words were often mentioned (e.g., frightening, fascinating, cute). This was further elaborated in the analysis of the interviews where it became apparent that these seemingly irreconcilable words were echoed by the participants describing a very ambivalent feeling of not knowing exactly how to understand and interact with a robot. This ambiguous stance towards social robots have been reported in previous studies [55] and several explanations have been offered. For instance, Bruckenberger and colleagues theorise that fictional robots from the media. . . "leads to "weird", double-minded feelings towards real robots [56]". Others underline that the ontological categories that people would habitually use to classify objects are challenged when confronted with social robots [17,57]. But conflicting reactions are also a typical response in the cognitive processing of novelty [22]. As pointed out by Smedegaard [23], more research is needed to clarify the relationship between (i) the cognitive processing of novelty, (ii) the reactions to social robots either as a special class of novel entity, and (iii) the reactions to social robots as a special class of entity. Since novelty and innovation research also uses linguistic responses, the 3-word method facilitates such comparative analyses.
Collectively, the percentile distribution of positive responses seemed comparable between the online survey group and the group of participants who engaged in face to face interaction (as displayed in Figure 2 and Figure 3). Relating to our third research aim (exploring if there was a difference in the impressions formed between only seeing a robot online engaged in an interaction Vs personally interacting with a robot) we found some difference in the use of negative words to describe the impressions. Hence, the use of negative words to describe the robot were to a significantly greater extent associated with the online survey. It is difficult to determine if this simply reflects an inhibition effect in the online study, i.e. the increased likelihood that people display negative emotions and flaming behaviour online or conversely, if interaction in and off itself increase the likelihood of a more neutral or positive impression of the robot [58,59]. However, other studies have also reported that the physical presence of a robot affects the nature of social interaction that people will engage in and the extent to which they will rate the robot in positive terms [60]. Such results could indicate that people who have been co-present with a robot take care not to "hurt" the robot, as previous studies, for instance, has shown that respondents typed fewer negative words on a computer that they had just worked on when asked to evaluate that same computer [61]. As we shall ex-plain presently, a closer inspection of the ten most used words in both conditions corroborates this interpretation, especially in combination with the qualitative data from participants in the interaction condition.
In order to discuss in greater detail our first (if participants would form impressions that were uniform enough across respondents to enable us to observe trends ) and forth research aim (if there were uniform characteristics in the linguistic properties of the words produced) we need to begin with a brief comment on translation. The Danish word 'uhyggelig' is notoriously difficult to translate into English -it is the negation of the similarly untranslatable word 'hyggelig', often transliterated as 'cosy'; we render it here with the standard translation as uncanny but here the semantic component of scary is less pronounced; 'uhyggelig' characterizes a situation where the agent feels unsafe, is tense, practically uncertain, and disposed to avoidance. Similarly, the term 'underlig' in Danish has no strict English equivalent -it expresses disorientation, and means strange but not in the sense of alien but of puzzling. Very close in meaning is the Danish term 'maerkelig' which could be translated as peculiar-and-puzzling. We render the Danish adjective 'sjov' as funny in the sense of fun, entertaining, excluding other meanings of funny such as peculiar or comical, where the latter does not have the sense of a response-dependent predicate but is used as evaluation. The remaining Danish words have sufficiently good English correlates, but it should be noted that the adjective creepy is not a translation but used literally by the Danish participants who may or may not be fully familiar with the English meaning.
The distribution of the 10 most commonly ascribed words in the two samples share some overlap: uncanny, child, frightening, human, peculiar and strange were all amongst the ten most frequent words in the two groups. The adjectives uncanny and frightening are both categorized as response-dependent relating to feelings or emotions in the participant whilst peculiar and strange are response-dependent features relating to cognitive disorientation. Given that they were mentioned in both groups, reporting these emotionally or cognitive responsedependent features seems independent of personal interaction experience with the robot. The noun child and the adjective human was mentioned in both groups. From the semantics of the adjective human -or the Danish equivalent: menneskelig -alone we cannot deduce whether the use of the adjective human refers to the mannerism of the Telenoid or its visual appearance. Previous research suggests that the presence of specific facial features on a robot could explain 62% of the variation in the ascription of human likeness where especially nose, mouth, and eyebrows were deemed important in this regard [62]. Others report that the human-like face was most preferred and rated as being more alive and sociable whilst a robot with a silver face was deemed only moderately humanlike [63]. Utilizing the abotdatabase (http://www.abotdatabase.info/ collection) of ratings of human likeness the scores of the Telenoid is high for facial features but relatively low for body-manipulators and surfacelook [64]. Taken together the rating of human in the present study may be a reflection of the Telenoid's visual appearance. However, previous studies also find that displays of emotions [65], voice [66], and motion [67] influence ascription of human likeness. The qualitative data from interviews with participants in the interaction condition suggest that the voice was a decisive factor in experiencing the robot as human-like.
Taking a closer look at the differences in the two sets of the 10 most frequently used words in the online survey and the interaction condition, the noun doll and the adjective alien only appear in the online survey and the adjectives creepy and funny (in the sense of fun, entertaining) only in the interaction condition (see Figure 6). While the rankings in the two columns in Figure 6 are not directly comparable, due to differences in sample size, it may nevertheless be instructive to consider the relative position of a word within each column. In the online condition uncanny is most frequent, while terms expressing exploratory engagement, peculiar and puzzling, occur less frequent. By contrast, in the interaction condition puzzling has top score, before uncanny. As mentioned above, this may be taken as an indication that in the interaction condition the robot is still perceived as creating an ambivalent situation of practical disorientation more than a frightening situation. As the qualitative data from interviews show, the participants became more comfortable with interacting with the robot as time passed.
In the online condition adjectives dominate that express an evaluation relative to a norm -a behavioral (uncanny), aesthetic (ugly), perceptual (small, human), practical (peculiar, puzzling), or social norm (alien). This predominance of normative evaluations, in connection with the fact that two classificatory nouns (child, doll) are used, may reflect the greater cognitive distance afforded by the third-person perspective as spectator of a dialogue. In the interaction condition it seems that the linguistic reactions fit with the direct exposure to the robotclammy expressing a tactile feature -and especially exposure to the robot in a social interaction context from the second-person point of view. On the one hand, the robot is drawn into the space of the socially connected and evaluated relative to a social norm of trustworthiness (creepy) while in the online survey the adjective alien is among the ten most used words, stating a lack of social connectability. On the other hand, only in the interaction condition participants observe that the robot is nude, indicating that the participants have included the robot into the social horizon that allows for shame. Third, only in the interaction condition participants find the robot funny (partly also, as the interviews reveal, also silly). As the interview data reveal, the adjective funny is used in a context where the participant describes her or his way of coping with a practically disorienting situation in a social context. If one does not understand the behavioral norm that is to be followed in a situation, distancing oneself from the situation by amusement provides a form of relief.
In the online survey the noun doll was among the ten most frequently used words while in the interaction condition a noun for a non-living thing was used only seven times. This distribution of nouns across the two conditions may be of special interest from the point of view of cognitive science research on human-robot interaction. While the link between social robotics and neuropsychological research on social cognition is being explored in both directions [68][69][70], it appears that the neurolinguistic dimension of this link has not been explored so far. As shown in [62], nouns for living things correlate with greater responses in the lateral mid-fusiform gyrus, and nouns for nonliving things in the medial mid-fusiform gyrus; interestingly, these correlations seem to be 'innate', i.e., independent of visual experiences. While the two neuronal responses are assumed to be exclusive, in our study 10 out the 400 sets of 3 words used a noun for a non-living thing (e.g., doll, robot) and a noun for a living entity (e.g., barn, infant, living thing) or adjectives of a living entity. The 3word method in this way could be used to strengthen, on neuroscientific grounds, the suggestion that social robots indeed present a new conceptual category [17].

Limitations
Several limitations of the present study should be delineated. Firstly, we did not assess the immediate impression of the robot in the interaction study (i.e. had the participants rate their impression based on a still image of the robot before being physically present with the robot). This decision was borne out of a methodical judgement, not to prime or prepare participants on what robot they were about to engage in conversation with as previous studies outline the effects thereof [71,72]. Future studies could assess whether the three-word impression would differ over time or remain stable after being physically present with the robot. Secondly, this study was limited to the use of the Telenoid which may significantly impact the impressions gauged. For instance, it is possible that a similar study with a robot which appearance is more in line with the "Walt Disney" caricature of what a robot looks like, would lead to more homogeneous impressions. Thirdly, exposure time to the stimuli picture was not controlled in the survey portion of the study. However, a study by Willis and Todorov showed that impressions of personality traits deduced from still images were formed as quickly as after 100 ms of exposure time and were not more correct if unlimited time was given [73]. Fourthly, the interrater reliability of the two coders in the sentiment analysis was poor. We took the conservative approach of either excluding or averaging their scores. Fifthly, the content of the conversation in the online condition has emotional undertones which may have influenced the impressions formed by the participants. Finally, the online condition offered a third person perspective on the interaction seen on the video whilst the interaction condition, by definition, solely entailed being engaged in the conversation from the second-person point of view. The effect of this on the quality of the data is difficult to determine and would require future studies.

Future directions, practical and theoretical implications
The present study has several implications. On a practical level the study can inform us on the need for impression management for robots. The overlap in types and sentiment of words between the online and interaction study indicate that similar impressions are formed -in essence regardless of whether a person have interacted with a robot or not. Here it would be important for the robotic engineers to gauge whether these impressions were the ones intended but also pinpoint what determines that exact impression to arise. This in turn again points to the need to retain mixed methods as the backbone of HRI research as identifying the underlying processes involved calls on several disciplines and their methodologies. On the other hand the results of the present study also points to significant differences inasmuch as participants were more likely to use negative words and engage in normative evaluations in the online condition. Regardless of whether this finding reflects or is being magnified by a disinhibition effect or an effect of being engaged in the third-person perspective, it has practical implications as many people will likely meet robots online before they encounter them in real life. Future studies could further explore this threeword approach further for instance utilizing a classical experimental set-up directly comparing an online condition to an interaction condition where for instance exposure time, proximity to the robot, and form of contact (direct interaction, purely observational and online interaction) could be controlled. In such a design the effect of being an interlocutor or a passive observer of a dialogue could also be assessed.

Conclusions
The present study adds to the existing methodologies in HRI research by exploring a "middle ground" between quantitative and qualitative methods for assessing the impression of robots. The method allowed us to explore people's conceptualisation of robots from different angles and offered new insights into the ambiguous experience reported which in turn can fuel further theorisation. The 3-word method and qualitative interviews productively supplemented each other -while the 3-word method could be used as a heuristics to identify certain interview sets, e.g., those that may stand out from more common reactions, the qualitative results provided important information on how to disambiguate entries from the 3-word data set. Utilizing the same methodology in different research settings, with other robots and samples could contribute to an "opinion or sentiment lexicon" with impressions of various robots (akin to the abotdatabase). Notwithstanding the underlined benefits, the present study should be construed as a proof of concept study and the method should be repeated and refined in future studies.