Use and usability of software veri ﬁ cation methods to detect behaviour interference when teaching an assistive home companion robot: A proof - of - concept study

,


Introduction
A long-term goal of robotics research is the use of assistive robots in the home. Such robots have started to appear in various guises ranging from stationary helpers [1] to cleaning robots [2] to robotic companions [3,4]. In previous research, companion robots have been designed to serve useful functions for their users, while carrying out those tasks in a socially acceptable manner [5]. The combination of an autonomous mobile robot and a "smart-home" environment, where the robot is able to extend its capabilities via access to the home sensor network, has been investigated in a number of large-scale projects, e.g. refs. [6,7], motivated by the use of robotic solutions to address the concerns of cost and care issues resulting from an ageing population [8,9].
The use of home assistance robots to help older adults stay independent in their homes faces many challenges. One of these challenges is to allow the robot to be personalised, e.g. that the robot can be taught to change its functional behaviours in response to the changing needs of the older adult. Our previous research [7] investigated these issues, proposing and evaluating a teaching system in a human-robot interaction (HRI) experiment. Results were encouraging, showing the potential of the system to be used easily by a number of stakeholders, including health professionals, formal and informal carers, relatives, friends and the older persons themselves to adapt the robot to meet changing needs.
End-user personalisation is an active area of research since it has been recognised that robots to be used "in the field" need to allow users to adapt, modify and teach a robot. Despite the best intentions to provide user-friendly interfaces, often only experienced programmers can achieve complex behaviour prototyping [10], but research has demonstrated the feasibility to create systems that support end-user robot programming and personalisation. To give a few examples, Lourens and Barakova [11] suggested a userfriendly framework to allow users with minimal programming experience to construct robot behaviours. Building on this work, pilot tests demonstrated how such an approach could be used, e.g. in the area of robot-assisted therapy for children with autism [12]. Furthermore, trigger-action rules have been suggested and tested with end-user developers to personalise home automation environments [13] and achieve robot behaviour personalisation [14]. Recent trends of End User Development and associated challenges have been identified [15], with a focus on approaches specific to Internet of Things and rule-based systems, e.g. the triggeraction paradigm. For a recent systematic review on end-user development of intelligent and social robots with a focus on visual programming environments, see ref. [16].
A key issue that arises from end-user personalisation of robot behaviours, and which is the focus of this article, is that of behaviour verification and behaviour interference. Behaviour verification is concerned with the effect of adding behaviours via the teaching system and checking whether the new behaviours violate operational goals. Our previous research on methods of behaviour verification is described in refs. [17][18][19][20]. Behaviour interference for a home-assistive robot is a part of a verification approach but deals with the consequences of teaching new behaviours which may inadvertently affect the execution of existing behaviours or not be executed themselves due to already existing behaviours.
This article describes a behaviour interference detection mechanism embedded into a teaching system for a home companion robot, designed to be used by carers, relatives and older adults themselves (rather than robotics experts or programmers). We conduct an evaluation of the system with 20 participants (age range  in order to gain insights on their views of the functionality of the system, but importantly, to also investigate the usability of such a system, which would be a crucial factor in any future envisaged deployments of such systems in people's own homes or care homes. Few studies have investigated an actual deployment of companion robots in real-world settings. Examples include studies focusing on therapeutic and educational outcomes for children with autism, e.g. a 1-month study involving daily HRIs in children's homes [21], or the year-long deployment of a therapeutic robot used by staff in a special needs nursery school [22]. Results from such "field studies" highlight the importance of usability and a number of challenges have been identified that will influence whether or not people are willing to keep using such a system. In a pioneering study involving home companion robots, De Graaf et al. described a 6-month, long-term study which placed 70 autonomous robots in people's homes [23]. Investigating cases of non-use, i.e. refusal or abandonment of the robot, the authors conclude "the challenge for robot designers is to create robots that are enjoyable and easy to use or (socially) predictable to capture users in the short-term, and functionally-relevant and possess enhanced social behaviours to keep those users in the longer-term" [24] (p. 229).
The contributions of the article are to identify and classify when newly added behaviours might affect (or be affected by) the execution of existing behaviours and report on a HRI study in relation to this. The HRI study evaluates the usability of the interference detection system and provides an opportunity to assess participant actions and reactions towards the system.
The remainder of this article is organised as follows. Section 2 describes the overall setting of this study and relevant background, including descriptions of the HRI scenario, the robot behaviours and the previously developed behaviour teaching system (TEACHME). Section 3 discusses our approach to formal verification and behaviour interference checking, analysing and categorising behaviour interactions. Section 4 outlines the TEACHME system's novel enhancement that allows users to add new behaviours to the robot and being notified of possible behaviour interference. Section 5 describes the user evalu-ation carried out with the results being reported and discussed in Section 6. Concluding remarks and future work are presented in Section 7.

Setting and background
The research is being conducted in a typical suburban British 3-bedroom house in a residential area off campus but near the University of Hertfordshire. It has normal house furnishings but has been upgraded to a smart home. It is equipped with sensors and cameras which provide information on the state of the house and its occupants. Over 60 sensors report on electrical activity, water flow, doors and cupboard opening/closing etc. User locations are obtained via ceiling mounted cameras [25], and robot locations via ROS navigation [26]. Since this location has been used for many different HRI studies, there are no permanent residents occupying the house, but its ecological validity is far greater than laboratory experiments performed on campus. However, in order to allow for researcher-led controlled experiments, the setting is more constrained than the deployment of robots in real homes occupied by their owners/ tenants. We call this location, which bridges the gap between a "real" and entirely "simulated" home environment (laboratory), the UH Robot House which is available to University of Hertfordshire researchers but also other researchers as an environment to test and evaluate smart home and robotics technology [27]. The Robot House, as a natural and realistic setting of a home environment, has been used in many HRI studies, e.g. [6,[28][29][30][31].
The physical sensors relevant to the present study range from sensors monitoring activity of electrical devices in the house (e.g. "fridge door is open," "microwave is on," "TV is on" etc.), to sensors attached to furniture (e.g. detecting operation of cupboard door, drawers etc.), to sensors monitoring water flow and temperature (able to detect e.g. "toilet is flushing," "taps are running" etc.) and, finally, pressure sensors (e.g. located on sofas, beds etc. to indicate occupation).
The study reported here used a commercially available robot, the Care-O-bot3 ® robot manufactured by Fraunhofer IPA [32]. It is a multi-purpose, mobile manipulator that has been specifically developed as a mobile robotic assistant and companion to support people in domestic environments and is based on the concept of a robot butler ( Figure 1) [33].
The robot's high-level decision-making uses a production rule approach where each behaviour comprises sets of rules (preconditions or guards) which, if satisfied, execute actions. The rules can check the house and robot sensor values both instantaneously and within a temporal horizon (e.g. "has the doorbell rung in the last 10 seconds?"). Actions are generally related to the robot but can also set other values which can be subsequently checked by the rules. Thus, actions are either robotic (e.g. "move to location X, raise tray"), or sensory/memory based (e.g. "User has been informed to take her medicine"). A more detailed description of the ontology of the house and the robot control system approaches are described in the studies by Saunders et al. [34,35].
Robot behaviours defined as production rules are held as tables in a mySQL database. The rules themselves are encoded as SQL statements and are generated by the TEACHME teaching system, described in more detail in Section 4.
Memory-based values are also held as "sensors" and are used to define or infer knowledge about the house or activities within the house at a higher semantic level. For example, it may be inferred from electrical sensory activity in the kitchen that the "sensor" called "Preparing meal" is currently true. Other memory sensors are used to cope with on-going events in the house which are not reflected by the physical environmental sensors (similar to Henderson and Shilcrat [36]). For example, a sensor with the label, "User has been reminded to take their medicine" might be set if the robot has given such a reminder to the user and would typically be used to ensure that the reminder was not repeated. Temporal additions to the rules allow the system to formulate rules such as "has the user been reminded to take their medicine in the last 4 hours?" Behavioural selection is via priority. Thus, where more than one behaviour has all of the preconditions evaluate as "true," the behaviour with the highest priority will execute first. If all the priorities are equal and all the preconditions to the rules are true, then a non-deterministic choice of the behaviours will be made (in practice, the first rule from the rule set is chosen. However, as the rule set is returned by an SQL query the order of results is not guaranteed, which makes the choice non-deterministic). The use of priorities provides a mechanism for resolving conflicts between actions when the conditions for more than one rule hold in a given instant (we call this behaviour interference).
The Care-O-bot3 ® robot [32] is equipped with facilities for manipulating the arm, torso, "eyes," robot LEDs, tray and has a voice synthesiser to express given text. Typical robot actions would be for example, "raise tray," "nod," "look forward," "move to location x," "grab object on tray," "put object x at location y," "say hello" etc.
Production rules can be created in three ways. First, by low-level coding (e.g. using C++ or Python). Second, by a medium-level teaching mechanism which allows easy creation of behaviours and setting of behavioural priorities, but relies on the user to cope with higher-level memory-based issues. We envisage that the second facility would be used by technical "experts" generating sets of behaviours for the first time. However, creating behaviours in this way is very similar to low-level programming in that a very logical and structured approach to behaviour creation is necessary. Third, a high-level teaching facility is provided which allows the user to easily create behaviours. This generates appropriate additional scaffolding code but does not allow priority setting, all newly added behaviours are automatically generated with equal priority. The cost of this simplification is a loss of generality; however, it is compensated for by ease of use.
As we concentrate in providing a mechanism for (nontechnical) end users to create behaviours, behavioural interference resolution is a challenge. Other approaches to this include providing preferences e.g. in Soar [37], planning, e.g. Hierarchical Task Networks [38] or learning rule utilities e.g. in ACT-R [39]. However, all of these approaches require detailed knowledge of the underlying system, as well as an understanding of the concept of interference and how to use such systems to resolve it. These requirements make it unsuitable for end users such as older adults or their carers who want to use a home companion robot. One of the key aims of this study was to allow non-expert users without detailed technical knowledge to recognise behavioural interference and reflect on how they might approach resolving it.
Next, we explain formal verification and how it is being used in our approach for detecting possible behaviour interference as a consequence of users teaching the robot new behaviours.

Formal verification and behaviour interference checking
Formal methods are a family of mathematical approaches which allow for the specification, design and analysis of computer systems [40]. Formal methods can also be used to verify software and hardware systems, in a process known as formal verification. There are a wide variety of software tools available for formal verification, including model checkers and automated theorem provers. The aim of formal verification is to show the correctness (or incorrectness) of algorithms and protocols using mathematical analysis or formal proof. Formal verification has been used extensively in the design and development of safety-and mission-critical computer systems [41,42]. Formal verification is often used in the design and implementation stages of software development, prior to deployment. After this point, if the software is to be modified then the formal verification process must be repeated. While many formal verification tools are automatic in their operation (e.g. model checkers like SPIN [43], NuSMV [44] or PRISM [45]), the process of creating and validating models is often not automatic, and must be done "by hand." Formal verification has been applied to autonomous robots in various settings [46] including HRI [47,48], home service robots [49] and collaborative robot applications [50].
In our previous work, we explored the use of model checkers for the formal verification of the robot behaviours within the Robot House [17][18][19][20]. In the study by Webster et al., our approach was based on a model of the activity of a typical person within a house [17]. In the study by Dixon et al. [18] an input model for the model checker NuSMV [44] was constructed by hand, and later this process was automated by a tool that can directly read in sets of behaviours and automatically generate an input model [20]. This could potentially be used to re-generate models of robot behaviours and be used to formally verify properties of the system with newly added behaviours. However, due to the complexity of explaining counter models to users we chose to take a different approach.
In the present study, we use static checking of behaviours to identify potential interactions between them. This could be used by a technical expert setting up the behaviours initially for the robot or by a user after deployment to personalise the behaviours. In particular, here we consider the case where the user adds new behaviours for the robot to follow. When a new behaviour is created by the user it is possible for the new behaviour to interact with other existing behaviours in what we term "behaviour interference." Example 1. The system may contain a behaviour (B24) that says If the fridge door is open go to the kitchen (with priority 30). A newly added behaviour (B37) might be If the fridge door is open and someone is sitting on the sofa go to the sofa and say "the fridge door is open" (with priority 10). As the priority of the second behaviour (B37) is less than the first (B24) and whenever the preconditions of the second behaviour are satisfied then the preconditions of the first behaviour are also satisfied then the newlyadded behaviour will never run.
Checking for and reporting such conditions allow the users to identify when new or existing behaviours will not or might not be executed even when their preconditions are satisfied.

Behaviour interference
Note that in the scenario we discuss here, with users adding behaviours to the behaviour repertoire of a robot, the new behaviours are given the same priority. However, when defining behaviour conflicts we consider a more general case where newly added behaviours could have any priority, so the same system could also be used by technically competent users who would also be permitted to input behaviour priorities.
Analysis of the robot behaviours revealed that some potential problems with a new behaviour can be quickly identified without the use of a model checker. A behaviour b is defined as follows: where p i are preconditions that must evaluate to true in order for some sequence of actions A to take place. The use of the logical and connective "∧" specifies that all n preconditions must be true. The set of preconditions for a behaviour b is denoted ( ) P b . The behaviour has a priority ( ) ∈ π b which determines its relative priority. Recall, if there are a number of behaviours whose preconditions are all true, the robot scheduler will execute the behaviour with the highest priority. If all the priorities are equal, then the scheduler will choose non-deterministically one of the behaviours for execution (in practice, the first rule from the rule set is chosen. However, as the rule set is returned by an SQL query the order of results is not guaranteed, which makes the choice non-deterministic).
Given the behavioural selection algorithm of the robot scheduler, it is possible for a behaviour to always be chosen over another. For example, if the preconditions of behaviour b 1 are the same as the preconditions of behaviour b 2 , and b 1 has a higher priority than b 2 , then b 1 will always execute instead of b 2 . In fact, this is also true if b 1 's preconditions are a subset of b 2 's preconditions, as whenever b 2 's preconditions are true, b 1 's preconditions must also be true. In this case, we say that b 1 overrides b 2 , or conversely, that b 2 is overridden by b 1 : In Example 1, behaviour B37 (the newly added behaviour) is overridden by behaviour B24 so behaviour B37 will never be executed.
It is also possible for a behaviour b 1 to be scheduled instead of b 2 in some circumstances, but not others. For example, if the two behaviours b 1 and b 2 from the previous definition have equal priorities, then either behaviour may be chosen to execute. This is called interference: Example 2. Assume now that we have behaviours as described in Example 1 where both behaviours have priority 10. We will refer to these as B24a and B37a. Now behaviour B24a interferes with behaviour B37a. In situations where the fridge door is open, and it is not the case that someone is sitting on the sofa then B24a will be executed. However, when both the fridge door is open, and someone is sitting on the sofa then either behaviour might be executed. In Example 1, behaviour B37 will never be executed. Here there are situations where behaviour B37a might never be executed.
Overriding and interference demonstrate two ways in which behaviours can prevent the execution of other behaviours. It is also possible to identify the potential for overriding and interference in a wider range of cases. For example, if a behaviour b 1 's preconditions are a superset of the preconditions of b 2 , i.e. ⊆ b b 2 1 , and the extra pre- , may also be true at some point during the execution of the robot, then we can say that behaviour b 1 potentially overrides b 2 : Furthermore, we can extend this idea to interference, allowing for a behaviour to potentially interfere with another behaviour of the same priority: Definitions 1-4 provide a set of guidelines for identifying conflicts and potential conflicts between the robot's behaviours. Additionally, they can be used to identify when robot's behaviour set contains behaviours that are likely to overlap and result in unpredictable or undesired activity. The guidelines are summarised in Table 1.
Example 3. Let us assume that a behaviour set consists of two behaviours: These guidelines can be computed for the robot's behaviour database in less than a second, meaning that they can be used by the robot's TEACHME system to quickly determine in real-time whether a new behaviour suggested by the user is likely to conflict with existing behaviours. While useful and efficient, these guidelines do not provide the same level of verification made possible by exhaustive state space analysis using model checkers. However, they do allow a partial analysis of the robot's software that can be used to give timely and meaningful feedback to the robot's users.
The guidelines above are implemented in a software tool called the Behaviour Checker (BC). The BC works by parsing two databases, one containing the existing behaviours used by the robot, and the other containing the new behaviours which have been defined by the user. After parsing, the new behaviours are compared to the already existing behaviours, and behaviour conflicts are identified. Table 1 shows the different types of feedback generated by the BC. The feedback given to the user, for a new behaviour b n , and an existing behaviour b e , can be seen in Figure 6. Note that this is simplified for the user in two ways. First, in cases where more than one definition applies, the BC will output only the most severe conflict, with overriding being the most severe, followed by interference, potential overriding and potential interference.
n , so following Definitions 1 and 3, b n both overrides and potentially overrides b e and the BC will output only the former. Second, in the case where a Definition is satisfied by more than one existing behaviour, only one of the existing behaviours will be shown to the user at a time to avoid overloading the user with an extensive list of behaviours.

The TEACHME system and behaviour interference notification
In this section, we describe the TEACHME system that allows the users, carers or relatives to input new behaviours into the robot. Full details of this part of our system, as well as evaluations of its usability and usefulness in a user study can be found in ref. [7]. Note, a key difference of TEACHME, compared to other popular approaches on human-robot teaching that extensively use machine learning approaches (see overview in ref. [51]), is the explicit representation of robot behaviours as rules implemented as tables in a database. A similar rulebased approach is taken by Porfirio et al. [47]; however our approach supports utilisation of external sensors and behaviours that involve continuous time (e.g. remind the user to take medicine every 4 h). This allows us to conduct experiments that are situated within a realistic Robot House smart home environment ontology which includes knowledge about sensors, locations, objects, people, the robot and (robot) behaviours. The motivation of this approach was to provide a format that can easily be read, understood and manipulated by users who are not programmers, in order to facilitate easy and intuitive personalisation of the robot's behaviour by end-users.
We also explain in this section how any problems detected between two behaviours are presented to the user.

Teaching system -TEACHME
In order to create behaviours the user must specify what needs to happen (the actions of the robot) and when those actions should take place. An example of the user teaching interface (i.e. GUI) is shown in Figures 3-5 and displays the actions a non-technical user would use to create a simple behaviour to remind the user that the kettle is on.
The steps consists of "what" the robot should do followed by "when" the robot should do it. Steps are as follows: the user chooses to send the robot to the kitchen entrance and then presses a "learn it" button. This puts the command into the robot memory (top of Figure 3). Then the user makes the robot say, "The kettle is on, are you making tea?" This is not in the robot's current set of skills and so is entered as text input by the user (bottom of Figure 3). This is followed by a press of the "learn it" button. Now the two actions are in the robot's memory and the user can define when these actions take place.
The user is offered a number of choices based on events in the house (such as user and robot location, the settings of sensors showing the television is on, the fridge door is open, etc.) and a diary function (reminders to carry out a task, e.g. to take medicine or phone a friend at a particular day and time) shown on the top of Figure 4. The user chooses a house event occurring in the kitchen (bottom of Figure 4). Again this is followed by pressing the "learn it" button. Having completed both "what" and "when" phases the user is shown the complete behaviour for review and can modify it if necessary ( Figure 5). Once satisfied, the user presses a "make me do this from now on" button and the complete behaviour becomes part of the robot's behaviour database.

Behaviour interference detection and reporting
The behaviour interference function was embedded within the TEACHME system. This is called when the completed behaviour is signalled by the user to be ready for scheduling on the robot. A challenge for this type of notification is to make it both understandable to a naïve user and to provide mechanisms for rectifying possible interference issues.
The screen that appears when interference is detected is shown in Figure 6. It informs the user that a problem has been found in two behavioursan existing behaviour and the behaviour the user just created. It continues to list the "when" factors that are causing the interferencewhich are effectively the preconditions or sets of preconditions that are equivalent between the behaviours. It also offers some choices to help the user ignore or rectify the interference.
In order to evaluate our system we conducted a proofof-concept user study in the UH Robot House.

User evaluation
The aim of this study was to investigate the usability of the system and if there was any difference in understanding the concept of behaviour interference between already technically trained users (e.g. participants with a computer science background) and those without any systems/programming or robotics background. For future applications of assistive home companion robots, it is essential to know whether users of any systems designed

Research questions
Our research questions were as follows: 1. Is the robot teaching system still considered usable by users when it includes automatic behaviour interference checking? 2. Do users find the mechanism for informing them about behaviour interference effective in helping them understand the nature of behaviour interference? 3. Does the participant's background or gender have any effect on detecting and solving behaviour interference issues?
Note, regarding the third research question we expected that participants' familiarity with robots might have an impact on the results. Regarding gender we did not have any particular hypothesis, so this was an exploratory question.

Participants
Twenty participants were recruited, 11 female and 9 male, who took part individually in the experiment. The participants had a mean age of 48.15 and a median age of 47.
The youngest participant in the sample was 26 and the oldest participant was 69 with an interquartile age range of 35-61. The participants were either postgraduate students at the University of Hertfordshire (typically with a computer science background) or people who had previously expressed a wish to interact with robots in the Robot House. The latter group, which included some University staff, had minimal technical experience, although some had taken part in robotics experiments in the past. Of the 20 participants, 17 had interacted with robots in general before (although not necessarily this robot, and none had used the TEACHME system previously) and eight had experience in programming robots. Figure 7 shows the distribution of the participants based on their prior robot programming experience, with their age and gender information.

Proof-of-concept experiment: methodology
The study was conducted in the UH Robot House with the Care-O-bot3 ® robot, see Section 2 for detailed descriptions. On arriving at the Robot House, each participant was invited to review and sign the appropriate ethics forms Figure 5: Screenshots of the TEACHME teaching interface. The user reviews the complete taught behaviour.
Software verification of a home companion robot  411 and complete a short demographics questionnaire on gender, age and technical background.
For the actual HRI scenario presented to participants, we used narrative framing, allowing participants to feel part of a consistent "story." This technique has been used successfully in HRI, cf. refs. [52][53][54][55]. It has also been used in long-term HRI studies on home companion robots where, inspired by ref. [56], it facilitated prototyping episodic interactions in which narrative is used to frame each individual interaction [57], or to provide an overall narrative arc encompassing multiple interactions and scenarios [28]. In the present study, we used narrative framing extensively, including multiple props, personas and allocated roles for participants.

Scenario: new technician
The participants were asked to imagine that they had just been accepted for a job with a fictitious company called "Acme Care Robots" (ACR) as a robot technician. It was explained that ACR builds robots to assist older adults with the aim of helping them stay in their own home (rather than being moved to a care home), and that it was their job to create behaviours on the robot. They were told that this was their first day of training and following training they would receive their first assignment.
In order to reinforce the illusion of technician training, we used propsall persons involved in the experiment were given white laboratory coats to wear. This included the participant, the experimenter (who also acted as the "trainer") and a third person required to be present by the University for safety purposes (Figure 8).
Training commenced with the experimenter introducing the TEACHME system and explaining in general terms how it worked. The participants were then invited to use the system to create the behaviours shown in Figure 2. After each behaviour was taught the participant was invited to test the behaviour, e.g. after teaching the robot to say "the doorbell is ringing" when the doorbell rings, Figure 6: Screenshots of the TEACHME teaching interface. The system detects a possible interference between two behaviours and asks the user to take action to resolve it. the participant or experimenter would ring the physical doorbell in the Robot House and check that the resulting robot actions were correct.
Three individual behaviours were taught in the training phase. These were chosen to be relatively simple but also to exercise the speech, movements and diary functions of the robot.
Having completed training the participant was given their first assignment. This involved setting up behaviours for a fictitious older lady, a "persona" called "Ethel" who supposedly lived in the Robot House. Details of the assignment sheet given to the participant are shown in Figures 9 and 10.
The choice of a naturalistic setting for the study, the Robot House, the narrative framing approach (as discussed above), the introduction of a user persona, and the use of props was meant to enhance the believability, plausibility and ecological validity of the scenarios as well as enhance users' engagement and immersion. The aim was to encourage participants to immerse themselves into the role of a robot training technician. Props have been a common tool in human-computer interaction research and development for decades, e.g. ref. [58]. With regards to the development of scenarios for home companion robots, narrative framing of HRI scenarios has been used successfully, e.g. ref. [28,57]. The use of personas has been pioneered by Alan Cooper in humancomputer interaction. According to Cooper, "Personas are not real people, but they represent them throughout the design process. They are hypothetical archetypes of actual users." [59], see also studies on user and robot personas in HRI, e.g. refs. [60,61].

Interfering behaviours
The behaviours shown in the assignment sheets consisted of four tasks. The first three we call the "A" section, and the fourth task the "B" section. In the following 1 , etc. The first task of the "A" section was designed to make the robot attend the kitchen with its tray raised whenever activity was detected. Thus, if "Ethel" were carrying out some kitchen activity the robot would be present. Activity was inferred from the kettle being on, the fridge door being open, or the microwave being on. This can be formalised using the definitions from Section 3.1 as the following three behaviours: ; , : ; , : ; , where k means that the kettle is on, m means that the microwave is on and f means that the fridge door is open. The action a 1 means that the robot moves to the kitchen entrance, and a 2 means that the robot raises its tray. The second task was designed so that the robot would proceed to the sofa if Ethel was sitting there: 2 3 where s means that Ethel is sitting on the sofa and a 3 means that the robot moves to the sofa. It was implied that Ethel could place something on the robot's tray while in the kitchen, which the robot would then bring to her once she sat on the sofa. The third task contained an interference issue. It required that if the kettle was on the robot should inform Ethel that she might be making a cup of tea, and it should proceed to the sofa: ; , 3 43 where a 4 means that the robot says, "Are you making tea?" As the preconditions of Similarly, the fourth task (the single task in the "B" section) contained an interference issue. This task re- Figure 9: Background information given to the participant after the training phase has completed. This is then followed by the actual assignment shown in Figure 10.

Order of interference detection
The behaviour interferences described above were presented to users during the proof-of-concept experiment. Two versions of the TEACHME system were created: one with behaviour interference checking, and one without in order to evaluate how participants responded to these two versions.
To rule out familiarity effects (where all participants experienced the checking procedure in the same order) the two versions of the software were pseudo-randomised between participants. Note, in both conditions, i.e. with and without behaviour checking, after using the interface to teach the robot a new behaviour, participants would test the behaviour they created on the physical robot multiple times. In the checker "off" condition, participants would be puzzled when the robot did not carry out the desired task. Participants could then go back to the interface and try to resolve the problem. In the checker "on" condition, they would be alerted to why this interference has happened and could subsequently attempt to resolve the problem.
The 20 participants were randomly allocated into two groups of 10 persons (10 in group X and 10 in group Y). Before the "A" section those participants in group X had checking turned onthe Y group had checking turned off. Once the "A" section was complete the X group would have checking turned off and the Y group have checking turned on. This meant that for example a participant might receive an interference warning after the "A" section issue, but not after the "B" section. Another participant might receive an interference warning after the "B" section issue but not after the "A" section. Figure 11 shows participants' distribution across the two experimental conditions based on robot programming experience. Note that the distribution of participants with robot programming experience was not equal, i.e. three participants in group X and five participants in group Y.

Measures
After both the A section and the B section, the participant was asked to complete a questionnaire ( Table 2). The ques- tionnaire was based on a modified version of Brooke's System Usability Scale (SUS) which rates the general usability of an interactive system [62]. Answers to questions are based on a 5-point Likert scale with values ranging from 1 -"Not at all," to 2 -"Not really," 3 -"Maybe," 4 -"Yes probably" and 5 -"Yes definitely." Note, half of the questions are positively phrased (odd numbered questions), half are negatively phrased (even numbered questions). We had used this scale in a previous validation of the TEACHME system [7]; however, here we extended the questionnaire with two additional Likert scale items referred to as Question 11 ("The robot teaching system helped me resolve inconsistencies in the relative's instructions") and Question 12 ("The robot teaching system helped me understand how behaviours can interfere with each other"). Note, Q11 addresses the ability of the system to solve the problems at hand (e.g. resolving inconsistencies), while Q12 is probing participants' understanding of the principle that different robot behaviours may interfere with each other.
The participants were also given an opportunity to write an expanded answer to these two questions if they wished. Following the "B" section, participants could provide further written comments.

Results and discussion
In this section, we provide the results for the user study. In the following, the abbreviation "BC" is for "Behaviour Checking" and "NBC" is "No Behaviour Checking."

Usability outcome variables
There were three outcome variables: one from the responses to the SUS (based on items 1-10 shown in Table 2), and the two additional items (Questions 11 and 12) mentioned above.

SUS responses
SUS responses for each of the two repeated measures conditions are presented in Table 3. Note, "difference" reported in this and other tables refers to the differences in scores for each participant in this repeated measures study. To calculate the SUS score, 1 is subtracted from each of the values of the odd numbered questions. The values for the even numbered questions are subtracted from 5. The sum of these scores is then multiplied by 2.5, which results in a score between 0 and 100. A SUS score above 68 is considered above average, while scores less than 68 are considered below average [62].
The mean scores are consistent with our previous experiment on usability of the TEACHME system [7] (that did not involve any behaviour interference detection) Table 2: Usability Questionnaire used in the present study Modified Brooke's Usability Scale (5 point Likert scale), items 1-10, complemented by 2 additional items 1 -"Not at all," to 2 -"Not really," 3 -"Maybe," 4 -"Yes probably," 5 -"Yes definitely"

1.
I think that I would like to use the robot teaching system like this often 2.
I found using the robot teaching system too complex 3.
I thought the robot teaching system was easy to use 4.
I think that I would need the support of a technical person who is always nearby to be able to use this robot teaching system 5.
I found the various functions in the robot teaching system were well integrated 6.
I thought there was too much inconsistency in the robot teaching system 7.
I would imagine that most people would very quickly learn to use the robot teaching system 8.
I found the robot teaching system very cumbersome to use 8.
I felt very confident using the robot teaching system 10.
I needed to learn a lot of things before I could get going with the robot teaching system 11.
The robot teaching system helped me resolve inconsistencies in the relative's instructions 12.
The robot teaching system helped me understand how behaviours can interfere with each other Note, the last two items are referred to as "Question 11" and "Question 12" in this article. with high usability. This indicates that there were no significant or salient differences between the two repeated measures conditions and suggests a positive response to research question 1 ("Is the robot teaching system still considered usable by users when it includes automatic behaviour interference checking?"). Table 4 considers the effects of the presentation order, i.e. whether behaviour checking is turned on or off for Sections A and B. NBC/BC denotes behaviour checking was turned off for Section A and then on for Section B (group Y, above the line in the table) and BC/ NBC denotes that behaviour checking was turned on for Section A and then off for Section B (group X, below the line in the table). Tables 7 and 10 have a similar structure. Presentation order effects in terms of SUS responses were insignificant ( Table 4).

Question 11 of usability questionnaire
As Tables 5 and 6 suggest, there were differences between the two repeated measures conditions. These differences were significant with a moderate effect size [63] calculated in the manner suggested by Rosenthal [64] (Wilcoxon signrank test < p 0.01, effect size = r 0.60), and participants considered the system with behaviour checking more favourably, partly providing a positive response to research question 2 ("Do users find the mechanism for informing them about behaviour interference effective in help-ing them understand the nature of behaviour interference?"). Table 7 suggests that there were no effects from presentation order in terms of responses to the two different conditions for question 11. Tables 8 and 9 suggest that there were significant differences between the two repeated measures conditions. These differences were significant with a moderate effect size (Wilcoxon sign-rank test < p 0.05, effect size = r 0.57). Thus, participants considered the interference detection system helped them understand the interference issue better, providing a positive response to research question 2. Table 10 suggests that there were no effects from presentation order in terms of responses to the repeated measures variable.

Demographics outcome variables 6.2.1 Gender
There were no relationships between the repeated measures conditions and gender for any of the three outcome variables (see the difference between mean in Table 11). This means that the data do not show a significant impact    of gender on the usability of the system and how it helped them to resolve inconsistencies and understand behaviour interference.

Prior interaction with robots and programming experience
There were no relationships between the repeated measures conditions and prior interactions with robots for any of the three outcome variables (Table 12). There were also no relationships between experience of programming robots and the repeated measures conditions for any of the three outcome variables (Table 13).
Thus, our data do not reflect a significant effect of participants' background on detecting and solving behaviour interference issues, in response to research question 3 ("Does the participant's background have any effect on detecting and solving behaviour interference issues?").

Conclusions, limitations and future work
We defined and implemented a static behaviour checking system that considers the preconditions and priorities of behaviours to identify cases where behaviours will never be executed or may not be executed. We incorporated this into the TEACHME System on the Care-O-bot3 ® robot in the Robot House that fed back problems to users by a graphical user interface. We carried out a user evaluation study to elicit their views on this system. Regarding the static behaviour checking system we elected to carry out checks on behaviour interference as it was straightforward to explain results to an end-user. An alternative approach would be to add the new behaviour, re-construct the underlying model of the system and carry out full model checking. The main issue with this approach we perceive is how to explain any output to the end-user.
While the participants in this study did not find the two conditions (with behaviour checking and without behaviour checking) different in terms of general usability, they did find that the behaviour checking approach was     significantly more useful for resolving and understanding inconsistencies in the robot's behaviour. Furthermore, we found that technical background did not have a significant effect in understanding the nature of behaviour interference.
If those results can be confirmed in future larger scales studies, then this is an encouraging direction for the development of robot personalisation systems that allow robot behaviour creation and modification that can be carried out by both technical and non-technical users. Specifically, the issue of behaviour interference and the resulting conflicts could be understood by non-expert users when detected and reported effectively. However, although such mechanisms can report on interference, a separate issue is how an end user could potentially deal with and resolve the problem. In the study reported in this article, the user had only limited options (deleting or amending behaviours or simply ignoring the issue). In more complex cases, solutions may be to provide additional behaviour priority modification. Further investigation of these more complex cases and their possible solutions would be a valid next step in this area of research.
The integration of the behaviour checking system into the Robot House also represents an additional tool to compliment formal verification. Formal verification is often used "offline," i.e. prior to system deployment. In addition, it is usually performed by highly-skilled verification engineers. However, the use of behaviour checking based on static analysis of behaviours described in this article has shown that such tools can be used online, during the operation of a multi-purpose companion robot. Furthermore, it can be used to give timely and informative feedback directly to end-users during robot behaviour teaching.
There are several limitations to our work. First, the relatively small number of participants is a major limitation, and the sample of participants is not ideally balanced in terms of gender and programming background. Second, it would have been helpful to have each participant carrying out several sessions in a longer-term study. Third, video recording and analysis of participants' actions and reactions to those two conditions, and their interactions with the experimenter during the experiment, could have added additional detailed information on how participants experienced the two conditions. Finally, we only tested each participant in two conditions, with and without behaviour checking. A larger scale study, with a between-participant design, could study different variations of the behaviour checking approach, in order to gain more detailed feedback on the usability and usefulness of the system, and how to improve the system, rather than, as we did in this study, only considering the presence or absence of behaviour checking.
With respect to future work there are a number of directions we could improve the initial static behaviour checking system. Currently, the behaviour checking system is limited to behaviours in which the triggering condition is a sequence of conjunctions, i.e. p 1 , p 2 and p 3 , etc. A more general approach would use Boolean formula as the triggering condition, so that disjunctions, negations and nested formula could also be included in the condition, e.g. (p 1 or p 2 ) and ((not (p 3 )) and p 4 ). Previously, we assumed that the preconditions of behaviours were conjunctions of atomic statements and represented these as sets of preconditions. The definitions for overriding and interfering were presented as subset checking between sets of preconditions for two behaviours. Let ( ) F b i denote the Boolean formula representing the preconditions for behaviour b i (which in Section 3 was a conjunction). An alternative way to show ( ) ( )  formula. We could program this directly or call to a theorem prover for propositional logic or SAT solver. This would allow a greater range of flexibility of programming the robot, both by developers at the code level, and endusers using the TEACHME system. Second, a more detailed study of the allowable preconditions could be made so that interactions between temporal constraints (such as it is between 10 and 11 and it is morning) or spatial constraints (such as near the sofa and in the living room) could be dealt with properly. For these, we would need better representations related to time and space with definitions for terms such as morning, afternoon and evening. Constraint solvers or spatial reasoners might be useful for reasoning here but we would have to check what types of statements are allowed concerning time or space and how best to reason about them together with Boolean formulae.
Third, when adding a new behaviour we have just presented one behaviour to the user that satisfies the guidelines. A more detailed study might show all behaviours that match the guidelines and these could perhaps be ordered in some way with the stronger conditions first (overriding, then interfering, then potential overriding and finally potential interfering).
Another avenue of future work would be to integrate the use of more powerful verification tools like model checkers and theorem provers and to convert the lowlevel technical output of these types of systems into easy-to-understand feedback for an end-user. In particular, a parser for the UH Robot House rules that translated these into input to the NuSMV model checker was described in ref. [20]. If we add a new behaviour we could check properties such as the pre-conditions for the new property will never be satisfied (so it will never be run) or if the pre-conditions are satisfied on some execution sometime the behaviour will run. Similar checks for existing behaviours could also be carried out. However, how to report those results to non-technical users needs more investigation.
Furthermore, the TEACHME interface would be more robust if it supports priority settings and provides functionality for users to create novel temporal and non-temporal memory variables as opposed to relying on predetermined memory variables, created by technical users, in the present version. The future development of the user interface we designed might benefit from insights gained in recent research on interfaces to allow novice users to comprehend and debug software and systems, e.g. refs. [65,66].
Finally, larger scale user studies specifically targeting older adults as well as adults with dementia or other health-related conditions, ideally performed in participants' own homes, could further illuminate the usefulness and usability of our developed system and its impact on applications to support healthy and independent living. More generally, the techniques and systems presented in this article could be further developed and applied as well to other application domains, including therapy and education, where robots need to be taught new behaviours by non-expert, novice users. In addition to using the system in order to provide assistance functionalities, our approach could also be extended, e.g. to teaching a robot social behaviours.