A study on an applied behavior analysis - based robot - mediated listening comprehension intervention for ASD

: Autism spectrum disorder ( ASD ) is a lifelong developmental condition that a ﬀ ects an individual ’ s ability to communicate and relate to others. Despite such challenges, early intervention during childhood development has shown to have positive long - term bene ﬁ ts for individuals with ASD. Namely, early child - hood development of communicative speech skills has shown to improve future literacy and academic achieve - ment. However, the delivery of such interventions is often time - consuming. Socially assistive robots ( SARs ) are a potential strategic technology that could help support intervention delivery for children with ASD and increase the number of individuals that healthcare professionals can positively a ﬀ ect. For SARs to be e ﬀ ectively integrated in real - world treatment for individuals with ASD, they should follow current evidence - based practices used by therapists such as Applied Behavior Analysis ( ABA ) . In this work, we present a study that investigates the e ﬃ cacy of applying well - known ABA techniques to a robot - mediated listening comprehension intervention delivered to children with ASD at a university - based ABA clinic. The interventions were delivered in place of human therapists to teach study participants a new skill as a part of their overall treatment plan. All the children partici - pating in the intervention improved in the skill being taught by the robot and enjoyed interacting with the robot, as evident by high occurrences of positive a ﬀ ect as well as engagement during the sessions. One of the three participants has also reached mastery of the skill via the robot - mediated interventions.


Introduction
The Centers for Disease Control estimates that 1 in 54 children have autism spectrum disorder (ASD) in the United States [1,2]. ASD is a lifelong developmental condition that affects an individual's capability to communicate and relate to others. The field of Applied Behavior Analysis (ABA) has evidence-based treatments for addressing communication and social deficits as well as reducing challenging behaviors in these populations [3]. These methods are utilized to teach communication, social, play, group instruction, and academic behaviors as well as activities of daily living. In order to provide effective treatment outcomes, such methods should begin during early childhood (i.e., 1.5-6 years old), and it is recommended that children receive 20-40 hours a week of one-on-one instruction from a healthcare professional with training in behavior analysis [4]. Such intensive one-on-one treatment limits the number of individuals that healthcare professionals can positively affect.
Socially assistive robots (SARs) could be a supportive technology to these healthcare professionals, as they provide intervention to individuals with ASD. Numerous SARs have already been developed and utilized for interventions targeted toward individuals with ASD. Some applications of these robot-mediated interventions for individuals with ASD included (1) imitation therapy [5,6], (2) improving social skills (e.g., turn taking, joint attention, eye gaze, greetings/goodbye) [7,8], (3) encouraging self-initiated social interactions [5,9,10], (4) reducing challenging behaviors [11,12], and (5) improving  emotion recognition [12,13]. In general, these robotmediated interventions have had positive outcomes with behavior-based changes in individuals with ASD. However, it remains difficult to integrate this technology in real-world treatment plans because current robotmediated interventions do not follow ABA standards used for the treatment of individuals with ASD [14].
ABA focuses on addressing the unique deficits and behavioral excesses in individuals with ASD. Since these deficits and/or excesses are unique to each individual, the following standard procedure is used in real-world ABA clinical practice: (1) assessment of deficits and excesses in an individual, (2) developing an intervention plan to address an individual's unique needs, and (3) implementing the intervention with ongoing frequent reassessment [15]. Namely, the first step in ABA clinical practice is to perform assessments to determine the appropriate goals for each individual using tools such as the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP) [16] and Vineland Adaptive Behavior Scale [17]. This is followed by the development of a clear intervention plan which includes clearly defined procedures for instruction, error correction, prompt fading, reinforcement, and performance data collection. Finally, the intervention is implemented as a part of an individual's overall treatment plan and his/her performance is frequently reassessed to modify the level of prompting or intervention procedures as necessary. Previous robot-mediated ABA-based intervention research has primarily focused on intervention development and implementation independent of an individual with ASD's treatment program. Furthermore, they have not focused on teaching new skills to individuals with ASD.
The long-term goal of our research team is to integrate autonomous SARs into clinical practice in order to extend the capabilities of therapists for providing accessible therapies to individuals with ASD. Our current research efforts have been toward investigating the efficacy of SARs in delivering interventions as a part of a curriculum for children with ASD to meet developmental milestones.
In this work, we present a study which evaluates the efficacy of an ABA-based robot-mediated listening comprehension intervention for children 3-8 years old with ASD at an ABA clinic. The intervention was developed in close collaboration with a Board-Certified Behavior Analyst (second author) to follow standard ABA procedures and replicate the behaviors of human therapists. To the authors' knowledge, this is the first robot-mediated intervention that has a robot deliver an ABA-based intervention to teach individuals with ASD a completely new skill as a part of their overall treatment plan.

Background
ABA therapy structures instruction using an Antecedent, Behavior, and Consequence (ABC) model [3]. Antecedent, also known as a discriminative stimulus (S D ), refers to the environmental and social factors that occur before a desired target behavior. Behavior is an individual's response to the antecedent. Consequence refers to what occurs immediately after a behavior, which can include methods such as verbal praise, rewards, warnings, or error corrections. As an illustrative example, an antecedent could be a greeting (e.g., saying "hi" and waving) delivered by a SAR to Mary, a child with ASD. The behavior is then whether Mary greets the robot back according to the presented learning opportunity. Consequence would then be the robot providing verbal praise such as "Great job!" or rewards such as a healthy snack to the child for a correct behavior (i.e., greeting the robot).
There are several types of ABA instruction typically utilized by therapists in clinical settings such as Discrete Trial Training (DTT), Pivotal Response Training (PRT), and Verbal Behavior Approach [18][19][20]. These types of ABA instruction vary from more structured (i.e., DTT) to more naturalistic (i.e., PRT) forms of teaching. Herein, we focus on DTT as it is a structured form of instruction that can be independently delivered by robots. DTT is an application of the principles of ABA in a structured teaching environment to teach behaviors (e.g., social, language, imitation) [18]. Namely, DTT involves providing clear instruction of a targeted behavior or a set of behaviors. This includes breaking down a larger set of skills (e.g., cleaning ones room or navigating a social interaction) and systematically teaching individual behaviors (e.g., sorting toys or waving hello) that can be combined to reach the larger skill via positive reinforcement for correct responses or error corrections for incorrect responses. Prompting and careful selection of materials are also a part of the process.
To date, there have only been a handful of studies evaluating the use of SARs in DTT-based therapies [21][22][23][24][25]. In ref. [21], the character-like SAR iRobi and CARO were used in an intervention to teach eye contact and facial expression recognition to children with ASD. The intervention consisted of observational learning where the robot first modeled the correct behavior with a human therapist and the robot then followed up by requesting the child perform the behavior. Correct responses were reinforced by providing access to social stimuli (e.g., music). Encouragement and breaks were provided for incorrect responses. The overall efficacy of the robotmediated intervention was evaluated over eight sessions with eight children with ASD that had difficulty distinguishing between emotions and typically made minimal eye contact. Overall, the participants decreased in eye contact and increased in their abilities to recognize the robot's emotions. However, the individual learning outcomes of the children were not reported.
In ref. [22], the humanoid robot Nao was used to teach adolescents with ASD appropriate greeting behaviors. The intervention utilized a DTT format where the robot first greeted the individual with ASD and provided him/her an opportunity to respond correctly. Correct responses were reinforced with social praise and incorrect or no responses were followed with a least-to-most prompting protocol. Namely, a model prompt was first delivered by a therapist (i.e., demonstrating a greeting) and in the case that the individual still failed to emit the greeting behavior a verbal instruction prompt was delivered by the robot (i.e., explicitly requesting individual with ASD to provide a greeting). The efficacy of the robot-mediated intervention was evaluated with three adolescent participants with ASD that already had the ability to initiate and respond to social greetings but generally did not demonstrate the behaviors. Each child at least participated in 17 intervention sessions but only one of the three participants were able to acquire the response to greeting in the absence of prompts.
In ref. [23], the humanoid robot Nao delivered interventions, inspired by DTT and Floortime, to teach children with ASD imitation behaviors. The intervention consisted of the robot requesting the child to imitate its arm movements and maintaining his/her engagement during the activity. Correct responses were reinforced with social praise. Incorrect responses or no response was followed by a verbal prompt either indicating the mistake the child made or verbally instructing the child on the arm gesture he/she should execute. A study with two children with ASD demonstrated the robot could facilitate an imitation therapy with the participants but the efficacy of the intervention was not evaluated.
In ref. [24], the humanoid Zeno robot delivered a DTT-based emotion recognition intervention to children and adolescents with ASD. The robot played three different emotion recognition games: guessing the emotion of the robot, selecting a picture of a human expressing a specific emotion, and identifying a human's emotion from a picture. Correct responses were reinforced with social praise. A least-to-most prompting protocol was used for incorrect responses and consisted of increasing the number of verbal hints provided to the individual. A study with two children with ASD over three intervention sessions did not show significant improvements in the children's ability to recognize emotions in comparison to their baseline performance.
In ref. [25], the Nao robot facilitated gesture imitation sessions with children with ASD. The sessions were in a DTT format where the robot first presented a gesture to the child and requested him/her to copy it. The child was then reinforced with praise if he/she imitated the robot's gesture correctly. If he/she did not imitate the gesture correctly, the robot would demonstrate the correct gesture and request the child to try again. A study with eight children with ASD was conducted to evaluate whether there were differences in children's engagement and performance during robot-mediated sessions in comparison to human-mediated sessions. Each child participated in a single robot-mediated intervention session and a single human-mediated intervention session. Overall, the children with ASD were more engaged and performed better in the robot-mediated sessions.
In general, current research in robot-mediated DTTbased therapies for individuals with ASD have focused on developing interventions that increase an individual's performance on an existing social skill (i.e., imitation, emotion recognition, greeting, or eye contact) within his/her repertoire using least-to-most prompting protocols [21][22][23][24][25]. To date, there has not been research on teaching a skill that is not already within an individual with ASD's existing repertoire and integrated as a part of an individual's treatment plan. Teaching a completely new skill to an individual with ASD requires the application of a set protocol of DTT techniques to be effective and efficient [18,29]. The efficacy of such robotmediated DTT-based interventions has also been inconclusive with both positive improvements in participant skill execution [21,22] and no change in participants' abilities [24]. With the exception of [22], most studies have also not investigated the individual improvements in participant skill execution after the interventions and/ or had a limited number of intervention sessions.
The research presented in this study extends the existing literature on robots delivering DTT-based interventions to individuals with ASD by (1) developing a language acquisition intervention; (2) investigating the efficacy of a robot-mediated intervention that teaches a new skill not already in an individual's existing repertoire; and (3) integrating the intervention, in place of a human therapist, as a part of an individual's overall treatment plan at an ABA clinic.

Robot-mediated listening comprehension intervention
We developed a DTT-based robot-mediated listening comprehension intervention to evaluate whether a robot can utilize evidence-based practices in ABA to teach answering wh-questions. We have chosen a listening comprehension intervention because early childhood education on communicative speech skills has been shown to be one of the most important factors for positive long-term quality of life for individuals with ASD [26,27]. Specifically, the ability to answer whquestions is one of the most important communicative skills in early childhood education [28]. The robotmediated intervention consists of a robot-mediator reading passages from electronic books (e-books) and asking the child wh-questions with the aim of fading prompts until the child can independently as well as correctly answer the wh-questions.

Humanoid Nao robot
In this study, we utilize the humanoid Nao robot (Figure 1), which stands 58 cm in height with two degrees of freedom (DOF) in the head, five DOF in each arm, one DOF in each hand, and six DOF in each leg to enable the robot to produce physical movements. Nao can speak verbally through speakers. Furthermore, it has seven touch sensors, four bidirectional microphones, and two 2D cameras, allowing the robot to sense the environment around itself. The robot can also be controlled remotely via a Wi-Fi connection.

E-book
Our team has designed an e-book for the robot mediator to use during our interventions. It has been designed specifically for teaching answering wh-questions for children. The e-book is accessible via a web browser and has a page turning animation like a physical book. The pages can be turned by either pressing the left or right arrow keys on a keyboard or clicking on the e-book using a mouse. During the robot-mediated intervention, the robot turns the page on the e-book by sending a command that mimics pressing the left or right arrow keys on a keyboard. The motivation for this is to mimic as closely as possible the human-based intervention. On each page of the book is a picture of an animal with a sentence consisting of a main subject performing an action in a location. For example, Katie the cat was sitting on the couch. The illustration is a digital image of the main subject within a neutral background without any animations. An example page in the book is shown in Figure 2.

Intervention design
In this study, the listening comprehension intervention ( Figure 3) was designed by a Board-Certified Behavior Analyst to closely replicate the existing ABA therapies and DTT teaching procedures implemented by behavior technicians in the clinic. In this intervention, the antecedent was the discriminative stimulus (S D ) of a robot reading a passage from the e-book and asking a  wh-question. The behavior was a child's response to the wh-question. The consequence was then either (1) a positive reinforcement via social praise and dancing from the robot for a correct response or (2) an error correction for an incorrect response from the child. Each intervention session began with the robot mediator introducing itself and notifying the child that they will be reading together during the session. The robot then read through the entire book (three pages) while requesting the child to answer the wh-questions after each page. Namely, after the robot read over one page and displayed the illustration to the child, he/she was asked a wh-question. The wh-questions included only who, what, and where questions. Questions were structured as follows: (1) "Who was (action)," (2) "What is (noun) doing," and (3) "Where did (noun perform an action)." See Table 1 for the questions and correct responses used during the intervention. If the child answered the wh-question correctly, the robot would provide reinforcement in the form of both verbal praise (e.g., Great Job!) and a fun dance. When the child incorrectly responded to the robot's question, the robot verbally provided the answer to the question for the child to model and repeat. After three attempts at evoking a correct response from a child, the robot would move on to a new question.
During the intervention, prompts were also used when a child did not respond to a question within a certain period of time. These prompts were gradually faded until a child could independently answer the whquestions. Prompts are commonly used in ABA therapy as supplementary stimuli to increase the probability of correct responses to an antecedent stimulus (e.g., whquestion) and structured to gradually transfer responses from prompts to the naturally occurring antecedent stimulus [29]. Example forms of prompt structuring include most-to-least prompts, graduated guidance, least-to-most prompts, and time delay prompts. In this intervention, we used a most-to-least prompting protocol via time delay based on a child's prior performance to the wh-questions. There are three levels of prompts before a child achieves skill independence/mastery. These levels include (1) an immediate prompt (i.e., a 0-second delay immediately following the S D ), (2) a 2-second delay before a prompt, or (3) no prompt. A child always begins at a level 1 prompt and the opportunity to  Where did Katie the cat sit? On the couch Where did Doug the dog chase his tail?
By the pond Where did Sam the snake slither?
On the ground independently respond gradually increases with each prompt level. The prompt level increases by one level if they correctly answer eight of the nine wh-questions for two consecutive intervention sessions. A child has achieved mastery of the entire skill only when they can correctly answer eight of the nine wh-questions for two consecutive intervention sessions with no prompts (i.e., during level 3). The time delays for the prompt levels and the way in which new prompt levels are introduced are well established in standard ABA clinical practice as well as outlined by the Autism Curriculum Encyclopedia (ACE) from the New England Center for Children (NECC) [30]. The time delays have been shown to provide children with ASD sufficient opportunity to respond correctly in time. Furthermore, the reason new levels are not gradually introduced (e.g., having some L2 prompts with L3 prompts in a session) is to reduce the probability of prompt dependency by fading away prompts as soon as possible [29]. Namely, prompt dependency refers to the phenomenon where an individual only responds correctly when a prompt is presented [31]. This form of time delay prompt fading has been shown in human delivered interventions to minimize the number of error responses and effectively as well as efficiently teach new skills to individuals with ASD [29].

Study design
We conducted a study to investigate the efficacy of the robot-mediated intervention for teaching the wh-questions to three children with ASD at a university-based ABA clinic. Furthermore, we also evaluated the children's engagement, communication, and affect during the sessions.

Participants
The inclusion criteria for these participants were (1) 3-8 years old, (2) DSM-5 diagnosis for ASD, (3) currently receiving ABA therapy, (4) have not mastered wh-questions, and (5) can follow 1-step gross motor instructions. Prior to initiating the intervention, written informed consent was obtained from the participants' parental guardians and the director of the clinic evaluated whether the child already possessed the skill by using the VB-MAPP [16]. VB-MAPP is a standard assessment tool used in ABA to specifically evaluate whether individuals with an ASD diagnosis possess a language or social skill. We also conducted a baseline session where the questions were presented to the participant by the robot without prompts or reinforcement. Participants were included in the study only if the director of the clinic indicated that the participant did not possess the skill according to the VB-MAPP assessment and he/she could not answer any of the questions during the baseline session.
A total of three children from the university-based ABA clinic met the inclusion criteria and participated in our study. Participant 1 was 5 years old and female. Participant 2 was 4 years old and male. Participant 3 was 5 years old and male. Each of the study participants was also evaluated prior to the study using the Vineland Adaptive Behavior Scales -Third Edition [17]. Vineland-3 is a standardized instrument used to measure adaptive behavior and support the diagnosis of intellectual and developmental disabilities such as ASD. This is the standard instrument used at the ABA clinic where we conducted our study. The scores for each participant are presented in Table 2. Namely, the Adaptive Behavior Composite (ABC) score is a composite score that evaluates an individual's ability in three adaptive behavior subdomains: communication, socialization, and daily living. Communication refers to an individual's ability to listen, understand, and express himself/herself through speech as well as to read and write. Socialization refers to an individual's ability to function in social situations. Finally, daily living skills refer to an individual's ability to perform everyday tasks of daily living.

Intervention setting
The intervention took place at a university-based ABA clinic within a private therapy room approximately 8′ × 10′ in size with carpeted floor (Figure 4). The room has a table and four chairs. The robot mediator sits or stands across from the child to conduct the intervention. A computer monitor on the table presents the e-book to the child and the robot mediator refers to the e-book throughout the intervention. A behavior technician was always present in the room to keep the child from physically interacting with the robot, to help keep the child seated in a chair in front of the robot, and to collect data. A video camera was also placed in the room to record the interaction for post-study analysis. During the robot-mediated interventions, a researcher was controlling the actions of the robot during the intervention and collected performance data to ensure interobserver reliability. In order to successfully control the robot's actions during the intervention, the researcher controlling the robot was always able to see and hear the child.

Robot control interface
The researcher controlling the robot utilized a customized Wizard-of-Oz GUI to set up the robot prior to the start of an intervention session and to control the actions of the robot during the intervention. Prior to the start of each session, the researcher chose the level of prompts to be utilized for the therapy session that day. The choice in the level was based on the performance of the child during his/her previous sessions. During an intervention session, the researcher utilized five push buttons to control the actions of the robot. These actions included (1) greeting the child, (2) reading a page in the book and asking a wh-question, (3) reinforcing a correct response, (4) correcting an incorrect response, and (5) moving onto the next wh-question. Namely, the researcher used the greeting action (button 1) to have the robot greet the child at the beginning of the session. The researcher would then control the robot to go through 27 whquestions. Herein, each wh-question will be referred to as a trial. A trial consisted of the researcher controlling the robot to read a page in the book and ask the child a wh-question (button 2). A time-delayed prompt would be automatically provided by the robot based on the prompt level selected by the researcher prior to the start of the intervention session. This ensured that the robot delivered the prompts at the correct time and did not rely on the researcher's reaction time. If the child responds correctly, the researcher would control the robot to reinforce the child with both social praise and a dance (button 3). In the case that a child responds correctly before the prompt, the reinforcement will prevent the prompt from occurring. If the child responds incorrectly, the researcher would control the robot to correct the child and request the child to model and repeat the answer (button 4). The researcher will move onto the next wh-question (button 5) once the child has responded correctly or the robot cannot evoke a correct response from the child after three attempts. Button 5 selects at random without replacement the next whquestion and changes the phrases the robot will use for buttons 2-4 to be within the context of the new whquestion.

Procedure
The intervention was conducted over 3 months, with each child participating in an intervention session 3-5 times a week according to their availability and only once per a day. Each session was 30-45 minutes depending on the performance and preferences of the participants. Participants could request to end the session at any time. A typical session included presenting a maximum of 27 wh-questions (who, what, and where) with short breaks after the presentation of 9 questions. Herein, we will refer to each set of nine wh-questions as a subsession. There were a total of nine unique wh-questions used by the robot in this study. The exact questions are presented in Table 1. Each wh-question in the bank was presented 3 times during a session for the total of 27 questions.

Data collection
Data were collected on the participants' performance on answering the wh-questions during the robot-mediated intervention. A plus ("+") was recorded for a correct response without a prompt, a plus p ("+p") for a correct response after a prompt, a minus ("−") for an incorrect response without a prompt, and a minus p ("−p") for an incorrect response after a prompt. When a participant did not provide a response, it was counted as an incorrect response after a prompt ("−p") at levels 1 and 2. At level 3 and during mastery, a participant not responding was counted as an incorrect response without a prompt ("−"). We also conducted a post-study analysis of the videorecorded sessions to investigate the children's engagement, communication, and affect during the robot-mediated intervention sessions. The following definitions were utilized by the coders: Engagement was defined by the percentage of intervals that included the participant looking at any part of the robot or computer screen for at least 2 consecutive seconds during the interval. Namely, each session was discretized into 10-second intervals and the participants were classified as looking at the robot or computer screen if their gaze was directed toward the robot or computer screen at least once for 2 seconds during the interval. If the participant looked twice at the robot and/or computer screen within the interval, it still was only counted once during that interval. The percentage of intervals engaged was then calculated by the number of intervals the participant was looking at the robot and/or computer screen divided by the total number of intervals in a session.
Communication was defined as the frequency during a session that a child either independently initiated verbal interaction with the robot (e.g., "Hi Robot!") or engaged in verbal interaction with the robot as a part of the intervention.
Affect was defined as the percentage of intervals during a session when the participant demonstrated positive, negative, or neutral affect. A percentage of intervals was calculated for each of the three affective states. Each session was discretized into 10-second intervals and the participant's affective state during the interval was categorized into positive, negative, and/or neutral. During an interval, a participant could be categorized into more than one affective state (e.g., positive and negative). Hence, the affect percentages represent the percentage of the session where a participant was in a particular affective state. A positive affective state was identified by smiles, positive comments/movements to the robot, laughing, and touching/attempts at touching the robot. Negative affect was identified by crying, vocal protests (e.g., "No!"), attempts to leave, whining, frowning, or whimpering. Neutral affect was when neither positive nor negative affect was displayed by the participants behaviors.
The performance, engagement, communication, and affect data were coded by two independent behavior technicians to ensure interobserver agreement (IOA). IOA was scored using the trial-by-trial method which measures the agreement between the coders divided by the total number of trials. The IOA scores are presented in Table 3.  Table 4 summarize the results of this study. Namely, Figure 5 presents the participants' percentage of correct responses with and without prompts over the course of the intervention sessions. As previously mentioned, all participants first participated in a baseline session where they did not respond correctly to any of the    wh-questions after being read a text. Table 4 presents the Pearson correlations between participants' performance and their engagement, communication, and affect during the sessions. For significant correlations, we also present in Figure 6 the participant behavioral data (i.e., engagement, communication, or affect) alongside their performance data during the sub-sessions.

Participant 1 5.1.1 Performance
In the initial intervention sessions, participant 1 was initially not responding to the robot. The director of the clinic suggested having the therapist they are typically paired with provide prompts along with the robot to familiarize the participant with the intervention and the robot because both were new experiences to her. Similar practices are held within the clinic when a child's treatment is transferred to a new therapist he/she is unfamiliar with, so that the child can become comfortable with the new individual as well as set the expectations of the interactions. The human prompts were removed once participant 1 correctly answered 89% of the whquestions over two sessions with the assistance of the human prompts at level 1. The child only moved onto level 2 once the human prompts were removed and the child answered 89% of the wh-questions over two sessions at level 1 with the robot. All prompting after level 1 came only from the robot. Participant 1 required 5 sessions before the human prompts were removed and 16 sessions total to complete the prompt level 1. She took two sessions to complete prompt level 2. When level 3 was introduced, participant 1 correctly responded on only 3% of the opportunities presented. As a result, level 2 was reintroduced according to the recommendations of the director of the clinic (i.e., the BCBA). Participant 1 required two sessions to complete prompt level 2 again so level 3 was reintroduced. After four sessions at level 3, the intervention was discontinued due to the lack of progress. During DTT, it is a common practice to evaluate the current interventions to be able to effectively determine learner progress and potential programming changes [32].
For this participant, a customization to the protocol was made to improve responding during the intervention because initially participant 1 was not responding to the robot's immediate prompts. After session 3, the behavior technicians who were familiar with participant 1 sug-gested that a human-like voice may improve the participant's response to the robot's prompts. Hence, a modification was made to instead use a prerecorded familiar human voice on the robot. The results demonstrated that the child immediately started to correctly respond to prompts after the modification to deliver the prompts using a prerecorded human voice. We hypothesize that participant 1 preferred a human voice to a synthetic voice because of familiarity with human voices and perceived it as a reinforcer. Similar preferences toward more human-like voices over synthetic voices have been observed with different populations of users [33][34][35].
As observed in the graph ( Figure 5), with the modifications participant 1 correctly responded to prompts but was still unable to complete level 3 (i.e., answer the wh-questions without prompts) or beat the prompt. Namely, the participant never provided a correct response before the prompt even when increasingly longer time delays were used before prompting. It is hypothesized that the prerequisite skills (i.e., rotating questions about an item) for this intervention were not met prior to the start of the intervention. To illustrate, participant 1 at the beginning of the study had not fully mastered the prerequisite skill of answering rotating questions (i.e., 'What is it,' 'What color is it,' and 'What shape is it') in relation to an item (e.g., tennis ball). Assessing prerequisite skills and identifying the appropriate skills to teach is a common challenge in treatment for ASD and not unique to only robot-mediated interventions [36].

Engagement, communication, and affect
On average, participant 1 was engaged during 90.5% of the sessions. Her engagement was negatively correlated with her ability to answer questions without prompts (r = −0.259, p < 0.05) and positively correlated with her ability to answer questions with prompts (r = 0.295, p < 0.05). The negative correlation between engagement and answering correctly without prompts suggests the participant was waiting for the robot to provide a prompt. With this participant, a time-delay prompting protocol may have led to prompt dependency because of the difficulty in the task for her [29].
She would also attempt to independently communicate verbally with the robot on average 9 times during a session. Her communication with the robot was negatively correlated with her ability to answer questions without prompts (r = −0.512, p < 0.05). This is expected because when she is asked to respond without prompts, her error rate increases and as a result is required to increase communication with the robot because of the error correction procedure of the intervention.
During the sessions, participant 1 also displayed positive affect during 22.5% of the session, negative affect during 3.7% of the session, and neutral affect during 74.5% of the session. Her affect, however, did not have any significant correlation with her performance. Per the expert staff within the clinic, participant 1 has a history of being compliant, irrespective of her affect during typical ABA instruction.

Participant 2 5.2.1 Performance
Similar to participant 1, participant 2 initially required human prompts at level 1. Participant 2 required five sessions before the human prompts were removed and nine sessions in total to complete the prompt level 1. Participant 2 then took three sessions to complete prompt level 2 and two sessions to complete level 3. In total, it took 14 sessions before participant 2 achieved complete the mastery of the wh-question answering skill related to the text designed for this study. Furthermore, we had three additional sessions with participant 2, which showed that he retained the mastery of the skill.

Engagement, communication, and affect
On average, participant 2 was engaged 89.2% during the sessions. His engagement was positively correlated with his ability to answer questions with prompts (r = 0.309, p < 0.05). He would also attempt to independently communicate verbally with the robot on average 18 times during a session. His communication with the robot was negatively correlated with his ability to answer questions without prompts (r = −0.463, p < 0.05) and with prompts (r = −0.691, p < 0.05). For participant 2, increases in communication were a result of the high rate of echoing many parts of the robot intervention and the participant reading the book out loud (e.g., 'Katie the cat was sitting on the couch'). These types of communication interfered with his performance with the primary task (i.e., answering the robot questions) because they were distractors from the task. This was further supported because when participant 2 was engaged with the primary task, he performed better with the prompts.
During the sessions, participant 2 also displayed positive affect during 28.7% of the session, negative affect during 0.6% of the session, and neutral affect during 70.3% of the session. His affect, however, did not have any significant correlation with his performance. Due to appropriate selection of the intervention, participant 2 successfully went through the intervention, which likely contributed to his high positive, high neutral, and low negative affect during the sessions.

Participant 3 5.3.1 Performance
Participant 3 required four sessions to complete prompt level 1, and seven sessions to complete prompt level 2. When level 3 was introduced, participant 3 responded correctly on only 6% of opportunities presented, so level 3 was discontinued and level 2 was reintroduced. Participant 3 required two sessions to complete prompt level 2 but the intervention was discontinued due to requests from the child to no longer participate in the study. Similar to participant 1, it was hypothesized that the participant did not possess the prerequisite skills before the start of the intervention. Hence, the task was hypothesized to be too difficult and frustrating for participant 3. This was further emphasized by the higher negative affect during the intervention as compared to the other participants.

Engagement, communication, and affect
On average, participant 3 was engaged during 76.5% of the sessions. He would also attempt to independently verbally communicate with the robot on average 10 times during a session. However, neither his engagement nor his communication was significantly correlated to his performance. In contrast to participant 1, engagement and communication for participant 3 occurred due to interest in the robot and related less to compliance with directions. Hence, the actions associated with participant 3's engagement and communication were often unrelated to the particular task (e.g., pretending to feed the robot, statements and actions of affection).
During the sessions, participant 3 also displayed positive affect during 46.3% of the session, negative affect during 9.4% of the session, and neutral affect during 44.6% of the session. His positive affect was positively correlated with his ability to correctly answer questions with prompts (r = 0.490, p < 0.05). His negative affect was also positively correlated with his ability to correctly answer questions with prompts (r = 0.867, p < 0.05). The staff suggested that participant 3 likely responded correctly to prompts when negative affect was displayed because the participant was aware that answering questions would lead to a break from the session. This was further emphasized by the child frequently requesting the sessions to be ended.

Summary of interventions
To summarize, only one of the three participants was able to reach mastery on the robot-mediated listening comprehension intervention. All participants responded correctly to prompts and enjoyed interacting with the robot, as evident by high occurrences of positive affect as well as engagement. For participants 1 and 3, prompts were not completely faded due to the skill difficulty. The experts hypothesize that these participants did not have the prerequisite skills to master this type of intervention. The participants demonstrated before the intervention the ability to answer the wh-questions but not rotating questions on a particular topic. Typical programming for children with a diagnosis of ASD requires many modifications to reach mastery [37]. Due to the rigor of research, modifications were not appropriate and as such modifications could not be made to the robotmediated intervention to have all the children reach mastery. In typical interventions, delivery modifications would have been implemented to help the children reach mastery. For future robot-mediated interventions, the research team will work with the expert to precisely select skills to target for intervention. Namely, the skills should be in line with the participants' current repertoire.
It is important to ensure skills taught by a robot to a child with ASD are transferable to human interactions and can be retained long-term. To validate the skill taught by the robot, we had a human therapist present the nine wh-questions from the listening comprehension intervention without prompts to participant 2, 1 year after the child interacted with the robot. We only validated participant 2 because he was the only child to reach mastery of the skill. Participant 2 was able to answer seven out of nine of the questions the first time. The human therapist re-presented two incorrectly answered questions after asking two to three other questions and participant 2 could correctly answer the questions the second time. It is important to note when the therapist presented the questions the first or second time, he/she did not provide prompts, reinforcements, corrections, or any sort of feedback. These results demonstrate that the robot-taught skills were transferable to human interactions. Furthermore, this intervention was last presented to the child a year after he/ she interacted with the robot. This suggests that robottaught skills can be retained long-term by a child with ASD.
Engagement, communication, and affect results were unique to the individual participants. This is expected because ASD is idiosyncratic in how the behaviors manifest in each child [38]. This is further emphasized by our differing results from those found in [22], where the children's engagement, communication, and affect did not have any significant correlation to the children's performance during the social greeting interventions. While these parameters presented differently in each of the participants, the parameters could potentially be utilized as additional feedback for (1) the robot to autonomously personalize its behaviors during an intervention or (2) engineers and therapists to personalize the design of the intervention to improve the child's performance. Hence, in the future, it could be beneficial to develop approaches that identify the human-robot interaction parameters that correlate with a child's performance and focus on intentionally influencing these parameters.

Summary of robot intervention behaviors
The robot-mediated listening comprehension intervention presented in this work focused on having a SAR replicate the core behaviors of human therapists during DTT-based therapies with children with ASD. These behaviors included the delivery of the discriminative stimulus, error corrections, prompts, and reinforcements. During the intervention, the discriminative stimulus was used to evoke a response from the child. Error corrections, prompts, and reinforcements were all then used to increase the probability of a child correctly responding to a discriminative stimulus in the future.
Although all the children improved on their ability to answer the wh-questions, only one child reached mastery during this study. Discussions with the clinicians suggested that the main areas for improvement in our robot-mediated inter-ventions included moving beyond the core therapist DTT behaviors and developing a larger range of reinforcements, high-probability requests [3], and pairing procedures [39] for the robot. Namely, a current limitation with the existing intervention is that the robot uses only five phrases for social praise which are paired with dances. Despite the participants initially enjoying these reinforcers during the interventions, we noticed they became monotonous over longer periods of time. In the future interventions, this could be improved by increasing the variety of reinforcers delivered by the robot such as including reinforcers typically used by human therapists such as edibles, preferred items (e.g., stickers), or other forms of social reinforcers (e.g., jokes, praise, and songs). A robot should also be capable of either adapting the type of reinforcement delivered according to a child's performance or inquiring from them directly what may be their preferred reinforcer. Furthermore, it should be noted that the goal is to have the children respond to social reinforcers (e.g., "nice job") and limit other types of reinforcement because social reinforcement is what naturally occurs in humanhuman interactions.
As previously mentioned, at the beginning of the interventions, participants 1 and 2 were not responding to the robot during the intervention. The clinicians hypothesized the following two potential explanations: (1) the participants were unfamiliar with how to interact or respond to the robot during an intervention; and (2) the robot was new to the participants, so they are yet to develop a relationship with it. To address these two issues, the director of the clinic suggested having the therapist they are typically paired with provide prompts along with the robot to familiarize the participants with the intervention and the robot because both were new experiences to the participants. While we were not able to systematically evaluate this process, we observed that initial human prompts helped the children understand how to interact with the robot. The human prompts were then faded away to enable the robot to independently deliver the entire intervention.
To improve the robot-mediated interventions, the robot should have behaviors to independently manage the process when it is first introduced to the children. The clinicians have suggested that the robot could deliver high-probability requests to familiarize the children with how to interact and respond to the robot. The high-probability requests in ABA refer to making requests that are already easy for an individual prior to making requests that may be initially difficult for an individual such as a new skill they are learning [3].
Hence, a robot teaching a new skill to a child should have a set of requests that the child has already mastered, so that they can be used during the initial interactions with a child to build a child's experience with interacting with the robot. Furthermore, the highprobability requests can also be used to increase the motivation of children during intervention sessions they are struggling with by interspersing the requests a child has mastered with the skills they are learning during the intervention.
The clinicians also recommended the robot have a pairing procedure to build a positive relationship with the children because having the robot immediately deliver an intervention may be too large of a step for the robot. Pairing procedures refer to building rapport with a child and offering reinforcers (e.g., preferred items, edibles, and social reinforcers) without making requests or demands [39]. Such a process builds the association for a child that the robot is fun and will lead to more fun elements in the future. Pairing procedures are also commonly used by human ABA therapists in their first interactions with children, so that they can build a positive relationship with the children [39].

Conclusions
Overall, the participants all improved in their capability to answer the wh-questions over the sessions with prompts from the robot but only one participant achieved mastery to date. Such differences in success are expected because children with ASD often require customized intervention plans due to their differences in capabilities and preferences.
With the insights gained from this study, we intend on modifying our intervention to address the aforementioned challenges that have arisen during our study. Namely, when selecting interventions to be delivered by the robot, it is imperative to identify targets at the participants' skill level. One such method to identify these targets is via tools such as the VB-MAPP supporting skills protocol [16]. The present study also has the robot only utilize the core ABA techniques for implementing DTT. However, for future studies, we plan to explore motivational ABA techniques utilized during human-facilitated DTT sessions. These motivational techniques could include high-probability requests (e.g., interspersing mastered tasks with acquisition tasks), pairing procedures, and varying the robot reinforcers. We will also explore a broader set of socially valid targets for the robot to deliver efficient and effective ABA-based interventions. Finally, we also intend on investigating changes in children's behavior during long-term interactions with the SAR and changes in performance as well as behavior over multiple different interventions.