In his article “Digital communications and social media use in surgery – how to maximize communication in the digital age”, M.S. Karpeh has impressively demonstrated how digital technology is changing the way surgeons communicate with colleagues and patients to the benefit of both the caregivers and the patients . Nonetheless, digitalization is not only confined to human communication. The next step is the direct dialogue between humans and machines. Up to now, interaction with devices is mostly mechanical: pushing a button or writing commands. Most probably, this will be substituted soon by directly addressing the machine by voice.
For humans, speech is one of the most intuitive and most natural ways to communicate. Speech control provides a significant simplification when using an application, as no hands or eyes are needed anymore . Therefore, scientists and engineers aim to realize methods and systems that enable not only interpersonal communication but also interaction with machines through natural spoken language. Today, we are able to communicate with various computer applications via speech and research focuses on user-adaptiveness to enable more natural and human-like dialogues (e.g. [3, 4]).
Until now, the whole topic has only been scarcely researched in medical applications, although the benefits seem obvious. In surgery, the operating room (OR) is the most value-creating factor. However, being the most cost-intensive and complex one as well, any effort in research in digital assistance for the OR is justified . With the emergence of new technologies, the surgical working environment becomes increasingly complex and comprises many medical devices that have to be monitored and controlled. However, the goal is to reduce the workload of the surgical team to allow them to fully focus on the actual surgical procedure. Therefore, new strategies are needed to keep the working environment manageable. In this context, the OR of the future (ORF) is a keyword often used . It describes the application of new technologies such as computer-enhanced systems to create an intelligent OR  that facilitates work and reduces the staff needed to assist during a surgical intervention. This reduces personnel cost and promises to lessen the rate of avoidable incidents caused by human error.
During a procedure, the surgeon needs his hands to operate on the patient and his eyes for being aware of what he is doing. Furthermore, the surgeon as well as the whole surgery team must not be disturbed by the usage of complex interfaces. Therefore, the human-computer interface has to be designed simple and intuitive . Thus, any graphical and gesture-based systems are not well suited for this purpose. Being hands- and eyes-free, speech turns out to be the modality of choice. Moreover, speech is the modality used by the surgeon to communicate with their staff. Therefore, using speech to control the technical devices does not pose an additional mental burden. The surgeon can focus on the surgery and control the technical environment at the same time without taking care of how to interact with the system. Enabling the operating surgeon to control devices inherent to the OR by himself reduces the staff needed to assist during a surgical intervention.
Voice interaction systems
Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, Google Now and other virtual assistants have become a firm part of everyday life. Bager  reported that there has been a significant development in computing science and voice interaction systems. The current digital assistants not only answer user questions via speech but also provide information autonomously and user-oriented. The systems hear and reply in native tongue using a friendly voice and combine the user’s available data automatically with contextual knowledge of linked databases. However, until now, the aforementioned virtual assistants only follow a question answering paradigm and do not focus on intelligent interaction over a longer period of time. If the user asks a question, the system combines available data and tries to respond to the user’s inquiry. After the system’s response, the user can start further requests but not related to the preceding query. This means that there is no coherent dialogue over several dialogue turns but only single system-user exchanges. In contrast, the aim of our system for intelligent voice interaction is to track the complete interaction during the ongoing surgery, recognize the user’s context and react accordingly through intelligent subdialogues and proactive interventions. An example for a spoken dialogue system allowing adaptive dialogues over several dialogue turns and representing the state-of-the-art in research is the OwlSpeak Dialogue Manager, which has been developed for intelligent environments by Heinroth et al.  and further extended by Ultes and Minker .
Voice interaction in the OR
Our goal is to develop an intelligent system that takes care of the surgical environment. To increase the productivity and reduce the workload of the operating staff, it ought to act active-cooperatively and support the surgeon autonomously during the procedure. From our point of view, the system should escort the surgery team throughout the whole procedure and provide assistance where necessary.
We have developed the concept of intelligent digital assistance for clinical ORs (IDACO). Its main functionalities include:
providing data about surgery type, session team, general patient data, pre-diseases, medical treatment and laboratory data;
saving preferred device settings for each surgeon, reading and changing the pre-settings, as well as transmitting the parameters to the devices (e.g. OR table, room light, insufflator, suction and irrigation unit);
automatically controlling surgical devices (e.g. starting the insufflator, increasing the gas insufflation, turning off and on the light and tilting the table);
tracking the usage of several materials (e.g. trocars, different types of clips and suturing material) and warning if the usage differs from the schedule; and
emergency mode for unforeseen incidents during a procedure, which allows furthermore the option of a “silent mode” to prevent further distractions by the system.
Technical requirements for voice interaction in the OR
Enabling an intelligent operating assistance system to follow a surgery and control surgical devices automatically bears several challenges.
First of all, modules for automatic speech recognition (ASR) and text-to-speech synthesis (TTS) in medical applications need to be developed, supporting the full range of medical and surgical vocabulary. Common ASR and TTS modules are not trained for such specific applications.
Moreover, for keeping track of the procedure and automatically controlling surgical devices, the system needs to know when to perform which action on which device and when to stay in the background. Therefore, it has to be aware of the whole context of the surgery, i.e. the current point of the procedure, all past and future actions. At every point of the procedure, the system needs all relevant information to infer whether the surgery goes as scheduled or not. This means that developers have to find a reliable method for tracking the course of the surgery, thus allowing to detect unscheduled events during the procedure. Moreover, it has to be clear how the system is supposed to react in tenuous situations. For this purpose, standardized surgeries need to be modeled in detail, allowing the system to compare the actual course of the procedure to the schedule . Using this medical domain knowledge, rather exact models of the complex surgery structure need to be created, which can be applied by the voice interaction system. However, the multitude of existing surgical procedures makes it impossible to implement each one individually. Consequently, the goal is to find a generic method for modeling the surgery control.
Finally, an interface needs to be designed and implemented, which allows intercommunication between the voice interaction system and the surgical devices as well as the clinical information system. More precisely, the interface has to allow accessing necessary data such as surgery type and session team as well as patient data including pre-diseases, medical history and laboratory data. Moreover, the actual controlling of surgical devices such as the OR table, OR lights and peripheral devices (e.g. insufflator and electrosurgical instrument) needs to be enabled. Hereby, regulatory concerns will pose a way bigger barrier than the technical possible implementation.
We present the concept of a voice interaction system for the OR that enables speech-based communication with an intelligent OR. To the best of our knowledge, the presented scheme is the first intelligent spoken language operation assistant, putting together several functionalities that provide the surgeon assistance in many different situations before and during an ongoing procedure.
Using a speech interface, the surgeon can concentrate on the surgery and control the technical environment at the same time, without taking care of how to interact with the system. The aim of our system is to assist during the preparation for the surgery and escort the operating staff through the entire surgery. The system listens to each of the surgeon’s instructions and compares the observed course of the procedure to the surgery schedule. In case of a deviation from the schedule, it reacts proactively and thus acts as a human-like assistant in an intelligent surgical environment.
Dybkjaer L, Bernsen NO, Minker W. Evaluation and usability of multimodal spoken language dialogue systems. Speech Commun 2003;43:33–54. Google Scholar
Litman DJ, Forbes-Riley K. Evaluating a spoken dialogue system that detects and adapts to user affective states. In: 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Philadelphia, USA; 2014:181–5. Google Scholar
Miehle J, Yoshino K, Pragst L, Ultes S, Nakamura S, Minker W. Cultural communication idiosyncrasies in human-computer interaction. In: 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Los Angeles, USA; 2016:74–9. Google Scholar
Bager J. Smartphone denkt voraus: digitale assistenten. Mag Comput Tech 2015;16:122–45. Google Scholar
Heinroth T, Denich D, Schmitt A. OwlSpeak – adaptive spoken dialogue within intelligent environments. In: 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops). Mannheim, Germany; 2010:666–71. Google Scholar
The article (https://doi.org/10.1515/iss-2017-0034) offers reviewer assessments as supplementary material.
About the article
Published Online: 2017-08-25
Research funding: Authors state no funding involved. Conflict of interest: Authors state no conflict of interest. Informed consent: Informed consent is not applicable. Ethical approval: The conducted research is not related to either human or animals use.
Juliana Miehle: Conceptualization; Investigation; Methodology; Project administration; Software; Writing – original draft. Daniel Ostler: Conceptualization; Methodology; Writing – original draft. Nadine Gerstenlauer: Investigation; Methodology; Software; Writing – original draft. Wolfgang Minker: Conceptualization; Supervision; Writing – review and editing.
Citation Information: Innovative Surgical Sciences, Volume 2, Issue 3, Pages 159–161, ISSN (Online) 2364-7485, DOI: https://doi.org/10.1515/iss-2017-0034.
©2017 Miehle J. et al., published by De Gruyter, Berlin/Boston. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0