Contribution to a thematic issue

,


Introduction
Over the last years, we have incorporated an increasing number of smart devices into our everyday lives. Many users employ a smartphone while on the go, a desktop computer in their offices, and, increasingly, an augmented (AR) or virtual reality (VR) head-mounted display (HMD) for immersing themselves into virtual worlds. Clearly, each device class possesses unique properties that makes it suitable for different contexts and activities. Research has not only studied the respective devices individually, but also in combination. The use of homogeneous device combinations or modalities for interaction (e. g., tablets or smartphones) is often referred to as cross-device interaction [3]. Similarly, the simultaneous combination of heterogeneous devices or modalities for interaction (e. g., an AR HMD with a linked tablet) is referred to as hybrid user interfaces [5]. Importantly, hybrid user interfaces are characterized as "complementary […] technologies […] that take advantage of the strong points of each" [5].
Despite potential opportunities, most of our workflows are constrained to a single device: Instead of benefiting from an entire device ecology for a given task in a specific context, we often hesitate to incorporate more devices into our workflow and perform entire tasks on a single device, regardless of its suitability for each subtask [16]. The reasons for this constraint are numerous, as to achieve an effective combination of devices for a particular task many factors must be considered, such as the affordances for interaction provided by each device, the continuity of data, or user representations. Therefore, much research has been dedicated to these multi-device ecologies. For example, the field of cross-device interaction [3] examines different constellations of splitting a task across different devices. Even for the nascent field of mixed reality, the area of hybrid user interfaces [5] argues for the use of HMDs in combination with more traditional devices (e. g., smartphones). So what makes these multi-device ecologies worthwhile? Simply adding more devices can be counterproductive, as it might not fit to users' workflow or current activity [16]. Indeed, in a successful multi-device ecology each device or interface component possesses complementary characteristics, filling a niche that was not suitably covered before.
Consider, for example, an immersive 3D data visualization in augmented reality [9] (see Figure 1): Using an AR HMD, users can immerse themselves in the virtual world and explore a 3D visualization through egocentric navigation. Although supporting mid-air gestures and voice commands can be useful for some tasks, they can be cumber-some for precise interaction with the data itself, due to limitations in accuracy, physical strain, or lacking directness and precision. By adding a tablet for 2D interaction, we can complement the existing device with familiar touch input, thus allowing for more precise data manipulation conveniently constrained onto a physical 2D plane.
In this work, we focus on the aspect of complementarity in novel user interfaces and introduce the concept of complementary interfaces. In the remainder of this article, we elaborate on the concept of complementary interfaces, provide a set of challenges, and illustrate the opportunities of complementary interfaces with examples from within our own research.

Complementary interfaces
Traditional desktop interfaces rely on complementary input devices (e. g., mouse and keyboard) to perform tasks, such as pointing and text input. In contrast, many post-WIMP 1 [23] and ubiquitous computing interfaces [24] such as smartphones and tablets are self-contained, trading complementary peripherals with the convenience of builtin touch interaction and a combined input and output space. However, as task complexity increases, single devices may no longer be sufficient to adequately support users in their workflows. For example, research has shown that alternative input modalities can benefit our interaction (e. g., by improving spatial memory [25] or decreasing cognitive load [28]).
Recent research streams in human-computer interaction, such as cross-device interaction [3], multimodal interaction [22], and hybrid user interfaces [5] can be seen as manifestations of Mark Weiser's vision of the computer for the 21st century: "specialized elements of hardware and software, connected [...], will be so ubiquitous that no one will notice their presence." [24]. The technological and methodological advances within the last decades allow researchers to design and evaluate new interaction paradigms beyond the boundaries of a single device and modality, leading to a variety of combinations of interfaces that can be used seamlessly in concert. However, handling multiple devices can increase cognitive load [17], with high transaction costs [7], and users are often not aware of the benefits of including additional devices into their workflow [16].
Based on our own experiences in designing and evaluating multi-device and multi-modal environments, we be- lieve that attributing unique roles, properties, and purposes to each device and modality can lead to a worthwhile combination of interfaces that can overcome the mentioned issues.
We call these meaningful combinations of devices and modalities complementary interfaces: By distributing interaction across devices and modalities, we establish a symbiosis of interfaces, where each component purposefully increases the quality of interaction and further supports users in their current activity. Hence complementary interfaces are an umbrella term that includes combinations of homogeneous (e. g., cross-device interaction) and heterogeneous (e. g., hybrid user interfaces) device classes, but also input (e. g., interaction techniques) and output modalities (e. g., visually or auditory). Importantly, complementary interfaces always feature some degree of heterogeneity in the involved components that complement each other to support the overall system functionality to solve the task at hand. These degrees of heterogeneity may lie in the input or output modality, location (e. g., screen space or input space), or dimensionality of data visualization (e. g., 2D, 3D).
Our notion of complementary interfaces has the potential to serve two purposes: (1) As a design framework, supporting designers in building and composing meaningful complementary interfaces; (2) as an evaluation framework, allowing researchers to study effects of meaningful combinations of complementary interfaces.
While our formal definition of complementary interfaces is still in a formative stage, we are currently exploring aspects of complementarity to better identify and quantify their characteristics.

Challenges for complementary interfaces
Based on our own experiences in developing and evaluating complementary interfaces, we identified six initial challenges (C1-C6) for complementary interfaces.

C1 -Loss of context and linking content
How can we maintain the user's context, spatial memory, and world awareness when switching between devices?
Here, a seamless transition between devices can be helpful (cf. [9]), but can be especially hard to establish with heterogeneous devices and differing representations (e. g., 2D and 3D visualizations [8]). We aim to explore techniques for a lossless transfer of context across heterogeneous de-vices, for example by allowing users to create annotations or place visual markers in a visualization to highlight particular data points, which then persist across different devices. Can these techniques be used to establish a mental connection between semantically identical content (e. g., visualization) yet visually different representations of the content (e. g., 2D visualization on the desktop and 3D visualization in an immersive environment [8])? For example, the field of visual analytics uses techniques such as linking and brushing or multiple coordinated views that provide different views on the same data -we therefore want to investigate whether these techniques can be transferred to a more general use case. This may also facilitate communication in heterogeneous collaboration scenarios [15] by providing shared points of reference [14], regardless of the current visual representation or device. One important aspect in this context may be the continuity of task-relevant data: While each device in the ecology has a distinct complementary purpose, we can redundantly provide task-relevant data on each device to help users in keeping and re-establishing context when switching between devices.

C2 -Cost of switching
Switching our visual attention between different devices can incur a significant overhead [18]. This effect may be especially pronounced for mixed reality devices, as the act of switching between, for example, a desktop screen to a VR HMD [8] is still cumbersome, despite increased device ergonomics. Here, we need to explore techniques that aid or eliminate these transitions. For example, instead of putting on a VR HMD to inspect data in 3D and then taking off again to make specific data selections at a desktop PC, a mixed reality HMD might better support this switch by allowing a transition from VR to reality without taking off the goggles through video-see-through technology. However, the high expense, instrumentation effort, and lack of comfort when wearing state-of-the-art HMDs hinder widespread adoption and prolonged use. We therefore aim to investigate the trade-off between less immersive yet more convenient (e. g., handheld AR) and more immersive but less convenient (e. g., VR HMD) XR devices, which could facilitate transitions between environments.

C3 -Attention awareness and adaptation
How can a proactive and contextual approach based on a combination of implicit interaction [19] and explicit input simplify the user interaction to enable natural interaction? Devices in an environment should tune their attention to the user and adapt to the users' needs and profi-ciency [11]. What a user is visually focusing on (e. g., gaze direction, location, orientation) and what skills or knowledge a user has (e. g., detected through cognitive load and arousal measures) should be used to adapt the content and interaction mechanisms and with this, complement explicit user input. Examples are displays that automatically select the language the user is familiar with [12], a reading interface that adapts the presentation speed to the user's cognitive load [13], or even mutual adaptation scenarios [1]. However, capturing this data reliably, easily, and cheaply still poses a significant technical challenge. Our aim is to further investigate both low-cost hardware solutions as well as interface adaptions to reliably and effectively complement explicit actions based on the user's (implicit) attention.

C4 -Consistent user experience
How can we provide a consistent user experience across heterogeneous devices while exploiting the strengths of each device? For example, while the desktop profits from the familiarity and precision of a WIMP interface, a VR environment is more suited for 3D user interfaces [8]. However, this can lead to inconsistent interaction, which may result in an increased mental demand for the user. In contrast, reconstructing the interface for each device (e. g., emulating a desktop interface in VR by using 2D panels and pointing with VR controllers) may increase overall consistency, but can also lead to an inferior user experience. Similarly, exploring a VR scenario through a handheld touchscreen device will necessarily involve different navigation and manipulation techniques compared to using an immersive HMD setup [15]. We aim to explore the impact of interface consistency and ways to gradually adapt it, for example by recreating a 2D desktop interface in VR initially and then gradually morphing this to a 3D interface, or enabling the user to trigger this transformation themselves.

C5 -Continuity of user representation
How can we consistently and continuously represent users across heterogeneous devices and different realities? For example, a desktop may present the user as a mouse cursor, while a VR environment may show an avatar as user representation [20]. Providing a continuous user representation may be essential particularly in multi-user scenarios, to help collaborators understand where other users are located, where their focus lies and what interface (i. e., device) they are interacting through, as this will impact their abilities and behavior [15]. We aim to further investigate how we can support a continuous user represen-tation when transitioning across different devices (e. g., across different tablets [15]) and realities (e. g., from reality to VR [8]) as well as their impact on user performance.

C6 -Overcoming legacy bias and finding suitable modalities
How can we motivate users to integrate multiple interactive components into their workflows? Although there may be clear advantages for engaging with multiple devices or modalities (e. g., [25,27,28]), users will often still prefer to work as they are accustomed to it. To overcome this legacy bias [16], complementary interfaces must integrate well with users' current workflows, devices, and modalities as they shape the way we interact. Here, we aim to investigate how we can best improve upon existing workflows by providing auxiliary complementary interfaces and carefully guide users to benefit from each involved component [27]. As novel technologies (e. g., mixed reality HMDs) become more commonplace, this too will help to reduce legacy bias, as users may be more willing to employ familiar devices.

STREAM: Synchronous use of heterogenous devices
STREAM [9] combines an immersive AR HMD with a spatially-aware tablet to interact with a 3D visualization (see Figure 1). Here, the two heterogeneous device classes excel at complementary aspects: The AR HMD excels at viewing and interacting with the visualizations in a 3D space, as it provides users with stereoscopic vision and allows for egocentric movement, further reinforcing the depth perception. On the other hand, the tablet provides familiar touch input with haptic feedback, allowing for direct interaction [2] with the 2D scatter plots. Through spatial awareness (i. e., the tablet is tracked with two HTC Vive Trackers), STREAM also enables spatial input: For example, users can rotate individual scatter plots in 3D space by physically rotating the tablet. Users can use this device combination simultaneously, as the AR HMD does not block the user's hand or view; even when the tablet is out of the user's view, the familiar touch interaction as well as spatial awareness are still available for the user through the use of an eyes-free interaction concept. Due to the low cost of switching between devices (i. e., users only need to shift their visual attention), the devices are codependent -meaning that STREAM cannot be controlled by one device alone. STREAM addresses the loss of context when switching between AR and tablet visualization (C1) by providing a seamless transition interaction, thus reducing mental demand by merging both the tablet and AR visualization. However, due to the co-dependency between both de- Figure 1: STREAM combines spatially-aware tablets with augmented reality head-mounted displays for visual data analysis. Users can interact with 3D visualizations through a multimodal interaction concept, allowing for fluid interaction with the visualizations. [9]. vices, STREAM relies on a low cost of switching between devices (C3). This is further supported by an eyes-free interaction concept, allowing for interaction with the tablet without requiring the user's visual attention: Each corner of the tablet contains a large button that is mapped to a single action. This is indicated to the user through an AR heads-up display, facilitating the execution of actions by touching the corresponding corner while relying on proprioception. However, we observed a legacy bias (C6) during periods of eyes-free interaction. Here, users occasionally looked down at the tablet during touch interaction, indicating that they are still used to focus on one device at a time.

ReLive: Asynchronous use of heterogenous devices
ReLive [8] bridges the gap between visual analytics approaches on the desktop and immersive analytics approaches in mixed reality by providing a mixed-immersion visual analytics framework for exploring and analyzing mixed reality user studies (see Figure 2). ReLive combines two heterogeneous device classes -used asynchronously [10] -for complementary analysis workflows. On the one hand, the desktop interface allows for an exsitu analysis, as the devices excel at precise controls, provide a high-resolution display, and use familiar 2D visualizations suited for viewing aggregated data. In addition, users benefit from a cross-compatible environment as well as malleable components, allowing users to use the keyboard for programming their own components (cf. computational notebooks). On the other hand, a VR HMD complements the desktop view for in-situ analysis. Here, the VR view allows users to immerse themselves in the study and look at the data within its original environmental context. The immersion, egocentric navigation, and stereoscopic vision make this environment ideal for viewing and exploring 3D data. However, since users cannot use both devices at the same time, ReLive has no dependency between both devices. As a result, users can complete the entire task on either device, reducing the amount that users need to switch between devices. Due to its device combination, ReLive exhibits a significant cost of switching between devices (C3), which is mitigated by making both components independent, yet synchronized. However, despite this synchronization, users still experienced a loss of context in terms of spatial memory when switching between environments (C1). In addition, while visualizations are implicitly synchronized between desktop and VR, we aim to further investigate more explicitly linking content (C1), for example by investigating cross-reality linking and brushing techniques. Lastly, ReLive makes a design tradeoff that employs 2D menu interaction in VR instead of more embodied interaction to guarantee for a consistent user experience across devices (C4).

When tablets meet tabletops: Collaboration with homogeneous devices
In this work [26], we studied collaborative sensemaking activities using personal tablets and a shared tabletop.
Here, we combined two homogeneous devices for complementary collaborative sensemaking activities (see Figure 3). The tablets act as private space, where each user can search, read, and annotate documents independent of their partner's activity, facilitating loosely coupled ac-  tivities. As collaborative sensemaking activities can be described as mixed-focus collaboration, where individuals constantly transition between individual and shared activities (i. e., coupling styles), we purposefully added a shared devices for collaborative activities: Here, users can share their gained information, spatially arrange it, and use it as a starting point to discuss solution approaches with each other. To this end, we closely investigated the effect of the size of a shared tabletop on user's interaction, their communication, and awareness during cross-device mixed-focus collaboration. However, collaboration is just encouraged, but not enforced. Potentially, each user can solve the task on their own, with minimal or no usage of the shared space at all. This is further supported by the lack of dependency of the incorporated devices -it is only necessary to read documents on the individual tablet, the shared tabletop can be regarded as optional.
We addressed multiple challenges with our multidevice sensemaking tool: To support collaborative sensemaking activities, the two components of the system (personal tablets and shared space) needed to be seamlessly connected. Here, we carefully designed the interfaces to reduce the cost of switching (C3) between them. While it was possible to transfer content bidirectional between the personal and shared space, it was also possible to directly send a document to the partner's tablet. We used colorcodings to indicate each participant's activities, which facilitated linking content (C1). Further, the spatial arrangement on the tabletop was visually highlighted by drawing convex hulls around the clustered items. This was possible by encircling the items or by lifting them in or out. Colorcoded bookmarks further supported the continuity of user representation (C5).

Cross-device collaboration in VR: Collaboration with heterogeneous devices
This work explores the heterogeneous cross-device collaboration between a handheld VR device (i. e., a window into a virtual world) and a fully-immersed VR HMD [15]. The HMD user is embodied by a human-sized avatar (see Figure 4 (A)), reflecting their ability to move via natural loco- motion and manipulate objects with their virtual hands. In contrast, the handheld user is represented as a floating, box-shaped head (see Figure 4 (B)) to allow collaborators to discern the user's direction of gaze. This enables interaction with different levels of immersion, hardware availability, and mobility, supporting scenarios where not all users have access to a full VR setup. In this work, the complementarity stems from the different roles given to the HMD and handheld user, which are based on their specific device characteristics: The HMD user is responsible for 3D object manipulation, as this benefits especially well from 3D spatial input. In contrast, the handheld user can act as consultant, as this user can easily access real-world artifacts (e. g., blueprints) and easily switch between egocentric navigation and assuming the HMD user's point of view. This scenario allows us to reflect on two challenges. First, it highlights the continuity of user representation (C5) for collaboration. Due to asymmetric devices (i. e., handheld and HMD), different modes of non-verbal communication must be considered. Both users are displayed as an avatar regardless of the device, allowing for a consistent representation between users. Second, to favor each device's strength, there is a tradeoff regarding the consistency of user experience (C4), which may lead to additional confounding factors and may complicate communication (e. g., sharing interaction hints).

Proficiency-aware interfaces: Combining implicit and explicit interaction techniques
In our approach, we explored how a system can become aware of the user's language proficiency. The display can provide content in different languages. Using gaze tracking the viewing and reading pattern of the user is captured and analyzed. Based on this implicit input, an appropriate content representation is chosen [12]. This system is an example of a proficiency-aware user interface [11] based on gaze. This approach can be extended beyond languages (e. g., observing gaze patterns on manual tasks or while playing music) and also to other physiological signals such as EEG. Here we consider the complementarity of content, its presentation, and with this, the complementarity of implicit and explicit interactions. Instead of having different options available to the user and requiring a manual switch, we created a system to do this implicitly. This adaptation is a basis for creating the experience of natural interaction of a system that offers an interaction that is tailored to the user and feels appropriate, without explicitly selecting.
This combination of implicit and explicit interactions highlights the importance of attention awareness and adaptation (C3). This does not only include that the interactive system provides different contents that can be adapted and visualized based on implicit user input, but also the gathering of user information (e. g., gaze movements) in an unobtrusive way.

Multimodal interfaces: Input and output
In the applications above, we showed different examples of interfaces in which information was presented mainly visually. However, in some cases we can enhance the representation of data by incorporating other sensory channels, such as audition or touch. A multimodal approach to data representation is especially advantageous when we need to specify several dimensions associated with the data, (e. g., a quantitative measure and the uncertainty associated with it [6]). Yet, by only presenting complex pieces of data visually, we run the risk of overloading the representation, which may negatively impact the user's ability to derive a meaningful interpretation of the information. Leveraging multiple sensory modalities for different data dimensions instead allows us to isolate and focus on specific aspects of the data while also being able to maintain an awareness of overall informational coherence. Since different aspects of data are presented via separate perceptual channels, multimodal interfaces are intrinsically complementary. Users can attend simultaneously to various fields of data without needing to switch focus between devices or visualization windows (C2). Depending on the given application, devices dedicated to each sensory modality can be combined to meet specific representational requirements. For a multimodal representation to effectively convey the desired information, designers must take into account the underlying characteristics of the sensory modalities targeted by the interface (C6). Mapping spatio-temporal properties to a specific sensory channel can provide a more intuitive, straightforward approach to conveying information. For example, given the high spatial resolution of the visual system compared to the other channels, a visual representation is the most appropriate for presenting spatially organized data. In contrast, the auditory system is the least suitable candidate for mapping such spatial information, since its resolution is limited in this domain. Yet, data sonification can take advantage of the auditory system's much higher temporal resolution [21], for example by representing properties that quickly change over time using modulations in pitch or volume. Moreover, the assignment of a certain dimen-sion to a feature of multisensory representation is often and necessarily arbitrary. While some arbitrary mappings are widely used to the point that they have become conventional in visual representations (e. g., high and low numerical values are typically represented with warm and cold colors, respectively), novel correspondences may require users to learn the correlations before being able to use the application [4]. As a result, designers must ensure that users can effectively interpret the presented information with the given mappings.

Outlook
Complementarity may play an essential role in the design of novel user interfaces, resulting in interactions involving several different technologies. In this paper, we discuss how such meaningful combinations of devices and modalities -forming a symbiosis of interfaces -contribute towards an increased quality of interaction. We introduce the term complementary interfaces to describe these meaningful combinations, highlighting the complementary roles of each component by taking "advantage of the strong points of each" [5]. Our notion of complementary interfaces can be either used as a design framework (e. g., supporting identification of meaningful combinations) or an evaluation framework (e. g., explaining and quantifying effects of meaningful combinations). In future work, we plan to further elaborate and formalize our notion of complementary interfaces, define a design space, and further address our presented challenges. Ultimately, we aim to quantify the meaningfulness of the symbiosis of interfaces by investigating and establishing metrics to, for example, quantify redundancy and complementarity of input and output modalities in multi-device ecologies.
Author contributions: Johannes Zagermann and Sebastian Hubenschmid contributed equally to this research.