Download - Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

Transcript
Page 1: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213DOI 10.1007/s12193-010-0044-2

O R I G I NA L PA P E R

Interactive design of multimodal user interfaces

Reducing technical and visual complexity

Werner A. König · Roman Rädle · Harald Reiterer

Received: 24 July 2009 / Accepted: 22 January 2010 / Published online: 19 February 2010© OpenInterface Association 2010

Abstract In contrast to the pioneers of multimodal inter-action, e.g. Richard Bolt in the late seventies, today’s re-searchers can benefit from various existing hardware de-vices and software toolkits. Although these developmenttools are available, using them is still a great challenge, par-ticularly in terms of their usability and their appropriatenessto the actual design and research process. We present a three-part approach to supporting interaction designers and re-searchers in designing, developing, and evaluating novel in-teraction modalities including multimodal interfaces. First,we present a software architecture that enables the unifica-tion of a great variety of very heterogeneous device driversand special-purpose toolkits in a common interaction librarynamed “Squidy”. Second, we introduce a visual design en-vironment that minimizes the threshold for its usage (ease-of-use) but scales well with increasing complexity (ceiling)by combining the concepts of semantic zooming with visualdataflow programming. Third, we not only support the inter-active design and rapid prototyping of multimodal interfacesbut also provide advanced development and debugging tech-niques to improve technical and conceptual solutions. In ad-dition, we offer a test platform for controlled comparativeevaluation studies as well as standard logging and analy-sis techniques for informing the subsequent design iteration.Squidy therefore supports the entire development lifecycle

W.A. König (�) · R. Rädle · H. ReitererHuman-Computer Interaction Group, University of Konstanz,Box D-73, Constance, Germanye-mail: [email protected]: http://hci.uni-konstanz.de

R. Rädlee-mail: [email protected]

H. Reiterere-mail: [email protected]

of multimodal interaction design, in both industry and re-search.

Keywords Multimodal user interfaces · Post-WIMP userinterface · Natural interaction · Design environment ·Zoomable user interface · Semantic zooming · Multimodalinteraction · Squidy

1 Introduction

With recent advances in computer vision, signal processing,and sensor technology today’s researchers and interactiondesigners have great opportunities to go far beyond the tra-ditional user interface concepts and input devices. More nat-ural and expressive interaction techniques, such as tangibleuser interfaces, interactive surfaces, digital augmented pens,speech input, and gestural interaction are available and tech-nologically ready to be incorporated into the multimodal in-terface of the future (see some examples in Fig. 1). However,the actual utilization of these techniques for the design anddevelopment of multimodal interfaces entails various criticalchallenges that interaction designers and researchers have toface.

In contrast to the design of traditional graphical userinterfaces, the development of multimodal interfaces in-volves both software and hardware components [12]. How-ever, conventional development environments (e.g. MS Vi-sual Studio/.Net, Adobe Flash, Eclipse IDE) fall short ofsupporting uncommon interaction modalities and appropri-ate data processing (e.g. computer vision), not to mentionthe handling of multipoint and multi-user applications (e.g.for multi-touch interaction). As a consequence a broad va-riety of very heterogeneous and specialized toolkits andframeworks have evolved over the last few years such as

Page 2: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

198 J Multimodal User Interfaces (2010) 3: 197–213

Fig. 1 Diverse input devices for single-modality or multimodal in-terfaces: (a) Physical game controller offer absolute pointing, motionsensing and gesture-recognition to the end-user. (b) Digital pens buildupon users’ pre-existing knowledge and thus offer a very natural modeof interaction e.g. for digital sketching and prototyping. (c) Multi-touch

surfaces augmented with physical tokens reduce the gap between real-world and digital-world interaction. (d) Finger gestures provide a veryexpressive and direct mode of interaction. (e) Well-known devices suchas an omnipresent laser pointer provide flexible input from any distance

Table 1 Interaction designers have to cope with very different environments for the same interaction modality, touch input

Hardware platform Microsoft Surface Custom-build table Apple iPhone HTC Hero

Form factor Table Table Mobile Mobile

Operating system Microsoft Windows Linux/Windows Mac OS X Android OS

Programming language C# C++ Objective-C Java

Software framework Surface SDK Touchlib iPhone SDK Android SDK

Apple iPhone SDK1, Microsoft Surface SDK2, GlovePIE3,Processing4, NUI Group Touchlib5. They provide supportfor specific interaction modalities, but are mostly restrictedto a dedicated hardware environment and entail further re-quirements and dependencies. When using touch as inputfor instance, the interaction designer has to cope with dif-ferent hardware platforms, operating systems, programminglanguages, and software frameworks (see Table 1). Whendeveloping single-modality interfaces, this diversity can bebypassed—at least in the short-run—by focusing on justone specific device. But the combination of multiple de-vices, e.g. for multimodal interaction involves further plat-

1Apple iPhone SDK, http://developer.apple.com/iphone/.2Microsoft Surface SDK, http://www.microsoft.com/surface/.3GlovePIE, http://carl.kenner.googlepages.com/glovepie/.4Processing, http://processing.org/.5NUIGroup Touchlib, http://nuigroup.com/touchlib/.

forms, devices, and frameworks, resulting in an unmanage-able technical and mental complexity.

There are development environments that support at leastsome of the more uncommon input devices and modal-ities (e.g. physical turntables, mixing desks, multi-touchsurfaces and simple vision tracking). Two examples areMax/MSP6 and vvvv7. Both are graphical development en-vironments for music and video synthesis and are widelyused by artists to implement interactive installations. Theirpopularity in the design and art community arises in par-ticular from their graphical user interface concepts. Bothare based on the concept of visual dataflow programmingand utilize a cable-patching metaphor to lower the imple-mentation threshold [24] for interactive prototyping. Usersarrange desired components spatially and route the dataflowbetween the components by visually connecting pins instead

6Max/MSP, http://cycling74.com/products/maxmspjitter/.7vvvv, http://vvvv.org/.

Page 3: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 199

of textual programming. However, the visual representationof each primitive variable, parameter, connection, and low-level instruction (e.g. matrix multiplication) leads to com-plex and scattered user interfaces, even for small projects.vvvv offers the possibility of encapsulating consecutive in-structions in so-called “subpatches”. This approach helps toreduce the size of the visual dataflow graph, but the hier-archical organization introduces additional complexity. Incontrast to the visual encapsulation in vvvv, the “external”mechanism of Max/MSP supports the visual and technicalencapsulation of certain functionality in an external objectas a “black-box”. This mechanism offers high flexibility andabstraction but requires low level programming in C. Thisresults in a higher threshold and lower interactivity of the de-sign and development process, since changes have to be tex-tually written and compiled in an external development envi-ronment before the external object can be used in Max/MSP.

Basically, Max/MSP and vvvv show interesting user in-terface concepts but they are focused on real-time audiocomposing and 3D rendering and were not designed tosupport the development of multimodal interfaces in gen-eral. For that, interaction designers require not only a setof ready-to-use interaction techniques and input devicesbut also the possibility to physically develop and integratenew interaction modalities and hardware devices. Hardwaretoolkits such as Phidgets [11], Smart-Its [10] or iStuff [1] of-fer a set of compatible microcontrollers, sensor devices andsoftware frameworks enabling rapid prototyping of physi-cal input and output devices. However, the technical com-plexity of the software frameworks requires advanced pro-gramming and signal processing knowledge, in particularwhen multiple devices are used in parallel. iStuff mobile [2]combines the hardware toolkit iStuff with a visual program-ming environment based on Apple’s Quartz Composer.8

This was originally designed to support the visual develop-ment of interactive multimedia and 3D rendering. It sharesthe cable-patching metaphor with the already discussed de-velopment environments vvvv and Max/MSP. This combi-nation of hardware toolkit and visual development environ-ment facilitates fast iterations and rapid prototyping on mul-tiple levels. However, it is restricted to the domain of mo-bile phone interaction and limited in its functionality andthe type of input (e.g. no support for computer vision).

All of the aforementioned development environmentsand toolkits support diverse devices and modalities but theyare not especially designed to support the design of multi-modal interfaces. Here, multiple inputs have to be synchro-nized (e.g. hand-gesture and speech), processed and com-posed to a higher level command (e.g. moving an object).There are few frameworks that address these requirements.

8Apple Quartz Composer, http://developer.apple.com/graphicsimaging/quartzcomposer/.

ICARE [5] is a conceptual component model and a soft-ware toolkit for the rapid development of multimodal inter-faces. It provides two types of software components: the el-ementary components, consisting of Device and InteractionLanguage components used to develop a specific modality,and the Composition components that combine the diversemodalities. It was used for different use cases (e.g. designof a multimodal flight cockpit) but it became apparent thatonly a limited set of the defined components were reallygeneric [27] and the toolkit was not easily extensible [22].

Based on the experiences gained with ICARE, the opensource framework “OpenInterface” was developed by theOpenInterface Project9 that is dedicated to multimodal in-teraction. The OpenInterface framework is composed ofthe OpenInterface Kernel, a component-based runtime plat-form, and the OpenInterface Interaction Development Envi-ronment (OIDE), a graphical development environment forthe design of multimodal interfaces [27]. In order to inte-grate an existing input device as component into the Open-Interface Kernel the component interface has to be speci-fied in an dedicated XML-based CIDL description language(Component Interface Description Language). This specifi-cation can be semi-automatically generated from the sourcecode of the component by the OpenInterface platform. Italso generates C++ code to encapsulate the external binaryinto a well defined programming interface [3]. Due to thisexplicit description of the interface the encapsulated com-ponent can be used in the graphical development environ-ment OIDE. This uses a cable-patching metaphor similarto Max/MSP, vvvv, and Quartz Composer in order to de-fine the dataflow by combining the selected components vi-sually. Lawson et al. [22] identified diverse shortcomingsof the OpenInterface OIDE and the introduced applicationdesign process. A major issue is the limited focus and in-flexible design of the components. The developers ratherfocus on the design of their individual component than onthe application as a whole. This leads to an inflexible de-sign of the components and the application in general thathinders the reuse, extension and exchange of componentsas well as the entire application. This inflexibility also re-stricts interaction designers in exploring diverse alternatives,which then impedes rapid prototyping and limits epistemicproduction [18] of concrete prototypes. In order to addressthe identified issues Lawson et al. introduced an all-in-oneprototyping workbench for multimodal application develop-ment [22]. It is a combination of the OpenInterface Kernelwith an Eclipse Plugin as graphical editor that is namedSKEMMI. The editor is also based on the cable-patchingmetaphor, but provides three levels of detail with respect tothe displayed information. The low-detail “workflow” levelreduces information and facilitates the initial sketching of

9OpenInterface Project, http://www.oi-project.org/.

Page 4: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

200 J Multimodal User Interfaces (2010) 3: 197–213

the desired interaction techniques. In the “dataflow” levelwhere all details for the routing of the dataflow are rep-resented, the user selects, arranges and logically links thecomponents without the need to route and connect every sin-gle pin. In a third-level, the “component” level, only a spe-cific component with its input and output pins is visualizedand the user is able to tailor the component’s interface (e.g.changing the port attributes and parameters). SKEMMI pro-vides also an alternative source code editor view that allowsfor changes of the component or its interface programmat-ically. The three-layer approach helps to control the visualand functional complexity of the components, but there is nohigher-level abstraction concept (e.g. hierarchical pipelinesor semantic zooming). If the designed multimodal interfaceincorporates multiple devices and various signal processingcomponents, the SKEMMI user interface gets increasinglycrowded. The geometric zoom of the user interface is not asolution for the complexity issue since it just changes thesize of the displayed information but not the informationrepresentation itself.

To sum up, there are only very few frameworks that sup-port the design of multimodal interfaces. However, they ei-ther provide a limited range of interaction modalities or arehardly extensible regarding the platform, the components orthe visual user interface. The OIDE or the SKEMMI graphi-cal editors seem very promising, but the complexity issue iscritical in real world projects. Moreover, all of the discusseddevelopment environments focus mainly on rapid prototyp-ing and the early steps of iterative design. None of themprovide tool-support for the empirical evaluation of the de-signed interfaces (e.g. ISO 9241-9 tapping tasks and suit-able data-logging). All of the graphical development envi-ronments utilize the cable-patching metaphor in a similarway in order to connect input and output pins. However,the dataflow programming could be more powerful with-out losing its simplicity. Furthermore, they still require adeep understanding of the underlying technology on behalfof the designers, since they have to understand and routeeach primitive variable/data item even when using “black-box” modules.

In the following, we present our Squidy Interaction Li-brary, which contributes on different levels:

– The software architecture: Squidy enables the unificationof heterogeneous devices and components in a commonlibrary. The architecture is designed to provide great flex-ibility, simple extension, high independency and fast par-allel processing.

– The visual development environment: Squidy enables theinteractive design and configuration of multimodal inter-faces for interaction designers and researchers. The userinterface concept is designed to provide a low threshold(ease-of-learn) and high ceiling (high functionality) andscales well with increasing complexity.

– Tool-support for the entire development lifecycle: Besidesthe visual design and configuration for rapid prototyping,Squidy also provides advanced development and evalua-tion techniques for iterative design.

After giving a short conceptual overview in the next section,we will discuss the software architecture in Sect. 2.1 andafterwards describe the user interface concept in detail inSect. 2.2. In Sect. 3 we will show the appropriateness ofour solution to the actual design and research process in thecontext of a variety of real world projects.

2 Squidy interaction library

We introduce the Squidy Interaction Library, which unifiesa great variety of device toolkits and frameworks in a com-mon library and provides an integrated user interface for vi-sual dataflow management as well as device and data-filterconfiguration. Squidy thereby hides the complexity of thetechnical implementation from the user by providing a sim-ple visual language and a collection of ready-to-use devices,filters and interaction techniques. This facilitates rapid pro-totyping and fast iterations for the design and development.However, if more functionality and profound customizationsare required, the visual user interface reveals more detailedinformation and advanced operations on demand by usingthe concept of semantic zooming. Thus, users are able toadjust the complexity of the visual user interface to theircurrent needs and knowledge (ease of learning).

The basic concept (see Sect. 2.2 for a more detailed dis-cussion) that enables the visual definition of the dataflowbetween the input and output is based on a pipe-and-filterconcept (see Fig. 2). By using this concept Squidy providesa very simple, yet powerful visual language for designingthe interaction logic. Users can select an input device ofchoice as source, e.g. a laser pointer, which is representedby an input node in the visual user interface. They con-nect it successively with filter nodes for data processing,such as compensation for hand tremor or gesture recogni-tion and route the refined data to an output node as sink. Ba-sically, the user defines the logic of an interaction techniqueby choosing the desired nodes from a collection (knowledgebase) and connecting them in an appropriate order assistedby a heuristic-based node suggestion. The filter nodes are in-dependent components that can transmit, change, or deletedata objects, and also generate additional ones (e.g. if a ges-ture is recognized). The source and sink are specific driversthat handle the input/output operations and map the individ-ual data format of the devices to the generalized data typesdefined in Squidy (see Fig. 4). The pipe-and-filter conceptprovides also very technical advantages, since the encap-sulation of functionality in independent “black-boxes” en-sures information hiding, modifiability and high reuse by

Page 5: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 201

Fig. 2 View of a simple pipeline in Squidy. The pipeline receives po-sition, button and inertial data from a laser pointer, applies a Kalmanfilter, a filter for change recognition and a filter for selection improve-ment and finally emulates a standard mouse for interacting with con-ventional applications. At the same time the data is sent via TUIO to

listening applications. The pipeline-specific functions and breadcrumbnavigation are positioned on top. The zoomable knowledge base, witha selection of recommended input devices, filters, and output devices,is located at the bottom

Fig. 3 Input node in Squidy representing an interactive laser pointer.In order to reduce visual complexity the node-specific functions (ac-tive/inactive, delete, duplicate, publish to knowledge base) and the un-connected in and out ports are only shown if the pointer is within thenode

abstraction. The possibility for multiple input and outputconnections offers a high degree of flexibility and the poten-tial for massive parallel execution of concurrent nodes. Inour implementation each node generates its own thread andprocesses its data independently as soon as it arrives. Thiseffectively reduces the processing delay that could have anegative effect on the interaction performance.

The sink can be any output technique such as a vibrat-ing motor for tactile stimulation or LEDs for visual feed-back. Squidy also provides a mouse emulator as an outputnode to offer the possibility of controlling standard WIMP-

Fig. 4 Data type hierarchy in Squidy based on primitive virtual de-vices [30]. Any data processed in Squidy consists of single or com-bined instances of these basic data types

applications with unconventional input devices. Multipointapplications (e.g. for multi-touch surfaces or multi-userenvironments) and remote connections between multipleSquidy instances are supported by an input/output node thattransmits the interaction data either as TUIO messages [17]or as basic OSC messages over the network. TUIO is a

Page 6: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

202 J Multimodal User Interfaces (2010) 3: 197–213

widely used protocol for multipoint interaction based on themore general OpenSound Control protocol (OSC), which isa successor to the MIDI standard. By providing these stan-dard interfaces for both input and output connections Squidysupports the majority of multi-touch applications that haverecently become very popular in both research and industry.Above these basic network interfaces Squidy also supportsand integrates more complex frameworks such as the Ap-ple iPhone SDK, the Android SDK, the NUIGroup Touch-lib, and the Microsoft Surface SDK. Users therefore bene-fit from the particular functionalities and specific hardwareof all these techniques. Inside Squidy, however, they arealso able to define, control, evaluate, and reuse interactiontechniques independently from the hardware or the specificframework. This flexibility results from the architecture uti-lized and the generalized data types which will be explainedin more detail in the following section.

2.1 Software architecture

There are several frameworks and toolkits that provideready-to-use components for input devices and signal proces-sing. Instead of connecting the components to pipelinesprogrammatically, most of these frameworks and toolkitsoffer a basic language for controlling the dataflow visu-ally (for example Max/MSP, vvvv, OIDE or SKEMMI).Such a visual programming language reduces the tech-nical threshold and complexity and aids users with lit-tle or no programming experience. Also, the integrationof new modalities requires a fine grasp of the underly-ing technology and thus is still a highly demanding task.Although, extending a framework with new componentsis only offered by a few of today’s common frameworkssuch as ICARE [4] or the open source framework Open-Interface (www.oi-project.org). However, integrating newcomponents into the frameworks requires either an addi-tional programming effort or a dedicated definition of theinterface by a specific mark-up language. Basically thismeans that a developer has to switch between different ap-plications and programming languages while developinga new interaction technique, increasing the mental work-load.

2.1.1 Generic data types

In order to unify very heterogeneous devices, toolkits andframeworks, we generalized the various kinds of input andoutput data to a hierarchy of well-defined generic data types(see Fig. 4) based on the primitive virtual devices intro-duced by Wallace [30] and adapted to the work of Bux-ton [6] and Card et al. [7]. Each generic data type consistsof a type-specific aggregation of atomic data types such asnumbers, strings or Boolean values bundled by their seman-tic dependency. Simply adding a single connection between

two nodes in the visual user interface performs routing ofdataflow based on these generic data types.

This is quite a different approach when compared to someof the aforementioned frameworks such as the ICARE [5]and vvvv. These frameworks use atomic data types definedin the particular programming language and assign them vi-sually by connecting result values with function argumentsin their specific user interfaces. In order to use the function-ality of a module in these frameworks, the user has to routeeach of these low-level data types. Each x-, y-, and z-valueof a three-dimensional data type has to be routed separately,for example. This is a procedure that needs additional effortand can be error-prone, in particular when designing com-plex interaction techniques. Furthermore, this approach re-quires detailed knowledge about the functionality of eachnode and its arguments. Routing low-level data types there-fore puts high cognitive load on the user and leads to visu-ally scattered user interfaces, particularly as the number ofconnected nodes increases.

Squidy, on the other hand, does not require the designerto visually define every value and programming step man-ually. The interaction data is grouped in semantically bun-dled generic data types as mentioned before. Squidy there-fore offers the abstraction and simplicity of a higher-leveldataflow management and reduces the complexity for theinteraction designer without limiting the required function-ality.

2.1.2 Squidy bridge

In order to achieve high extensibility and to simplify the in-tegration of new devices and applications, we provide theSquidy Bridges as common interfaces that support widelyused network protocols and also offer a specific native API ifhigh-performance data transmission is needed. For the pur-pose of unifying data produced by different hardware de-vices or applications (especially relevant for incorporatingmultiple interaction modalities), the Squidy Bridges map thediverse data originating from heterogeneous sources into thegeneric data types. Thus, the internal data processing is har-monized and completely separated from the diversity of theexternal world. These bridges are able to handle data trans-formations in both directions (e.g. from Apple iPhone intothe Squidy Core and from the Squidy Core to the applica-tion running on the panoramic display and vice versa in or-der to close the feedback loop e.g. activation of the vibratoron the iPhone as tactile feedback of the application’s sta-tus (see Fig. 5)). The interaction library already comes withan OSC Bridge and a Native Interface Bridge that can beused out-of-the-box. The OSC Bridge offers the possibil-ity of directly connecting the various available devices andtoolkits using this communication protocol. Since OSC isbased on standard network protocols such as UDP or TCP,

Page 7: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 203

Fig. 5 This figure shows the usage scenario of an interactive andmultimodal environment to control an application running on a 360◦panorama screen by using touch gestures and speech. The user inter-acts with his fingers by touching the display of an Apple iPhone (1).All recognized touches will be sent from an iPhone Client application(OSC reference implementation running on the iPhone) to the OSCBridge of Squidy (2). The Squidy Core will process the incoming dataappropriately and sent it via the “special purpose bridge” (3) to the360◦ application (4) to control a cursor object, which visually high-lights the users current finger position. If the user has selected an inter-active element with such a touching gesture the application (5) sendsa tactile feedback back to its connected bridge (6). The tactile feed-back coming from the application will be forwarded through the OSC

Bridge (7) to the iPhone (8) where the vibration motor will be activatedto inform the user that he is hovering above an interactive element. Af-ter the user has realized the tactile feedback and thus the interactiveelement (9), he will use a spoken command to invoke an action on theselected object. Therefore, the spoken command will be recognized bythe operating system’s speech recognition and then will be sent to the“native interface bridge” (10). The appropriate spoken command willhave been processed by the Squidy Core (11) and transformed into anaction, which will be sent to the application to trigger object activa-tion/manipulation (12). This multimodal scenario can be implementedwith Squidy using pluggable Squidy Bridges to receive data from dif-ferent devices and a simple arrangement of nodes to process that in-coming data

it is highly flexible and widely applicable, in particular formobile or ubiquitous computing. An OSC message consistsof several arguments such as the class name of the genericdata type, a unique identifier and data-type-specific parame-ters. For instance, a message for a two-dimensional positionthat may be sent from an Apple iPhone would contain thePosition2D data type as first argument, IMEI number as sec-ond argument, x- and y-value as third and fourth argument(Listing 1).

The flexibility gained from this network approach (e.g.hardware and software independence, high scalability by

distributed computing (see Fig. 15)) entails a certain de-lay that can have a negative effect on user input perfor-mance [23]. Thus, for those purposes when performance ismore important than flexibility, the Native Interface Bridgeprovides a straightforward Java and C/C++ API to map datafrom individual devices to the generic data types in Squidyprogrammatically. In contrast to the OSC Bridge, this tech-nique increases throughput and reduces the delay to a mini-mum.

For devices that support neither the OSC protocol nor theNative Interface Bridge by default, Squidy provides client

Page 8: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

204 J Multimodal User Interfaces (2010) 3: 197–213

Listing 1 OSC Message sent from an Apple iPhone contains fourarguments (finger touch)

/∗∗∗ 1 . g e n e r i c d a t a t y p e∗ 2 . IMEI as i d e n t i f i e r∗ 3 . x- p o s i t i o n∗ 4 . y- p o s i t i o n∗ /S t r i n g : de . ukn . h c i . s q u i d y . c o r e . d a t a .

P o s i t i o n 2 DS t r i n g : 49 015420 323751 8d o u b l e : 0 . 2 5d o u b l e : 0 . 1 7

reference implementations (e.g. Squidy Client for iPhoneOS10 and for Android OS11 that can be deployed on thesedevices, minimizing the effort and threshold of device inte-gration. However, if the hardware is not able to communi-cate via existing bridges natively, or if deployment of pro-prietary software is not desired or is not possible due tohardware restrictions, then users can add further bridges toallow communication, for instance through special-purposeprotocol bridges such as the Virtual-Reality Peripheral Net-work [29].

The support of multiple bridges as interfaces in combina-tion with the device-independent generic data types enablesa separation of the data sources and the signal processing inthe Squidy Core. This offers a simple but flexible integra-tion of new interaction techniques and modalities withouttouching existing core functionality. As with ICARE [4] orOpenInterface (www.oi-project.org), interaction techniquesdesigned with the user interface are completely decoupledfrom the individual hardware or the connected applications.Replacing devices (e.g. switching from the Apple iPhone tothe Microsoft Surface) therefore does not affect the appliedinteraction techniques (e.g. “selection by dwelling”) or theconcrete application also connected to a Squidy Bridge. Theindependent-bridge approach in combination with the gen-eralization of data types enables the integration of very het-erogeneous devices and toolkits in Squidy. Interaction tech-niques that have been defined once can be reused multipletimes. Squidy thus reduces complexity by abstraction, offershigh flexibility and enables rapid prototyping.

2.1.3 Squidy core

All data resulting from (multimodal) user interaction isbridged from devices to the Squidy Core. The core processes

10Squidy Client for iPhone OS: http://itunes.apple.com/app/squidy-client/id329335928.11Squidy Client for Android OS: http://sourceforge.net/projects/squidy-lib/files/Components/Squidy-Client-for-Android-OS.

Listing 2 Methods to insert new or changed data objects into thedataflow

/∗∗∗ P u b l i s h e s 1 . . . n d a t a o b j e c t s t o enhance t h e∗ d a t a f l o w s e m a n t i c s .∗ /p u b l i c vo i d p u b l i s h ( I D a t a . . . d a t a ) ;

/∗∗∗ P u b l i s h e s a d a t a c o n t a i n e r t h a t c o n s i s t s o f∗ an a r r a y o f d a t a o b j e c t s and a t imes t amp on∗ which t h e d a t a c o n t a i n e r has been r e l e a s e d .∗ /p u b l i c vo i d p u b l i s h ( I D a t a C o n t a i n e r

d a t a C o n t a i n e r ) ;

this data automatically and in parallel without any program-ming effort or further customizations. Users can define a fil-ter chain (processing chain) using visual dataflow program-ming provided by the visual user interface of the Squidy In-teraction Library. In order to process the interaction data,the Squidy Core provides a flexible API for manipulat-ing (CRUD – Create/Read/Update/Delete) the dataflow. Toinsert new or changed data objects into the dataflow, thepublish-method (Listing 2) of the API can be called at thedesired place in the pipeline. For instance, a gesture recog-nizer that has detected a pre-defined gesture will publisha new gesture object into the dataflow. These methods ac-cept 1 . . . n instances of data objects or a data container thatconsists of an array of data objects as well as a releasetimestamp. The interface ‘IData’ ensures the compatibilityof the published data objects with the generic data types de-fined in Squidy and specifies common methods and enumer-ations.

Furthermore, the Squidy Interaction Library comes withdiverse off-the-shelf filters for signal processing, data fu-sion, filtering and synchronization that provide the essentialfunctionalities for developing multimodal interfaces. Com-pared to OIDE [27] or SKEMMI [22], Squidy incorporatesthe facility to add new filters (including properties, algo-rithms, logic and descriptions) without the need for switch-ing to a different development environment. Therefore, thesource code is embedded and can be manipulated by usersdirectly. Changes made to the source code will be compiledand integrated on-the-fly and the new or changed functional-ity is thus instantly available to users. Each implementationof a filter owns a data queue and a processing thread with-out any effort on the developer’s part. The incoming datawill be enqueued until the processing thread dequeues datato perform custom data processing automatically [5]. Thus,the interaction library runs in a multi-threaded environmentthat allows concurrent data processing by each filter with-out blocking the complete process chain (e.g. a filter that iscurrently waiting for a system resource does not block other

Page 9: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 205

Listing 3 The “preProcess” stub grants access to all data of a datacontainer

/∗∗∗ D i v e r s e c o l l e c t i o n o f d a t a a c c e s s i b l e by∗ t h i s method s t u b b e f o r e i n d i v i d u a l∗ p r o c e s s i n g .∗ /p u b l i c I D a t a C o n t a i n e r p r e P r o c e s s (

I D a t a C o n t a i n e r d a t a C o n t a i n e r ) ;

Listing 4 Processing single data objects of a specified type at a time

/∗∗∗ P r o c e s s e s d a t a o f p a r t i c u l a r g e n e r i c d a t a∗ t y p e (DATA_TYPE i s a p l a c e h o l d e r f o r∗ t h o s e g e n e r i c d a t a t y p e s )∗ /p u b l i c I D a t a p r o c e s s (DATA_TYPE d a t a ) ;

filters during that time). This system of self-contained fil-ter components prevents side effects on the signal process-ing and thus aids users to design consistent and reliableinteraction techniques. Users can intercept a filter’s inter-nal processing by implementing simple pre-defined methodstubs similar to the concept of “Method Call Interception”.The following method stubs reflect different points of en-try that differ in the quantity and type of dequeued dataprovided. The processing thread determines in a certain se-quence whether a stub is implemented and then invokes thisstub using reflection.

In the “preProcess” stub (Listing 3), the collections ofdata types grouped within a data container are passed to themethod’s implementation. This is an easy way to access alldata at a glance or iterate through the data collection man-ually, e.g. to search for interaction patterns consisting of adiverse set of data types concerning multimodal interaction.Whenever it is sufficient to process one particular data in-stance at a time, the ’process’ method stub is appropriate.The code fragment in Listing 4 is a generic representationof such a process method stub.

In the case of the “process” stub (Listing 4), the SquidyCore iterates through the collection automatically. It there-fore does not have to be done programmatically as in the“preProcess” stub. Here, DATA_TYPE is the placeholderfor a generic data type (Sect. 2.1.1), offering a simple data-type filter for the dataflow. The Squidy Core only passes in-stances of that generic type to that method implementation.

Before the data collection is published to the next fil-ter of the processing chain or bridged back to any deviceor application, the data collection can be accessed throughthe “postProcess” stub (Listing 5). An example of using thispost processing is the functionality to remove duplicate datafrom the dataflow to reduce data-processing overhead.

Listing 5 All data objects of a data container are accessible throughthe “postProcess” stub after individual data processing

/∗∗∗ D i v e r s e c o l l e c t i o n o f d a t a a c c e s s i b l e by∗ t h i s method s t u b a f t e r i n d i v i d u a l∗ p r o c e s s i n g .∗ /p u b l i c I D a t a C o n t a i n e r p o s t P r o c e s s (

I D a t a C o n t a i n e r d a t a C o n t a i n e r ) ;

The Squidy Core uses the Java Reflection mechanism todetermine if a filter has implemented such a data intercep-tion and passes inquired data to the implementation auto-matically. Therefore, no additional effort is required for in-terface declaration, generation and compilation such as isneeded for the CIDL used by the OpenInterface framework(www.oi-project.org). This flexibility of the Squidy Core toquickly integrate or modify filter techniques provides the ca-pability often needed to rapidly and iteratively prototype in-teractive and multimodal interfaces.

Heterogeneous devices and toolkits can be easily tiedto the Squidy Interaction Library using existing SquidyBridges (OSC Bridge, Native Interface Bridge) or custombridge implementations (e.g. to integrate devices or toolk-its communicating via special protocols). The Squidy Coreprovides a multi-threaded environment to perform concur-rent data processing and thus increases data throughput,minimizes lag and enhances user’s experience while usingmultimodal interaction. A suitable API supports develop-ers to quickly implement new filters or change existing fil-ters without the need for recompilation or repackaging. Thethree-tier architecture covers usage by both interaction de-signers and developers, assists them with appropriate toolsand thus reduces mental activity to a minimum.

Currently we run applications based on Microsoft .Net,Windows Presentation Foundation and Surface SDK, AdobeFlash and Flex, OpenGL for C++ or JOGL as well asstandard Java technology. The Squidy Bridges combinedwith Squidy Client reference implementations provide var-ious external and integrated drivers and toolkits. Currently,Squidy supports the NUIGroup Touchlib, the Apple iPhoneSDK, the Android SDK and Microsoft Surface SDK formulti-touch interaction, the ART DTrack and the Natural-Point OptiTrack for finger gestures [9] and body-tracking,the libGaze for mobile eye-tracking [14], the iPaper frame-work for pen and paper-based interaction [28], the MicrosoftTouchless SDK for mid-air object tracking, the Phidgets APIfor physical prototyping and self-developed components forlaser pointer interaction [19], GPU-accelerated low-latencymulti-touch tracking (SquidyVision), Nintendo Wii Remoteand tangible user interface (TUI) interaction.

Page 10: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

206 J Multimodal User Interfaces (2010) 3: 197–213

Fig. 6 The Squidy KnowledgeBase is a searchable interfacefor accessing all implementedinput device and filter nodes

2.2 User interface concept

The Squidy user interface concept is based on the conceptof zoomable user interfaces. It is aimed at providing differ-ent levels of details and integrating different levels of ab-straction so that frequent switching of applications can beavoided. In the following subsections we will provide moredetails about the different user interface concepts.

2.2.1 Knowledge base

Squidy provides a wide range of ready-to-use devices andfilter nodes stored in an online knowledge base that is acces-sible within the Squidy user interface. An assortment is di-rectly offered at the bottom of the pipeline view (see Fig. 2).The selection and arrangement of the nodes are based on sta-tistics of previous usage and thus give a hint of suitable part-ners for the currently focused device or filter. This dynamicsuggestion may lead to a higher efficiency and also helpsnovice users to limit the otherwise overwhelming number ofavailable nodes to a relevant subset. The user can directlydrag a desired node from the selection (bottom) to the de-sign space for the pipeline (centre). If the desired node isnot part of the suggested subset, the user has the possibilityof accessing all nodes of the knowledge base by zoominginto the corresponding view at the bottom. Therein, dynamicqueries support the exploration (see Fig. 6). These are basedboth on automatically generated metadata about each nodeas well as user-generated tags.

2.2.2 Semantic zooming

In accordance with the assumption that navigation in infor-mation spaces is best supported by tapping into our nat-ural spatial and geographic ways of thinking [25], we use

a zoomable user-interface concept to navigate inside theSquidy visual user interface. When zooming into a node,additional information and corresponding functionalities ap-pear, depending on the screen space available (semanticzooming). Thus, the user is able to gradually define the levelof detail (complexity) according to the current need for in-formation and functionality.

2.2.3 Interactive configuration & evaluation

In contrast to the related work, the user does not have toleave the visual interface and switch to additional applica-tions and programming environments in order to get ad-ditional information, to change properties, or to generate,change or just access the source code of device drivers andfilters. In Squidy, zooming into a node reveals all parame-ters and enables the user to interactively adjust the valuesat run-time (see Fig. 7). The changes take place immedi-ately without any need for a restart, providing a direct rela-tionship between user interaction and application feedbackand thereby maintaining causality, as Card et al. puts it [8].This is especially beneficial for empirically testing a num-ber of different parameters (e.g. adjusting the noise levels ofa Kalman filter) because of the possibility of directly com-paring these settings without introducing any (e.g. tempo-ral) side effects. This process of interactive configurationand evaluation is much needed during the design of mul-timodal interaction, especially when using uncommon inter-action techniques and user interfaces. Squidy therefore fa-cilitates fast development iterations.

2.2.4 Details on demand

Going beyond the access and manipulation of parameters,Squidy provides illustrated information about the function-

Page 11: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 207

Fig. 7 View of a zoomedKalman filter node with table ofparameters. Parameter changesare applied immediately. Spatialscrolling with overview window(right) and temporal scrolling oflast changes (bottom) isprovided visually. Via automaticzooming, the user can accessfurther information about thenode (Fig. 8) and the filtersource code (Fig. 9)

Fig. 8 Information view of theKalman filter node providingillustrated descriptions about itsfunctionality

ality, usage and context of the node, and this information isdirectly embedded in the node. By zooming into the infor-mation view marked by a white “i” on a blue background(see Fig. 7), the information is shown without losing thecontext of the node. This information view (see Fig. 8) maycontain code documentation (e.g. automatically generatedby javadoc), user-generated content (e.g. from online re-sources such as wikipedia.org or the Squidy-Wiki) or specif-ically assembled documentation such as a product specifi-cation consisting of textual descriptions, images or videos.The interaction designer using Squidy does not need to opena web browser and has to search for online documentationsin order to get the relevant information. Due to the seman-

tic zooming concept the user specifies her information needimplicitly by navigating in the zoomable user interface andspatially filtering the information space.

2.2.5 Embedded code and on-the-fly compilation

The user even has the ability to access the source code(see Fig. 9) of the node by semantic zooming. Thus, codechanges can be made directly inside the design environ-ment. Assistants such as syntax highlighting or code com-pletion support the user even further. If the user zooms out,the code will be compiled and integrated on the fly, againwithout needing to restart the system. Users may also gen-erate new input and output devices or filters by adding an

Page 12: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

208 J Multimodal User Interfaces (2010) 3: 197–213

Fig. 9 Source Code of thecorresponding device or filternode is directly accessible bysemantic zooming.Zooming-out leads to runtimecompilation of the source codeand live integration into thecurrent pipeline

empty node and augmenting it with applicable code. In or-der to minimize the threshold for the first steps and to re-duce the writing effort, the empty node already contains allrelevant method definitions for data handling and process-ing. Therefore, only the desired algorithm has to be filledin the suitable method body of the node. By zooming outthe new node is compiled and it is than immediately readyfor usage. In order to share the new node with the commu-nity the user can publish it into the knowledge base (seePublish-button in Figs. 3 and 2). The design rationale is notto replace the classical development environments such asMicrosoft Visual Studio or Eclipse, but rather to integratesome of their functionality directly into Squidy. Thereby, weprovide a unified environment that seamlessly integrates themost relevant tools and functionalities for the visual designand interactive development of multimodal interfaces.

2.2.6 Dataflow visualization—visual debugging

The visual design of an interaction technique requires aprofound understanding of the data flow and the semanticsof the designed pipeline. For instance, to detect and ana-lyze interaction patterns such as gestures or multimodal in-put, researchers or interaction designers should be able toquickly get an overview of the interaction data flow during aparticular time span. In compliance with the pipe-and-filtermetaphor, we integrate a data-flow visualization at the centreof each pipe (see Fig. 2). This simple yet powerful view (seeFig. 10) visualizes the data flow through its correspondingpipe with respect to its temporal and spatial attributes. At aglance, users are able to inspect a massive amount of data,as well as data occurring in parallel, according to its spatiallocation and chronological order.

Direct manipulation of the time span allows the user toadjust the range to their current need. The visual representa-tion of data depends on the type that the data belongs to (e.g.representation of a position in 2D differs from the represen-tation of a gesture being recognized—see Fig. 10). Thus,users benefit from the insight into the interaction data flowby getting a better understanding of the effect of differentparameter settings.

Every node in Squidy operates strictly within its ownthread and therefore implies multi-threading and concur-rent data processing without any additional effort. This al-lows a higher bandwidth and enhances the data throughput.Nevertheless, users may produce errors while implementingnodes or use incorrect parameter settings. This can causeside effects (e.g. array index out of bounds) that in conse-quence may lead to an inaccurate interaction technique ora local error (other nodes run in separate threads and aretherefore unaffected). Thus, the design environment sup-plies each project, pipeline, node and pipe (in the follow-ing we call these shapes) with a visual colour-coded outer-glow effect (see Fig. 2) that represents the node’s currentstatus. Three distinct colours (green, red, grey) are uniquelymapped to a class of conditions. A green glowing shape in-dicates a properly operating node implementation runningunderneath. Additionally, pipes possess a green illuminationwhen interaction data is flowing or has recently been flow-ing through them. The red glow indicates that an error hasoccurred during execution of node implementation (e.g. un-handled exception—NullPointerException). Then, all con-nected outgoing pipes to a defective pipeline or node aregiven the same error colour-coding status to enhance errordetection and allow faster error correction. Shapes that arenot set as activated (not running) and pipes that currently do

Page 13: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 209

Fig. 10 Dataflow visualizationshowing the values of allforwarded data objects of a pipewithin a defined time span

not have interaction data flowing through receive a grey illu-mination. Thereby, without any need for interaction, the usercan perceive the status of the data flow between the nodes ofthe pipeline.

Occasionally, researchers or interaction designers requirethe capability to preclude parts of interaction data from be-ing propagated to nodes (e.g. disabling the buttons pressedon a laser pointer and instead using gestures to control anapplication). Thus, a pipe provides two opportunities to nar-row the set of interaction data flowing through it. The firstpossibility for reducing the set of interaction data is beforedata is visualized by the dataflow visualization. This allowsthe user to visually debug designated types of data. The sec-ond possibility is immediately after the data comes out of thedataflow visualization. The user can visually debug the databut nevertheless prevent it from being forwarded to nodesconnected downstream. Users are able to zoom into the data-type hierarchy view (see Fig. 4) and select (which meansthis data is forwarded) or deselect a data type by clickingon it. In Fig. 4 all data types are selected and therefore havea coloured background. A deselected data type would justhave a coloured border.

3 Squidy use cases

Over the last two years, we iteratively developed, used andenhanced Squidy during the course of applying it in sev-eral diverse projects. The starting point was the need for an

infrastructure that facilitates the design and the evaluationof novel input devices and interaction techniques in multi-modal and ubiquitous environments.

The first input device that we implemented with Squidywas an interactive laser pointer. This enabled a flexible inter-action with a large, high-resolution display such as the Pow-erwall located at the University of Konstanz (221 inches,8.9 megapixels) from any point and distance [19]. A ma-jor issue of this interaction technique was the pointing im-precision introduced by the natural hand tremor of the userand the limited human hand-eye coordination [21]. Squidyimproved the design and research process by providing theopportunity to interactively implement, change and empiri-cally test diverse smoothing techniques without introducingside effects. In an iterative approach the dataflow was vi-sualized, the filter logic was adapted, the filter parameterswere optimized, and the resulting interaction technique wasfinally evaluated based on a Fitts’ Law Tapping Test (ISO9241-9), which is also provided in Squidy as a ready-to-usecomponent (see Fig. 12). Thus, Squidy supported the entiredevelopment lifecycle, resulting in a very efficient and ef-fective project progress.

In a follow-up project we specifically made use of theseparation of the three layers in Squidy since we could eas-ily apply the laser pointer interaction to an artistic instal-lation. This scenario utilized the laser pointer for interac-tion but came with a very different display and visualiza-tion technique. Surrounded by 360◦-satellite images of theearth, visitors to the “Globorama” installation explored the

Page 14: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

210 J Multimodal User Interfaces (2010) 3: 197–213

Fig. 11 Globorama installation on a 360◦ panoramic screen. On the left: top view of the panorama screen with 8 m in diameter. On the right:Visitors exploring the Globorama installation with an interactive laser pointer

Fig. 12 Laser Pointer Interaction in front of a large high-resolutiondisplay. Squidy facilitated the integration and evaluation of precisionenhancing and smoothing techniques, allowing the precise selection oftargets as small as 22 mm in diameter from a 3 m distance [21]

entire globe with the laser pointer and submerged at selectedpoints in geo-referenced pano-ramic photographs or web-cam images of the respective location (see Fig. 11). The visi-tors were able to physically move inside the 360◦ panoramicscreen (8 m in diameter, 8192 × 928 px) while dynamicallycontrolling zooming and panning and selecting interestingareas [20]. We also augmented the laser pointer with a vi-brator for tactile feedback and multiple RGB-LEDs for vi-sual feedback on the device. The tactile feedback was givenwhenever the visitor moved the cursor over an active el-ement such as a geo-referenced photograph and the colorLEDs visualized the current status of the system. The “Glob-orama” installation was exhibited at the ZKM Center for Artand Media in Karlsruhe (2007) and at the ThyssenKruppIdeenpark 2008 in Stuttgart.

Squidy was also used to design functional prototypes forpersonal information management with interactive televi-sion sets [15]. For this application domain, the NintendoWii, in its role as a standard input device for home enter-tainment, was integrated into Squidy. Although the device,the application, and the display were completely differentto the previous projects, the smoothing filters implemented

for the laser pointer could be applied to the Wii and proved

to be very beneficial, since both the Nintendo Wii and the

laser pointer share an important similarity in being absolute

pointing devices. Furthermore, the wiigee gesture recogni-

tion toolkit [26] was integrated into Squidy to enable three-

dimensional gestures with the Nintendo Wii. Although the

toolkit was originally designed for the Nintendo Wii, the

laser pointer can be used interchangeably, since Squidy uni-

fies the individual data types of the devices with the generic

data types commonly defined in Squidy.

In the context of Surface Computing, we conceptually

and technically combined multiple touch-sensitive displays

aiming to provide a more ubiquitous user experience based

on the naturalness and directness of touch interaction [16].

In this scenario, we integrated mobile handhelds (Apple

iPhone) as personal devices as well as shared multi-touch

tables (Microsoft Surface) and large high-resolution walls

(eyevis Cubes) for collaborative design and visual informa-

tion organization. In order to facilitate multimodal input and

context-aware applications, we integrated speech recogni-

tion, mobile eye tracking [14] and freehand gestures [9].

To further close the gap between the digital and the phys-

ical world, we enhanced this environment with digital pens

for interactive sketching and the possibility of interacting

with physical tokens on the diverse multi-touch displays (see

Fig. 14). All of these techniques were integrated in, and

driven by, Squidy and installed in our interaction lab known

as the Media Room (see Fig. 13). This physical infrastruc-

ture in combination with Squidy as the common software

infrastructure gives an ideal design and development envi-

ronment for researchers and interaction designers develop-

ing the user interfaces of the future.

Page 15: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 211

4 Conclusion and future work

“Creating interactive systems is not simply the activity oftranslating a pre-existing specification into code; there is sig-nificant value in the epistemic experience of exploring alter-natives” (Hartmann et al. [13]).

This statement is especially true for the design of mul-timodal interfaces, since there is no well established bodyof knowledge and no ready-to-use solution for multimodalinterfaces the designer can take advantage of. Interaction

Fig. 13 The Media Room is a lab environment which provides a va-riety of input and output devices with different modalities and formfactors. Squidy serves as the basic technology to integrate these differ-ent devices as well as to configure and evaluate them

designers need to physically explore and prototype new in-teraction modalities and therefore require development en-vironments that especially support the interactivity and thedynamic of this creative development process. We presentedthe Squidy Interaction Library that supports the interactivedesign of multimodal user interfaces with a three-part con-tribution. First, it provides a software architecture that offersthe flexibility needed for rapid prototyping and the possi-

Fig. 14 Multi-touch surfaces augmented with physical tokens used inthe context of blended museum

Fig. 15 This cloud shows how Squidy contributes to the developmentlifecycle of multimodal interaction techniques. Each phase in the life-cycle, whether it is the design and prototyping, the implementation and

testing, or the usability evaluation phase is surrounded by a variety ofSquidy features that support the interaction designer or developer dur-ing this activity

Page 16: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

212 J Multimodal User Interfaces (2010) 3: 197–213

bility to integrate a vast variety of heterogeneous input de-vices and signal processing filters. Second, the Squidy vi-sual user interface introduces a new user interface conceptthat combines visual dataflow programming with semanticzooming in order to reduce the visual and technical com-plexity. This visual approach enables also a high degree ofinteractivity that is further supported by the fluid integrationof code views, filter mechanisms and visualization tools.Third, the Squidy Interaction Library does not only focus onthe rapid prototyping, but also provides advanced develop-ment techniques and tool-support for empirical evaluationof the developed interfaces. Figure 15 shows a high-levelfeature cloud of the Squidy Interaction Library with respectto the different development phases. The appropriateness ofSquidy to the actual design and research process was prac-tically shown by the presented use cases. Additionally, wewill conduct qualitative usability tests in order to validateand inform the design of the Squidy user interface concept.Up to now, the Squidy Interaction Library has not providedmulti-user support. This, and the integration of version con-trolling, will be future work.

The Squidy Interaction Library is free software and pub-lished at http://hci.uni-konstanz.de/squidy/ under the licenceof the LGPL.

References

1. Ballagas R, Ringel M, Stone M, Borchers J (2003) Istuff: a phys-ical user interface toolkit for ubiquitous computing environments.In: CHI ’03: Proceedings of the SIGCHI conference on humanfactors in computing systems. ACM, New York, pp 537–544

2. Ballagas R, Memon F, Reiners R, Borchers J (2007) Istuff mo-bile: rapidly prototyping new mobile phone interfaces for ubiqui-tous computing. In: CHI ’07: Proceedings of the SIGCHI confer-ence on human factors in computing systems. ACM, New York,pp 1107–1116

3. Benoit A, Bonnaud L, Caplier A, Jourde F, Nigay L, Serrano M,Damousis I, Tzovaras D, Lawson J-YL (2007) Multimodal signalprocessing and interaction for a driving simulator: Component-based architecture. J Multimodal User Interfaces 1(1):49–58

4. Bouchet J, Nigay L (2004) Icare: a component-based approachfor the design and development of multimodal interfaces. In: CHI’04: CHI ’04 extended abstracts on human factors in computingsystems. ACM, New York, pp 1325–1328

5. Bouchet J, Nigay L, Ganille T (2004) Icare software componentsfor rapidly developing multimodal interfaces. In: ICMI ’04: Pro-ceedings of the 6th international conference on multimodal inter-faces. ACM, New York, pp 251–258

6. Buxton W (1983) Lexical and pragmatic considerations of inputstructures. SIGGRAPH Comput Graph 17(1):31–37

7. Card SK, Mackinlay JD, Robertson GG (1991) A morphologicalanalysis of the design space of input devices. ACM Trans Inf Syst9(2):99–122

8. Card SK, Newell A, Moran TP (1983) The psychology of human-computer interaction. Erlbaum Associates, Hillsdale

9. Foehrenbach S, König WA, Gerken J, Reiterer H (2008) Naturalinteraction with hand gestures and tactile feedback for large, high-res displays. In: MITH 08: Workshop on multimodal interactionthrough haptic feedback, held in conjunction with AVI 08: inter-national working conference on advanced visual interfaces

10. Gellersen H, Kortuem G, Schmidt A, Beigl M (2004) Physicalprototyping with smart-its. IEEE Pervasive Comput 3(3):74–82

11. Greenberg S, Fitchett C (2001) Phidgets: easy development ofphysical interfaces through physical widgets. In: UIST ’01: Pro-ceedings of the 14th annual ACM symposium on user interfacesoftware and technology. ACM, New York, pp 209–218

12. Harper R, Rodden T, Rogers Y, Sellen A (2008) Being human:human-computer interaction in the year 2020. Microsoft Research,Cambridge

13. Hartmann B, Abdulla L, Mittal M, Klemmer SR (2007) Author-ing sensor-based interactions by demonstration with direct ma-nipulation and pattern recognition. In: CHI ’07: Proceedings ofthe SIGCHI conference on human factors in computing systems.ACM, New York, pp 145–154

14. Herholz S, Chuang LL, Tanner TG, Blthoff HH, Fleming R (2008)Libgaze: Real-time gaze-tracking of freely moving observers forwall-sized displays. In: VMV ’08: Proceedings of the 13th inter-national workshop on vision, modeling, and visualization

15. Jetter H-C, Engl A, Schubert S, Reiterer H (2008) Zooming notzapping: Demonstrating the ZOIL user interface paradigm for itvapplications. In: Adjunct proceedings of European interactive TVconference, Salzburg, Austria, July 3–4, 2008. Demonstration Ses-sion

16. Jetter H-C, König WA, Reiterer H (2009) Understanding and de-signing surface computing with ZOIL and squidy. In: CHI 2009workshop—multitouch and surface computing

17. Kaltenbrunner M, Bovermann T, Bencina R, Costanza E (2005)Tuio—a protocol for table based tangible user interfaces. In: Pro-ceedings of the 6th international workshop on gesture in human-computer interaction and simulation

18. Kirsh D, Maglio P (1994) On distinguishing epistemic from prag-matic action. Cognitive Sci 18(4):513–549

19. König WA, Bieg H-J, Schmidt T, Reiterer H (2007) Position-independent interaction for large high-resolution displays. In:IHCI’07: Proceedings of IADIS international conference on inter-faces and human computer interaction 2007. IADIS Press, pp 117–125

20. König WA, Böttger J, Völzow N, Reiterer H (2008) Laserpointer-interaction between art and science. In: IUI ’08: Proceedings of the13th international conference on Intelligent user interfaces. ACM,New York, pp 423–424

21. König WA, Gerken J, Dierdorf S, Reiterer H (2009) Adaptivepointing: Design and evaluation of a precision enhancing tech-nique for absolute pointing devices. In: Interact 2009: Proceed-ings of the twelfth IFIP conference on human-computer interac-tion. Springer, Berlin, pp 658–671

22. Lawson J-YL, Al-Akkad A-A, Vanderdonckt J, Macq B (2009) Anopen source workbench for prototyping multimodal interactionsbased on off-the-shelf heterogeneous components. In: EICS’09:Proceedings of the first ACM SIGCHI symposium on engineeringinteractive computing. ACM, New York

23. MacKenzie IS, Ware C (1993) Lag as a determinant of humanperformance in interactive systems. In: CHI ’93: Proceedings ofthe INTERACT ’93 and CHI ’93 conference on human factors incomputing systems. ACM, New York, pp 488–493

24. Myers B, Hudson SE, Pausch R (2000) Past, present, and future ofuser interface software tools. ACM Trans Comput-Hum Interact7(1):3–28

25. Perlin K, Fox D (1993) Pad: an alternative approach to the com-puter interface. In: SIGGRAPH ’93: Proceedings of the 20th an-nual conference on computer graphics and interactive techniques.ACM, New York, pp 57–64

26. Schlömer T, Poppinga B, Henze N, Boll S (2008) Gesture recog-nition with a Wii controller. In: TEI ’08: Proceedings of the 2ndinternational conference on Tangible and embedded interaction.ACM, New York, pp 11–14

Page 17: Interactive design of multimodal user interfacescourseinfo.ligent.net/2017sp/related_files/konig_2010.pdf · search. Keywords Multimodal user interfaces · Post-WIMP user interface

J Multimodal User Interfaces (2010) 3: 197–213 213

27. Serrano M, Nigay L, Lawson J-YL, Ramsay A, Murray-Smith R,Denef S (2008) The openinterface framework: a tool for multi-modal interaction. In: CHI ’08: Extended abstracts on human fac-tors in computing systems. ACM, New York, pp 3501–3506

28. Signer B, Norrie MC (2007) Paperpoint: a paper-based presenta-tion and interactive paper prototyping tool. In: TEI ’07: Proceed-ings of the 1st international conference on tangible and embeddedinteraction. ACM, New York, pp 57–64

29. Taylor RM, II, Hudson TC, Seeger A, Weber H, Juliano J, HelserAT (2001) VRPN: a device-independent, network-transparent VRperipheral system. In: VRST ’01: Proceedings of the ACM sympo-sium on virtual reality software and technology. ACM, New York,pp 55–61

30. Wallace VL (1976) The semantics of graphic input devices. SIG-GRAPH Comput Graph 10(1):61–65