Techniques and Applications of Wearable Augmented Reality...

20
HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO Audio Engineering Society ConventionPaper 5768 Presented at the 114th Convention 2003 March 22–25 Amsterdam, The Netherlands This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Techniques and applications of wearable augmented reality audio Aki Härmä 1 , Julia Jakka 1 , Miikka Tikander 1 , Matti Karjalainen 1 , Tapio Lokki 2 , Heli Nironen 2 , Sampo Vesa 2 1 Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015, HUT, Finland 2 Telecommunications Software and Multimedia Laboratory, Helsinki University of Technology, P.O. Box 5400, 02015, HUT, Finland Correspondence should be addressed to Aki Härmä ([email protected]) AES 114 TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 1

Transcript of Techniques and Applications of Wearable Augmented Reality...

Page 1: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

Audio Engineering Society

ConventionPaper 5768Presented at the 114th Convention

2003 March 22–25 Amsterdam, The Netherlands

This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, orconsideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may beobtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York10165-2520, USA; also seewww.aes.org . All rights reserved. Reproduction of this paper, or any portion thereof,is not permitted without direct permission from theJournal of the Audio Engineering Society.

Techniques and applications of wearableaugmented reality audio

Aki Härmä1, Julia Jakka1, Miikka Tikander1, Matti Karjalainen1, Tapio Lokki2, Heli Nironen2, Sampo Vesa2

1Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015, HUT, Finland2Telecommunications Software and Multimedia Laboratory, Helsinki University of Technology, P.O. Box 5400, 02015, HUT,Finland

Correspondence should be addressed to Aki Härmä ([email protected] )

ABSTRACTThe concept of augmented reality audio characterizes techniques where real sound environment is extendedwith virtual auditory environments and communications scenarios. This article introduces a frameworkfor Wearable Augmented Reality Audio (WARA) based on a speci�c headset con�guration and a real-timeaudio software system. We will review relevant literature and aim at identifying most potential applicationscenarios for WARA. Listening test results with a prototype system will be presented.

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 1

Page 2: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

1 INTRODUCTIONThe era of wearable audio appliances started with the in-troduction of a portable cassette player two decades ago.The development of digital technology led to portableCD players and finally to fully digital mp3 players. Digi-tal cellular phones have been around for ten years. Whilespeech communications is the main application, manymanufacturers have recently integrated a digital audioplayer to a phone to enable high-quality audio playback.However, the basic application scenario for widebandaudio is still the same as in early Walkmans. Anotheraspect standing with the aging population in developedcountries is that the number of users for hearing aid de-vices is constantly increasing. With digital technologythe quality of hearing aid devices has significantly im-proved while the prices are dropping and thus preparingthe way for even higher number of users. Yet anotherapplication that relates to this is personal active hearingprotectors.

We may have multiple wearable audio appliances butonly one pair of ears. At some point it may make senseto integrate all those functions into the same physical de-vice. Mechanical and electrical integration is alreadyfeasible. However, in application scenarios there aremany interesting new possibilities and problems to ex-plore. Also, the progress in speech and audio technology,computing, and communications predict introduction ofcompletely new types of intelligent and interactive audioand speech applications. For example, auditory displayswhich can provide a user different types of informationin the form of spatialized sound events have been intro-duced.

We consider a device which a user could be wearing at alltimes. It would resemble portable audio players in somerespects and also provide speech and audio communica-tions services, e.g., over a wireless network. But, at thesame time, it would also make it possible for a user tohear and interact with the real acoustic environment in anatural way. Thus it would facilitate or even make easierordinary speech communications with other people, en-able safe navigation in traffic, and operation of machineswhere acoustic feedback is important. In addition, therewould be a large number of new functions which provideinformation and communications channels which are notavailable in a natural acoustic environment or in currentappliances.

The possibility to hear the natural acoustic environment

around a user differentiates the concept of augmentedreality audio from the traditional concept of a virtualaudio environment where a user is typically immersedinto a completely synthetic acoustic environment. Theproposed system for WARA requires specific transducersystems and auralization techniques. In the prototypesystem introduced in this article the transducer config-uration is based on a headset where miniature micro-phones are integrated into earphone elements in bothears. When microphone sounds are routed directly toearphones, a user can perceive a representation of thereal acoustic environment. Since the experience may dif-fer from the open-ear case we call this representation apseudo-acousticenvironment. It has been demonstratedthat users can adapt to a modified binaural representation[1]. Virtual and synthetic sound events, such as talk of aremote user, music, or audio markers are superimposedonto the pseudo-acoustic sound environment in a devicewhich may be called an augmentation mixer. At one ex-treme, virtual sounds can be combined with the pseudo-acoustic signals in the augmentation mixer in such a waythat a user may not be able to determine which soundsources are local and which ones are artificially renderedby means of digital signal processing. Preliminary lis-tening tests show that this is relatively easy to achievein using personalized in-situ HRIRs in the synthesis ofvirtual sounds.

Real-time implementation of WARA system requireslow latency for audio and seamless integration of audiostreams, signal processing algorithms and network com-munications. We will introduce a modular and flexiblesoftware architecture developed for testing of differentapplication scenarios.

One class of applications is based on localized audiomessages. For example, we may considerauditory Post-It application where a user can leave and receive audiomessages related to certain places or objects in our en-vironment. We review the technology for detecting thelocation and orientation of a user in those of applicationsand report recent experimental results. In speech com-munication applications the use of binaural microphonesmakes it possible to explore new ways of communicationbetween people.

We start with an introductory section where we put theWARA concept into a wider context and define the cen-tral parts of the proposed framework. Then we reviewthe literature and give an organized representation of pre-

Page 3: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

vious works and ideas in related application fields andhardware solutions available at the time of writing thispaper. In Section 4 and 5 we introduce a software plat-form developed for testing of different techniques andapplications.

2 REAL, VIRTUAL, AND AUGMENTED AU-DIO ENVIRONMENTSThe basic difference between real and virtual sound en-vironments is that virtual sounds are originating fromanother environment or are artificially created. Aug-mented reality audio (or augmented audio reality) com-bines these aspects in a way where real and virtual soundscenes are mixed so that virtual sounds are perceived asan extension to the natural ones. At one extreme an aug-mented reality audio system should pass a test which isclosely related to the classicalTuring testfor artificial in-telligence [2]. That is, if a listener is unable to determinewhether a sound source is a part of the real or a virtualaudio environment, the system implements a subjectivelyperfect augmentation of listener’s acoustic environment.At the other extreme, virtual auditory scenes could berendered in high quality such that they are easily sepa-rable from real ones by their characteristics that are notpossible in normal acoustic environments.

In the current paper the focus is in developing techniquesfor future wearable applications. Hence, it is clear thattransducers used for producing virtual sounds must bewearable. In practice, headphones or earphones are mostviable alternatives. Headphones have been used success-fully in many virtual reality applications reviewed in Sec-tion 3. Figs. 1 a and b illustrate a user in a real acousticenvironment and with a headphone-based virtual acous-tic system where sound environment is created using acomputer.

2.1 Headphone problems in virtual spatial audioHeadphone auralization often leads to a perceived effectof having the virtual source localized inside listener’shead. This is usually calledintracranial, or inside-the-head locatedness (IHL) [3]. Spatial localization of soundsources which are perceived to be inside listener’s headis termedlateralization. It has been demonstrated that alistener can make a clear distinction in headphone listen-ing betweenlocalized(that is, sounds outside-of-head)and lateralized sound sources and that these two typescan coexist in listener’s experience [4].

The effect of a lateralized sound in headphone listeningcan be produced using amplitude and delay differences in

a)

b)

Fig. 1: A listener in a) real and b) virtual environment.

two headphone channels corresponding to each source.In order to make a sound sourceexternalized, more so-phisticated binaural techniques are needed [5]. In partic-ular, we may list the following aspects

1. Spectrum differences in the two ear signals due tohead-related transfer functions, HRTF’s. In lab-oratory environmentpersonalizedHRTF’s can beused to produce a realistic illusion of an external-ized source [6]. However, there is a great variabilityin HRTF’s among subjects.

2. Acoustic cues such as the amount of reverberationin virtual sound. It has been demonstrated that theuse of artificial reverberation can help in formingan externalized sound image in headphone listening

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 3

Page 4: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

[7].

3. Dynamic cues related to head turning and othermovements of a listener or a source. The virtualsource should be stable in relation to the environ-ment. In headphone listening this requires that theauralization processor is controlled by informationon the position and orientation of listener’s head,see review on head-tracking techniques in Section6.

4. Multimodality aspects such as connection of asound event to a visible real-world object.

5. User’s expectations on the performance of the sys-tem also affect externalization and localization ofvirtual sound sources, see, e.g., [8].

2.2 The proposed transducer systemIn this study, we focus on a rather specific type ofmicrophone-earphone systems. There are many other al-ternatives but this was chosen as a starting point becauseit seems to have many beneficial properties and it directlyfacilitates testing of rather unconventional applicationsand services. This is a microphone-earphone systemwhere microphones are mounted directly on earplug-typeearphone elements and are therefore placed very close tothe ears. Ideally, the whole system could fit in user’s earsand there would be no additional wires. The wearabil-ity of this device would be the same as for a hearing aiddevice. Earplug-type earphones have a low power con-sumption which results in extended operating times andlow weight of the device. For testing we constructed twodifferent types of headsets. Attenuation of direct soundin model I measured in an anechoic chamber with an au-diometer is plotted in Fig. 2. In model I, the earphonesare of earplug-type and provide 10-30 dB attenuation ofdirect sound. In model II, the headphones are traditionalopen earphones placed at the entrance of the ear canaland providing only 1-5 decibel attenuation. The right earelement of the model II is shown in Fig. 3.

The use of earplug-type headphones makes it possibleto control the signals entering listener’s ears more accu-rately than, e.g., with open headphones. For example,one may mute or amplify sounds selectively if the directacoustic propagation of sound into the ears is suppressedby the blocking of the ear canal by the device. The atten-uation of external sounds is one of the parameters whichneed to be studied in comparing different transducer sys-tems.

103

−40

−35

−30

−25

−20

−15

−10

−5

0

Frequency

Atte

nuat

ion

Attenuation of WARA headset model I ear plugs

Fig. 2: Attenuation in headset model I (earplug).

Microphones are located in the ears or very close to theears at both sides. When microphone signals are routeddirectly to earphones the system exposes a user to a ’bin-aural’ representation of the real acoustic environmentaround the user. However, in practice it is almost im-possible to position and tune transducers so that signalsentering listener’s ears are identical to those in the open-ear case. Hence, the produced spatial impression cor-responding to the real acoustic environment is also al-tered. To make a distinction between the physically realacoustic environment and its electronic representation inuser’s ears we call the latter one a pseudo-acoustic envi-ronment. This is illustrated in Fig. 4 a, where a user iswearing a microphone-earphone system.

2.3 Pseudo-acoustic environmentThe pseudo-acoustic environment is a modified represen-tation of the real acoustic environment around a user. Inmany applications it is most convenient to try to makethe pseudo-acoustic environment as identical to the realenvironment as possible. In principle, this could beachieved by means of digital filtering of microphone sig-nals. Equalization filters could be estimated in a mea-surement where a probe microphone was inserted intothe ear canal. However, this is difficult and would proba-bly lead to highly individualized filters specific for someparticular piece of equipment. Therefore, in the currentphase, we only try to control the signal level and do somecoarse equalization of signals to make thepseudoenvi-ronment sound as natural as possible. Accordingly, somedifference is expected to remain in the user’s spatial im-pression. However, it has been demonstrated in manyexperiments that listeners can adapt to atypical [1] orsu-

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 4

Page 5: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

Fig. 3: Model II headset element with an open-type ear-phone (constructed of a Sony MDR-ED268LP earphoneand an electret microphone element). Position of the mi-crophone element is indicated by the arrow.

pernormal[9] binaural inputs. Therefore, we may expectthat a user could adapt to the use of the proposed trans-ducer system, too.

The proposed model could also be used to produce amodified representation of reality which could be advan-tageous, more convenient, or just an entertaining featurefor a user. For example, in some cases the system wouldprovide hearing protection, hearing aid functions, noisereduction, spatial filtering, or emphasize important soundsignals such as alarm and warning sounds.

2.4 Virtual sound environmentA virtual sound enviroment is created using traditionaltechniques of auralization [10, 11, 12, 13]. This mainlyinvolves binaural filtering using HRTF filters. However,this may not be sufficient for good results in externaliza-tion of sound sources especially in a reverberant pseudo-acoustic environment. Therefore it is probably necessaryto bring some early reflections and reverberation also tothe virtual sound environment to make it match betterwith the local environment. It is currently an open re-search issue how this should actually be done and there-fore many different approaches have to be tested.

In some applications the rendered virtual sound environ-ment should be independent of the position and rotationof user’s head. The sound source should belocalizedandoften somehow connected to the real environment aroundthe user. In those applications some system for finding

the position and orientation of a user is needed. Severalalternative techniques for head-tracking are reviewed inSection 6.

2.5 Augmented reality audio environmentAn augmented audio environmentis produced by super-imposing virtual sound environmentonto thepseudo-acoustic environment. First, the local environmentshould be delivered to a user in such a way that its loud-ness and binaural properties are acceptable for the user.Secondly, virtual sound environment should be mixedwith the local environment carefully to produce a co-herent perception for a user. A goal is to find the bestmixing rulesfor local and virtual enviroments leading toa meaningful fusion of these two main components ofaugmented reality audio. In Fig. 4b the mixing of thepseudo-acoustic and virtual audio environments is per-formed in a device which is called augmented reality au-dio (ARA) mixer.

a)

b)

ARA

mixer

ARA

mixer

Fig. 4: A listener in a) pseudo-acoustic and b) augmentedenvironment.

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 5

Page 6: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

Fig. 5: A listener in an augmented environment and an-other user experiencing telepresence based on a real-timebinaural recording.

The proposed scheme makes possible a number of differ-ent application ideas. One of them is illustrated in Fig.5. The person on the left hand side is using the WARAsystem introduced earlier and the person on the right ata remote location can experience telepresence based ondirect transmission of first user’s pseudo-acoustic envi-ronment. For the user on the right, the ARA mixer wouldcombine thelocal pseudo-acoustic environment withre-motepseudo-acoustic environment to produce a specifictype of augmented audio reality experience.

2.6 Listening testA preliminary listening test was carried out in order toevaluate the level of performance of the WARA system invirtually imitating localized pseudo-acoustic sound. Thetest was carried out in a standardized listening room andauthors of this paper were used as test subjects. The setupconsisted of a head-supporting chair for the test subjectwearing the WARA headset, and a loudspeaker not visi-ble to the subject. A similar test setup has been used ear-lier in studies on externalization of virtual sources, e.g.,in [6]. The listening test system is illustrated in Fig. 6

In the beginning of the test session, an impulse responsefrom the loudspeaker to the WARA headset microphonesin the test subject’s ears was measured. The binauralimpulse response contains both the subject’s individualhead-related impulse response in relation to the locationof the loudspeaker, as well as the room response. Twotypes of test signals were then played for the test subjectto compare. The pseudo-acoustic signals were acousti-

HRIR R

HRIR L

HRIR LHRIR R

FILTERING

ESTIMATING

INPUT SIGNAL

LOUDSPEAKER

Fig. 6: The listening test configuration. Head-relatedRoom Impulse Responses (HRIR) were measured foreach listener. The HRIRs were then used to synthesizetest signals for headset playback. In a listening test,test signals were switched between loudspeaker and ear-phone playback.

cally dry signals played from the loudspeaker. The vir-tual test signals were the same signals filtered with themeasured binaural response in order to auralize them.The test was a two-alternative forced choice test, wherethe subject was asked which one of the signals is playedfrom the loudspeaker.

As both types of the test signals are perceived throughthe earphones, hypothetically no difference in the sig-nals should be audible. On the other hand, a perceivabledifference may occur due to the direct sound leaking intothe ear past the headset system in the case of loudspeakerplayback, and also the low frequency partials being con-ducted to the ear by bone. In particular, the model IIheadset attenuated direct sound leaking into the ear byonly a few decibels. In the listening test, the level of thepseudoacoustic environment in listeners ears was 3-5 dBhigher than would be without the headset. The listen-ing test results show that indeed pseudo-acoustic soundcan be almost indistinguishably imitated. The test sub-jects were able to recognize or guess the source of thesound correctly only in 68% of the cases (pure guessinggives 50 %). The sources of the samples with broaderfrequency band were easier to distinguish, whereas dis-

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 6

Page 7: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

tinguishing the sources of simple instrument sounds orspeech was almost impossible. It seems that the pseu-doacoustic sound masks the leaking direct sound veryefficiently.

The test was carried out using the WARA model II head-set. Similar results are obtained with the headset modelI with a limited number of listeners. A more extensivelistening test on mixing of pseudo-acoustic and virtualsound environments is currently being prepared.

3 TAXONOMY AND OVERVIEW OF PREVI-OUS APPLICATIONSDifferent types of applications can be categorized in

many ways. Some cases are clearly examples ofcom-municationswhile in some cases they can be seen asin-formation services. Another meaningful division is toconsider a particular application to representhuman-to-humanor human-to-machinecommunications. Anotherimportant division is in the way how virtual audio eventsare connected to the real environment. In some cases cre-ated virtual events arelocalizedand in other casesfreely-floating. In the following, ideas and applications havebeen categorized into three different classes.

3.1 Speech communicationsIt can be expected that speech communications will re-main as one of the most important applications for wear-able augmented reality audio systems. However, it maytake many different forms. As a communications appli-cation Figure 1b would show the case where a remotetalker is represented to a user using techniques of binau-ral auralization. Fig. 4 b shows the case of augmentedreality audio where the virtual source is mixed with thepseudo-acoustic environment around the user. Concep-tually this can be interpreted as a situation where theremote talker has been rendered to user’s local environ-ment. Figure 5 illustrates a quite different scenario wherethe user on the right hand side is virtually transported tothe location of the first talker. This is called telepresence.

The idea of telepresence was approached already in1933, when binaural hearing was examined by lettingtest persons with two receivers listen to sound exam-ples through microphones in a dummy head’s ears, thedummy head being in another room [14].

Cohen et al. introduced a system where a robot pup-pet was controlled from a distance by a pilot wearingeffectors corresponding to the puppet’s sensors, so thatthe puppet could venture into hazardous environments

like fires or toxic waste dumps [15]. The relationshipbetween acoustic telepresence and virtual reality presen-tations was examined, considering virtual reality as theperception of the sound field by the dummy head, yieldedby HRTF filtering. Also the user dynamically selectingthe transfer functions and thus placing the sound sourceshimself was experimented. No head-tracking was used.

Visiphone [16] is an "augmented home" system, a tele-phone augmented with graphics, a system that can beused to enable a casual hands-free limitedly mobile con-versation over distance with a visual reminder, feedbacksystem and a focal point for directing speech. In somesense the idea could be defined as an audio version ofinternet relay chat.

Another speech communications application is telecon-ferencing. An application of teleconferencing, a spatial-ized stereo audio conferecing system was designed by J.Virolainen [17]. The advance of being able to virtuallylocalize the other attendees around the user helps the userto separate the different speakers making the conferencemore natural and easier to lead among multiple atten-dees. The naturalness and accuracy of the perception ofdirections are significantly increased when adding head-tracking to the system. It is also possible to extend theuse of the application to wearable augmented reality tele-conferencing.

3.2 Localized acoustic events connected to realworld objectsThe idea here is to immerse in one’s normal audio sceneadditional audio information of the real environment,e.g., alerting sounds, instructions, even advertisements.The point is in connecting them to real world objects byauditorily localizing them. Superimposing synthetizedsound on top of the real audio scene instead of isolatingone from the real world by building a virtual audio realitywas originally proposed by M. Krueger [18].

The LISTEN project [19] reports on augmenting thephysical environment through a dynamic soundscape,which users experience over motion-tracked wirelessheadphones. The application experiment promotes a typeof information system for intuitive navigation of visu-ally dominated exhibition space, in this experiment amuseum. The augmenting part of the application con-sists of speech, music, and sound effects creating a dy-namic soundscape including exhibit-related informationas well as "the right atmosphere" immersed individu-ally in the visitor’s auditory scene based on his/her spa-

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 7

Page 8: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

tial behaviour and also on preliminarily entered informa-tion. The experiment includes a pedagogical as well asan artistic approach.

The idea of an automated tour guide was already pro-posed by Bederson [20] to replace the taped tour guides,which were found even obtrusive considering normal so-cializing. Bederson’s prototype application included arandom access digital audio source, a microprocessorto control the audio source, an infrared receiver to tellthe processor the location of the viewer, and an infraredtransmitter in the ceiling above each object to tell the sys-tem to play the pre-recorded descriptions of the piece,and also to stop the playing when the subject movedaway.

A spatial auditory display was evaluated by air crews onsimulated night mission flights [21]. The ordinary visualdisplay was compared to one augmented with spatial au-ditory display. The results indicate that an improved au-ditory system would improve navigation.

An aiding navigation system for visually impaired is con-sidered in [22], where the person’s visual environment isregistered by a camera and transformed to a sound sceneproviding information of obstacles or possible danger.The equipment used is a camera, a portable computer,and a pair of earphones.

A method to teach visually impaired persons orientationand mobility skills has been described by Inman [23],where simulated environments are used to train soundidentification, localization and tracking sound sources.The simulation was implemented on a performance plat-form of a Pentium-based computer, headphones, a head-tracker and a joystick. As the student moved his or herhead, the virtual environment sound source or sourceschanged accordingly. The different training tasks in-cluded identifying and reacting to the simulation soundsources in a particular way, completing an action likecrossing a street in the virtual environment and complet-ing an action by getting audio feedback from the stu-dent’s own actions like walking along a line. The envi-ronment was also displayed on a screen so that the trainercan follow the student’s actions.

The KnowWhere System is a method to present ge-ographic information to blind people [24] by using atouchable virtual map surface recorded by a camera andtransferring the features touched in the map to signalsdescribing the objects.

Guided by Voices [25] is a game application that usesaudio augmented reality system to provide location-triggered context-related playback sounds that inform theplayer. The sounds depend on the current location andalso the history of the locations of an individual player.The system is based on wearable computers and an RFbased location system with transmitters placed in the en-vironment.

Another game based on auditory display system is Sleuth[26], a game where the player is set in the middle of acrime scene where he is supposed to use the informationgiven by the auditory clues and determine the course ofthe crime and find out the one quilty.

3.3 Freely-floating acoustic eventsThe concept of freely-floating acoustic events, comparedto localized events, are events that are not connected toobjects in the subject’s physical environment. Typicallythis means that the only anchor point relative to which theevent is localized is user’s head. Potential applicationsare information services such as news, calendar effects,or announcements, or many different forms of entertain-ment such as music listening.

A parade example of a freely-floating spatial audio eventis in the DateBook application introduced in [27]. Herethe calendar application of a PDA device was remappedto a spatial audio display of calendar events. Calendarentries at different times were rendered around the userso that noon appeared at the front of the user, 3 pm at theright and 6pm behind the user.

The Nomadic Radio project [28, 29] presents an idea of awearable audio platform that provides access to all nowa-days normal desktop messaging and information servicesas well as an awareness of people and background eventssituated elsewhere in a location of individual interest.The audio messaging and information services, such asemail, voice mail, news broadcasts and weather reports,are downloaded to the device throughout the day anddelivered to the user as synthesized speech. The mes-sages are localized around the user’s head based on theirtime of arrival. Different categories of messages canalso be assorted to different levels of urgency or interest,and localized to different distances away from the head.The awareness of a certain remote location is based onthe Coctail Party Effect [30], meaning that one is ableto monitor several audio streams simultaneously, selec-tively focusing on one and leaving the rest in the back-ground. This is even easier if the streams are segregated

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 8

Page 9: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

by localizing them away from each other. The devicechosen as the Nomadic radio is a wearable shoulder-mounted audio-only device. The messages are broad-casted as spatial audio stream and synthesized speech.The navigation and controlling of the system is done byusing a set of speech commands.

MacIntyre et al. [31] considered a combination of audi-tory and visual displays to provide both foreground in-teraction and background information and on the otherhand both public and personal information. The focusis in developing new user interface paradigms and soft-ware, and creating testable prototype applications. Thecurrent audio part of the system is Audio Aura [32]. Au-dio Aura is a distributed system that employs a wear-able active badge to transmit the individualized informa-tion to wireless headphones. The goal is to provide theuser with auditory cues on other people’s physical ac-tions in the workplace, such as information on how longago a certain person has left a room or where he is atthe moment. The system detects the traffic and updatesa database. Certain changes in the database result theauditory cues to be sent.

One aspect of localizing acoustic events is sonification.It is mostly understood as an additional, supporting di-mension of visual interfaces to help navigation or per-forming tasks. Sonification as a topic has been studiedextensively. The concept in relation to a prototype com-puter graphics extension has been studied by V. Salvadoret al. [33] and an experiment with mobile phone inter-face was carried out by S. Helle et al. [34]. In the com-puter graphics system study it was found that there areeven more ways to implement sonification in the systemthan was predicted. In the mobile phone experiment thesonification was found both irritating and also helpful insome occations.

3.4 Other topicsIn relation to wearable augmented audio equipment thereare some interesting new aspects concerning hearing aiddesign. To reduce noice amplified along with the desiredsignal (usually speech), some new techniques have beenintroduced. Perceptual Time-Frequency Subtraction al-gorithm presented by Min Li et al. [35] is developed tosimulate the masking phenomena and thereby enhancethe speech signal, and it is meant to be applicable bothin digital hearing aids and portable communication sys-tems.

Individualizing spatialized audio displays is researched

by P. Runkle et al. [36] and a system of active sensorytuning, AST, is developed, where subjective preferencesare possible to set.

4 THE WARA SOFTWARE SYSTEMA generic block diagram of the WARA system sketchedin Section 2 is shown in Fig. 7. The signals enteringuser’s ears are composed of pseudo-acoustic input sig-nals captured with microphones located close to the ears,and virtual sounds which may be speech signals fromremote talkers or some other signals, such as recordedor synthesized announcements, advertisements, instruc-tions, warnings, or music. Mixing of the two compoundsis performed in the Augmented Reality Audio (ARA)mixer. The network shown above the head in Fig. 7is used to produce the output signal of the system. Mi-crophone signals are recorded and some preprocessingmay be applied to produce a monophonic signal whichis, in a communications application, transmitted to an-other user. Alternatively, the binaurally recorded signal(remote pseudo-acoustic signal for the other user, see,Fig. 5) could be transmitted in its original form. Bothoptions have been implemented in the current softwaresystem.

The WARA software system is written in C++ and is (al-most) completely built over the Mustajuuri framework.The Mustajuuri is a plugin-based real-time audio signalprocessing software package which is designed for low-latency performance [37]. The different configurationsrequired by applications are constructed by adding plug-ins into a mixer console window (Figure 8) of the Mus-tajuuri program. The Mustajuuri plugin system allowsdifferent configurations to be built in a very quick andflexible way.

5 WARA APPLICATIONSDifferent application schemes have been tried using

the plugins of the WARA software system as buildingblocks. In this section these applications are overviewedand discussed.

5.1 Binaural telephoneThe basic application of the WARA system is a binau-ral telephone. It is a two-way voice-over-IP communica-tion scheme with binaural signals that WARA transdud-ers provide. Binaural telephone enables a kind of telep-resence instead of normal telephone, since the whole 3-Dsoundscape is transmitted to the receiver end. The binau-ral signals are transmitted via UDP protocol in its orig-

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 9

Page 10: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

ACOUSTICSMODEL

AURALIZATION

ARA ARA

MIXERMIXER

HRTF’

LOCATION AND ORIENTATION

ACOUSTICS PARAMETERS

PREPROCESSING FORTRANSMISSION

OUTPUT SIGNAL

INPUT

SIGNAL

Fig. 7: A generic system diagram applicable for the ma-jority of augmented reality audio applications studied inthis report.

inal form to the remote end of communication. In cur-rent implementation no compression is applied to binau-ral signals, but for more efficient transfer some compres-sion could be utilized.

In informal usability testing it was found that imple-mented binaural telephone really gives a convincing feel-ing of telepresence. The remote end soundscape is natu-rally transmitted to the other user. It is easy to get con-fused on which people actually are at which room, ifthere are other people besides the ones wearing the head-sets in both ends. However, when utilizing the system ina meeting (only one user wearing WARA headset) an ob-vious shortcoming was discovered: the user at the remoteend gets frustrated when he or she cannot talk to the otherpersons not wearing the headset at the other end of com-munication. Another problem in such a multiuser situa-tion is that the volume level of the local speaker tends tobe much louder than the level of the other speakers in theroom.

5.2 Speech communications with headtracked au-ralization

When applying the WARA system as a binaural tele-phone the remote talker is localized inside the user’shead. This does not sound very natural and it would

Fig. 8: Mustajuuri mixer console with two-way speechcommunications application.

be better if the remote talker can be positioned to a cer-tain place in user’s auditory environment. To position aremote talker, a speech communication application withheadtracked auralization has been implemented. This ap-plication is similar to the binaural telephone, but it con-tains also plugins for head-tracked or non-head-trackedauralization for the incoming audio signal. The binau-ral signal is transmitted via UDP protocol in its originalform and then converted to a mono signal and finally au-ralized with HRTFs. An example of such an application,implemented with Mustajuuri [37] is illustrated in Fig. 8.

The remote talker can be positioned outside the user’shead by means of room acoustic modeling and HRTFfiltering. Room acoustic modeling would consist of thesynthesis of a number of early reflections and late rever-beration.

In the current version, the WARA auralization has beenimplemented by applying a well-known direct sound,early reflections and statistical late reverberation concept(see, e.g., [10]). For auralized sound source (a remotetalker) the direct sound and 14 early reflections, six forthe first-order and eight for the lateral plane second-orderearly reflections are calculated from a simple shoe-box

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 10

Page 11: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

room model with user-adjustable wall, floor, and ceilinglocations. Each of these reflections is panned to cor-rect direction using FIR filters, the responses of whichhave been matched to the minimum-phased HRTF func-tions. The applied filter orders for each of the three vir-tual sound source groups (direct sound, 1st and 2nd orderreflections) can be adjusted individually and the compu-tational burden can thus be reduced [38]. Informal listen-ing tests suggest that the early reflections can be renderedwith less filter taps than the direct sound without degrad-ing the spatial impression.

In addition to the HRTF filtering, each early reflection isfiltered with material absorption. Such filtering smoothsout the comb-filter-like coloration caused by successivesumming of an identical sound source - the fact that1/r-law attenuation is implemented and the echoes comefrom different directions means that the summed soundsare not exactly identical, but close enough for some un-natural coloration to take place. In current version of theWARA auralization the material filters are first order IIRfilters with coefficients calculated from given absorptioncoefficients at octave bands [38]. The most importantthing with the material filtering is just to get somewhatrealistic low-pass filtering for the reflections. The earlyreflections and their parameters are controlled with a plu-gin illustrated in Fig. 9.

The early reflections, although they are computed from asimplified geometry, help in externalization of the virtualsound sources, but the auralized sound is still somewhatunnatural. For even better results, some diffuse late re-verberation is added to match the length of the reverber-ation with the reverberation of the pseudo-acoustic en-vironment. The applied reverberation algorithm [39] isa variant of feedback delay network (FDN) algorithm.The parameters for the artificial reverberation (reverber-ation time etc.) are chosen manually by analysing the re-verberation of the pseudo-acoustic environment. A moresophisticated approach would be to estimate parametersautomatically from the binaural microphone signals (seesection 6.4).

In cases where virtual sources arelocalized in the en-vironment both room modeling and binaural synthesisneed additional information about theorientation and lo-cationof the user. Orientation and location informationof a user can be estimated using some head-tracking de-vice. A limitation in many current technologies (see Sec-tion 6) is the need of an external device placed in the en-

Fig. 9: Control window of the HRTF auralize plugin.

vironment. Those can be used only in a close range andare therefore basically limited to applications where anaugmented sound source is localized to some real worldobject which could host also atransmitter of a head-tracking system.

5.3 Auditory Post-it applicationThe WARA framework enables also different location-based applications. One such application is an auditoryPost-it, in which short messages can be attached to cer-tain objects or messages can be left to certain places.

With current implementation of auditory Post-it, it is pos-sible to leave and receive five messages. However, withsmall adjustments to the code it would be possible tocompile versions with any number of messages. A mes-sage is left by recording it to one of the files found fromthe common soundfile directory. The coordinates of themessage(s) are given in a text file. Messages are receivedby selecting commandGet new messages, see Fig. 10.After this by selectingPlay all messages once, Play allmessages repeatedlyor Play selected messageit is pos-sible to listen received messages. If the last one:Playselected messageis chosen, user has also to select a mes-

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 11

Page 12: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

Fig. 10: Auditory Post-it application

sage to play. All these commands are controlled using aGUI (Fig. 10). It is also possible to change the coordi-nates of the reference point (The coordinates of messagesgiven in the text file are referenced to this point) and thesize of the message area (radius). The localization of theuser is done by a head-tracker which sends the coordi-nates of the user to the application. When user, wearinga detector, is inside a message area, the message is playedif selected command allows that.

The current version of the auditory Post-it applicationworks well when the messages are located not more than1-1.5 m from the head-tracker. However, with bettertracker it should be possible to detect messages from alarger area. The next improvement to the auditory Post-itapplication will be the implementation of the possibilityto leave a message while an application is running. In thecurrent version messages have to be defined beforehand.

5.4 Calender application

The last implemented application is a 3-D audio calen-dar reminiscent, based on the idea presented by Walkeret al. [40]. The user can record calendar entries andplace them into directions corresponding to the time ofthe day: 9 A.M. at -90 degrees azimuth (left of the user),noon at 0 degrees, three o’clock at +90 degrees (right ofthe user) and so on. The calendar application was im-plemented in two versions: head-tracked and non-head-tracked. The head-tracked version spins the calendar en-tries around a reference point (which is specified relativeto the transmitter of the head-tracking system) and thenon-head-tracked version obviously has the user’s headas the only reference point.

The calendar application seems to work quite well, eventhough it certainly would help to have an easier interfaceto it, some kind of external controller device with whichthe user could browse through the calendar entries.

6 OVERVIEW OF RELATED TECHNOLOGY6.1 Headsets

In order to send an acoustic environment from one placeto another and then make it audible there is a naturalneed for microphones to record the environment or thecommunication and transducers to make the sound audi-ble. As the system is aimed to be worn all the time, thetransducers have to be small and light. Nowadays somepeople are wearing headphones throughout the day to lis-ten to music. Adding a microphone to headphones couldthen be a logical way to make a two-way transducer sys-tem to record and reproduce the sound.

For augmented reality purposes it would be ideal if thesurrounding and the augmented acoustic environmentcould be totally controlled with the headset. To achivethis the headset should block the surrounding soundsas effectively as possible and simultaneously play theacoustic environment back to the user. With headsetsthere are practically two ways to attenuate surround-ing sound: in-ear passive earplugs and active noise can-celling headphones. Traditional passive hearing protec-tors would be effective but rather unpractical for this mat-ter. There are various application-dependent headset sys-tems available in the market. Most models rely on pas-sive background noise rejection caused by tight fit of theheadset inside the ear canal - though working like a tra-ditional earplug. To get more attenuation at low frequen-cies some models have added electronics to cancel outbackground noise. There is a small microphone in theearplug and the signal it detects is fed back to the earplugin inversed phase. In some more advanced models thereare two microphones and a separate DSP unit for pro-cessing the signals. The microphone signal is not avail-able as an output in these headset systems. On the otherhand, there are systems where the microphone signal isavailable either for communication or recording use. Ta-ble 1 lists some of the headsets available in the market,that have some relevance to the system considered in thisarticle.

6.2 Head-trackers and location devicesMany of the applications discussed in this paper arebased on knowledge of the location or orientation of theuser. In the following presentation we roughly follow thetaxonomy introduced in a recent review article [41].

In WARA applications, tracking of user’s location andorientation takes place at least on two levels. First, insome applications we may be interested in theglobalpo-

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 12

Page 13: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

sition of a user, where the system is aware of thephysi-cal coordinates, such as in using the Global PositioningSystem (GPS), or the emerging European Galileo posi-tioning system. Global information is typically neededin applications in which augmented acoustic events arelocalized to known real world objects (such as the Cen-tral train station, the parking lot of movie theater, or thedepartment copying machine). An overview of differenttechniques operating at global, metropolitan, building, orroom level are given in [41]. Global position informationis needed to launch an application related to some object.In this article we put more emphasis on the problem ofhead-tracking. Estimates on the short-range relative lo-cation and orientation of the user’s head are needed forsuccessful augmentation of the acoustic environment.

The three major techniques for location-sensing aretri-angulation, proximityandscene analysis. Triangulationcan be done via lateration, which uses multiple distancemeasurements to know points or via angulation, whichmeasures bearing angle relative to points with knownseparation. In proximity technique the distance to aknown set of points is measured and in scene analysisa view from a particular vantage point is examined.

GPS The Global Positioning System (GPS) is the mostwidely publicized location-sensing system at the mo-ment. GPS is a satellite-based navigation system madeup of a network of 24 satellites. The system is governedby the U.S Department of Defense. GPS satellites circlethe earth twice a day in a very precise orbit and trans-mit signal information to earth. GPS receivers take thisinformation and use triangulation to calculate the user’sexact location. Essentially, the GPS receiver comparesthe time a signal was transmitted by a satellite with thetime it was received. The time difference tells the GPSreceiver how far away the satellite is. With distance mea-surements from a few satellites, the receiver can deter-mine the user’s position. The accuracy of the system isaround 1-5 meters. The system doesn’t work reliably in-doors. [42]

GALILEO The Transport Council of the EU startedGALILEO project in 1999. The system is a satellite-based global positioning system and is inter-operablewith other systems such as GPS and UMTS. TheGALILEO will cover all European states. The servicesoffered are divided in three distinct services: General-

Purpose, Commercial and Public-Utility Services. TheGeneral-Purpose Services will be available for every-body without any need for authorisation. The accuracywill be 5-30 meters depending on technology. The Com-mercial Services provide an accuracy of 1-10 meters andare available as added value services on payment of a fee.The Public-Utility Services will provide a highly secureand accurate services for safety-of-life and other criticalapplications. The accuracy will be 1-6 meters. By theend of 2007, GALILEO should be fully operational.[43]

Active Badges Active Badge is an indoor locating sys-tem developed at the AT&T Cambridge. The system con-sists of a central server, fixed infrared sensors around thebuilding and badges worn by the users. The badges emita globally unique identifier every 10 seconds or on de-mand. A central server collects this data and determinesthe absolute location of the badge by using a diffuse in-frared technique. The system can only tell the roomwhere the badge is. On the other hand, the badge mayhave globally unique ID (GUID) for recognition. Sun-light and fluorescent light interfere with infrared.[41]

Active Bats Also developed by the AT&T researchers,the Active Bat location system uses an ultrasound time-of-flight lateration technique to provide a physical lo-cation of the Active Bat tag. The tags emit ultrasonicpulses, synchronized by radio frequencies, to a gridof ceiling-mounted sensors. The distance measurementdata is forwarded to a central unit that calculates the po-sition. The accuracy is 9 cm at 95 percent of the measure-ments. The tags can have GUID’s for recognition.[41]

Cricket Instead of central computing, the Cricket Lo-cation Support System lets the user calculate the loca-tion. The infrastructure consists of infrared transmittersand the user objects act as receivers. Radio frequenciesare used for synchronization and to delineate the timeregion during which the sounds should be received byreceiver. A randomized algorithm allows multiple unco-ordinated beacons to coexist in the same space. Cricketimplements both the lateration and proximity techniques.The user can be located in around an area of 1.2 x 1.2square meters within a room. Because all the compu-tation is done at user side the power issues in wearabledevices may become a burden.[41]

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 13

Page 14: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

RADAR RADAR, developed by the Microsoft Re-search group, is an RF-based location system using theIEEE 802.11 WaveLAN wireless networking technol-ogy. At the base station, the system measures the signalstrength and signal-to-noise ratio of signals sent by wire-less devices. This data is then used in computing the 2Dposition within the building. There are two implemen-tation for RADAR: one uses scene analysis and anotheruses lateration. In scene analysis the accuracy is around5 meters and in lateration 4.3 meters in 50 percent of themeasurements. The system requires at least three basestations per floor. There are also some companies offer-ing 3-D indoor position tracking. [41]

SpotON SpotOn is an example of an ad hoc applica-tion where no infrastructure is implemented. The systemimplements ad hoc lateration with low-cost tags. Spo-tON uses radio signal attenuation to estimate intertag dis-tance. Repeating the measurement with a few tags pro-vides more accuracy. The achieved accuracy depends onthe amount of tags. Ad hoc systems could be made tocommunicate with other, fixed, location systems to yieldmore accurate and usable positioning.[41]

6.3 Head-trackersHead-tracking technologies can be roughly divided intoacoustic, electromagnetic, inertial, mechanical, and op-tical tracking. Any of these techniques can be used tomake a 6-degrees-of-freedom tracker (x, y, and z for po-sition and yaw, pitch, and roll for orientation).

Acoustictracking uses stationary microphones, and oneor more movable ’user’ elements. The user’s movableelements emit high-frequency sound that is detected bya collection of stationary microphones. By measuringthe time for sound to travel from transmitter to receiversit is possible to calculate the distance. The orientationis achieved by using multiple sensors. Because it takessome time for sound to travel from the transmitter to thesensors, the time-of-flight trackers suffer from slow up-date rate. Also, the speed of sound is affected by theenviroment variables (temperature, pressure,..) and thismay occasionally cause problems. Acoustic tracking canalso be utilized by using so called phase coherence track-ing. In phase coherence tracking the stationary sensorsdetects the phase difference between the signal sent bythe user emitter and a stationary emitter. As long as thedistances are shorter than the wavelenghts used, the dis-tance can be determined. By using multiple sensors the

orientation can also be calculated. The phase coherenceperiodically updates the position and over some time er-ror may accumulate in the location.

In electromagnetictracking, a stationary element emitsa pulsed magnetic field. A movable sensor attached tothe user senses the field and reports the position and ori-entation relative to the source. Electromagnetic trackingcan be accurate to within 2.5 mm in position and 0.1 de-gree in rotation, although accuracy deteriorates with dis-tance from the source. Electromagnetic tracking is alsosusceptible to interference from metallic objects in theenvironment.

Inertial tracking devices represent a different mechan-ical approach, relying on the principle of conservationof angular momentum. These trackers use a couple ofminiature gyroscopes to measure orientation changes. Iffull 6-DOF tracking ability is required, an inertial trackermust be supplemented by some position tracking device.A gyroscope consists of a rapidly spinning wheel sus-pended in a housing. The mechanical laws cause thewheel to resist any change in orientation. This resistancecan be measured and converted into the yaw, pitch, androll values. Inertial trackers sometimes suffer from drift-ing from the exact location - an error that accumulatesover time. Some kind of occasional position verificationwould fix this problem.

Mechanicaltracker devices measure position and orien-tation by using a direct mechanical connection betweena reference point and a target. This could be done forexample by an arm that connects a control box to a head-band. The sensors at the joints measure the change inposition and orientation with respect to a reference point.Mechanical trackers in general are very fast and accurate.

Optical trackers come in two variants. In the first one,one or several cameras are mounted on top of the head-set, and a set of infrared LEDs is placed above the headat fixed locations in the environment. In the alternativesetup, the cameras are mounted on the ceilings or a fixedframe, and a few LEDs are placed at fixed and knownpositions on the headset. In both approaches, the pro-jections of the LEDs on the cameras image planes con-tain enough information to uniquely identify the positionand orientation of the head. Various photogrammetricmethods exist for computing this transformation. Opticaltrackers in general have high update rates and sufficientlyshort lags. However, they suffer from the line of sightproblem, in that any obstacle between sensor and source

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 14

Page 15: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

seriously degrades the tracker’s performance. Ambientlight and infrared radiation also adversely affect opticaltracker performance.

Table 2 lists some headtrackers available in the market.

6.4 Estimation of room acoustical parametersAcoustic parameters, such as amount of reverberation

and geometric properties of the real environment aroundthe user, may be needed for successful augmentationof the pseudo-acoustic environment. There are basi-cally two mechanisms for acquiring this information. Insome applications this information can be derived off-line and it would be available for the system as infor-mation related to that particular location. This wouldrely on a global localization system and an extensivedatabase. This is basically the approach taken in the LIS-TEN project [19]. A more generic alternative would bebased on on-line estimation of acoustic parameters frommicrophone input signals. The latter approach may bemore suitable for wearable devices.

7 CONCLUSIONSIn this article we presented an overview of the concept ofwearable augmented reality audio (WARA). We definedthe idea of WARA and compared it to previous workson virtual and augmented reality audio. Different typesof applications were discussed and compared in terms oftheir signal processing and hardware requirements. Wealso presented an overview of related transducer technol-ogy and localization and headtracking devices availableat the time of writing this article.

In the case of open ears a listener may perceive theacoustic environment in its natural form. When wear-ing a headset with binaural microphones and earphonesthe user is exposed to a modified representation of theacoustic environment around the user, which is calledherepseudo-acousticenvironment. A traditional exam-ple of a pseudo-acoustic environment is that of a personwearing a binaural hearing aid device.Virtual sound en-vironment is a synthesized binaural representation or abinaural recording. Inaugmentationof user’s acousticenvironment virtual sound objects are rendered on thenatural, or a pseudo-acoustic representation of the soundfield around the user. In this article, we focused on thelatter case where virtual sound objects are combined witha pseudo-acoustic representation of the acoustic environ-ment around a user. This is obtained using specific head-set where miniature microphone elements are integratedinto small earphone elements. Pseudo-acoustic represen-

tation is produced by routing microphone signals to theearphones.

We argue that it is technically and perceptually easier toproduce a convincing fusion of virtual sounds and theacoustic environment around a user if virtual sounds areembedded into a pseudo-acoustic representation than thenatural acoustic representation with open ears. In pre-liminary listening test results presented in this article wefound that for an idealized case of using in-situ HRIRs inproducing virtual sound objects even experienced listen-ers typically cannot discriminate virtual sounds from testsounds coming from loudspeakers placed behind a cur-tain in a listening room. It even turned out that the differ-ence in results is small between earplug type earphoneswhich block the ear canal, and conventional in-the-earearphones which attenuate direct sounds by only a fewdecibels. Users also reported relatively fast adaptation tothe pseudo-acoustic representation. Results are encour-aging and suggest that the proposed transducer systemmay provide a potential framework for the developmentof applications of wearable augmented reality audio.

Finally, we introduced a system developed by the au-thors for testing of different applications and techniquesof WARA. It is based on a specific headset configurationand a real-time software system running on a Linux ma-chine.

8 ACKNOWLEDGEMENTSA. Härmä’s work was partially supported by the Gradu-ate School GETA and the Academy of Finland.

9 REFERENCES

[1] R. Held, “Shifts in binaural localization after pro-longed exposures to atypical combinations of stim-uli,” J. Am. Psychol., vol. 68, 1955.

[2] A. M. Turing, “Computing machinery and intelli-gence,”Quaterly Rev. Psychol. Phil., vol. 109, Oc-tober 1950.

[3] J. Blauert,Spatial Hearing: The psychophysics ofhuman sound localization. Cambridge, MA, USA:The MIT Press, 1999.

[4] G. Plenge, “On the differences between localiza-tion and lateralization,”J. Acoust. Soc. Am., vol. 56,pp. 944–951, September 1972.

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 15

Page 16: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

[5] B. M. Sayersu and E. C. Cherry, “Mechanism ofbinaural fusion in the hearing of speech,”J. Acoust.Soc. Am., vol. 29, pp. 973–987, September 1957.

[6] W. M. Hartmann and A. Wittenberg, “On the ex-ternalization of sound images,”J. Acoust. Soc. Am.,vol. 99, pp. 3678–3688, June 1996.

[7] N. Sakamoto, T. Gotoh, and Y. Kimbura, “On out-of-head localization in headphone listening,”J. Au-dio Eng. Soc., vol. 24, pp. 710–715, 1976.

[8] T. Lokki and H. Järveläinen, “Subjective evalua-tion of auralization of physics-based room acous-tics modeling,” inProc. Int. Conf. Auditory Dis-play, (Espoo, Finland), pp. 26–31, July 2001.

[9] B. G. Shinn-Cunningham, N. I. Durlach, and R. M.Held, “Adapting to supernormal auditory localiza-tion cues. II constraints on adaptation of mean re-sponse,”J. Acoust. Soc. Am., vol. 103, pp. 3667–3676, June 1998.

[10] D. R. Begault,3-D sound for virtual reality andmultimedia. New York, USA: Academic Press,1994.

[11] J. Huopaniemi,Virtual Acoustics and 3-D Soundin Multimedia Signal Processing. PhD thesis,Helsinki University of Technology, Laboratory ofAcoustics and Audio Signal Processing, Espoo,Finland, 1999. Report no. 53.

[12] L. Savioja,Modeling techniques for virtual acous-tics. PhD thesis, Helsinki University of Tech-nology, Telecommunications Software and Multi-media Laboratory, Espoo, Finland, 1999. ReportTML-A3.

[13] T. Lokki, Physically-based Auralization - De-sign, Implementation, and Evaluation. PhD the-sis, Helsinki University of Technology, Telecom-munications Software and Multimedia Labora-tory, report TML-A5, 2002. Available athttp://lib.hut.fi/Diss/2002/isbn9512261588/.

[14] H. Fletcher, “An acoustic illusion telephonicallyachieved,” Bell Laboratories Record, vol. 11,pp. 286–289, June 1933.

[15] M. Cohen, S. Aoki, and N. Koizumi, “Augmentedaudio reality: Telepresence/vr hybrid acoustic

environments,” in Proc. of IEEE InternationalWorkshop on Robot and Human Communication,pp. 361–364, IEEE, July 1993.

[16] J. Donath, K. Karahalios, and F. Viégas, “Visi-phone,” inInternational Conference on Audio Dis-play April 2000, ICAD, April 2000.

[17] J. Virolainen, “Design and implementation of astereo audio conferencing system,” Master’s thesis,Helsinki University of Technology, 2001.

[18] M. W. Krueger, Artificial Reality II. Addison-Wesley, 1991.

[19] G. Eckel, “Immersive audio-augmented environ-ments: The listen project,” inProc. of Fifth Inter-national Conference on Information Visualisation,pp. 571–573, IEEE, 2001.

[20] B. Bederson, “Audio augmented reality: A proto-type automated tour guide,” inHuman Computer inComputing Systems, pp. 210–211, ACM, 1995.

[21] R. D. Shilling, T. Letowski, and R. Storms, “Spa-tial auditory displays for use within attack ro-rary wing aircraft,” inInternational Conference onAuditory Display April 2000, (Atlanta, Georgia,USA), ICAD, April 2000.

[22] N. Belkhamza, A. Chekima, R. Nagarajan,F. Wong, and S. Yaacob, “A stereo auditory displayfor visually impaired,” in IEEE TENCON 2000,pp. II–377–382, IEEE, 2000.

[23] D. P. Inman, K. Loge, and A. Cram, “Teaching ori-entation and mobility skills to blind children us-ing computer generated 3-d sound environments,”in International Conference on Auditory DisplayApril 2000, ICAD, April 2000.

[24] M. W. Krueger and D. Gilden, “Knowwheretm: Anaudio/spatial interface for blind people,” inInterna-tionl Conference on Audio Display November 1997,ICAD, November 1997.

[25] K. Lyons, M. Gandy, and T. Starner, “Guided byvoices: An audio augmented reality system,” inIn-ternational Conference on Auditory Display April2000, (Atlanta, Georgia, USA), ICAD, April 2000.

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 16

Page 17: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

[26] T. M. Drewes, E. D. Mynatt, and M. Gandy,“Sleuth: An audio experience,” inInternationalConference on Auditory Display April 2000, ICAD,April 2000.

[27] A. Walker, S. A. Webster, D. McGookin, andA. Ng, “A diary in the sky: A spatial audio dis-play for a mobile calendar,” inProc. BCS IHM-HCI2001, (Lille, France), pp. 531–540, Springer, 2001.

[28] N. Sawhney and C. Schmandt, “Design of spatial-ized audio in nomadic environments,” inInterna-tional Conference on Auditory Display November1997, ICAD, November 1997.

[29] N. Sawhney and C. Schmandt, “Nomadic radio:Speech and audio interaction for contextual mes-saging in nomadic environments,”ACM Trans-actions on Computer-Human Interaction, vol. 7,pp. 353–383, September 2000.

[30] E. C. Cherry, “Some experiments on the recogni-tion of speech, with one and two ears,”J. Acoust.Soc. Am., vol. 25, pp. 975–979, September 1953.

[31] B. MacIntyre and E. D. Mynatt, “Augmenting in-telligent environments: Augmented reality as andinterface to intelligent environments,” inAAAI1998 Spring Symposium Series, Intelligent Envi-ronments Symposium, (Stanford University, CA),AAAI, March 1998.

[32] E. D. Mynatt, M. Back, R. Want, and R. Freder-ick, “Audio aura: Light-weight audio augmentedreality,” in The Tenth ACM Symposium on User In-terface Software and Technology 1997, (Banff, Al-berta, Canada), ACM, October 1997.

[33] V. C. L. Salvador, R. Minghim, and M. L. Pacheco,“Sonification to support visualization tasks,” inIn-ternational Symposium on Computer Graphics, Im-age Processing, and Vision, 1998, pp. 150–157,SIBGRAPI, 1998.

[34] S. Helle, G. Leplâtre, J. Marila, and P. Laine,“Menu sonification in a mobile phone - a prototypestudy,” in Proceedings of the 2001 InternationalConference on Auditory Display, (Espoo, Finland),pp. 255–260, ICAD, July 2001.

[35] M. Li, H. G. McAllister, N. D. Black, and T. A. D.Pérez, “Perceptual time-frequency subtraction al-gorithm for noice reduction in hearing aids,” in

IEEE Transactions on Biomedical Engineering2001, pp. 979–988, 2001, September 2001.

[36] P. Runkle, A. Yendiki, and G. H. Wakefield, “Ac-tive sensory tuning for immersive spatialized au-dio,” in International Conference on Auditory Dis-play April 2000, ICAD, April 2000.

[37] T. Ilmonen, “Mustajuuri - An Application andToolkit for Interactive Audio Processing,” inPro-ceedings of the 7th International Conference on Au-ditory Display, 2001.

[38] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänä-nen, “Creating interactive virtual acoustic environ-ments,”Journal of the Audio Engineering Society,vol. 47, pp. 675–705, Sept. 1999.

[39] R. Väänänen, V. Välimäki, J. Huopaniemi, andM. Karjalainen, “Efficient and parametric reverber-ator for room acoustics modeling,” inProc. Int.Computer Music Conf. (ICMC’97), (Thessaloniki,Greece), pp. 200–203, Sept. 1997.

[40] A. Walker, S. Brewster, D. McGookin, and A. Ng,“Diary in the Sky: A Spatial Audio Display fora Mobile Calendar,” inProceedings of BCS IHM-HCI 2001, pp. 531–540, 2001.

[41] J. Hightower and G. Borriello, “Location systemsfor ubiquitous computing,” inTrends, Technolo-gies & Applications in Mobile Computing(N. Dim-itrova, ed.), special report 5, pp. 57–66, IEEE Com-puter Society, 2001.

[42] “Gps primer.” http://www.aero.org/publications/GPSPRIMER/index.html.

[43] “Global satellite navigation services for europe,”in GALILEO Definition Summary, esa, EuropeanSpace Agency, 2000.

[44] “Invisio and bluespoon communication earplugs.”http://www.nextlink.to.

[45] “KOSS in-ear headsets.” http://www.koss.com.

[46] “Etymotic headsets.” http://www.etymotic.com.

[47] “Okm microphones.”http://www.okmmicrophones.com.

[48] “Sintef.” http://www.sintef.no/units/informatics/RoSandSoS/.

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 17

Page 18: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

[49] “Polhemus 3D tracking devices.” http://www.polhemus.com.

[50] “Motion trackers for computer graphic applica-tions.” http://www.ascension-tech.com.

[51] “Intersense motion trackers.”http://www.isense.com.

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 18

Page 19: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

Brand DescriptionNEXTLINK.TO [44]Invisio The Invisio is an in-ear communication earplug with an internal micro-

phone. The microphone detects the jaw-bone vibration and this way isfairly insensitive to surrounding noise and other sounds as well. Theplug is wire connected to radio unit.

Bluespoon This is a wireless, bluetooth version of the Invisio. Available only forthe right ear

SonyMDR-NC10 An in-ear noise cancelling headphone. The earplug fits tightly in ear

canal and attenuates surrounding sound fairly well even without switch-ing the noise-cancelling on. The plugs are a bit bigger than conventionalearplugs. The noise cancelling is done by analog means. The noise-cancelling circuitry works fairly well at low frequencies.

MDR-NC11 This is a newer model of the NC-10. The earplug is much smaller andlighter. Otherwise similar to the NC-10.

MDR-EX70LP This is a small in-ear earplug. The fits tightly inside the ear canal and at-tenuates surroundings fairly well, especially at higher frequencies. Sim-ilar to the KOSS’s ThePlug.

KOSS [45]ThePlug The plug consists of a foam-like cushion that fills the ear canal and

this way blocks the surrounding sounds. Similar to the Sony’s MDR-EX70LP.

Complug A communication earplug with a microphone. The earplug is an in-eartype and background noise rejection is increased with an electric noise-cancelling circuitry

Etymotic Research [46]ER-4 A high-end in-ear headphone with a very high background noise at-

tenuation (20-25 dB). The attenuation works passively. There slightlydifferent models (binaural, power and stereo) for different applications.

ER-6 Similar to the ER-4 but with little less isolation from the backgroundnoise.

OKM [47]in-ear microphones These are in-ear electret microphones for binaural recordings. There is

no headphone in this system.PARAT [48]Communication earplug A communication earplug developed in the research institute SINTEF

(Trondheim, Norway). The earplug consists of a miniature loudspeakerand two microphones. The other microphone is outside the plug andthe other microphone is located inside the plug. The earplug has alsoa built-in microchip for signal processing and a built-in radio unitfor wireless communication. In normal situations, he earplug passesthe surrounding sounds through but when high noise levels occur, theearplug blocks the surrounding sounds. The users voice can be recordedvia the outside microphone or when in noisy environment, via the innermicrophone.

Table 1: Transducer review

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 19

Page 20: Techniques and Applications of Wearable Augmented Reality ...users.spa.aalto.fi/mak/PUB/HarmaAES114.pdf · The concept of augmented reality audio characterizes techniques where real

HÄRMÄ ET AL. WEARABLE AUGMENTED REALITY AUDIO

Brand Method Range DOF OtherLogitechHead-Tracker acoust. 1.6 m 6 max angle 100 degree conePolhemus [49]FASTRAK magn. 1.2-2 m 6 max. 16 receiversISOTRAK magn. 1.6-2.3 m 6 max two receiversAscension [50]3D Bird inertial cable length 3 Only orientation, computing in a PCFlock of Birds magn. 3 m 6 Up to four sensorLaser Bird optical 2 m 6 High update rate (240 Hz), max angle± 55

horizontal and± 54 vertical to 1.2 mpciBIRD magn. 76 cm 6 Small sensors (8 and 22 mm), computing

in a PCminiBIRD magn. 76 cm 6 Miniature sensors (10 x 5 x 5 mm)Nest of Birds magn. 91 cm 6 up to four sensors, USBMotion Star magn. up to 3 m 6 Also a wireless version, primary for mo-

tion tracking up to five characters with 18sensors

InterSense [51]900-series inertial 20 - 72 m2 6 Also a wireless version available, up to

four sensors simultaniously600-series inertial/ultrasonic 16 - 50m2 6 can be used wireless if only orientation is

needed300-series inertial cable length 3 Only orientation, up to four sensorsInterTrax inertial cable length 3 Only orientation, USBInertiaCube inertial cable length 3 Only orientation, computing in a PC

Table 2: Headtrakers

AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25 20