Binaural Reproduction System for Loudspeakers

Binaural Reproduction System

for Loudspeakers

- based on optimal source distribution principle -

Section of AcousticsDepartment of Electronic Systems

Aalborg University 20083rd Master Semester

Project Group 08gr960

Department of Electronic Systems

Section of Acoustics

Fredriks Bajers Vej 7

http://es.aau.dk/sections/acoustics/

Title:

Binaural Reproduction System forLoudspeakers- based on optimal source distribu-tion principle

Theme:

Acoustics and sound perception

Project period:

E9, fall semester 2008

Project group:

08gr960

Participants:

Morten ChristophersenMark Aarup MikaelsenJens Martin OddershedeThomas Deleuran RasmussenSteffen Stengaard Villadsen

Supervisors:

Yesenia Lacouture ParodiPer Rubak

Copies: 8

Number of pages: 167

Enclosures: DVD

Date of completion:December 18th, 2008

Abstract:

This report describes and documentsthe design and implementation of a bin-aural reproduction system using loud-speakers for multimedia purposes.Based on the theory of optimal sourcedistribution and least square method,two implementation models were sug-gested for implementation on basis ofa two-way loudspeaker setup. Throughbrute force simulations, the modelswere analyzed with respect to their per-formance in channel separation, sweetspot size and equalization. The modelproviding the largest sweet spot was se-lected and used for both objective andsubjective testing.

The implemented system was capable

of achieving a channel separation of

−21/− 7.8 dB denoting the best/worst

performance within the sweet spot area.

The system showed good equalization

for the optimal listening position with

a performance error of 0.7 dB. The sub-

jective test compared localization per-

formance with headphones. The head-

phones proved better binaural repro-

duction performance, however, the sys-

tem’s performance was almost as good

throughout the sweet spot.

The content of this report is freely available, but publication (with reference source) may only be pursueddue to agreement with the respective authors.

Jens Martin Oddershede<[email protected]>

Morten Christophersen<[email protected]>

Mark Aarup Mikaelsen<[email protected]>

Steffen Stengaard Villadsen<[email protected]>

Thomas Deleuran Rasmussen<[email protected]>

Preface

This report is written by project group 08gr960 at the Section of Acoustics, Departmentof Electronic Systems at Aalborg University during the 3rd semester of the master pro-gramme in the period spanning from September 2nd, 2008 to December 18th, 2008. Theproject concerns the design of a binaural reproduction system for loudspeakers based onthe optimal source distribution principle. According to the curriculum the purpose of the3rd semester Acoustics programme is to [Aalborg Universitet, 2008]:

• provide the student with a broad background knowledge of acoustics, hearing andaudio engineering.

• give the student knowledge of scientific methods in acoustics.

The report is aimed at people with knowledge equivalent to the teaching on the 3rd

semester master programme in acoustics. The project was proposed by Yesenia Lacou-ture.The reader should pay attention to the following on perusal of this report:

• The report is divided into two major parts:

- The main report which is divided into numbered chapters.

- The appendices which are arranged alphabetically.

• Figures, tables and equations are numerated consecutively according to the chapternumber. Hence, the first figure in chapter one is named figure 1.1, the second figureis named figure 1.2 and so on.

• The Harvard method is used for citation. The bibliography can be found after themain report.

• The enclosed DVD contains data sheets, test signals, internet sources, simulationdata and MATLAB scripts used in this project.

Aalborg University, December 18th 2008.

III

Report Overview

The main report contains the following chapters:

Chapter 1: Introduction

This chapter introduces sound reproduction techniques and their relevance in multimediaapplications. The principle of crosstalk cancelation and binaural reproduction is describedand the problem formulation for the project is stated.

Chapter 2: Human Sound Perception

This chapter describes the human sound perception and the ability to localize sounds.An analysis of localization cues: ILD and ITD, is given. The localization performance ofstationary and moving sound sources is briefly described afterwards. Next, the definitionand properties of the HRTF is presented along with a description of HRTF measurement.

Chapter 3: Introduction to Crosstalk Cancelation

Here the aspects of XTC are presented through a basic example. The performancemeasures CHSP and PE are defined. Sweet spot definitions are introduced to set a focusarea for the XTC system. Afterward the effect of loudspeaker positioning is analyzed.

Chapter 4: Requirement Specification

The requirement specification lists and describes the different requirements for the systemin terms of loudspeaker placement, sound reproduction method and XTC performance.The chapter ends with a summary of the listed specifications.

Chapter 5: Crosstalk Cancelation Analysis

This chapter gives an analysis of XTC. First, the OSD principle is explained. Afterwards,a channel model for the XTC principle is described. Based on the OSD principle and theXTC channel model filter gain requirements for an actual implementation are presented.Conclusively, solution methods for the XTC filter are investigated.

IV

Chapter 6: Implementation

This chapter describes the implementation of the XTC system. Bandlimitation, binauralsynthesis and crossover network for the implementation are described. Three implemen-tation models are hereafter proposed.

Chapter 7: Simulation

This chapter describes the simulation of the implementation models by evaluating theirperformance. By brute force simulation the optimum parameters for the XTC systemare determined, and an analysis of the obtained results is given.

Chapter 8: Objective Test

This chapter describes the objective system test performed to determine CHSP, PE andsweet spot size.

Chapter 9: Subjective Test

This chapter describes the subjective listening test performed to determine how humansubjects perceive the effect of the XTC system.

Chapter 10: Conclusions

The final part of the main report concludes on the designed XTC system and perspectivefor the project is discussed.

Appendices

The appendices include measurement reports and additional details as a supplement tothe main report. Measurement reports for CTF, CHSP and HPTF are included here.Additional details for least square solutions, computations of singular values, and thelistening test are included. Questionnaire templates used in connection with the listeningtest are also included.

V

Table of Contents

1 Introduction 1

1.1 Project Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Initial Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Human Sound Perception 5

2.1 Localization of Spatially Distributed Sounds . . . . . . . . . . . . . . . . . 52.2 Binaural Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Introduction to Crosstalk Cancelation 19

3.1 Basic Example of XTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Performance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 Sweet Spot Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Loudspeaker Span Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Requirement Specification 29

4.1 Application Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.2 Sound Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 XTC Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.4 Specification Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Crosstalk Cancelation Analysis 35

5.1 Geometric Interpretation of the OSD Principle . . . . . . . . . . . . . . . 355.2 Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 XTC Source Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.4 Matrix Inversion Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.5 Analysis of LS Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Implementation 55

6.1 Frequency Range and Loudspeaker Span by OSD . . . . . . . . . . . . . . 556.2 Bandlimited Binaural Reproduction . . . . . . . . . . . . . . . . . . . . . 586.3 Crossover Network Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 596.4 Model Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.5 Plant Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.6 Implementation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 676.7 Final Implementation Specifications . . . . . . . . . . . . . . . . . . . . . 69

VI

Table of Contents

7 Simulation 717.1 Extent of Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717.2 Optimal Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

8 Objective Test 878.1 Performance Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878.2 Sweet Spot Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888.3 CHSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

9 Subjective Test 919.1 Test Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919.2 Test Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

10 Conclusions 10710.1 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Bibliography 109

Appendices 113

A Computation of the Singular Values of C 115

B Additional Details of Least Square Solutions 117

C Measurement Report: Channel Transfer Function 127

D Measurement Report: Channel Separation 137

E Measurement Report: Listening Test 145

F Measurement Report: Headphone Transfer Function 157

G Questionnaries 163

Enclosure: DVD 167

VII

Chapter 1Introduction

Two-channel stereo is a common method of sound reproduction used throughout theworld. Two loudspeakers are often placed symmetrically with respect to the listenerin order to reproduce the stereo sound. Through stereo reproduction it is possible toauralize a virtual environment between and behind the span of the loudspeakers. Thestereo impression is illustrated in figure 1.1a, where the loudspeakers are the actual soundsources. The listener will, however, perceive the sound coming from the positions of thedifferent illustrated instruments as if being audience in a concert hall. The orchestraplaying will form a space in front of the listener that can be as wide as the span of theloudspeakers. This makes the stereo reproduction technique suitable for music playback.[Gardner, 1997]

(a) Stereo scenario, where the loud-speakers create a sound image spacespatially distributed between the spanof the loudspeakers.

(b) Binaural scenario, where the loud-speakers create a sound image spa-tially located around the listener.

Figure 1.1: Black box is the physical room while the red box illustrates the virtual sound space.

In applications such as computer games and movies it is often desired to provide thelistener with a more authentic impression of a given scenario, which implies surroundingthe listener in a virtual sound space. This can be achieved through playback of binauralsignals. The binaural impression is illustrated in figure 1.1b, where the sound of the threeillustrated cars comes from two loudspeakers, however the listener perceives the sound asoriginating from the positions of the cars. Binaural signals can be produced from either

1

1 Introduction

binaural recordings or binaural synthesis. Binaural recording is where the influence ofhead and ear shape as well as ear spacing is taken into account. Binaural synthesis meansprocessing a sound in mono to become binaural, by synthesizing the influence of the head,ear and spacing between the ears, which can be useful for games. The sound impressionprovided by this reproduction technique will be referred to as 3D sound.

With binaural audio it is desired to replicate the sound pressure in each ear as how anatural sound would do. This can be achieved using either headphones or loudspeakers.Headphones have the advantage that the left and right channels are naturally separated inthe earcups and no effects of room reverberation or external sources affects the listener.As a downside, the use of headphones might not always be convenient or comfortablewearing for long periods of time. Furthermore, the listener may suffer from in-headlocalization or misperception of front and back localization. [Gardner, 1997, p. 2]

Binaural reproduction is however also possible through loudspeakers and is called transauralreproduction. This deals with the inconvenience of wearing headphones. However, eachloudspeaker will radiate sound to both ears and thus create undesired crosstalk as seenin figure 1.2.

Two-channelsound source

xL xR

lLlR

hLL hRR

hRLhLR

yL yR

Figure 1.2: Listener positioned in a conventional stereo setup. The input signal xL is played throughleft loudspeaker lL, where the sound is also perceived at the right ear as hLR. Hence, the sound pressureperceived at right ear yR includes sound from the left channel. Both hLR and hRL cause undesiredcrosstalk.

To provide the listener with 3D impressions using binaural signals, crosstalk cancelation(XTC) must be applied. A drawback using transaural reproduction is the restriction tobeing only one listener, which makes it more application dependent.

2

1.1 Project Application

1.1 Project Application

The project application concerns the design and implementation of an XTC system ap-plied to a multimedia loudspeaker setup.

Monitor Rightloudspeaker

Leftloudspeaker

Figure 1.3: The multimedia setup seen from the users point of view. The span between the loudspeakersand the topology must be determined.

Figure 1.3 show the setup from the users point of view, with the loudspeakers symmetri-cally placed around the monitor. The multimedia setup is especially suitable for an XTCsystem since:

• Many computer games are able to provide 3D sound signals for headphones (i.e.binaurally synthesized).

• The user is expected to stay seated and always face the monitor. Thereby themultimedia setup can be considered a fixed setup, where the position of the headcan be considered fixed.

The last point is important since an XTC system only achieves good performance ina small region called the sweet spot. Hence the nature of a multimedia application isappropriate for XTC.

1.2 Initial Problem Formulation

The goal of this project is to:

Design and implement a binaural reproduction system using loudspeakers,

which provides good localization performance and has a crosstalk cancelation

robust to minor head movement.

In order to do so, the following questions arise:

How do humans localize sound?

3

1 Introduction

How is 3D sound reproduced through loudspeakers?

How is the performance of such a system validated?

To answer these questions the following topics are analyzed:

• Binaural understanding of the human sound perception.

• Head-related transfer functions (HRTF) in use for binaural synthesis.

• Role of loudspeaker positioning in a multimedia setup using XTC.

• Different XTC implementation methods.

• System validation through a subjective listening test.

4

Chapter 2Human Sound Perception

This chapter describes how spatially distributed sounds are perceived by humans andhence localized. The cues for localization are discussed together with the localizationperformance of humans. Afterwards, a description of the head-related transfer functions(HRTF) that are used for binaural synthesis is given.

2.1 Localization of Spatially Distributed Sounds

The ability to localize sound sources is of importance to humans. This ability makes itpossible to localize the sound source and then make visual focus towards the perceiveddirection. Localization of sources are determined by angle and distance. To ease thedescription of direction and distance to a sound source, figure 2.1 shows a coordinatesystem defining the position of sounds relative to the head.

µ

0°

180°

90° -90°

r

(a) For φ = 0, the horizontal plane is de-scribed by the angle θ (azimuth).

Á

90°

-90°

180°0°

r

(b) For θ = 0, the median plane is describedby the angle φ (elevation).

Figure 2.1: Illustrations of angles in the (a) horizantal plane and (b) median plane. The distance fromthe center of head to the source is denoted by r.

The location of a source placed at a given point in space can be projected onto the axisof the ears, which describes a lateral displacement in one dimension only. Hence, the

5

2 Human Sound Perception

lateral displacement does not uniquely define a given direction, but is proportional tothe distance to the source. This displacement as a function of the input at the ears isreferred to as lateralization. A special case of perceived distance is when the distance issmaller than the radius of the head, which is referred to as inside-head location. Thismay especially be the case when using headphones. Thus regarding lateralization, noperception of distance to the source is acquired. The perception of distance will be dealtwith later. [Blauert, 1997, p. 139]

Some terms useful in the studies of sound localization are:

• Monaural refers to when a sound reaches one ear only.

• Binaural refers to when a sound reaches both ears.

2.1.1 Interaural Cues

The attributes of a monaural signal may yield cues for localization, whereas the com-parison of a binaural signal is by far superior regarding cues for localization. The cuesbased on comparison, are referred to as interaural cues. Figure 2.2a illustrates a sinusoidreaching both ears.

t

Ipsilateral ear

t

Contralateral ear

(a) A sinusoid played through the loud-speaker reaching the head.

rsin

r /180

b

(b) The approximated path difference be-tween the ears shown with bold lines.

Figure 2.2: The principle of ITD and ILD is shown in (a). Due to the propagation time and headshadowing, the sinusoid will reach the ears at different times and with different levels. From equation (2.3)the ITD can be calculated using the approximated path difference illustrated in (b).

As seen in figure 2.2a, the sinusoid reaches the ears at different times due to the propa-gation time of a sound wave, which is referred to as the interaural time difference (ITD).The ear that the signal reaches first is called the ipsilateral ear and the other ear is called

6


the contralateral ear. Furthermore, the level of the signal at each ear is different, whichis referred to as the interaural level difference (ILD).

Calculating the exact ITD can be complicated. As shown in figure 2.2b, if the head isapproximated as being a sphere with radius, b, and the incident angle of a sound wave isexpressed by azimuth, θ, the path difference between the ears can be calculated as

d = b sin θ + bθπ

180. [m] (2.1)

This calculation of path difference is only valid for φ = 0. Including elevation as aparameter, the generic calculation of path difference is

d = b

(sin θ sinφ+

arcsin (sin θ sinφ) π

180

)[m] (2.2)

[Gardner, 1997, p. 38]. From equation (2.1) or equation (2.2) and knowing the speed ofsound in air, c = 343 m/s, the ITD can be found as

∆tITD =d

c. [s] (2.3)

The ITDs may vary from 0 µs when a source is directly at 0 azimuth and up to around690 µs when a source is placed at ±90 azimuth. In practice the ITD will however varywith the frequency for a given angle. The ITDs at frequencies below 1.5 kHz may beup to 1.5 times larger than for frequencies above due to the spherical diffraction of thepropagating wave [Gardner, 1997, p. 18].

For a simple sinusoid, the ITD corresponds to the interaural phase difference (IPD)between the two ears. As long as the wavelength of the sinusoid is longer than the pathdifference between the ears, the IPD will give unambiguous cues for localization. Thepath difference is calculated from equation (2.2).

As frequency increases, the ITD will consist of several cycles of the sinewave, whichwill result in ambiguous cues, since a sound from other angles will correspond to thesame IPD. The ITD proves to give the most effective localization cues when it can beregarded as an IPD. This is valid for frequencies up to 1.5 kHz, where a full wave-cycleis approximately 690 µs. Notice that this limitation is angle-dependent, and the 1.5 kHzlimit is for an angle of θ = ±90. Furthermore, this frequency limitation is valid onlyfor simple signals like a sinusoid. For more complex sounds it has been concluded thatthe envelope of the signal is evaluated instead, and the ITD is based on the onsets andends of these envelopes. Thus, ITD cues can be developed for signals with no frequencycontent below 1.5 kHz, since the time difference between envelopes are evaluated instead.[Moore, 2003, p. 236]

The use of ITD alone as a localization cue is not enough, since sources at multipleazimuths and elevations may give rise to similar ITDs. This problem is referred to as thecone of confusion as seen on figure 2.3.

7


Figure 2.3: The cone of confusion for a given ITD. The surface of the cone corresponds to source pointsthat yield the same ITDs.

For a given ITD, the surface of the cone in figure 2.3 illustrates possible source pointsthat all yield the same ITD. Other cues, however, help solving the ambiguous cues fromITDs in the cone of confusion. These will be described later. [Moore, 2003, p. 249]

The other localization cue illustrated in figure 2.2a is the ILD, which occurs due to thedifferent sound paths having different attenuations. This is caused by increased pathlength, head shadowing and other effects of head, pinnae and body. At frequencies below500 Hz, the ILD becomes negligible due to the large wavelengths, which will ”bend” aroundthe head. This is referred to as diffraction, where the size of the head is small comparedto the wavelength and thus causes little ”shadowing” of the sound wave propagating. Anexception appears, however, when the sound source is close to the listener, and the ILDsmay be of significance again. For higher frequencies little diffraction occurs and the ILDmay be as high as 20 dB. [Moore, 2003, p. 235]

Conclusively, the high frequencies of a sound may yield ambiguous ITDs, where lowfrequencies may yield negligible ILDs. Thus, the use of both localization cues expandsthe frequency area for possible localization. The frequency relation of these localizationcues is referred to as the ”duplex theory”, which is only defined for sound sources in thehorizontal plane. The knowledge of envelope-analysis that yields usable ITDs of soundswith high frequency content contradicts the definiton of this relation. Furthermore, whenthe listener moves his head around and performs a dynamic localization, many of thedescribed localization issues are overcome. [Moore, 2003, p. 236]

Localization Performance

The localization performance can be separated into the ability to localize a stationarysource and displacement of a source compared to a reference.

It has been found, that a stationary direction can be located with an uncertainty of10 azimuth and 12 elevation for real sources. For virtual sources the uncertainty isincreased to 12 azimuth and 24 elevation [Pedersen and Jørgensen, 2005].

8


As previously mentioned the cone of confusion gives difficulties in judging whether asound is coming from the front or back, or from above or below the horizontal plane.In these situations a head movement in the vertical or/and the horizontal axis can helpdetermine location of the source. [Moore, 2003, p. 235]

The second aspect is to consider how well a listener can detect a small shift in positionof a sound source. This aspect is used to measure the resolution of the localizationability, which is the smallest detectable change in angular position of a source relativeto the listener. This is referred to as the minimum audible angle (MAA). The MAA willlikewise change correspondingly with change in azimuth and elevation of the source. Witha source placed around 0 azimuth, the MAA has been found to be 1 for frequenciesbelow 1 kHz, whereas for an angular position of 60 to 70 azimuth, the MAA was foundto be as large as 20. [Moore, 2003, p. 238]

In the frequency range between 1.5 kHz and 1.8 kHz, the performance worsens, yieldinglarger MAAs. This is consistent with the duplex theory due to the cues of ITDs becomingmore ambiguous with increased frequency together with the negligibility of ILDs at lowerfrequencies. The low MAA is more or less consistent in the high frequency range as longas source angles are kept small. At higher angles the MAA increases considerably. Thisis expected to be due to the angle dependent frequency limit of the ITD cue.

Regarding the thresholds of the ITD and ILD cues individually, changes in the ITDbecomes undetectable in the high frequency range (if the envelope-analysis of complexsounds is disregarded). In the low frequency range, an ITD roughly yields an MAA of 3.Contrary, the ILD is detectable in the whole audible frequency range, although efficientILD cues only occur at higher frequencies. [Moore, 2003, p. 238]

2.1.2 Monaural Cues

Until now, the head has been modeled as a sphere with two holes, and thus the effects ofthe external ears, i.e. the pinnae, the head and body have not been taken into account.Furthermore, only the horizontal plane has been investigated. Depending on the incidentangle of the sound, the pinnae will affect the spectrum of the incoming sound to somedegree. It has been suggested that the pinnae help providing spectral cues for verticallocalization as well as cues for dinstiguishing between sources from front and back. Theresult of a sound interacting with the pinnae is relatively sharp peaks and dips in thefrequency spectrum from 6 kHz and above. Adding the effects of both head and body,spectral changes are seen from 500 Hz up to 16 kHz. As an example, a narrowbandnoise identical at both ears and centered around 8 kHz, causes the listener to perceivethe sound as coming from overhead, which suggests that a spectral peak for localizingsources overhead are at 8 kHz. [Moore, 2003, p. 250]

When a sound is heard it might be distorted from reflections of a room and nearby sur-faces. In the frequency domain these reflections cause spectral distortions. It is, however,believed that the high frequency content of these distortions are relatively smooth and

9


will not affect the sharp spectral cues provided by the pinnae. Furthermore, the earsprovide their own set of spectral cues, which gives the frequency-dependent ILD, alsoreferred to as interaural difference spectra.

2.1.3 Distance Perception

The cues described relate only to determining the location of sound by azimuth andelevation. A 3D experience is not based only on perceiving the direction of a sound,but also the distance to it. To differentiate between the actual sound source and theperceived sound source, the term ”auditory event” is introduced. An important cue todistance perception is the level of the sound related to distance. [Blauert, 1997, p. 117]

With a point source in free-field placed at distances from around 3 m to 15 m, the distanceof the auditory event is based only on the SPL at the ears of the listener. From the 1

r -law,the SPL of the sound source falls 6 dB for every doubling of the distance, r. It has beenshown, however, that a reduction of 20 dB SPL at the ears of the listener corresponds todoubling the distance of the auditory event, which does not confer with the 1

r -law. Fromthese results it can be derived that increasing the distance of the sound source does notcorrespond to the same increase of the auditory event when distance cues rely only onthe level of the sound. [Blauert, 1997, p. 118]

When the sound source is placed at distances of around 15 m and above, the acousticalpath from source to listener will have a larger effect on the transmitted sound. Due tohumidity of air and wind, the higher frequencies of sound will be attenuated more thanlower frequencies, and the SPL becomes frequency dependent with respect to the distanceto the sound source. [Blauert, 1997, p. 118]

At small distances, i.e. sources closer than 3 m, the incoming waves from the sound reactsstronger with the head, body and pinnae. Thus, these can not be neglected anymore,since the waves from the sound source can not be considered plane. Hence, the SPL at theears vary with frequency according to distance to the sound source. Since humans havedifferent sizes and shapes of head, body and pinnae, the frequency-dependency will beunique for each individual. Furthermore, the ILD increases with the decrease in distanceof the sound source in this range, which is independent of the SPL of the sound itself.[Blauert, 1997, p. 118][Moore, 2003, p. 265]

It should be noted that it is often easier to evaluate the distance of an auditory event forfamiliar sounds or when multiple sounds are present. Normally, sounds are not perceivedin free-field conditions, but in rooms with reflecting walls that affect the sound perceived.Hence, another important cue for perceiving distance is the ratio between direct andreflected sound. As a final remark, the best judgements of auditory events are made withthe presence of multiple cues, but judgement errors of distance are still around 20 %especially for unfamiliar sounds. The auditory events tend to be overestimated withsources near the listener and underestimated when far from the listener. [Moore, 2003,p. 265]

10

2.2 Binaural Reproduction


As learned, the human localizes sound through use of interaural and monaural cues. Forbinaural synthesis the influence of the head and torso must be presented by a transferfunction, called the head-related transfer function(HRTF). This section seeks to definethe HRTF and list some of its properties together with the issues that are introduced.This includes the investigation of their relation to dynamic localization, HRTFs measuredclose to the listener and utilizing HRTFs measured from another listener or dummy head.Furthermore, the principle of measuring HRTFs is described.

2.2.1 Head-related transfer functions

The HRTF is defined as the sound pressure perceived in each ear created by a givensource in space called a reference point. The HRTF is mathematically expressed as

HRTF(θ, φ) =Pear

Pref(θ, φ) , [−] (2.4)

where Pear is the sound pressure present in the respective ear, i.e. either left or rightear, and Pref is the pressure at the center of the head position with the listener absent[Hammershøi, 1995, p. 2]. The angle-dependency with azimuth θ and elevation φ will bedescribed later.

The HRTF can be seen as the Fourier transform of the head-related impulse response(HRIR), but for conveniency the expressions are kept in the frequency domain. TheHRTF is referred to as a pair of transfer functions respectively from a source to each ear.This might also be logically implied since in reproduction of binaural signals, both leftand right ear are used.

To describe the HRIR and corresponding HRTF relations between the left and rightear, the following figures will serve this purpose. The measurement data is taken from adatabase made at the CIPIC Interface Laboratory using a Knowles Electronic Mannequinfor Acoustic Research (KEMAR) with large pinnae [Algazi et al., 2001].

The first example is based on source points positioned with azimuths, θ = ±65 andelevation, φ = 45. The sources are positioned at a distance of 1 m from the listener. Themeasurements are performed in an anechoic room. The respective HRIRs and HRTFsof the source points are illustrated in figure 2.4. The source points are chosen to besymmetrically placed around θ = 0 in order to compare the signals at the left and rightear with regard to the source position.

As seen in figure 2.4a, the HRIRs of both left and right ear are depicted. The ITD isobserved between the left and right ear as the time delay between their onsets. The effectof the ILD is distinct, since the energy in the left HRIR is larger than that in the rightear. The sufficient length of the HRIRs generally corresponds to around 1 ms, which was

11


0 1 2 3 4−1.5

−1

−0.5

0

0.5

1

1.5

Mag

nitu

de [−

]

Time [ms]

Left earRight ear

(a) HRIRs of left and right ear. The length of anHRIR generally corresponds to around 1 ms. Onthis figure θ = −65.

102

103

104

−30

−20

−10

0

10

20

Mag

nitu

de [d

B]

Frequency [Hz]

(b) The corresponding HRTFs of left and rightear. On this figure θ = −65.

0 1 2 3 4−1.5

−1

−0.5

0

0.5

1

1.5

Mag

nitu

de [−

]

Time [ms]

Left earRight ear

(c) HRIRs of left and right ear. On this figureθ = 65.

102

103

104

−30

−20

−10

0

10

20

Mag

nitu

de [d

B]

Frequency [Hz]

(d) HRTFs of left and right ear. On this figureθ = 65.

Figure 2.4: HRIRs and HRTFs from reference point at θ = ±65 and φ = 45. Notice the ITD andILD apparent between left and right ear in the HRIRs. The HRIR of the right ear is offset by 0.5 forillustrative purposes only. Furthermore, it is observed that the high frequency content in the HRTF isstrongly dependent on the position of the reference point.

sufficiently described for other test subjects in the CIPIC database [Algazi et al., 2001].Hence, this length is assumed to be sufficient for the general HRIR.

In figure 2.4b the HRTFs observed, correspond to the left and right ear again. Whatis noticable, is the spectral pattern in the high frequency area consisting of peaks anddips produced by the head and pinnae. For any given direction, the HRTF will providea unique frequency response corresponding to that direction.

Comparing figure 2.4a with figure 2.4c, roughly the same ITD and ILD is observed, merelywith the signal at the right ear being dominant. As for figure 2.4d, a slightly different

12


spectral pattern is observed when compared against figure 2.4b. As an example, the largedip observed just above 10 kHz for the left ear in figure 2.4b does not exist for the rightear in figure 2.4d. Diversities are likewise seen between the right ear in figure 2.4b and theleft ear in figure 2.4d. It would be expected that the spectral patterns of those comparedshould be alike, since the source points are placed symmetrically around θ = 0 and aKEMAR is used.

Expanding the comparison of HRTFs between the left and right ear, figure 2.5 and fig-ure 2.6 give an overview of how the spectral pattern of an HRTF relates to azimuth andelevation of a reference point.

Figure 2.5: HRTFs in the horizontal plane (φ = 0) of left and right ear when varying azimuth ofreference point. An FFT of length 200 is used with a rectangular window, yielding a frequency resolutionof 220.5 Hz.

What can be observed in figure 2.5 is the increase in high frequency content in eitherleft or right HRTF, when the reference point moves to the corresponding side, i.e. mov-ing towards θ = ±90. As assumed before, an ideally symmetric head and body wouldresult in the left HRTFs corresponding to the right HRTFs mirrored around θ = 0.

13


Again as observed in figure 2.5, the azimuths mirrored around θ = 0 are not identical.This suggests that the KEMAR is not truly symmetrical or the deviations occured frommeasurement uncertainties.

Figure 2.6: HRTFs in the median plane (θ = 0) of left and right ear when varying elevation of referencepoint. An FFT of length 200 is used with a rectangular window, yielding a frequency resolution of220.5 Hz.

From figure 2.6 a frequency dip is seen around 8 kHz throughout the different elevations.The dip is, however, less prominent for an elevation of 90. This observation is valid forboth ears. It has previously been verified that a peak at 8 kHz in the HRTF correspondsto the listener perceiving the sound as coming from directly above himself [Moore, 2003,p. 251]. Commonly for figure 2.5 and figure 2.6, the low-frequency content varies littlewith changing angle, which agrees theoretically. At lower frequencies, the indication ofsymmetry around 90 in the spectral pattern is observed. For the high frequencies above8 kHz, no symmetry around 90 is seen. This could indicate that the high frequencyregion is used for distinguising between front and back.

14


Dynamic Localization

As indicated in the start of this chapter, the HRTF includes the localization cues usedin 3D sound perception. The perception of the sound source as coming from a givenreference point is only valid as long as the head is kept still. If the listener turns his head,the HRTFs change and the perceived reference point moves. An attempt to counteractthis can be done through head-tracking, where the HRTFs are updated to the new headposition. With successful head-tracking the listener may benefit from the use of dynamiclocalization cues, which will reinforce his 3D experience [Gardner, 1997, p. 30]. Wearinga head-tracker may, however, result in the same inconvenience as wearing headphones forbinaural reproduction as previously mentioned. It may likewise be expensive to supply ahead-tracker with the application. Hence, the use of head-tracking will not be investigatedfurther in this report.

Near-field HRTFs

As discussed in section 2.1.3, the frequency spectrum of a transmitted sound close to thelistener becomes distorted when measured at the ears. Hence, the SPL of the sound be-comes frequency-dependent due to the complex interaction with head, body and pinnae.HRTFs obtained at such close distances are by most articles normally referred to as near-field HRTFs, which are defined at distances smaller than approximately 1 m [Brungart,1999], [Brungart and Rabinowitz, 1999], [Guang-Zheng et al., 2008]. At increased dis-tances the change of frequency content becomes smaller.

Previously in this chapter, it was concluded that the change in interaural cues are supe-rior to the change in monaural cues. It was seen that the ILD was largely affected bydistance, whereas the ITD was not. Furthermore, the spectral cues related to perceptionof elevation seemed independent of distance too. A noticeable change in the binauralcues is caused by the effect of the acoustic parallax, which is illustrated in figure 2.7.

Comparing figure 2.7a and figure 2.7b, a closer sound source results in a decreased anglefrom source to ipsilateral ear, since the source is always directed towards the center ofthe head. Hence, this causes a lateral shift of the high-frequency features in the HRTF.[Brungart and Rabinowitz, 1999]

When trying to change distance perception, the remapping of the high-frequency featuresin the near field should be accounted for.

15


Distance 1 m

45˚ 41.1˚

(a) A sound source is placed at a distanceof 1 m and 45 azimuth relative to thecenter of the head and 41.1 azimuth rel-atively to the right ear.

Distance 0.2 m

45˚ 20˚

(b) A sound source is placed at a distanceof 0.2 m and 45 azimuth relative to thecenter of the head and 20 azimuth rela-tively to the right ear.

Figure 2.7: The acoustic parallax shown at two different distances of a sound source directed towardsa head being 18 cm wide. The distances are not in proportion, but serve only illustratively.

Individual HRTFs

Since individuals have different sizes and shapes of their heads and pinnae, the HRTFswill likewise differ between individuals. As learned from section 2.1, the effects of thepinnae occur for frequencies above 6 kHz, and since each individual have unique pinnaefeatures, this implies that each individual will receive other localization cues from HRTFsof others compared to their own.

A study performed in [Hammershøi, 1995, p. 15] showed the inter-individual differencesbelow 7 kHz had a standard deviation of less than 2 dB. From 7 kHz to 20 kHz thestandard deviation increased up to 11.5 dB. In previous measurements it was concludedthat localization in the horizontal plane seemed more robust compared to the verticalplane. However, the front from back and up from down localizations when simulatinga free-field condition with headphones had a worse performance compared to using theHRTF of the individual [Moore, 2003, p. 251].

Measuring HRTFs

HRTF measurements usually take place in an anechoic chamber to ensure that no roomreflections influence the HRIRs. As seen from equation (2.4), the frequency dependentpressure amplitudes, Pear and Pref, must be measured. Through use of either a dummyhead or real test subject, Pear is found for both ears. The reference pressure amplitude,Pref, is found by placing a microphone at the center of the head position (figure 2.8).

16


Pear,L

Pear,R

(a) Measurement principle for ob-taining the acoustical pressure ineach ear.

Pref

(b) Measurement principle for ob-taining a reference pressure. Micro-phone is positioned at the center ofthe head.

Figure 2.8: Measurement principle for obtaining Pear and Pref, whereafter the HRTF can be calculatedfrom equation (2.4).

Since Pear should correspond to the acoustical pressure at each eardrum, the recordingmicrophone theoretically has to be placed at the eardrum. This, however, results inpractical issues and the inconveniency of performing such measurements. Instead, itis possible to record the pressure at the blocked entrance of the ear canal, and it haspreviously been concluded that the differences between the individuals are smaller whenrecording at the blocked entrance. Hence, the recorded HRTFs may prove more applicablefor use. [Hammershøi, 1995, p. 35]

An issue that relates to the binaural reproduction is the reproduction chain itself. Thatis, the transfer functions of the microphones recording the binaural signal, the transferfunctions of the amplifiers used and the transfer functions of the loudspeakers used toreproduce the binaural signal. In short, the influence of the chosen transducers andamplifiers must be accounted for. These issues will be dealt with later in the report.

Summary

This chapter described several aspects regarding human sound perception, and here themost relevant of these are summarized.

The ILD and ITD cues of the perceived sound can be used to determine the direction of asource. The ITD cues were found to work best in the frequency range below 1.5 kHz andILD cues were found to work best above 1.5 kHz. The sound perception allows stationarylocalization of virtual sources with an uncertainty of 14 azimuth and 24 elevation.

17


For binaural synthesis, HRTFs can be used to add localization cues to a monaural sound.The HRTF models include influences of the head and torso and are individual for eachperson. In the frequency range below 7 kHz, these inter-individual differences are foundto be less than 2 dB.

These aspects of sound perception will be used further on in the design of the XTCsystem.

18

Chapter 3Introduction to Crosstalk

Cancelation

The purpose of this chapter is to introduce the aspects of XTC and identify relevantproperties before a more comprehensive analysis is carried out. First the XTC principleis explained and definitions of sweet spot are introduced. Afterwards it is investigatedhow the XTC performance is affected by the positioning loudspeakers. This will be basedon a literature study of previously conducted work.

3.1 Basic Example of XTC

In this section it is explained how two loudspeakers are able to provide XTC. This isdone through a basic theoretical example with point sources, from [Kirkeby, 1998]. Thescenario is shown in figure 3.1. The ears of the user are represented by the two yellowtriangles in the middle, and loudspeakers are represented by the widely separated cyansquares in the top. The goal in this example is to deliver a single positive wave fromthe left loudspeaker to the left ear, and make sure it is canceled at the right ear. Theprinciple is explained through the following steps, and the corresponding time signalsshown in figure 3.2.

19

3 Introduction to Crosstalk Cancelation

X−Position [m]

Y−

Pos

ition

[m]

−0.5 −0.25 0 0.25 0.5

0

0.25

0.5

0.75

1

(a) Desired wave (white) and first cancelationwave (black).

X−Position [m]Y

−P

ositi

on [m

]

−0.5 −0.25 0 0.25 0.5

0

0.25

0.5

0.75

1

(b) First waves reaching destination.

Figure 3.1: Example of how two loudspeakers can provide XTC. The figures show the sound field duringXTC as an animation, where (a) is the first state and a later state is shown on (b). The principle isexplained in the enumeration, and the signals transmitted can be seen in figure 3.2.[Kirkeby, 1998, p.391]

1. The left loudspeaker transmits the desired positive wave, represented by the whitering in figure 3.1a.

2. The right loudspeaker transmits a negative wave with the purpose of canceling thedesired wave as it coincides with the right ear. This is seen as the black ring infigure 3.1a. The propagation distance from right loudspeaker to right ear will beshorter, than from left loudspeaker to right ear. Therefore the cancelation wavemust be delayed and attenuated compared to the desired wave.

3. The first cancelation wave will inevitable also be transmitted to the left ear. Forthis reason, the left loudspeaker will transmit a second cancelation. This will be apositive wave, and it will have smaller amplitude than the first cancelation wave.The attenuation of amplitude is however only visible on the time signals (figure 3.2).

4. The desired wave arrive as seen in figure 3.1b. It can also be seen how the desiredwave is about to coincide with the first cancelation wave at the right ear.

5. After some duration the cancelation waves will have decreased so much in ampli-tude, that they become negligible.

A detail that can be noticed in figure 3.2, is that the two signals transmitted from theloudspeakers are similar, but out of phase.

20

3.2 Performance Measures

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

−1

−0.5

0

0.5

1

Time [s]

Am

plitu

de [V

]Signals to loudspeakers

Right loudspeaker signalLeft loudspeaker signal

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

−2

−1

0

1

2

Time [s]

Am

plitu

de [P

a]

Signals at left ear

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 10−3

−2

−1

0

1

2

Time [s]

Am

plitu

de [P

a]

Signals at right ear

Figure 3.2: Time signals of the waves propagating in figure 3.1. It is seen how the waves will interfere.They will cancel each other at the ears, and only the desired wave will remain at the left ear.[Kirkeby,1998, p. 390]

3.2 Performance Measures

Two measures are used to evaluate the performance for the XTC system in a specificlocation. These measures have been used previously in [Bai and Lee, 2006a, p. 4].

• Channel Separation (CHSP): Expresses the crosstalk amplitude (contralateralsignal path) with respect to the amplitude of the desired signal (ipsilateral path).

• Performance Error (PE): Expresses how much the ipsilateral signal path devi-ates from a flat frequency response.

Both measures will be obtained by sampling discrete frequencies, within the specifiedreproduction range. The spacing between the discrete frequencies is logarithmic. Nosymmetry is assumed during calculation, meaning that right and left side must be cal-culated individually. A measure of the complete transfer function from system input x

21


to receiver point y, when XTC has been applied, is stored in a matrix denoted R. Thevalues of these performance measures are calculated as

CHSPx[k] =RxIpsi[k]

RxContra[k], [−] (3.1)

where the index k indicates discrete frequency, and the index x indicates either left orright channel. Note that the CHSP will yield an attenuation, hence a more negative valuecorresponds to a better CHSP performance. The vectors CHSPx, are reduced to a singlemean value, as expressed by

CHSPxAvg =1

M2 −M1 + 1

M2∑

k=M1

20 log10 (|CHSPx[k]|) , [−] (3.2)

where k denotes specific frequencies and M1 and M2 denotes respectively the start andstop index for the evaluation. The performance error is calculated as

PEx[k] =z−m

RxIpsi[k], [−] (3.3)

where z−m represents a discrete time delay. This is included to ensure a delay will not bepenalized as performance error. The performance error is also averaged to a single meanvalue expressed as

PExAvg =1

M2 −M1 + 1

M2∑

k=M1

20 log10 (|PEx[k]|) . [−] (3.4)

3.3 Sweet Spot Definitions

An XTC system will only be able to work around a specific location known as the sweetspot. To achieve good spatial perception, the location of the ears is the important factor.This will imply that each ear has to be located at a specific position, but for conveniencethe sweet spot definitions will be based on where the center of the users head is located.

• Optimal sweet spot: The point for which the XTC system is optimized to workat. The best CHSPxAvg will be achieved in this point. The user is expected to havethe head located here.

• Absolute sweet spot: Defines the region around the optimal sweet spot wherethe XTC system can provide a desired CHSPxAvg performance.

• Relative sweet spot: Defines the region around the optimal sweet spot, wherethe CHSPxAvg performance is not reduced more than an amount with respect to theXTC performance in the optimal sweet spot. The relative sweet spot only providessufficient performance if there is sufficient performance in the optimal sweet spot.

In figure 3.3 the three different types of sweet spot can be compared.

22

3.4 Loudspeaker Span Analysis

Figure 3.3: Illustration of how the different types of sweet spot are related. For this figure both absoluteand relative sweet spot are defined as -12 dB. Since the optimal sweet spot in this case provides -30 dBof CHSPxAvg, the relative sweet spot extends out to the area which provides at least -18 dB.


This project will focus on a specific implementation with a fixed loudspeaker setup. Theseloudspeakers will span a certain angle in front of the user. The loudspeaker span Θ isdefined as two times the azimuth angle θ shown in figure 2.1a, on page 5. As an example,a loudspeaker span of Θ = 60 means that the loudspeakers are located at θ = ±30.

From articles describing research in the field of XTC, it was found that the loudspeakerspan is very important for the performance of the XTC system. The following perfor-mance parameters was found to be affected:

• Maximum XTC performance at the optimal sweet spot.

• The size of sweet spot (both absolute and relative), thereby robustness to headmovement.

• Frequency range where the XTC is working.

The most relevant knowledge obtained regarding the effect of the loudspeaker span fromthe literature is described in the following.

3.4.1 The Stereo Dipole

The article [Kirkeby, 1998] suggests a loudspeaker span of 10 as a solution providinga large sweet spot. The specific loudspeaker span of 10 is commonly referred to as a”Stereo Dipole”.

23


X−Position [m]

Y−

Pos

ition

[m]

−0.5 −0.25 0 0.25 0.5

0

0.25

0.5

0.75

1

(a) Wave propagation for 60 span

X−Position [m]Y

−P

ositi

on [m

]

−0.5 −0.25 0 0.25 0.5

0

0.25

0.5

0.75

1

(b) Wave propagation for 10 span

Figure 3.4: Comparing (a) and (b) it is seen that the sound field is much more uniform for the spanof 10, which should imply that the XTC system has a larger sweet spot. [Kirkeby, 1998, p. 391]

It was illustrated that a more uniform sound field was created (i.e. a larger sweet spot)by having the loudspeakers placed closer as seen by comparing figure 3.4a and 3.4b.Furthermore, it was also found that the smaller the span, the more power will be requiredfor XTC, especially for lower frequencies. This will limit the usable frequency range.

The simulations in [Kirkeby, 1998] are based on an idealized theoretical model. Thearticle accounts for neither the effect of the HRTF nor the loudspeakers, and there is noinvestigation of subjective localization performance.

3.4.2 Comparison of Loudspeaker Spans

In the article [Bai and Lee, 2006a] the practical influence of the loudspeaker span wasinvestigated more comprehensively. The angle spans 10, 60 and 120 were selectedfor evaluation, with the constraint of filters having the same mean square gain within adesired frequency range.

Simulations showed that a span of 10 gave worse XTC performance at lower frequenciescompared to larger spans. However, the larger spans showed irregularities in performanceat the higher frequencies by dips in the frequency response.

By calculating the mean XTC performance for a frequency range from 100 Hz to 6 kHz,larger spans were found to give the best performance in the optimal position. A subjectivelistening test also showed better localization of a virtual sound source, when the largerspans were used.

The smaller span was found to provide a larger area where the XTC performance wassimilar, i.e. a larger relative sweet spot. On the other hand the larger spans were able to

24


maintain a sufficient level of XTC performance within a larger area, i.e. a larger absolutesweet spot.

Subjective listening tests at lateral positions of 5 cm and 10 cm showed best localizationof a virtual sound source, when the larger spans were used. Hereby, the absolute sweetspot seems to be a more usable measure of robustness to head movement. This mightbe obvious, since it means that higher performance yields better localization cues, andthus gives better subjective localization. Comparing the span of 60 and 120 gave similarlocalization performance. However, the localization of a centered virtual source was worsefor the 120 span.

The reason for better localization for larger spans could originate from the fact that XTCof the 10 span had a low frequency limit of approximately 1 kHz, which is due to thegain constraint. Since ITD cues are best below 1.5 kHz, only ILD cues can be reproducedcorrectly. The head shadowing is also expected to contribute to better performanceof the larger spans. Furthermore, a larger span implies a larger distance between theloudspeakers, meaning that it will be able to reproduce a wider 3D sound stage withinsufficient XTC.

3.4.3 Relationship between Frequency Range and Filter Gain

From the above, the optimal loudspeaker angle seems to be around 60, although it will bea compromise between high and low frequency reproduction. The paper [Takeuchi and Nelson,2002] investigated this compromise and found the connection between frequency, loud-speaker span and filter gain, which can be seen in figure 3.5. From this figure it can beseen a loudspeaker span of 10 requires a large filter gain to reproduce the low frequencyregion, as it was stated in [Kirkeby, 1998]. However, the 10 span will require low filtergain to reproduce the high frequency range above 2 kHz. If it also should cover thefrequency range below, more filter gain will be required. A larger span of e.g. 40 wouldbe more suitable for this frequency range, however such span will show irregularities athigher frequencies. This is consistent with [Bai and Lee, 2006a], which also pointed outthat irregularities occur at the high frequency region when larger spans are used.

These observations lead to the principle of optimal source distribution (OSD), which isto only use a certain angle for reproducing the frequency range, where low filter gainis required. This means a loudspeaker with an angle dependent frequency response isrequired. A more realistic solution is to divide the frequency range between multipleloudspeaker units, by means of a crossover. The higher frequencies should originate froma small span, i.e. centered in front of the lister, and the lower frequencies should originatefrom the sides. This is illustrated in figure 3.6

Successful implementations of this were found in [Takeuchi and Nelson, 2000]. Herea three-way and two-way solution only required respectively 7 dB and 18 dB correc-tion, to give the performance of a one-way solution, which required 40 dB correction[Takeuchi and Nelson, 2002, p. 2794]. In [?] a listening test comparing the three-way

25


Loudspeaker span [°]

Fre

quen

cy [H

z]

20 40 60 80 100 120 140 160 180

63

200

630

2000

6300

20000

Req

uire

d fil

ter

gain

[dB

]

0

5

10

15

20

25

30

35

40

45

50

Figure 3.5: Dependency between filter gain, loudspeaker span and frequency. Notice that an XTCsystem will require a high filter gain to work over a wide frequency range no matter which loudspeakerspan is selected.[Takeuchi and Nelson, 2002, p. 2793]

OSD arrangement to a 10 loudspeaker span was found. Here, the OSD arrangementyielded the better localization performance, especially for reproduction of a wide soundstage. Furthermore, subjects reported that the overall sound quality was improved forthe OSD arrangement, which could be due to a wider frequency range.

r

en

f equcy

freen

qucy high

low low

Figure 3.6: The principle of OSD for XTC system is to use larger loudspeaker spans to re-produce the lower frequencies and smaller spans to reproduce the higher frequencies. Inspired by[Takeuchi and Nelson, 2002, p. 2791].

26


3.4.4 Summary

The findings of the previous work are summarized up in the following:

• Small loudspeaker span:

– Gives good performance in the high frequency region.

– Requires more filter gain to provide XTC in the low frequency region.

– Provides a large relative sweet spot.

• Large loudspeaker span:

– Gives good performance in the low frequency region.

– Shows irregularites in the high frequency region.

– Provides a large absolute sweet spot.

– Utilizes the natural head shadowing.

– Reproduces a wider sound stage.

• A measure of sweet spot should be absolute, hereby describing the region whereit is able to keep a certain level of XTC performance. This performance measurecorresponds better to the subjective localization performance.

• Less filter gain is needed if the frequency range is split up, and multiple loudspeakersare used according to the OSD principle.

In order to carry on with the design of the XTC system, the knowledge obtained in thischapter will be used to specify the project requirements in the following chapter.

27

Chapter 4Requirement Specification

At this point, a requirement specification for the XTC system is developed. It is, however,not possible to determine all specifications until further study has been undertaken. Thefollowing sections specify the requirements needed for implementation and realizationof the XTC system, and the chapter is concluded with a table that summarizes thespecifications.

4.1 Application Setup

The loudspeaker configuration will be set on basis of the OSD principle mentioned insection section 3.4.3. It is decided to design an XTC system on basis of a two-wayloudspeaker configuration. It was decided to use identical loudspeaker units for highand low frequency regions instead of units specifically designed for a certain frequencyregion. The loudspeakers covering the low frequency range are denoted as bass’, whilethose covering the high frequency range are denoted as tweeters.

The user of the multimedia setup is always assumed to be positioned symmetrically ac-cording to the loudspeakers and facing the center of the computer monitor. The geometricrelationship between user and the multimedia setup is set from what the group has foundreasonable. Figure 4.1 depicts the parameters that limit the physical setup.

Á

r

Monitor

Loudspeaker

d1

r

(a) Side view.

Monitor

d3

d2

d1

Bass Tweeter

(b) User view.

Figure 4.1: The different lengths and angles for the application setup.

The distance r defines the distance from the user to the center between the speakers,

29

4 Requirement Specification

which is set to be the same as the distance from user to center of the monitor. Thisdistance is set to r = 0.75 m. The distance d1 is set to 0.35 m. This ensures thatthe loudspeaker setup does not conflict with the monitor. The elevation φ indicates themonitor elevation related to the center of the head. This angle is calculated by

φ = sin−1

(−d1

r

)= sin−1

(−0.35

0.75

)= −27.82 . (4.1)

The distance d2 is limited to a maximum of 0.5 m. This limit serves as a constraint forthe width of the loudspeaker setup in order to maintain realism for a commercial product.It thereby determines the maximum span Θ of the loudspeakers by

Θmax = 2 · tan−1

(d2

r

)= 2 · tan−1

(0.5

0.75

)≈ 67.4 . (4.2)

The distance d3 cannot be smaller than the radius of the frame of the tweeters. Hence,d3 must be equal or larger than this radius in order for the two tweeters not to overlap,this could be a limitation since the OSD principle will position the tweeters close to thecenter.

It is considered that the loudspeakers are most likely to be mounted on a straight lineas seen in figure 4.2, if the application should be used for a commercial product. Thissetup, however, introduces a delay between the tweeters and the bass’ due to the differentdistances to the user. Hence, this delay should be taken into account when designing theXTC system.

rtweeter

rbassr

Figure 4.2: Top view of the loudspeaker setup. From the user point, the distance to the bass speakerswill be longer than to the tweeters.

4.2 Sound Reproduction

To divide the frequency range between the tweeter and the bass, a crossover network mustbe implemented. The crossover frequency must be determined from the OSD principle

30

4.2 Sound Reproduction

such that the needed gain for the loudspeakers is as low as possible. It it chosen to useFIR filters.

As the physical dimensions of the loudspeaker setup has been set, this implies otherlimitations such as choosing the loudspeakers. For this project, the chosen loudspeakersare tube speakers built at the Section of Acoustics, Aalborg University. They are based ona Vifa M10MD-39-8 loudspeaker unit [Danish Sound Technology, 2002]. The frequencyresponse of the loudspeaker is depicted in figure 4.3.

102

103

104

60

70

80

90

100

110

120

Frequency [Hz]

Max

SP

L [d

B r

e. 2

0 µP

a]

Figure 4.3: Frequency response of a typical tube loudspeaker [Oddershede et al., 2008, p. 123]. Themaximum SPL is calculated from a frequency response measured at ±1 V and scaled according to themaximum rating.

As seen, the loudspeaker rolls off around 200 Hz towards the lower frequencies. It isdecided that the XTC syetem must be able to provide at least 94 dB SPL at the ears ofthe listener, which is assumed more than sufficient for use in a multimedia setup. Fromthis requirement, the low frequency limit will be calculated later on.

It is chosen to compensate for the loudspeaker frequency response through the channeltransfer function measurements performed in appendix C page 127. Thus, the uncertain-ties regarding the loudspeakers are taken into account. Furthermore, the reproductionof the authentic sound source is improved. Regarding equalization, a requirement for anaverage performance error below 3 dB is chosen.

4.2.1 Application Frequency Range

Determining the actual frequency range of the XTC system is not possible until furtherstudy of XTC analysis has been performed. This analysis will be based on the frequencyresponse shown in figure 4.3 and the data from the datasheet which can be found on theenclosed DVD

31

4 Requirement Specification

Regarding the upper frequency limit, a specific frequency is not directly obtainable. How-ever, HRTFs start to become individual for frequencies above 6 kHz. For this project, thesystem is based on HRTFs measured using a head and torso simulator. Furthermore, thesweet spot size will decrease with increasing frequencies in the XTC system [Bai and Lee,2006b]. Based on these presented issues, the upper frequency limit is set to 6 kHz.

Although the XTC system is limited to a certain frequency range, it is still possible toplay the remainder of the audible frequency spectrum without XTC. This is under theassumption that signals in the XTC frequency range are unaffected by signals outsidethis range. It has previously been suggested to add a subwoofer to play the subsonicfrequencies, which would not affect the XTC performance [Takeuchi and Nelson, 2002].However, other issues might emerge from adding the remaining audible spectrum.

As learned in section 2.1, the monaural cues for the listener becomes more individual withhigher frequencies. This was primarily due to the different shapes and sizes of the pinnaebetween individuals. Hence, the question arises: Will a listener benefit more from hearingthe non-individual localization cues compared to no cues for the high frequencies at all?In order to keep focus of this project, it is decided not to go deeper into this subject, sincethis might require a comprehensive analysis to be backed up by listening tests. It mightbe expected though that the lack of any high frequency cues may result in more frontfrom back localization errors. Furthermore, the ILD cues caused by head-shadowing arestill present at higher frequencies, and without XTC this might deteriorate the spatialperception. Conclusively, the entire frequency range of the application will be limited tothat of the XTC system.

4.3 XTC Performance

The requirements for XTC performance is divided into channel separation (CHSP) andsize of sweet spot. Among the three different sweet spots definitions described in sec-tion 3.3, only the demands for the absolute sweet spot are specified. This implies aminimum achievement of CHSP throughout the specified region of the sweet spot, whichspecifies the XTC performance sufficient for localization of the sound source. The per-formance requirement is chosen to be -12 dB, which was chosen since it was previouslyused by [Bai and Lee, 2006a, p. 5].

The requirement for sweet spot size is set to a lateral movement of ±0.05 m and a frontalmovement of 0.1 m forward. The user is not expected to move his head backwards. Thisgives a sweet spot area of 0.01 m2 in the horizontal plane.

32

4.4 Specification Summary

4.4 Specification Summary

Table 4.1 summarizes the specifications for the XTC system obtained at this point.

Application Setup:

Distance to monitor r = 0.75 mLoudspeaker elevation from head 27.82

Maximum bass span 67.38

Sound Reproduction:

Lower frequency limit TBDUpper frequency limit 6 kHzPEavg <3 dBSPLmax >94 dBLoudspeaker Topology Two-way systemLoudspeakers Tube loudspeakerCrossover network FIR filters

XTC Performance:

CHSP <-12 dBAbsolute sweet spot size ±0.05 m lateral

0.1 m frontal

Table 4.1: Specifications for implementation of XTC system.

33

Chapter 5Crosstalk Cancelation Analysis

This chapter gives an analysis of XTC, first through a geometric interpretation leading tothe OSD principle. Next, a channel model for a basic XTC loudspeaker system is given.Following is an analysis of the source strength needed for an XTC filter implementation.Last, a comparison of matrix inversion methods is given, leading to an analysis of theleast square method.

5.1 Geometric Interpretation of the OSD Principle

This section gives a geometric interpretation of the XTC scenario, which leads to the OSDprinciple. It is shown that the additional distance from the ipsilateral to the contralateralear is essential in the need for XTC source strength. An expression that describes thedistance in free field is found and related to the wavelength. The difference in traveldistance is dependent on the span of the loudspeakers, and thus essential for the frequencyrange, where XTC is provided. The section is based on the OSD principle presented inthe paper [Takeuchi and Nelson, 2002]. First, the terminology is defined and afterwardsan example illustrates the geometric interpretation.

5.1.1 Terms and Definitions

The terms and definitions will be explained on basis of figure 5.1, and they will be usedthroughout the rest of the report.

From figure 5.1a the distance between lRL and lRR is defined as ∆l = lRL − lRR. As thesound must travel longer, the pressure is attenuated. This is expressed as

pYR=

1

lRR· pLR

pYL=

1

lRL· pLR

⇒

pYL=lRR

lRL· pYR

, [Pa] (5.1)

where pLRis the pressure produced by the right loudspeaker. The loss factor of which

the pressure is attenuated at YL compared to YR is expressed as

g =lRR

lRL. [−] (5.2)

35

5 Crosstalk Cancelation Analysis

µ

r

LR

¢b

YL

lRR

lRLd

YR

LL

lRC

(a) Terminology and geometry of a two-source system.

µYL

lRC

¢b

¢l

YR

(b) Simplification for calculating ∆l for thefree field model.

Figure 5.1: Terminology and simplification of the geometrics.

Since the position of the loudspeaker is referred to by the angle θ, a simplification of ∆lis made in figure 5.1b. This simplification expresses

∆l ≈ ∆b sin (θ) . [m] (5.3)

The simplification does not consider the shape of the head, and hereby it is assumedthat the sound can travel in a straight path from the loudspeaker to the ear. Instead ofdirectly including the shape of the head, a linear free field model is made. This modelapproximates the shadowing of the head by increasing the distance ∆b between the ears,as the loudspeaker angle θ is increased. The distance between the ears is found to betwice the size at an azimuth of 90 compared to 0 azimuth. Thus, a linear approximationis stated as

∆b ≈ ∆b0

(1 +

θ

90

), [m] (5.4)

where ∆b0 is the geometric distance between the ears, is made. This approximation isused throughout the project. [Takeuchi and Nelson, 2002]

As mentioned, the relationship between the length of ∆l and the wavelength is essential.Thus, ∆l is expressed in terms of a number of wavelengths as

∆l = aλ =a · cf

, [m] (5.5)

where c is the speed of sound in air, f is the frequency and a is the number of wavelengths.Next, the relationship between ∆l and the wavelength is shown to have direct relation tothe source strength needed for XTC. The need for source strength will be identified as arelationship between frequency (wavelength) and loudspeaker span.

36


5.1.2 Interpretation of OSD Principle.

As learned in section 3.1, the XTC principle was based on emitting cancelation wavesto reduce the crosstalk received at the ears. The only factor that stops the loop fromrunning endlessly is the loss factor g. A scenario where the free field distance ∆l = λ isdepicted in figure 5.2.

0 1 2 3 4

x 10−3

−4

−2

0

2

4

Time [s]

Tra

nsdu

cer

stre

ngth

[V]

RightLeft

(a) Loudspeaker signal for receiving one cycle.

0 1 2 3 4

x 10−3

−4

−2

0

2

4

Time [s]T

rans

duce

r st

reng

th [V

]

RightLeft

(b) Loudspeaker signal for receiving ten cycles.Steady state is not obtained.

0 1 2 3 4

x 10−3

−4

−2

0

2

4

Time [s]

Rec

eive

d st

reng

th [V

]

RightLeft

(c) Received signal.

0 1 2 3 4

x 10−3

−4

−2

0

2

4

Time [s]

Rec

eive

d st

reng

th [V

]

RightLeft

(d) Received signal.

Figure 5.2: The amplitude of the loudspeaker signals increases with the number of cycles transmitted.The objective is to only transmit a signal that is desired at the right ear only. In the figures, thewavelength and the free field distance between the ears are equal, corresponding to n = 4 (defined later).The received signals are delayed due to the propagation time.

In figure 5.2a it is seen how the loudspeakers first transmit the desired signal, and af-terwards transmit cancelation signals. Each new cancelation wave is g weaker than itspredecessor. The signals in figure 5.2c and 5.2d show that only the desired signal isreceived. However, when the desired signal is more than one cycle and ∆l = λ, the loud-speakers transmit 180 out of phase. Hereby, destructive interference becomes a problemthat results in a need for increased signal strength as seen in figure 5.2b. The problemregarding destructive interference is most significant every time the loudspeakers conductin phase or 180 out of phase.

37


If the loudspeakers conduct 90 or 270 out of phase, it can be shown that constructiveinterference is obtained. Hereby, the individual loudspeakers conducts a signal that isweaker than the received signal at the desired location. The case where the loudspeakersare conducting 90 out of phase is depicted in figure 5.3.

0 2 4 6 8

x 10−3

−4

−2

0

2

4

Time [s]

Tra

nsdu

cer

stre

ngth

[V]

RightLeft

(a) Loudspeaker signal for receiving one cycle.

0 2 4 6 8

x 10−3

−4

−2

0

2

4

Time [s]T

rans

duce

r st

reng

th [V

]

RightLeft

(b) Loudspeaker signal for receiving ten cycles.

0 2 4 6 8

x 10−3

−4

−2

0

2

4

Time [s]

Rec

eive

d st

reng

th [V

]

RightLeft

(c) Received signal.

0 2 4 6 8

x 10−3

−4

−2

0

2

4

Time [s]

Rec

eive

d st

reng

th [V

]

RightLeft

(d) Received signal.

Figure 5.3: In the figures n = 1, the amplitude of the loudspeaker signals decreases with the number ofcycles transmitted. At steady state the loudspeaker signal is lower than the receiver signal. The receivedsignals are delayed due to the propagation time.

In figure 5.3b it is seen that the individual loudspeaker strength is less than the strengthreceived in figure 5.3d. On basis of this illustration equation (5.5) is redefined such thatwhen a corresponds to a number of quarter wavelengths, the variable n = 4a takes aninteger value. Hence,

∆l = ∆b · sin(

Θ

2

)=a · cf

=n · c4 · f =

n · π2 · k , [m] (5.6)

where k is the wave number and n is the relationship between span and frequency, thatdetermines how well the system is conditioned. Expressing n yields

n =4∆l · fc

=4∆b0

(1 + θ

90

)· sin

(Θ2

)· f

c, [m] (5.7)

38


If n is even, ∆l corresponds to a multiple of λ/2 meaning that the system is as ill-conditioned as possible. This was illustrated in figure 5.2b, where the amplitude of thesignal is continuously increasing as long as the desired signal is transmitted. On the otherhand, when n is odd, the system is conditioned as good as possible, which is illustratedin figure 5.3b. Here the signal amplitude transmitted by each individual loudspeaker islower than the amplitude of the received signal.

Using equation (5.6) and setting n to an odd integer number, a relationship betweenfrequency and loudspeaker span is established as seen in figure 5.4. This relationship isoptimal with respect to the required source strength.

0 50 100 150

102

103

104

Loudspeaker span Θ [°]

Fre

quen

cy [H

z]

n=1

n=3 n=5

Figure 5.4: Frequency and loudspeaker spans that are optimal with respect to source strength. Alllines in the figure depict positions where n is odd.

Instead of using one loudspeaker for the system, a system using multiple loudspeakers canbe designed. This would require a loudspeaker at a fixed position to cover a subregionof the frequency range. As n is dependent of the frequency according to equation (5.6),the frequency range can be addressed by assigning an interval for n. A case where n isdefined to the interval 0.3 < n < 1.7 is depicted in figure 5.5.

As the interval of n increases, the required source strength also increases. Having a largeinterval for n gives the advantage that less loudspeakers are needed, but at a cost ofhigher source strength.

Summarizing, a solution based on the OSD principle is a trade-off between the numberof loudspeakers, the required source strength and the frequency range covered by XTC.

39


0 50 100 150

102

103

104


Fre

quen

cy [H

z]

0.3 < n < 1.7

Region of nLoudspeaker Freq. Range

Figure 5.5: Defining n to an interval, makes it possible to assign a frequency region to each of theloudspeakers. By positioning multiple loudspeakers, it is possible to cover most of the audible frequencyrange. In the figure 0.3 < n < 1.7 . The red lines mark the operational frequency range of a loudspeaker,and the angle where it is positioned. This figure shows a three-way system.

5.2 Channel Model

This section gives an analysis of XTC based on a simple multimedia setup using 2 sym-metrically positioned loudspeakers.

Several of the equations in the following sections deal with vectors and matrices in bothfrequency and time domain. Vectors and matrices in time domain are represented usingbold notation. Frequency domain vectors and matrices are represented in bold - italic

notation. Vectors are lowercase while matrices are uppercase letters. For simplicity,frequency variables are omitted from equations and figures.

The basic concept diagram in figure 1.2, page 2, can be expressed as the channel modelseen in figure 5.6, where xR is the right channel input, lR is the transfer function of theright loudspeaker, and hRL is the transfer function for the acoustical path from the rightloudspeaker to the left ear. Thus as previously described, hRL and hLR are the crosstalksignal paths.

lR

lL

xR

xL

hRR

hRL

hLR

hLL

Σ

Σ

yR

yL

Electro-acousticalpath ( )L Acoustical path ( )H

+

+

+

+

Figure 5.6: Channel model for a scenario using two loudspeakers.

40

5.2 Channel Model

In matrix notation, L and H can be expressed as

L =

[lR 00 lL

], H =

[hRR hLR

hRL hLL

]. (5.8)

The signal y at the ears and the input signal to the system x are vectors given by

y =

[yR

yL

], x =

[xR

xL

]. (5.9)

As the two matrices L and H represent transfer functions, the system transfer functionbecomes

y = H · L · x ⇒[yR

yL

]=

[hRR hLR

hRL hLL

] [lR 00 lL

] [xR

xL

]. (5.10)

There are two ways of compensating for the undesired influences, L and H , either bycompensating for their combined response or their individual responses. Defining theplant matrix as W = L ·H , and then compensating for W will result in simplifications,such as fewer filters and only one set of equalization constraints. It is decided that inthis project the individual influences of L and H are of lesser importance, and the plantmatrix W will therefore be the target of equalization, here given by

W = H · L =

[hRR hLR

hRL hLL

] [lR 00 lL

]

=

[hRR · lR hLR · lLhRL · lR hLL · lL

]=

[wRR wLR

wRL wLL

]. (5.11)

The input-output relation can thus be written as

y = W · x . (5.12)

From equation 5.12, it is seen that to receive the desired signal x at the ears y , it isrequired that y = x . Hence it is implied that the influence of W must be removed.

Therefore, an XTC filter C is introduced to the channel model, resulting in the systemtransfer function

y = W ·C · x ⇒[yR

yL

]=

[wRR wLR

wRL wLL

] [cRR cLR

cRL cLL

] [xR

xL

]. (5.13)

This implies that in order for C to cancel out the influence of W ,

C · W = 1 ⇒C = W −1 . (5.14)

41


cRR

cRL

cLR

cLL

xR

xL

yR

yLΣ

Σ

Σ

ΣwRR

wRL

wLR

wLL

vR

vL

+

+

+

+

+

+

+

+

Figure 5.7: Channel model describing the XTC system.

Figure 5.7 shows the channel model with the XTC filter C implemented. The systemis divided in its two functional parts, the first part consisting of the blocks that can becontrolled, and the second part consisting of the blocks that can not. Thus, the signalthat will be sent to the loudspeakers can be described as

v = C · x , (5.15)

such that

y = W · v . (5.16)

Thus, when the system is implemented, v describes the system output. The realizationof C requires matrix inversion, this topic will be analyzed in the following sections.

5.3 XTC Source Strength

In this section the needed source strength for an XTC implementation is derived on basisof a point source model. The source strength corresponds to the filter gain of the XTCsystem. The source strength can be used to set the lower limit of the frequency range forthe XTC system. First, a generic model is evaluated on basis of the geometric definitionsin section 5.1.1 under the assumption that the source is a point source.

To derive the source strength needed for an XTC implementation, the channel modeldescribed in section 5.2 is considered. Equation (5.11) describes the signal from thesystem input to the ears as

y = W · x ⇒[yR

yL

]=

[wRR wLR

wRL wLL

] [xR

xL

]. (5.17)

Assuming ideal loudspeakers in a free field environment, the pressure in a given point atdistance l from the loudspeaker can be described as

p =ρ

4πl, [Pa] (5.18)

42

5.3 XTC Source Strength

where ρ is the density of air [Møller, 1987, p. 28]. For a sine wave, the phase must beincluded to describe the signal. The phase is described from the signal frequency and thedistance to the source using complex notation, such that a signal at a distance l in spacecan be described as

p =ρ

4πle−jkl . [Pa] (5.19)

The expression for the sound pressure can be put into the channel model equation de-scribed in equation (5.17). If the setup is symmetric where lRR = lLL and lRL = lLR,equation (5.17) can be rewritten as

W =ρ

4π

[e−jklRR/lRR e−jklLR/lLR

e−jklRL/lRL e−jklLL/lLL

]

=ρ

4π

[e−jklRR/lRR e−jklRL/lRL

e−jklRL/lRL e−jklRR/lRR

]

=ρe−jklRR

4πlRR

[1 ge−jk∆l

ge−jk∆l 1

]. (5.20)

In the last reduction, ge−jk∆l corresponds to the difference in the signal that is receivedbetween the ipsilateral ear and the contralateral ear. The signal is reduced by a factorof g and delayed by e−jk∆l. Equation (5.20) consists of a constant and a matrix. Theconstant can be seen as the gain of the source signal and the matrix as the crosstalkinfluence. In order to invert equation (5.20) only the matrix needs to be inverted. Assuch, the constant is considered a part of the input signal x, and the plant matrix W isreduced to

W =

[1 ge−jk∆l

ge−jk∆l 1

]. (5.21)

5.3.1 Matrix Inversion

To perform XTC, C must be computed from equation (5.14). Thus, W −1 is calculatedto

W −1 =1

1 · 1 − ge−jk∆l · ge−jk∆l

[1 −ge−jk∆l

−ge−jk∆l 1

]

=1

1 − g2e−2jk∆l

[1 −ge−jk∆l

−ge−jk∆l 1

](5.22)

=1

1 − g2e−jn·π

[1 −ge−jn·π/2

−ge−jn·π/2 1

], (5.23)

43


where the last simplification is made using equation (5.6) from section 5.1.2. Equa-tion (5.22) gives a hint about problems that might occur during the matrix inversion. If∆l is close to zero, then g becomes close to 1. This makes the gain 1

1−g2e−2jk∆l → ∞,which is undesired. The situation corresponds to the loudspeakers being positioned closeto each other making the transfer function from one loudspeaker to each of the earsnearly identical. To overcome this problem the loudspeakers should be separated fromeach other with a large span. In this project the tweeters are suggested to be positionedwith little span. However, since the wave number k is large for high frequencies, theproblem does not get out of hand even though ∆l is small.

5.3.2 Filter Gain

The extra gain resulting from the inversion is evaluated next as the l2-norm of the C

matrix. The l2-norm gives the magnitude of a matrix, which in this case will equal thefilter gain. The l2-norm can be calculated as the largest of the singular values of a matrix[MathWorks, 2008]. The singular values of C are calculated in appendix A to

σC1,C2 =1√(

1 + ge−jnπ/2) (

1 + gejnπ/2) ∧ 1√(

1 − ge−jnπ/2) (

1 − gejnπ/2) (5.24)

Figure 5.8 shows the l2-norm as a function of n. From the figure, the same conclusionmade in section 5.1.2 can be drawn. The filter gain needed is minimum when n is oddand maximum when n is even.

2 4 6 8 10−10

−5

0

5

10

15

20

25

n [−]

Filt

er g

ain

[dB

]

σ

C1

σC2

Norm of C

Figure 5.8: The needed source strength as a function of n. It is easily seen that where n is odd, therequirements for source strength are at minima. The source strength corresponds to the norm of C .

5.3.3 Condition number

As W is inverted and its norm becomes large, this suggests that the matrix is ill-conditioned. When a matrix is so, the result of inversion is vulnerable to small round-off

44

5.4 Matrix Inversion Methods

errors caused by the numerical precision used. How ill-conditioned a matrix is, can beexpressed by its condition number. The condition number is defined as

κ =σmax

σmin, [-] (5.25)

where σmax is the largest singular value and σmin is the smallest. The higher the conditionnumber the more ill-conditioned the matrix is, and the greater the influence of smallrounding errors is.

2 4 6 8 100

5

10

15

20

25

30

35

n [−]

Con

ditio

n nu

mbe

r [d

B]

Figure 5.9: Condition number as a function of n for the singular values calculated in section 5.3. Whenn is even, the matrix is the most ill-conditioned. The figure shows the condition number for the pointsource model calculated in section 5.3.2.

5.4 Matrix Inversion Methods

This section analyzes how to calculate a solution to the XTC filter C presented in thechannel model. Two solution methods will be investigated.

First, the principle of the methods are presented, and one of these is selected based onprevious research. Afterwards the chosen method is explained in detail, which makes itpossible to set parameters and utilize the method to calculate the XTC filter C .

5.4.1 Generic Crosstalk Cancelation Method

The generic crosstalk canceler (GCC) takes starting point in an exact solution to theinverse matrix problem. This solution will be a matrix with elements that are expressed onbasis of the measured transfer functions in W. However, this matrix contains singularitiesand can only be approximated. These approximations are based on minimum-phasefilters, and the excess-phase of the transmission path is only described by a linear delay.[Lacouture, 2008, p. 3]

45


5.4.2 Least Square Methods

The least square (LS) methods are based on optimizing the filter by means of a costfunction. The cost function expresses the error compared to the exact W −1 solution, sothe minimum of the cost function results in an optimal filter solution. Such filter solutionmight have large peaks at specific frequencies. For this reason an effort parameter is oftenincluded in the cost function, in order to penalize the high gain. Hereby, the optimalsolution minimizes both the error and gain requirements.

The LS solutions can be found in both the frequency domain (LSfreq) and in the timedomain (LStime), but based on the same idea. The solution to an LStime approach shouldyield better performance, but obtaining the solution requires more computations.[Lacouture,2008, p. 4]

5.4.3 Performance of Methods

To choose the most suitable solution method for this project, it is investigated how theyeach perform. This will be based on previous work conducted in [Lacouture, 2008]. Inthis proceeding, the performance of the GCC, LSfreq and LStime was simulated for severaldifferent loudspeaker configurations and filter lengths. The simulated filter lengths variedfrom 128 to 1024 taps using a sample rate of 48 kHz, i.e. filter length of 2.7 ms to 21.3 ms.The loudspeaker configurations varied in amount of loudspeakers, their span and theirelevation. For this project, only the results from using two loudspeakers are considered.

There was no constraint to the amount of filter gain required, but regularization wasused, and it was in that respect ensured that the filter solutions became numericallystable. Furthermore, the simulations assumed matched HRTF and optimal sweet spotposition, so the results will be assumed to be the best case performances. In table 5.1the most relevant results are summarized. These are average values interpreted from thearticle. It is seen that the LS methods provide the best performance. The article suggests

Method GCC LS

Channel separa-tion (CHSPxAvg)

-20 dB to -30 dB, very littleincrease with longer filters.

-30 dB to -70 dB, highly in-creasing for the long filters.

Performance er-ror (PExAvg)

3 dB to 0.7 dB, smaller errorwith longer filters.

1.5 dB to 0.5 dB, smaller errorwith longer filters.

Loudspeakerplacement

Large influence on perfor-mance. Well-conditioned ma-trix required.

Little influence, always gives agood solution.

Table 5.1: Summary of the results obtained in [Lacouture, 2008] with the two different solution methods.The results of LSfreq and LStime are gathered, since they performed similar.

that the GCC performs worse due to the minimum-phase approximation, which can only

46

5.5 Analysis of LS Method

be accepted for loudspeaker placements that provide well-conditioned matrix problems.The loudspeakers of this project are placed according to the OSD principle investigated insection 5.1, and for this reason the difference between the two methods might be smaller.

The LStime should be able to outperform the LSfreq solution, but the improvements arenegligible if the filters are of sufficient length. For the two-channel case, even the shortestfilter length of 128 taps (5.3 ms) seemed to be a sufficient length for differences to benegligible. Configurations using four channels seemed to require longer filters, since LStime

showed larger improvement. The LStime seems especially suitable if the loudspeakerconfiguration will cause the filter length to be an issue.

5.4.4 Choice of Method

It has been chosen to use the LS method. This is due to the fact that it provides betterperformance, and it is expected to be more suitable for inclusion of extra requirementsto the solution. Extra requirements originate from the need to optimize the performancein multiple head positions, so the requirement of sweet spot size can be obtained.


This section describes the LS method in such detail that it will be possible to specifyparameters, and find a solution for a filter.

First, a block diagram is used to derive the solution, then the differences between thesolution in frequency and time domain are explained. Afterwards the performance thatcan be expected for different filter lengths is presented, and the section ends with dis-cussion of how to design frequency dependent regularization. Details regarding modelingdelay and guidelines for filter length can be found in appendix B, which ends up with anactual flow to carry out for a practical synthesis of the filters.

5.5.1 Block Diagram for LS Optimization

It is chosen to explain least square from its z-domain representation (i.e. LSfreq) since it iseasier to explain for discretized frequencies than for time signals. The overall procedure isto find the frequency response for XTC-filter solution, which yields the optimal solutionin least square sense. Afterwards the filters are transformed into time domain. Thisis explained using the block diagram in figure 5.10. Using the block diagram, the costfunction will be expressed and by deriving the minimum cost, the optimal solution isexpressed.

The vector d(z) represents the signals desired to received in R locations, i.e. at thetwo ears. However, if performance is desired in more than a single head location, it can

47


C(z)SxT

XTC-filter

z-m

Modeling delay

A(z)RxT

Target transfer

W(z)RxS

Plant

x(z)T

Input signalse(z)

R

Error signals

v(z) S

Loudspeaker signals (effort)

y(z)

d(z)

Reproduced signals

Desired signals

Figure 5.10: Diagram of the least square solution with error signals.

be required to specify more locations. The mapping of the T input signals x (z) to thedesired signals d(z) is specified in the target transfer matrix A(z). As an example it isdesirable to transfer the left channel to left ear and not to right ear. In the case whereonly two positions (both ears) are considered, this will make A(z) an identity matrix.

A solution involving an inversion of a room response is bound to be either unstable ornon-causal [Kirkeby et al., 1998, p. 2]. Since the solution will be expressed as an FIRfilter, it will be a stable solution which is found, hence it will be non-causal. Therefore amodeling delay is necessary to make the solution causal. For convenience the modelingdelay is omitted in the further derivations, since it only shifts the samples, and it shouldnot make any difference as long as it is remembered to apply it as the last step in thesolution. For the intermediate steps the modeling delay can be regarded as part of theA(z) matrix.

The signals that are reproduced at the R positions depend on the XTC-filter and howthe plant transfers the input signals. The plant matrix contains the transfer functionfor each of the S loudspeakers to each of the R locations. The error signal vector, e(z),expresses the differences between the desired and the reproduced signals by

e(z) = d(z) − y(z) , (5.26)

and the cost function is the power of this vector, which is given by

J(z) = e(z)He(z) = (d(z) − y(z))H (d(z) − y(z)) . (5.27)

By inserting terms of figure 5.10 into equation (5.27), it becomes

J(z) = (A(z)x (z) −W (z)v (z))H ((A(z)x (z) −W (z)v (z))) . (5.28)

Regularization

It will often be desirable to regularize the filter obtained, since exact filter inversionmight overload the loudspeaker. Regularization will smooth the effect of dips in thefrequency response, which is desirable since the location of the dips tends to variatewith each response. For this reason the effort penalty is added to the block diagram.

48


This is seen in figure 5.11, where the loudspeaker signals are fed to the cost function.The loudspeaker signals are fed through a shape factor B(z) to make the effort penaltyfrequency dependent.

The scaling factor β can be adjusted between 0 ≤ β ≤ ∞ to weight the solution, suchthat it is optimal with respect to performance error and filter gain. The value of β shouldalways be positive to ensure the effort is decreased. A more detailed analysis for deter-mining the regularization factors is given in appendix B. After applying regularization,the cost function hereby becomes

J(z) =e(z)He(z) + β(B(z)Hv(z)HB(z)v (z))

= [A(z)x (z) −W (z)v (z)]H [A(z)x (z) −W (z)v (z)] . . .

+ β[B(z)Hv(z)HB(z)v (z)

]. (5.29)

C(z)SxT

XTC-filter

z-m

Modeling delay

A(z)RxT

Target transfer

W(z)RxS

Plant

x(z)T

Input signals

e(z)

R

Error signals

v(z) S

Loudspeaker signals (effort)

y(z)

d(z)

Reproduced signals

Desired signals

(z)

Regularization shape

Square

Square

Cost output

Regularizationscaling

Figure 5.11: Expansion of the basic block diagram given in figure 5.10. The entire cost function is nowincluded. Notice that the filter gain, expressed through v(z), for the loudspeakers are also included as acost, hence a large gain is penalized.

Optimal Solution

From the cost function the minimum cost will be found by differentiation. The filterparameters that yield a differentiated cost equal to zero, expresses the optimal solution. Inequation (5.29) the filter coefficients are expressed through v(z), hence the cost functionis differentiated with respect to v(z), i.e. ∂j (z)

∂v(z) . The optimal values of v(z) are found tobe

v (z) =[W (z)HW (z) + βB(z)HB(z)

]−1W (z)HA(z)x (z) . (5.30)

49


Since v(z)=C (z)x (z), the filter solution is given by

C (z) =[W (z)HW (z) + βB(z)HB(z)

]−1W (z)HA(z) . (5.31)

Considering the case of a single sweet spot and only having two input channels, allmatrices will have dimension 2x2. The target matrix A(z) is furthermore reduced to anidentity matrix, hence A(z), can be excluded from equation (5.31). A C (z) providingsuch unity form it is possible to make any desired mapping with a target matrix A(z).It should be noted that the LSfreq optimizes points on the frequency axis with linearspacing.

5.5.2 Time Domain LS

The LS solution in time domain (LStime) can be presented using block diagrams similar tofigure 5.10 and 5.11. The difference is, that the underlying equations express a convolutionin time domain instead of a multiplication in frequency domain. This can basically beformulated as

v[n] ∗ w[n]FT→ v [k] · w [k] . (5.32)

The LSfreq can express the output at a specific frequency as a simple multiplication, whereLStime does all the multiplication of the FIR filter. This results in an increased amountof computations needed for deriving a solution. In practice FIR filtering is done throughconvolution matrices, which in matrix notation is expressed as

v[n] ∗w[n] =Wrs · v[n]

=

wrs[0]wrs[1] wrs[0]

... wrs[1]. . .

wrs[Nw − 1]...

. . . wrs[0]

wrs[Nw − 1]... wrs[1]

. . ....

wrs[Nw − 1]

v[0]v[1]v[2]...

v[n]

(5.33)

It can be seen that Wrs contains samples from a measured impulse response vectorwrs, which has a length of Nw samples. Furthermore, it is noticed that the structure isToeplitz.

50


Time Domain Solution

The system of equations is basically the same as for LSfreq but convolution matrices areused instead of frequency response vectors. This is expressed as

W =

W11 . . . W1S

... Wrs...

WR1 . . . WRS

, (5.34)

where r indicates a specific location, s a specific loudspeaker. Similarly the XTC filtermatrix becomes

C =

c11 . . . c1T... cst

...cS1 . . . cST

, (5.35)

where t indicates a specific target, and each cst is an impulse response from an inputsignal to a loudspeaker. To ease the overview, the solution is typically segmented byfinding a solution for each of the input signals separately. This means one column of C

is found, which can be arranged as a stacked vector of impulse responses by

ct =

c1t[0]c1t[1]

...c1t[Nc − 1]

c2t[0]...

cRt[Nc − 1]

. (5.36)

If a frequency-constant regularization is used, this allows a solution similar to equa-tion (5.31), expressed as

ct =[WHW + βI

]−1WHdt , (5.37)

where dt is a stacked vector of desired mappings, which are defined from the matrix A

and a modeling delay. [Kirkeby et al., 1996]

Frequency Dependent Regularization

If a frequency dependent regularization is desired, the impulse response b must first bedetermined. From b a convolution matrix B should be generated the same way Wrs wasmade. This will be put into a stacked matrix B as

B =

B

. . .

B

. (5.38)

51


Thus, the LStime solution becomes

ct =[WHW + βBHB

]−1WHdt (5.39)

[Lacouture, 2008]. [Kirkeby and Nelson, 1999]

5.5.3 Performance According to Filter Length

The needed filter length depends on the performance required. In table 5.2 the perfor-

(LSfreq / LStime) CHSPxAvg[dB] PExAvg[dB]

Filter length [taps] 2 Ch. 4 Ch. 2 Ch. 4 Ch.

128 -30/-32 -33/-41 1.2/1 2/0.5256 -39/-40 -45/-54 0.8/0.8 0.6/0.3512 -51/-52 -58/-57 0.4/0.4 0.4/0.31024 -80/-79 - 0.3/0.3 -

Table 5.2: The performance of LS solutions for XTC systems using different filter lengths. The per-formance of both LSfreq and LStime are presented as well as both two and four loudspeaker channelconfigurations are. The results are from [Lacouture, 2008]

mance of an XTC systems is shown for different filter lengths. It must, however, be notedthat these results originate from a study, where it was assumed that the loudspeakershad a flat frequency response, i.e. the XTC filter only removes the effect of the HRTFs.For this reason the XTC system of this project is expected to yield a lower performance.

5.5.4 Frequency Dependency of the Regularization

It is expected that better filter solutions can be obtained if the regularization is appliedusing a frequency dependent shape factor instead of a regularization, which is constantfor all frequencies (B(z)=I). Some parts of the frequency spectrum are expected to beproblematic to invert and require a high regularization factor. However, the regions thatare not problematic should not be penalized by such regularization, since it would resultin negligible equalization in these regions. This suggests making a shape factor which isonly high in the problematic regions. The following parts of the frequency spectrum areexpected to be problematic:

• The frequency range outside the systems reproduction range:Below the reproduction range the system gives a low output, and require a largegain. This range is not used, hence this range should be increased in the shapefactor. Above the reproduction range the transfer function is very dependent onthe listening subject and position. A solution yielding an overall improvement isconsidered better, hence the shape factor should be increased within this range.

52


• The frequency range outside the passband of the individual loudspeaker:The frequency range of the XTC system was divided accordingly to the OSD prin-ciple, since some parts of the range were problematic for a loudspeaker at specificangles. This implies that the shape factor should be individual for the outer andinner loudspeaker. The shape factor of the individual loudspeakers should be sethigher outside the range where it is intended to play.

Two articles using frequency depended regularization have been found [Lacouture, 2008],[Kirkeby and Nelson, 1999]. From these articles it seems that the typical approach is todesign and specify the shape as a simple highpass filter, with a cutoff frequency wherethe inversion is known to be problematic. This implies that almost no regularization wasused within the remaining frequency range. However, none of the articles have directlystated guidelines of how to shape the regularization.

53

Chapter 6Implementation

An analysis of XTC and loudspeaker setup has been presented in the previous chapters.This leads to the actual implementation presented in this chapter. The platform used forimplementation will be MATLAB, and the XTC system will be implemented withoutmeeting any real-time demands.

Before describing the actual implementation of the XTC system, some issues must bedealt with. First, the low frequency limit of the system is found. Based on the obtainedfrequency range, a bandpass filter is designed together with a description of the pre-processing of a monaural signal as input to a binaural signal. Afterwards, the design of thecrossover network is described. Next, possible implementation models are proposed anddescribed, and the considerations regarding the implementation algorithm are discussed.Conclusively, this leads to the final implementation specifications, which will be usedfurther on for simulation and verification.

6.1 Frequency Range and Loudspeaker Span by OSD

In this section the low-frequency limit of the XTC system will be determined. The limitis found as the lowest frequency where the chosen loudspeaker can deliver at least 94 dBSPL re. 20 µPa at the ears of the listener. The bass’ are positioned at the largestpossible span according to the requirement specification. When the low-frequency limitis specified, the frequency range for the bass and the tweeter can be determined. Thecalculations are made in the following order:

• Compute the low-frequency limit for the XTC system from the span of the bass’.

• Compute the crossover frequency between the bass and tweeter.

• Compute the span of the tweeters.

6.1.1 Low-Frequency Limit

First, the low-frequency limit for the XTC system is computed. Figure 6.1 shows theloudspeaker gain reserve. Here defined as the additional sound pressure, the loudspeakeris able to deliver at the ear, relative to the required 94 dB SPL re. 20 µPa.

55

6 Implementation

102

103

104

−30

−20

−10

0

10

20

30

Frequency [Hz]

Gai

n re

serv

e [d

B r

e. 1

Pa]

Figure 6.1: Loudspeaker gain reserve calculated from the response in figure 4.3 subtracted the required94 dB. Only at frequencies where the values are positive, the loudspeaker is able to deliver a sufficientamount of sound pressure.

At the lowest frequencies the loudspeaker is unable to produce a sufficient sound pressure.Hence, the loudspeaker cannot fulfill the requirement at these frequencies. As seen insection 5.3, the XTC system puts in an additional gain requirement. In figure 6.2 thisadditional requirement is depicted for the low-frequency region.

102

103

−5

0

5

10

15

20

Frequency [Hz]

Filt

er g

ain

[dB

]

Figure 6.2: Additional filter gain needed forlower frequencies, when the loudspeakers are po-sitioned at maximum span (67.4).

102

103

−50

−40

−30

−20

−10

0

10

20

Frequency [Hz]

Loud

spea

ker

gain

res

erve

[dB

]

Figure 6.3: The low-frequency limit can be de-termined as the zero-crossing at 158 Hz.

Subtracting needed gain in figure 6.2 from the gain reserve in figure 6.1, the low-frequencylimit can be found as the zero-crossing in figure 6.3.

The zero crossing is at 158 Hz, which is rounded to a low-frequency limit of 160 Hz.This yields an XTC frequency range from 160 Hz to 6 kHz. This is the theoretical limit,which is found somewhat questionable, since the loudspeakers are 4" units. However the

56

6.1 Frequency Range and Loudspeaker Span by OSD

demand for 94 dB SPL at the ear of the listener is considered so high that if the actualreproduction capabilities are lower, this is not a problem. Hence, the frequency limitfound is used throughout the project. The bass’ are positioned at a span of Θ = 67.4.

6.1.2 Crossover Frequency

From equation (5.7) page 38 the corresponding nb,Low value can be found as

nb,Low =4 · ∆b0

(1 + Θ/2

90

)· sin

(Θ2

)· f

c⇒ (6.1)

nb,Low =4 · 0.14

(1 + 33.7

90

)sin (33.7) · 160

343⇒

nb,Low = 199 · 10−3 , (6.2)

where ∆b0 is determined average head width measured just below the temple. As seen infigure 5.8 page 44, the filter gain needed is mirrored around n = 1. Hence, the bass’ areassigned the frequency interval from n ± vb, where vb is the size of the interval on eachside of n = 1. From equation (6.2) the value of vb is found to

vb = 1 − nb,Low ⇒vb = 801 · 10−3 . (6.3)

The upper frequency limit of the bass is calculated from nb,High = 1 + vb = 1.801 usingequation (6.2) to 1.446 kHz which is rounded to 1.45 kHz. The rest of the frequencyrange up to 6 kHz must be covered by the tweeters.

6.1.3 Tweeter Span

The angle at which the tweeters must be placed can be found from the linear centerfrequency of the band they must cover. The center frequency is 3.725 kHz and must beset to correspondingly to n = 1. From equation (5.6) the angle can be calculated to 8.6,corresponding to a span of Θ = 17.2. The nt,Low value can be found from the lowerfrequency limit for the tweeters and through equation (6.1) as

nt,Low =4 · f · ∆l

c⇒

nt,Low =4 · 1450 · 0.14

(1 + 8.6

90

)sin (8.6)

343⇒

nt,Low = 388 · 10−3 . (6.4)

This corresponds to a vt = 1 − 388 · 10−3 = 612 · 10−3. The filter gain needed, can becomputed using equation (5.24) to 4.8 dB for tweeters and 10.1 dB for bass’. The filtergain as a function of n for bass’ and tweeters are depicted in figure 6.4.

57

6 Implementation

0.5 1 1.5−4

−2

0

2

4

6

8

10

12

n [−]

Filt

er g

ain

[dB

]

BassTweeter

(a) Filter gain as a function of n.

0 20 40 60 80 10010

2

103

104


Fre

quen

cy [H

z]

Region of n

t

Region of nb

Tweeter rangeBass range

(b) Frequency range as a function of the loud-speaker span.

Figure 6.4: Interval of n and frequency for both bass and tweeter. The bass covers a larger interval ofn (frequency range) than the tweeter.

Table 6.1 summarizes the loudspeaker spans and frequency limits of the application,which will be needed for implementation of the XTC system.

Application specifications:

Bass span 67.4

Tweeter span 17.2

Lower frequency limit 160 HzCrossover between bass and tweeter 1.45 kHzUpper frequency limit 6 kHz

Tweeter filter gain for XTC 4.8 dBBass filter gain for XTC 10.1 dB

Table 6.1: Specifications for loudspeaker setup and frequency range.

6.2 Bandlimited Binaural Reproduction

It is desired to present localization cues at the ears of the listener through binauralsignals played by loudspeakers in an XTC system. It is decided to bandlimit the signalsas discussed in chapter 4. Figure 6.5 shows a block diagram with the flow from a monauralinput signal to the binaural signals.

Thus, a monaural signal must be filtered through the bandpass filter and afterwardsfiltered with an HRTF corresponding to the desired perceptual direction of the signal.

58

6.3 Crossover Network Design

um

hvirt,R

hvirt,L

xR

xL

Figure 6.5: Implementation of bandpass filter m and binaural filter H virt, which is common for allXTC models. The binural synthesis serve as pre-processing for the subjective listening test, whereas thebandlimitation is required for the XTC system.

The process is expressed as[xR

xL

]=

[hvirt,R

hvirt,L

]m · u . (6.5)

The bandpass filter, M , is implemented with cutoff frequencies flowcut = 160 Hz andfhicut = 6 kHz according to the frequency range of the XTC system. An FIR filter oflength 1000 is used for this purpose created from the window method using a Hanningwindow. The HRTFs used for filtering are 256 samples long and made at a samplerateof 48 kHz. Thus, it is from here chosen to implement the whole system at the samesamplerate. After the initial processing of the monaural signal, the resulting binauralsignal must be filtered through the XTC system to obtain signals for the loudspeakers.

6.3 Crossover Network Design

As decided, a two-way loudspeaker setup implements the OSD principle for XTC. Onepair of closely spaced tweeters, and a pair of widely spaced bass’. Therefore, a crossovernetwork is required to divide the frequency range between these two pairs of loudspeakers.

The crossover frequency was set to 1.45 kHz. Thus, the crossover network must consistof a lowpass and a highpass filter. These filters are designed from the requirement of astopband attenuation of 40 dB and a transition band one octave wide. It was furthermoredesired to obtain a flat frequency response when cascading the two filters, so it was decidedthat the filters should be same roll-off and have the 6 dB amplitude attenuation in thecrossover frequency.

The filters were realized using FIR filters of length 100 and created using a Hanningwindow. The lowpass and highpass filter frequency responses are shown in figure 6.6a.A parallel-coupling of the two filters results in a flat frequency response with two minordeviations as seen in figure 6.6b.

59

6 Implementation

0 1000 2000 3000 4000 5000 6000−100

−80

−60

−40

−20

0

Frequency [Hz]

Mag

nitu

de [d

B]

(a) Frequency responses of the low- and high-pass filters.

0 1000 2000 3000 4000 5000 6000

−4

−3

−2

−1

0

1

2

3

4

Frequency [Hz]

Mag

nitu

de [d

B]

(b) Frequency response of the sum of the twofilters.

Figure 6.6: Filter characteristics for the crossover network.

6.4 Model Propositions

Three proposed models for implementation follows, two of which require the designedcrossover network, and one model that fully relies on the LS method to design the XTCfilters and any necessary crossover network.

6.4.1 Implementation Model A

To implement the OSD principle using four loudspeakers, the crossover network wasdesigned to split the frequency range in a low frequency part and a high frequency part.An implementation model is proposed using two XTC filters, one for handling each partof the systems frequency range. This implies that the input signal x is first filteredby the crossover network N , creating low- and highpass filtered components of x , thatafterwards are filtered by the respective XTC filters resulting in the loudspeaker inputsv . A block diagram for this model is shown in figure 6.7.

As earlier mentioned, the total influence is denoted by the plant W that comprises ofH and L. To calculate the XTC filters, all four loudspeakers in the plant W must bemeasured individually. Separating the plant into a tweeter and bass plant, yields thematrices

W LF =

[wRR,LF wLR,LF

wRL,LF wLL,LF

](6.6)

W HF =

[wRR,HF wLR,HF

wRL,HF wLL,HF

]. (6.7)

As previously described, there will be two XTC filters denoted C LF and CHF respectively

60


LFXTC

HFXTC

Crossovernetwork

( )N

Input( )x

2

2

2

Acousticalpath( )H

Output( )y

Loud-speakers

( )L

Crosstalkcancelation

( )C

vR,LF

vR,HF

vL,HF

vL,LF

W

Figure 6.7: Implementation model A using two XTC filters.

and thus it is possible to write the system transfer function

y = (W HF · CHF · N HF + W LF ·C LF ·N LF) · x . (6.8)

Figure 6.7 as well as equation (6.8) show that realization of this model requires designand implementation of two XTC filters. This implies that during filtering, the systemmust run two LS algorithms for every input sample.

6.4.2 Implementation Model B

The alternative to implementing an XTC filter for each frequency range, is to implementa single XTC filter for the entire system frequency range. This implies, that the inputsignal x contrary to model A is first filtered by the XTC filter and then split by thecrossover network into four loudspeaker inputs v . A block diagram for this model isshown on figure 6.8.

For this model, the plant matrix W includes N , however in this model the blocks areconsecutively placed in the system and it follows that W = H ·L·N . Thus, the crossovernetwork is included in the measurement of W . Since the crossover filters are included inthese measurements the loudspeakers will only reproduce the intended frequency range.Therefore, measurements are simplified to measuring the plant one side at a time resultingin two measurements compared to the four required for model A. The resulting responsesare given by

W =

[wRR wLR

wRL wLL

]. (6.9)

61

6 Implementation

XTCOutput

( )y

CrossoverNetwork

( )N


( )C

Acousticalpath( )H

Loud-speakers

( )L

vR,LF

vR,HF

vL,HF

vL,LF

W

Input( )x

2

Figure 6.8: Implementation model B using a single XTC filter.

From this, the system transfer function is given by

y = W · C · x ⇒ (6.10)[yR

yL

]=

[wRR wLR

wRL wLL

] [cRR cLR

cRL cLL

] [xR

xL

], (6.11)

where

wRR = hRR,LF · lR,LF · nLF + hRR,HF · lR,HF · nHF . (6.12)

Expanding figure 6.8 based on equation (6.11), the extended block diagram is given infigure 6.9. The crossover network is shown as separate blocks as well as H and L forillustration of the whole model.

Conclusively, by comparison of equation (6.8) and (6.10), it is seen that the transferfunction for model B is simpler than that of model A. Furthermore, model B containsonly one XTC filter and thus results in a more computationally efficient implementation.

62


cRR nLFW

Σ

Σ

Σ

Σ hRR,LF

hRL,LF

hRR,HF

hRL,HF

hLR,HF

hLL,HF

hLR,LF

hLL,LF

vR,LF

vR,HF

vL,HF

vL,LF

xR

xL

yR

yL

nHF

nHF

nLF

cRL

cLR

cLL

+

+

+

+

+++

+

++

++

lR,LF

lR,HF

lL,HF

lL,LF

Figure 6.9: Extended implementation for model B.

6.4.3 Implementation Model C

In section 5.4 it was chosen to use the LS algorithm for the computation of C , wherea cost function seeks to optimize the XTC filter by minimizing the amount of crosstalkbased on a set of equalization constraints.

This implementation model is based on letting the LS algorithm decide the frequencyrange distribution between the loudspeakers through the cost function. Thus, the de-signed crossover network is not a part of this implementation model. The model is shownin figure 6.10.

XTCOutput

( )y

Acousticalpath( )H

Loud-speakers

( )L


( )C

vR,LF

vR,HF

vL,HF

vL,LF

W

Input( )x

2

Figure 6.10: Implementation model C, where the frequency distribution is decided by the LS solution.

63

6 Implementation

For this model, the plant is measured the same way as for model A. Since there is onlyone XTC filter, there will only be one W matrix containing all measured responses givenby

W =

[wRR,HF wRR,LF wLR,LF wLR,HF

wRL,HF wRL,LF wLL,LF wLL,HF

]. (6.13)

This leads to a system transfer function

y = W ·C · x ⇒

[yR

yL

]=

[wRR,HF wRR,LF wLR,LF wLR,HF

wRL,HF wRL,LF wLL,LF wLL,HF

]

cRR,LF cLR,LF

cRR,HF cLR,HF

cRL,HF cLL,HF

cRL,LF cLL,LF

[xR

xL

]. (6.14)

This expression can as with model B, be used to expand figure 6.10 into a full implemen-tation block diagram as depicted in figure 6.11.

Σ

Σ

Σ

Σ Σ

Σ

wRR,LF

vR,LF

vR,HF

wRL,LF

wRR,HF

wRL,HF

vL,HF

wLR,HF

wLL,HF

vL,LF

wLR,LF

wLL,LF

cRR,LF

cRL,LF

cRR,HF

cRL,HF

cLR,HF

cLL,HF

cLR,LF

cLL,LF

xR

xL

yR

yL

+

+

+

+

+++

+

++

++

+

+

+

+

Figure 6.11: Extended implementation for model C.

Summary

Three models were proposed, two implementing the designed crossover network based onthe OSD principle, and one relying on the LS algorithm to divide the frequency rangebetween the loudspeakers. Model A requires two separate XTC filters and is thereforeconsidered more complex than model B, where only one XTC filter is required. Thus, itis decided to implement and simulate model B and C to compare their performance.

64

6.5 Plant Measurements

6.5 Plant Measurements

Before an implementation can be made, the plant of the specified loudspeaker setup mustbe measured. The setup is made in an anechoic chamber to avoid reflections from thewalls. However, during measurements the setup for the subjective test will likewise belocated in the room. By this, it means the chair, monitor, curtains etc. are placed asthey would be during the subjective test to be performed later on.

This section describes the measuring positions and presents selected results from themeasurements. For more details on these measurements, measurement report appendix Con page 127 can be studied.

6.5.1 Measuring Positions

A total of 16 measuring positions have been selected. The first six are positioned on theedges of the of the desired sweet spot size, where one of these define the optimal sweetspot. This position is denoted position 1. The positions are used throughout the reportas reference of the position of a listener or measurement. Each point represents the centerof the head.

10 cm

5 cm

12 6 1610

89 7

4 5 1511

1312 14

3

Figure 6.12: Measurement positions, where each point represents the center of the head.

6.5.2 Plant Responses

In the following, the frequency responses of the channel transfer functions are shown.These can be put together to from the plant W. A reference measurement was madesuch that it was possible to obtain the HRTFs of the head and torso simulator. A plotof the HRTFs for position 1 can be found in the measurement report. In figure 6.13 theplant frequency responses are shown for the right ear at position 1. As seen, the plantstarts having large fluctuations above 6 kHz. Likewise, the loudspeaker roll-off towardthe low frequencies is clearly seen. The averages of the plant analyzed in one-third octavebands including respective standard deviations are seen in figure 6.14. As noticed andexpected, the standard deviation increases with the frequency.

65

6 Implementation

102

103

104

30

40

50

60

70

80

90

100

110

Frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]

Loudspeaker 1Loudspeaker 2Loudspeaker 3Loudspeaker 4Loudspeaker 5Loudspeaker 6

Figure 6.13: Frequency response of the right ear at position 1. The loudspeakers are denoted 1 to 4from right to left. Loudspeaker 5 is the two right side loudspeakers measured together and loudspeaker6 is the combined measurement of the left side. Loudspeaker 5 and 6 utilizes the designed crossovernetwork.

63 80 100 125 160 200 250 315 400 500 630 800 1k 1.25k1.6k 2k 2.5k3.15k 4k 5k 6.3k 8k 10k12.5k55

60

65

70

75

80

85

90

95

100

Center frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]

Figure 6.14: Average and standard deviation of the plant for loudspeaker 1, position 1-6, analyzed inone-third octave band. The whiskers are one standard deviation to each side of the average.

66

6.6 Implementation Algorithms

6.6 Implementation Algorithms

As the structure of the system has been found, the implementation of the desired solutionmethod must be found. This section will clarify the elements needed for implementationand likewise specify possible parameter values.

Two different solution methods were presented in section 5.4, where the LS methodwas chosen for implementation. Through an analysis of the LS method in section 5.5complemented by appendix B, the benefits and disadvantages of the LSfreq and LStime

algorithms were discussed. It is, however, desired to implement only one of the algorithmsdue to time constraints.

It was learned that LStime generally performs better with low filter lengths than LSfreq

does. However, this implementation is not required to meet any realtime demands.Longer filters greatly increases the performance of both algorithms, and their performancebecomes more similar too. Since the best performance is desired within reasonable limitsof the filter length, the advantage of LStime is of no importance. The LSfreq algorithmis preferred to be implemented in MATLAB to avoid the large convolution matrices ofthe LStime algorithm. Besides the greatly increased computational power needed, thelimitations of memory available might be met. Thus, the LSfreq algorithm is chosen forimplementation in the XTC system.

6.6.1 Defining Target Matrix and Modeling Delay

As learned, the target matrix defines the mapping of the signal, i.e. the desire of onlyhearing right channel in right ear and vice versa. For a given number of receiver positionsN , the size of the target matrix is 2N ×2, where each position consists of a 2×2 identitymatrix. Each receiver position corresponds to two receiver points. Thus, the targetmatrix becomes

AN =

I2×2,1

I2×2,2...

I2×2,N

. (6.15)

As mentioned in appendix B, the modeling delay has shown to yield best performancewhen being approximately half the filter length. Thus, an FFT shift of half filter lengthwill be performed when transforming the XTC filter back to time domain.

6.6.2 Employment of Regularization Scaling and Shape Factor

At this point, it is still not possible to specify the optimal regularization scaling, β. Thevalue of this parameter will be determined from the simulations in chapter 7.

67

6 Implementation

To obtain a frequency dependent regularization, the shape factor, B(z), was introducedin section 5.5, which would penalize, i.e. increase regularization, for frequencies outsidethe area of interest. Thus, it is chosen to let the shape factor be inspired by the frequencylimitations of the XTC system. Figure 6.15 shows two suggested shape factors.

0 1000 2000 3000 4000 5000 6000 70000

2

4

6

8

10

12

Frequency [Hz]

Mag

nitu

de [−

]

Common Shape

(a) Shape type 1: Constant regularization throughthe XTC frequency range.

0 1000 2000 3000 4000 5000 6000 70000

2

4

6

8

10

12

Frequency [Hz]

Mag

nitu

de [−

]

Low Frequency ShapeHigh Frequency Shape

(b) Shape type 2: Different regularizations accord-ing to bass and tweeter signals.

Figure 6.15: The suggested shape types to be implemented in the shape matrix B .

The shape factor in figure 6.15a has a constant regularization within the frequency limitsof the XTC system. Outside this frequency range, the regularization is ten times higherin this example. The ratio between these two regularizations will be defined as thepenalty factor. Both implementation model B and C can use the shape factor suggestedin figure 6.15a. Additionally, another shape factor illustrated in figure 6.15b is suggestedfor use in implementation model C. Here, the crossover frequency between the bass andtweeter is imposed to the solution. This is done by increasing the regularization tohalf the penalization factor for the frequencies being in the range of the tweeter whenregularization is meant for the bass and vice versa. It it chosen not to increase theregularization of this range with the full penalty factor.

6.6.3 Shortening Filter Length of Plant

As mentioned in appendix B, the requirement for XTC filter length could be decreasedby cropping the common initial delay in the plant impulse response and cropping theend of those, when the impulse response becomes insignificant. The resulting length isexpressed by Jr in equation (B.3) page 120. Figure 6.16 shows the four impulse responsescontained in the plant matrix for position two, which is defined in figure 6.12 on page 65.

The length of the gray area in figure 6.16 corresponds to the value of Jr. To find thecommon initial delay, the windowed energy of each filter is evaluated against a threshold.

68

6.7 Final Implementation Specifications

0 5 10 15

−0.2

−0.1

0

0.1

0.2

0.3

Time [ms]

Am

plitu

de [−

]

WRR

WLR

WRL

WLL

Figure 6.16: Impulse responses of plant matrix associated with implementation model B. The grayarea illustrates the part of the impulse responses used for calculating XTC filters.

The window size was appropriately set to a length of 32 samples (0.67 ms) and thethreshold value was set to -25 dB. From inspection of the impulse responses, it was foundthat they become insignificant after 512 samples (10.67 ms). Thus a plant filter lengthof 512 samples will be used for further processing.

6.7 Final Implementation Specifications

Since chapter 4, a comprehensive analysis of the XTC system has been performed leadingto more specific design parameters. Furthermore, implementation of the bandlimitedbinaural reproduction and design of crossover filters have been described together withthe proposals of implementation models and use of the chosen implementation algorithm.Table 6.2 clarifies the needed specifications for the final implementation of the XTCsystem. As noticed, table 6.2 does not include the parameter values of the XTC filterlength, the regularization scaling β and the penalty factor defined in the shape factor.These values will be found from the simulation of the XTC system, where appropriateevaluation ranges are suggested beforehand.

69

6 Implementation

Bandpass filter:

Samplerate 48 kHzFilter type, length FIR, 1000Design method Hanning windowLow cutoff 160 HzHigh cutoff 6 kHz

Crossover Network:

Samplerate 48 kHzFilter type, length FIR, 100Design method Hanning windowCrossover frequency 1.45 kHz

XTC Implementation:

Samplerate 48 kHzLow frequency limit 160 HzHigh frequency limit 6 kHzImplementation algorithm LSfreq

Implementation models B, CPlant filter length 512Shape types Type 1 for model B

Type 1 & 2 for model C

Table 6.2: Specifications for final implementation of XTC system.

70

Chapter 7Simulation

A simulation of the implemented system must be carried out in order to determine whichimplementation model to use, sweet spot size and optimal parameters. The simulationis carried out by the ”brute force”-method, which means simulating every single combi-nation possible for a given set of parameters. This is decided, since the parameters areassumed to have a non-linear dependency with each other. An analysis for determiningtheir optimal values beforehand would therefore be a too comprehensive study. Anotherpossible method for obtaining optimal parameters is to define it as a constrained opti-mization problem and employ methods from linear algebra. This, however, was regardedtoo cumbersome a process seen in the extent of the simulation to be performed.

This chapter describes what parameters to simulate and the step size of the individualparameters. Furthermore, the evaluation methods and constraints are presented, whichwill lead to the desired optimization. Conclusively, the simulation results are presentedtogether with an analysis of the parameter dependencies experienced.

7.1 Extent of Simulation

This section determines the extent of the simulation. Besides choosing what implemen-tation models and receiver positions to simulate, the range of the simulated parameterswill be given.

7.1.1 Implementation Model and Receiver Positions

Implementation Models

As both implementation model B and C were chosen for final implementation, these willlikewise be included in the simulation. This is due to the interest in which of these willyield the best performance.

Receiver Positions

A total of 16 receiver positions were recorded in the measurement report appendix Cof which the first six receiever positions enclose the desired sweet spot size. Thus, a

71

7 Simulation

combination of those six might optimize the performance within the sweet spot. Figure 7.1suggests four different configurations of receiver positions to be included in the simulation.

3

2 1 6

54

1 2 6 1

4

Figure 7.1: The four different receiver position configurations creating the plant W: (a) the position1 alone, (b) all six positions of the required sweet spot, (c) the lateral positions alone and (d) position1 and 4.

7.1.2 Range of Parameters Simulated

Filter Length

Regarding length of the XTC filter, it is previously mentioned that increasing the lengthgreatly increases the performance with regard to channel separation (CHSP). Furthermorethe application of this project is not limited by any demands for realtime. However, areasonable maximum filter length is set to be 8192 samples at a samplerate of 48 kHz.Due to choosing LSfreq as implentation algorithm, the adverse effect of wrap-around isunavoidable for short filter lengths. Thus, the minimum length is set to 512 samples,since this will be the length of the plant impulse reponses. The steps to simulate betweenthese lengths are decided to be by lengths of 2’s complement. Hence, the range is set tobe 2n with n = 9, 10, ..., 13.

Regularization Scaling β

The range of β is not easily obtained. However, from preliminary simulations it wasfound that a range between 10−6 and 100 was appropriate. As might be indicated, thesteps of the regularization scaling will be set exponentially. Thus, the range of β is setto be 10n with n = −6,−5, ..., 0.

Shape Types and Penalty Factor

As suggested in section 6.6, two different shape types were to be implemented. Type1 would be used for both implementation model B and C and type 2 would only beused with implementation model C. The use of these types should also be simulated toconclude whether type 2 would yield better performance than type 1 for implementationmodel C.

72

7.2 Optimal Parameters

The penalty factor was defined as the ratio between the bands with low regularizationand those with high regularization. It is suggested for the lower limit of the penalty factorto be equal to one, which would result in a shape matrix being identity throughout thewhole frequency range simulated. The upper limit is suggested to be 1000. Likewise thepenalty factor will be simulated with exponentially increasing steps. Thus, the range isset to be 10n with n = 0, 1, 2, 3.


This section will describe how to evaluate the XTC system with the simulated parametersand against which constraints they are evaluated. Furthermore, the desired optimizationsare presented in order to obtain the optimal parameters for the XTC system. Conclu-sively, the total simulation specifications will be summarized.

7.2.1 Evaluation Methods

The following evaluation methods will be defined and used for setting simulation con-straints.

Power of XTC filters

As described in section 5.3, the XTC filters require extra gain in order to perform properXTC. Furthermore, the equalization of the loudspeakers are included in the XTC filters,which likewise require extra gain. The power of the XTC filters expresses the gain needed.The filter power is calculated for each loudspeaker signal as

Πc,S [k] = |cS,Ipsi[k]|2 + |cS,Contra[k]|2 , [-] (7.1)

where S denotes the respective loudspeaker. The power is only evaluated throughout thefrequency range of the XTC system, since the input signals are bandlimited to this range.

Performance Error

In chapter 3 the definition of performance error (PE) was introduced through equa-tion (3.3) and equation (3.4), which express how much the ipsilateral frequency responsedeviates from a flat spectrum. As previously discussed, the regularization is heavier whenreaching the frequency range limits of the XTC system. Thus, the PE will likewise in-crease, since the effort for flattening the frequency spectrum is reduced. However, thisincreased PE is accepted as the need for higher regularization is prioritized. Furthermore,the system will not reproduce frequency content outside the frequency range, where thePE is worst.

73

7 Simulation

The PE will be evaluated as three average PEs divided into octave bands. The frequencyranges of those are specified to be:

1. band: 0.5 kHz to 1 kHz

2. band: 1 kHz to 2 kHz

3. band: 2 kHz to 4 kHz

In this way, the evaluation of PE is not affected by the worse performance in the outerregions, however it is possible to evaluate the PE in subregions. Thus, the average PEfor a given band may now be calculated as

PEx,n,Avg =1

Mn,end −Mn,start + 1

Mn,end∑

k=Mn,start

20 log10 (|PEx[k]|) , [dB] (7.2)

where n denotes the band evaluated, and x denotes either left or right ear.

CHSP

Likewise in chapter 3 the definition of CHSP was introduced through equation (3.1) andequation (3.2), which express the ratio between the crosstalk and desired signal in dB. Itis expected that the CHSP is worst in the outer regions of the XTC system. However,since the requirement for CHSP is specified for this range, i.e. 160 Hz to 6 kHz, theevaluation will be calculated from the average of the same range. The average CHSP isexpressed as

CHSPxAvg =1

M6000 −M160 + 1

M6000∑

k=M160

20 log10 (|CHSPx[k]|) , [dB] (7.3)

where x denotes either left or right ear.

7.2.2 Constraints

The following constraints will be defined in order to obtain an optimal parameter set thatfullfills these.

Constraint 1: Maximum Filter Power

To set a constraint for the maximum allowed power of the XTC filter, a deeper look intoapplication setup is required. From section 6.1 the lowest frequency limit of the XTCsystem was found, where the loudspeaker at maximum loading is still able to supply

74


94 dB SPL at the ears. The maximum load of the loudspeakers used was found to be80 W. Through equation (7.4), this loading can be expressed in voltage as

Umax = 20 log10

(√Pmax ·R

)[dBV] , (7.4)

where Pmax denotes the maximum power allowed to load the unit, and R denotes theresistance of the loudspeaker. For the loudspeaker used, Pmax = 80 W and R = 8 Ω, whichyields a total allowed voltage of 28.1 dBV. For an input voltage of 1 V, the maximumfilter power is thus 28.1 dB.

For each filter associated with the respective loudspeaker signal, the frequency having thehighest power within the XTC frequency range is used. The largest of the filter powerswill be evaluated against the maximum allowed, which was found to be 28.1 dB. Thus,the first constraint is defined.

Constraint 2: Performance Error Tolerance

The average PE is found for both ears in three bands respectively and calculated usingequation (7.2). The maximum of the three averages is compared against a tolerance. Thistolerance is chosen to be ±3 dB. If one of the averages exceed the tolerance, the currentparameter set is dicarded. The PE is only found for the receiver position at position 1,since the deviations from the flat spectrum are expected to change considerably for otherreceiever positions. Thus, the second constraint is defined.

7.2.3 Performance Optimizations

No constraint is set for the CHSP even though the requirement for the absolute sweetspot is set to be less than -12 dB CHSP. It would make no sense to set such requirement,since the desired performance of the system is to obtain the best possible CHSP. Thus, ifa given parameter set during simulation fulfills the two constraints, the current CHSP forexactly this parameter set is compared to the best CHSP of a previously chosen parameterset. This comparison is chosen to be evaluated through two types of optimizations.

• Largest sweet spot: The first is to obtain the best performance for the entiresweet spot range, i.e. the first six receiver positions. This is done by first evaluatingfor the worst CHSP found among the receiver positions. The obtained CHSP isthen compared against the currently saved CHSP as previously described, i.e. thebest CHSP of all parameter sets are chosen, which is evaluated from the worstCHSPs of the sweet spots.

• Optimal sweet spot: As a second optimization, it is desired to find the alltimebest CHSP for position 1, i.e. the best CHSP of all simulated CHSPs in position 1.

75

7 Simulation

7.2.4 Summary of Simulation Specifications

Table 7.1 summarizes the total specifications of the simulation. Note that this includemodels, configurations and shape types too.

Parameters Simulated:

Implementation models B, CReceiver position configurations a, b, c, dFilter lengths 2n with n = 9, 10, ..., 13

Regularization scaling β 10n with n = −6,−5, ..., 0

Shape types Type 1 for model BType 1 & 2 for model C

Penalty factors 10n with n = 0, 1, 2, 3

Simulation Constraints:

Maximum XTC filter power 28.1 dBPerformance error tolerance ±3 dB

Table 7.1: Specifications for simulation of the XTC system.

For all sets of parameters that obey the constraints, the set of parameters having thebest performance for either largest sweet spot or optimal sweet spot are saved as thesimulation result.

7.3 Simulation Results

This section presents the results of the simulation carried out together with a discussionof the performance obtained. First, the results for largest sweet spot are presented anddiscussed. Next, the results for optimal sweet spot are likewise presented and discussed.Conclusively, a final remark of the simulation is carried out.

7.3.1 Optimal Parameters for Largest Sweet Spot

The optimal parameters found for largest sweet spot are listed in table 7.2.

As noticed, implementation model C for only one receiver position situated at position 1was found to give the optimal performance with regards to CHSP. Furthermore, the bestcombination of β and the penalty factor was found using shape type 1, which can be seenin figure 6.15a. What is remarkable, is the optimal filter length, which was found to be1024. Hence, the longest filter length did not result in the best CHSP performance aspreviously expected. This might be due to compromising the performance in position 1and instead obtain a more general filter solution for all receiver positions, i.e. a decreasedresolution with shorter filter length corresponding to a more averaged solution.

76


Optimal parameters for largest sweet spot:

Implementation model CReceiver position configuration aFilter length 1024Regularization scaling β 10−6

Shape type Type 1Penalty factor 103

Table 7.2: The optimal parameters simulated for largest sweet spot CHSP performance.

Figure 7.2 shows the CHSPs of all receiver positions in the sweet spot.

500 1000 5000

−40

−35

−30

−25

−20

−15

−10

−5

0

5

10

15

Frequency [Hz]

CH

SP

[dB

]

Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6

Figure 7.2: The individual CHSPs of left ear are shown for all six positions in the sweet spot. TheCHSPs are smoothed by moving average in a span of a 16th filter length.

As noticed, the best performance is, not surprisingly, obtained at receiver position 1.Furthermore, the worst CHSP performance is present for the receiver position at position5. Comparing these two, it is seen that the largest difference is in the high frequency areaof the system. Above 2 kHz the fluctuations among the CHSPs increase, where they at6 kHz tend to meet again. This might be expected since the frequency limit of the XTCsystem is reached and only natural XTC preserves. The same tendency is observed atthe low frequency limit, where the CHSPs seem to be paired according to their lateralposition.

In comparison with the CHSPs at the left ear, figure 7.3 shows the corresponding CHSPsat the right ear. The same tendencies are observed again.

77

7 Simulation

500 1000 5000

−40

−35

−30

−25

−20

−15

−10

−5

0

5

10

15

Frequency [Hz]

CH

SP

[dB

]

Pos 1 Pos 2 Pos 3 Pos 4 Pos 5 Pos 6

Figure 7.3: The individual CHSPs of right ear are shown for all six positions in the sweet spot. TheCHSPs are smoothed by moving average in a span of a 16th filter length.

However, comparing the CHSP in a given position at left ear with the CHSP in themirrored position at the right ear, it is seen that there is poor symmetry in the XTC.Denoted by right/left, the best and worst average CHSPs were found to be -29.1/-29.6 dBfor position 1 and -12.1/-8.9 dB for position 5 in the sweet spot.

To get an impression of the XTC performance with regards to the absolute sweet spot,the CHSP is found for all 16 measurement positions, which were obtained in appendix C.These are seen plotted in figure 7.4 for both left and right ear. Both figure 7.4a andfigure 7.4b show the 16 measurement positions denoted by black cross-marks and thesweet spot size defined in chapter 4 is illustrated by the black dashed line. Furthermore,the gray contour line illustrates the -12 dB threshold limit, below which the requirementfor absolute sweet spot is fulfilled. The CHSPs in the figure are interpolated by the cubicmethod from all 16 measurement positions. Thus, the CHSP is not valid outside themeasurement area.

As observed in figure 7.4a, the requirement for -12 dB is not fulfilled for position 5 and6, which corresponds to a lateral displacement towards the right loudspeaker, hence thisis expected to cause the worse performance. Towards the left side, a lump in the graycontour line is observed between position 10 and 11 (measurement positions are shown infigure 6.12 page 65). This may be caused by moving further away from the contralateralloudspeaker. As seen from the simulated sweet spot size, the CHSP requirement is ap-proximately fulfilled at −10/+20 cm frontal direction and only ±5 cm in lateral direction.This observation may however be logically implied, since the lateral movement resultsin an asymmetric XTC for a symmetrically designed system, i.e. the phase cancelation

78


Lateral displacement [cm]

Fro

ntal

dis

plac

emen

t [cm

]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

CHSP [dB]

−40 −30 −20 −10 0

(a) The sweet spot for the left ear.

Lateral displacement [cm]F

ront

al d

ispl

acem

ent [

cm]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

CHSP [dB]

−40 −30 −20 −10 0

(b) The sweet spot for the right ear.

Figure 7.4: The CHSP is shown as a function of head displacement. (a) shows the CHSP for the leftear, and (b) shows the CHSP for the right ear. The black marks show the individidual measurementpositions.

decreases severely with lateral movement compared to frontal movement. Thus, a sweetspot is typically longer than it is wide.

Comparing figure 7.4a with figure 7.4b, some symmetrical tendencies are observed. Itshould be noted that the requirement for sweet spot stated in chapter 4 includes bothleft and right ear. Thus, the intersection area of the two contour lines for left and rightear respectively defines the fulfilled sweet spot area.

The spectral distributions of the evaluated constraints are shown in figure 7.5.

As seen from figure 7.5a, the maximum filter power is found for the bass’ around 200 Hz.Below this, the power decreases again. Since the XTC filters include the equalizationof the loudspeakers, this will also be apparent from the power spectrum. Thus, theincreasing power with decreasing frequency is comparable to equalizing the slight roll-offof the loudspeaker. The decrease in power below 200 Hz is expected to be due to theregularization increased by the penalty factor. The large dip starting from 3 kHz to 5 kHzis expected to be due to equalizing the resonance of the ear canal.

The PE shown in figure 7.5b fluctuates around 0 dB and increases towards the lowfrequency limit of the system. This increasing PE is expected to be caused by theincreasing regularization as described for figure 7.5a. The dashed black threshold lines at±3 dB show the tolerance limit defined as constraint for PE. The average PE throughoutthe frequency range was found to 0.5 dB for both ears.

79

7 Simulation

500 1000 5000 −20

−10

0

10

20

30

Frequency [Hz]

Pow

er [d

B]

RightLF

RightHF

LeftHF

LeftLF

(a) The power spectrum of the transfer functionfrom input signal to each loudspeaker signal.

500 1000 5000 −5

0

5

10

Frequency [Hz]

Per

form

ance

Err

or [d

B]

Right earLeft earPE tolerance

(b) The PE shown for the full frequency range ofthe XTC system.

Figure 7.5: The spectral distribution of the two constraints shown by (a) XTC filter power and (b)performance error.

7.3.2 Optimal Parameters for Optimal Sweet Spot

The optimal parameters found for optimal position are listed in table 7.3.

Parameters for optimal sweet spot:

Implementation model CReceiver position configuration aFilter length 8192Regularization scaling β 10−4

Shape type Type 1Penalty factor 100

Table 7.3: The optimal parameters simulated for optimal sweet spot CHSP performance.

Again, implementation model C for only one receiver position situated at position 1 wasfound to give the optimal performance with regards to CHSP. The filter length was foundto be a length of 8192, which is consistent with the expectation for best performance.Furthermore, the best combination of β and the penalty factor was again found whenusing shape type 1. Here it should be noticed, that the shape matrix is identity throughoutthe entire frequency spectrum due to the penalty factor being equal to 1, hence the useof the shape matrix is dispensable. Figure 7.6 shows the CHSP at position 1 for both leftand right ear.

As noticed, the highest CHSP performance is achieved in the area around 3.5 kHz withvalues approximate to -55 dB. Like the position 1 curves in figure 7.2 and figure 7.3, theCHSP performance of figure 7.6 decreases towards the low frequency limit. Denoted by

80


500 1000 5000 −80

−70

−60

−50

−40

−30

−20

−10

Frequency [Hz]

CH

SP

[dB

]

Right earLeft ear

Figure 7.6: The CHSPs of left and right ear for receiver position 1.

right/left, the best and worst average CHSPs were found to be -36.8/-37.6 dB for position1 and -11.7/-9.1 dB for position 5 in the sweet spot.

Even though this optimization is for the optimal sweet spot only, it is still desired tocompare the performance throughout the sweet spot with figure 7.4. The CHSPs fromall 16 measurement positions are shown in figure 7.7.


Fro

ntal

dis

plac

emen

t [cm

]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

CHSP [dB]

−40 −30 −20 −10 0

(a) The sweet spot for the left ear.


Fro

ntal

dis

plac

emen

t [cm

]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

CHSP [dB]

−40 −30 −20 −10 0

(b) The sweet spot for the right ear.

Figure 7.7: The CHSP is shown as a function of head displacement. (a) Shows the CHSP for the leftear, and (b) Shows the CHSP for the right ear. The black marks show the individidual measurementpositions.

As observed in figure 7.7a, the requirement for -12 dB is not fulfilled for position 5 and 6,

81

7 Simulation

and for figure 7.7b the positions 2, 3 and 5 do not fulfill the requirement. In general, therespective sweet spots of figure 7.7 compared with figure 7.4 seem much alike. However, aCHSP of approximately -36 dB is obtained for the optimal sweet spot position, where only-29 dB is obtained for the optimal sweet spot. Provided that the user of the applicationis expected to keep his head in the same position, the optimal sweet spot position mightbe the best choice. Since the purpose was to obtain the best performance in the entiresweet spot, and the optimization for optimal sweet spot yielded a slightly better overallperformance, this parameter set will be used.

The spectral distributions of the evaluated constraints are shown in figure 7.8.

500 1000 5000 −20

−10

0

10

20

30

Frequency [Hz]

Pow

er [d

B]

RightLF

RightHF

LeftHF

LeftLF

(a) The PE shown for the full frequency range ofthe XTC system.

500 1000 5000 −5

0

5

10

Frequency [Hz]

Per

form

ance

Err

or [d

B]

Right earLeft earPE tolerance

(b) The PE shown for the full frequency range ofthe XTC system.

Figure 7.8: The spectral distribution of the two constraints shown by (a) XTC filter power and (b)performance error.

As seen from figure 7.8a, the maximum filter power is found for the bass’ at the lowfrequency limit of 160 Hz. Contrary, it was seen that a roll-off occured below 200 Hz infigure 7.5a. The difference is expected to be due to the longer filter length used, and thelesser regularization below the low frequency limit. Otherwise, the spectrum of figure 7.8afollows the same trends as in figure 7.5a.

In figure 7.8b it is seen that the PE is within the tolerance limit of ±3 dB, even for thelow frequency limit. This is likewise expected to be due to the larger filter power causedby lower regularization. The average PE throughout the frequency range was found to0.2 dB for both ears.

7.3.3 Final Remarks on Simulation

From the simulations performed, it is concluded that the requirement for a CHSP betterthan -12 dB can not be fulfilled. This concerns both the simulation for largest sweetspot and optimal sweet spot position. Based on this requirement for obtaining the best

82


performance throughout the entire sweet spot, the parameter set associated with largestsweet spot will be used. These parameters are listed in table 7.2.

However, from the experience of the simulation results, several observations arise:

1. Implementation model C performs better than model B for both optimizations.

2. A plant matrix consisting of receiver position at position 1 alone yields the bestperformance for both optimizations.

3. Shape type 1 outperforms shape type 2 for both optimizations.

In order to look into the first issue, figure 7.9 shows the CHSP of implementation model Bagainst model C. The optimal parameters were found for each of them with optimizationfor the optimal sweet spot (table 7.3). The only change in the parameters for optimalmodel B is a β changed from 10−4 to 10−3.

500 1000 5000 −60

−50

−40

−30

−20

−10

Frequency [Hz]

CH

SP

[dB

]

Model BModel C

Figure 7.9: The CHSPs of implementation model B and C for the left ear only at position 1. TheCHSPs are smoothed in order to see the more general tendencies of the curves.

As seen in figure 7.9, model B deviates increasingly from model C and peaks around1150 Hz. Thus, the main reason for model B to be outperformed is partly due to thepeaking in the mid-frequency area and partly the CHSP offset seen throughout the upperfrequency region. As previously mentioned, model B implements the use of a crossovernetwork having a crossover frequency at 1450 Hz. The peak of the CHSP curve is,however, at 1150 Hz. Thus, believing the crossover network to cause this performancedecrease is questionable. The filters were designed having linear phase and thus constantgroup delay, which should not induce any deterioration on the XTC. On the other hand,the only difference between B and C apart from the crossover network, is the slightchange in regularization. Changing β, however, does not remove the peaking in themid-frequency region.

The second issue regarded why the use of receiver position at position 1 alone would yieldthe best performance results. A possible reason for this might be that the attempt to

83

7 Simulation

equalize for more than one receiver position is unfeasible. The different dips present in thespectra of the plant may be regularized such that less error is introduced by the attemptto perform XTC at certain frequencies. Conclusively, a single receiver position or a singlegood representative of a small sweet spot is believed to yield the best performance.

As learned, the simulation for both optimizations resulted in the use of shape type 1 eventhough shape type 2 was available for implementation model C. However, the optimalsweet spot position had a penalty factor equal to 1, hence the shape matrix is identity, andthe choice of shape type becomes indifferent. When optimizing for sweet spot, the penaltyfactor is significant, but the increased regularization from shape type 2 is found to decreasethe performance of the lateral positions significantly. This suggests that optimizing formore receiver positions should not be approached by increased regularization.

Regarding both optimizations, a CHSP of less than -20 dB is obtained at position 1.As learned in section 2.1, the ILD may be as high as 20 dB at certain angles for higherfrequencies, but becomes negligible for lower frequencies. This suggests a frequency de-pendent requirement for the CHSP. As seen in both figure 7.2, figure 7.3 and figure 7.6,the CHSP performance decreases towards the lower frequency limit. However, havinga frequency dependent requirement for CHSP, this decrease in performance seems lesscrucial. On the other hand, it is still desired to reproduce ITD cues for virtually posi-tioned sources. Thus, an increased CHSP is still expected to enhance the quality of thelocalization cues.

To verify the results of the simulation, an objective and subjective test are to be carriedout using the XTC filter made from the parameter set for optimal sweet spot.

84

Chapter 8Objective Test

This chapter describes the objective test of the implemented XTC system. The test willdetermine how the system performs with respect to the channel separation (CHSP) andperformance error (PE). The results of this test will be compared with the simulationresults from section 7.3 and the XTC performance requirements from the requirementsspecification section 4.3. The objective test measurement report can be found in ap-pendix D. This measurement report, however, does not include the results presentedin this chapter. Throughout the analysis of the results, the positions mentioned aredescribed in the measurements report. These are the same as used in the report untilnow.

8.1 Performance Error

It was chosen to include loudspeaker equalization as part of the XTC system. Sincethis test was performed using a white noise stimuli, the equalization can be verifiedthrough evaluation of PE. The measured responses are depicted in figure 8.1, where thenon-equalized responses are also depicted.

500 1000 5000 40

50

60

70

80

90

Frequency [Hz]

Mag

nitu

de S

PL

[dB

re.

20

µPa]

Equalized loudspeakerNon−equalized loudspeaker

(a) Position 1 right side.

500 1000 5000 40

50

60

70

80

90

Frequency [Hz]

Mag

nitu

de S

PL

[dB

re.

20

µPa]

(b) Position 2 right side.

Figure 8.1: Equalized and non-equalized loudspeaker responses for the right channel. The responses ofthe left side are similar.

In figure 8.1 the black lines represent the frequency response at the ears of the listener.

85

8 Objective Test

It can be seen that the responses are flat, especially for position 1. The PE defined insection 3.2 also shows good performance, 0.7 dB for both right and left side and 1.3 dBand 1.4 dB respectively for position 2. This is almost as good as the simulated results,which showed an average PE of 0.5 dB. This fulfills the requirement of a PE tolerance lessthan 3 dB. Based on this it is concluded that the loudspeaker setup is able to reproducea signal for the listener, without alternating its frequency spectrum.

8.2 Sweet Spot Size

In the requirements it is specified that the sweet spot size is a lateral movement of ±0.05 mand a frontal movement of 0.1 m. Within this area the CHSP performance must be betterthan -12 dB. Figure 8.2 shows the sweet spots with and without using XTC. The requiredsweet spot area is marked by the black dashed line.

Using the XTC system the absolute sweet spot is marked by the gray contour line. Tofulfill the demands, the contour line must enclose the dashed black line showing therequired sweet spot. As shown, the requirement is not fulfilled, but it is close. Notethat the contour line in figure 8.2a and 8.2b marks the sweet spot for the left and rightear respectively meaning that the sweet spot for the listener is the common area coveredof both contour lines. The sweet spot is larger in the region behind the intended sweetspot, but the optimal position is position 1 as intended. If the sweet spot of figure 8.2aand 8.2b are compared with the simulation results shown in figure 7.4, it is noticed thatthe size of the sweet spot is slightly smaller for the objective test than for the simulatedresults. Without XTC the requirement of a CHSP -12 dB is, as expected, not met.

86

8.2 Sweet Spot Size


Fro

ntal

dis

plac

emen

t [cm

]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

(a) Sweet spot for XTC for left ear.


Fro

ntal

dis

plac

emen

t [cm

]−15 −10 −5 0 5 10 15

−20

−10

0

10

20

30

(b) Sweet spot for XTC for right ear.


Fro

ntal

dis

plac

emen

t [cm

]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

CHSP [dB]

−40 −30 −20 −10 0

(c) Sweet spot for natural XTC for left ear.


Fro

ntal

dis

plac

emen

t [cm

]

−15 −10 −5 0 5 10 15−20

−10

0

10

20

30

CHSP [dB]

−40 −30 −20 −10 0

(d) Sweet spot for natural XTC for right ear.

Figure 8.2: Sweet spot performance. The gray contour line show the area of the absolute sweet spot.The black dashed line show the required sweet spot.

87

8 Objective Test

8.3 CHSP

As mentioned in section 2.1, the ILD cues may be as high as 20 dB. For position 1 theCHSP is fulfills this with a margin approximately 1 dB on the average. Figure 8.3 showsthe CHSP over the frequency range of the XTC system in position 1 and 2 for the rightear. The CHSP for position 1, is not good around 1.2 kHz. This deviated from thesimulations that performed rather good in this region. The reason for this is unknown.A table showing CHSP for all measurement positions can be found in appendix D page143. It shows that the average CHSP performance for right and left ear is almost alike.

500 1000 5000

−30

−20

−10

0

10

20

Frequency [Hz]

CH

SP

[dB

]

(a) Position 1.

500 1000 5000

−30

−20

−10

0

10

20

Frequency [Hz]

CH

SP

[dB

]

XTC CHSP"Natural" CHSP

(b) Position 2.

Figure 8.3: CHSP performance with and without XTC for the right ear.

88

Chapter 9Subjective Test

An objective validation of the XTC system was conducted to determine how well theXTC system performed, however the results from that test do not necessarily reflect howlisteners will perceive the effect of the system. This is determined through a subjectivelistening test using human test subjects. This chapter describes considerations and designof the test, and in the end the results from the test are presented and analyzed. Ameasurement report for this listening test is found in appendix E, where further detailsregarding the listening test are found.

9.1 Test Overview

The test subjects are presented with stimuli from a number of virtual sources createdusing binaural synthesis, and the test subjects must determine from where, they perceivethe stimuli. The test subjects will answer by means of a software GUI.

The amount and severity of localization errors will be used as a performance measure. Tohave a reference for the performance of the XTC system, the test subjects will also carryout a localization test using headphones for comparison. Figure 9.1 shows an overviewof the subjective listening test.

!"

#$ %&'

(

)* +

)*

A

f

,

( +

Figure 9.1: An overview of how the subjective listening test will be performed.

The test is divided into three sessions, where the XTC system is used in two sessions andheadphones in one session.

• Session 1: evaluating the XTC system performance at the optimal sweet spot.

• Session 2: evaluating the performance using headphones.

89

9 Subjective Test

• Session 3: evaluating the XTC system performance when the listener is displaced5 cm from the sweet spot.

Each session is divided into three groups evaluating virtual directions in different planes.The description of the groups follows later on in this chapter, while the test procedurecan be found in appendix E page 150.

9.2 Test Design

Several aspects of the reproduction are tested by measuring the localization performanceof the subjects. To understand the design, the objectives of the test are stated.

9.2.1 Test Objectives

Is the XTC system able to reproduce sufficient localization cues for subjects?For the XTC system to be usable it must be able to reproduce localization cues sufficiently.This question is answered through the following subquestions.

• Is the localization performance as good as headphone reproduction?It should be ensured that it is the XTC system’s reproduction performance that isbeing tested, and not an individual subject’s localization performance. To excludethis subjective parameter, the XTC system is compared to headphone reproduction,which gives a reference for the individual subjects localization performance, sinceheadphones provide optimum CHSP.

• Is the localization performance equally good for all directions?It is considered that different directions set different requirements for the reproduc-tion. For this reason several directions around the subject will be tested. Further-more, it is decided to study the localization in the different planes individually, toidentify if the reproduction is limited to a single plane.

• Does the entire sweet spot provide sufficient localization performance?If the requirement of sweet spot size is fulfilled, the subjects localization perfor-mance should not be degraded by head movement within the sweet spot. To ensurethis the test subjects will not only be tested at the optimal sweet spot position, butalso at a lateral displacement of 5 cm.

• Is the localization performance dependent on the subjects?The XTC system was designed to operate within a frequency range with littlevariance between individual HRTFs. Thus, there should also be little difference inthe subjects localization performance, i.e. the difference between loudspeaker andheadphone localization performance should be similar for all subjects.

90

9.2 Test Design

• How good is the performance compared to other XTC systems?Several articles study the topic of XTC systems and also use subjective listeningtests for validation. It is considered relevant to compare the performance of the XTCsystem designed with previous XTC systems. To make the results comparable, thetest design should be similar, and many design choices are made to comply withthis.

The design will be characterized by the objectives, and the test results will focus onanswering the stated questions.

9.2.2 Virtual Sources

As mentioned in the objectives it has been decided to study the subject’s localizationperformance in different planes. For this reason the chosen virtual directions are dividedinto three groups as seen in figure 9.2 and 9.3. These directions can be found in table E.1.Each group will only allow answer possibilities corresponding the plane(s) of the respectivegroup.

= 0˚

a1

a2

= 30˚

(a) Group 1: Horizontal plane test.

= 0˚

= -60˚

= -30˚

= -90˚

= 30˚

= 60˚

= 90˚

b1

b2

= 0˚= 30˚

(b) Group 2: Frontal plane test.

Figure 9.2: The virtual directions that will be tested in the horizontal (a) and frontal plane (b) aremarked in these figures. Directions with a rating ”5” indicate that this direction will be played five times,whereas the rest will only be played once. These colored pairs indicate that the directions always occurin subsequent order.

• Group 1: horizontal planeThe test in the horizontal plane includes the 12 directions with interspacings of30 as shown in figure 9.2a. This specific interspacing is chosen, since it was usedfor a previous subjective listening test of an XTC system [Bai and Lee, 2006a].

• Group 2: frontal planeSix different directions will be tested in the frontal plane. This will ensure that theXTC system is able to reproduce different elevations.

91

9 Subjective Test

= 0˚ = 30˚

! = 0˚

! = 30˚

a

b b

c

c

a

d

f

e

d

e

f

! = -30˚

(a) Group 3: 3D test (two planes).

Figure 9.3: Virtual direction for the 3D test, where a direction needs to be specified in two planes.The virtual directions that will be tested are marked pairwise with color and enumerated with ”a” to ”e”.The directions marked with the number 5 indicate that this direction will be played five times, whereasthe rest will only be played once.

• Group 3: 3D (two planes)In a 3D environment both elevation and azimuth angle must be answered and istherefore expected to be more challenging. Six randomly chosen directions areincluded in this test, and they can be seen on figure 9.3a.

The directions will be presented in a different order for each session to ensure that thesubjects answer based on how they perceive the sound instead of what direction theyexpect to hear.

9.2.3 Cone of Confusion Errors

It is considered interesting to study the reproduction of ILD cues from the severity offront/back confusion. It is expected that such errors could be provoked if a pair ofdirections positioned on the cone of confusion were presented in subsequent order. Aconfusion pair is included, and is marked in figure 9.2a. For the frontal plane directionsa confusion pair is made to provoke up/down confusion errors (figure 9.2b). Also in thetest of both planes, a front back pair is introduced, marked by ”c” and ”d” on figure 9.3.

9.2.4 Repetitions of Virtual Directions

To validate the XTC system’s performance, it should be ensured that it is the localizationperformance of the subjects that are measured, and not just random guessing. For thisreason it is decided that a few virtual directions should be repeated in order to measurethe consistency of a subject’s answers, and thus allow- hypothesis testing with fewersubjects.

92

9.3 Results

For each of the three groups of virtual directions, three directions are chosen to be pre-sented four additional times. It is chosen that a pair provoking cone of confusion errorstogether with a common direction for all three groups should be repeated.

9.2.5 Stimuli

A pink noise signal is chosen as stimuli for two reasons. Its frequency content covers theentire frequency range of the XTC system. Also, many previous localization tests of XTCsystems use pink noise sequences as stimuli, thereby making it easier to compare resultsfor the designed XTC system with previous results [Bai and Lee, 2006a], [Takeuchi et al.,2001, p. 967]. The signal consists of five repetitions of 300 ms pink noise followed by a300 ms silence. For more details on the noise signal see appendix E.

9.2.6 Test Subjects

A total of 11 subjects volounteered to participate in the listening test, all of whom wereknown by the group. It was assumed that any knowledge about the system would biasthe subjects, hence none of the test subjects were given any prior knowledge to theproject. Before the test, each subject was handed the questionnaire found in appendix Gto estimate whether any of them suffered from a hearing loss. All 11 subjects were foundsuitable for the test from a subjective evaluation of their questionnarie answers. Theanswered questionnaires are located on the enclosed DVD numbered by Test Subject 1 -11.

9.3 Results

The results from the subjective listening test will in this section be used to answer thequestions raised in section 9.2.1. Appendix E presents the directions perceived by thesubjects compared with the correct directions. To answer the questions from sheer com-parison will however be difficult, and therefore a rating system is introduced to make theresults easier to interpret and analyze. The ratings of the subjects localization perfor-mance is presented and used for statistical analysis.

9.3.1 Rating System for the Results

A rating system is introduced to categorize and penalize different types of errors inanswers. The rating system of [Bai and Lee, 2006a] is adopted, since it seems reasonableand allows for comparison with three XTC systems tested in this paper. This ratingsystem is shown in figure 9.4 and can be used directly for the one-dimensional errors inthe horizontal and frontal plane. A modified version of this is created for the directions,

93

9 Subjective Test

= 0˚ = 30˚

Target

direction

Figure 9.4: Rating of answers for tests of virtual directions in a single plane. The target direction ofthe stimuli is indicated by the red marker. The subject’s answer is rated accordingly to how close it isto the target. The numbers indicate the points given for the specific direction. Notice that the points 2and 3 are given in case of front/back error, when the answer lies on (or close to) the cone of confusion.

= 30˚

= 0˚

= 30˚

= 0˚

= -30˚

Figure 9.5: Rating of answers for tests of a virtual direction in two planes. The rating system is similarto the one-dimensional version figure 9.4. The elevation is indicated by having multiple circles, where thecircle of smallest radius is the lowest elevation. The colored lines indicate where the cone of confusionratings are given.

94

9.3 Results

where both azimuth and elevation are requested. This can be seen in figure 9.5. Itis attempted to give a rating categorizing the answers in a manner similar to the one-dimensional rating system. In this system, front/back localization ratings are now givenfor the entire cone of confusion. All of the subjects answers have been rated accordinglyto this system, and the rated results will be used for performance analysis.

9.3.2 Comparison of Overall Performance

To answer the overall question of whether the XTC system performance is comparableto headphones (HP), an overall rating of both has been calculated. This is the averagerating of all subject answers for all directions, as shown in figure 9.6. Notice the XTCsystem has been given the abbreviation XtcOpt for position 1 (optimal sweet spot) andXtcDis for position 2, displaced 5 cm. These abbreviations will be used for most of thefigures and tables in this section.

2.5

3

3.5

4

4.5

5

5.5

XtcOpt(Loudspeakers

centered)

XtcDis(Loudspeakers

displaced)

HP(Headphones)

Ave

rage

rat

ing

Figure 9.6: Overall rating of the answers from all subjects and all directions. The whiskers marka distance of one standard deviation from the mean value, and are thereby close to a 68% confidenceinterval.

As seen in figure 9.6, the subjects obtain better ratings using headphones. However thedifference is considered small. For this reason, it will be investigated if there is statisticalbasis for concluding that the performance using loudspeakers is worse. Starting fromthe assumption that the loudspeaker reproduction is at least as good as the headphones,leads to the one-sided hypothesis test

H0 : µHP ≤ µXtcOpt (9.1)

Reject H0 if TS >tα,n−1 (9.2)

TS =XHP −XXtcOpt√

SHP2

n +SXtcOpt

2

n

, (9.3)

95

9 Subjective Test

where X and S are sample estimators of mean and standard deviation respectively. TS isthe test statistics, which is compared to the distribution function and n is the number ofsamples. The equations were derived from [Ross, 2004, p. 311 and 319]. However, for alarge value of n, the t-distribution approximates the normal distribution, so tα,n−1 ≈ zα.For a 95% conficence interval it is found that the test statistics should be compared toz0.05 = 1.645 [Ross, 2004, p. 614]. The test statistics have been computed to

TS =4.11 − 3.92√

1.05+1.34660

= 3.23 (9.4)

which means that the H0 hypothesis must be rejected, and the overall headphone repro-duction provides better performance. The difference is so large that the hypothesis of theloudspeaker performance being similar to the headphone performance, must be rejectedlikewise.

H0 : µHP = µXtcOpt (9.5)

Reject H0 if |TS| > z 0.052

,

where, z 0.052

= 1.96, since it is a two-sided hypothesis test.

Paired Comparison

Although it was found that the overall headphone performance was better, it does notexclude the possibility that the actual influence of changing from headphones to XTCsystem is negligible. Therefore, an average paired comparison test will be made, whereHP and XtcOpt are compared for each individual direction and subject. The test is madeon basis of an average of all test pairs.

Wn = (Xn,HP −Xn,XtcOpt) (9.6)

H0 : Wn = 0 (9.7)

TS =√nW

Sw(9.8)

Reject H0 if |TS| > z 0.052

,

where Xn,HP is the rating for a specific direction and subject using headphones. W andSw are sample estimators of mean and standard deviation for the pairwise subtraction.Since there are 12+6+6 unique directions and 11 subjects, n = 264. The test statisticshave been computed to

TS =√

264 · 0.124

0.94= 2.15 , (9.9)

which means that H0 also must be rejected for paired comparison of all directions. It mustbe noted, that the comparison includes directions presented only once for the subject.

96

9.3 Results

Hence, the average ratings becomes more certain by only considering the directions whichwere repeated, however this also reduces n, and equation (9.8) can now be written as

H0 : Wn = 0 (9.10)

TS =√

99 · 0.2404

0.7245= 3.3 . (9.11)

This also leads to rejection of H0, and it seems that the XTC system compromises thelocalization performance.

9.3.3 Comparison of Directions

The second question in the objectives is whether the XTC system provide sufficientperformance in all directions. To answer this, the average rating for each direction hasbeen calculated. The directions requiring answers in two planes receives a lower averagerating, but apart from this the performance seems similar. A plot of this is found infigure E.6, page 154. The decrease in performance for both planes, seems to originatefrom an increase of cone of confusion errors. This can be seen in figure 9.7, which showsthe distribution of the different ratings.

1 2 3 4 50

50

100

150

200

250

300

Rating n [XtcOpt, XtcDis , HP]

Occ

uran

ces

of r

atin

g [−

]

Horizontal PlaneFrontal Plane3D

Figure 9.7: Histogram showing how many of each rating type there is assigned to the loudspeaker andthe headphones. The histogram is stacked so it is possible to see the influence of the different planes.Note, a rating ”3” represents cone of confusion errors and the rating ”2” for answer nearby. Each of thethree sessions consist of 660 occurences in total.

The direct plot presentation of all directions is, however, difficult to interpret, and for thisreason, it is decided to make statistical testing of the directions repeated five times. Itwas decided that the criteria for delivering sufficient performance should be that it doesnot drop significantly for specific directions. This seems reasonable, since it was noticedfrom figure E.6 that some directions obtained a better rating using the XTC system thanthe headphones.

97

9 Subjective Test

Therefore, it was decided to repeat the paired hypothesis test of the headphone versusloudspeaker, and use this as criteria for sufficient performance. Hereby equation (9.6) to(9.8) will be used again, but only for a specific direction at a time. The results of thesehypothesis tests are found in table 9.1. For this hypothesis test there are fewer samplesthan the previous, and therefore TS should be compared to a t-distribution as expressedby

Reject H0 if |TS| > tα2

,n−1 , (9.12)

where n = 11, since there is an average rating for each subject. Hereby TS will becompared to t0.025,10 = 2.201.

Target direction W S TS H0

θ = −90 0.09 0.72 0.42√

θ = 30 -0.13 0.63 -0.67√

θ = 150 0.36 0.65 1.86√

ψ = −90 0.47 0.8 1.96 (√

)ψ = 30 -0.02 0.26 -0.23

√

ψ = 150 0.56 0.9 2.08 (√

)θ = −60, φ = −30 0.18 0.72 0.84

√

θ = −120, φ = 30 0.16 0.51 1.06√

θ = 90, φ = 0 0.47 1.01 1.56√

Table 9.1: Hypothesis test of whether the XTC system performance is comparable to headphonereproduction. The

√in the H0 column indicates for which of the directions, the hypothesis can be

accepted using a normal distribution as model. The (√

) indicates that the hypothesis is only acceptedusing a t-distribution with n − 1 degrees of freedom.

From table 9.1 it is seen that for most directions the TS is positive, i.e. the headphonesyield the better performance. The difference in performance is, however, too small toreject the claim that the reproduction through headphones compared to that throughthe XTC system is similar. Hereby, the XTC system yields a sufficient performancein all directions, although it might not be as good as the headphone reproduction. Itshould, however, be noted that values of TS in table 9.1 are large in certain directionsand the hypothesis is only accepted, since there are too few samples to utilize the normaldistribution as a model.

Although the XTC system provides sufficient performance in all directions, it does notreveal whether the decrease in performance can be considered similar. To ensure this, anANOVA will be used for the same data testing the hypothesis

H0 : WDirection1 = WDirection2 · · · = WDirection9 (9.13)

Reject H0 if p ≤ 0.05 ,

where WDirection n is the mean difference in rating obtained from five repetitions for the

98

9.3 Results

spefic direction n. For the ANOVA test, the data is arranged as

W Subject1,Direction1 . . . W Subject1,Direction9

.... . .

...W Subject11,Direction1 . . . W Subject11,Direction9

, (9.14)

and it will be computed whether the difference between the columns (i.e. directions),could be caused by variance of the sampling or if the difference is too large to assumethis.

A box whisker plot from the analysis is found in figure E.8, on page 155. The p-valuewas calculated to 0.29, i.e. above the 0.05 required for a 95% confidence interval, thusthe hypothesis is accepted.

9.3.4 Comparison of Sweet Spot Positions

The last session in the listening test was using the XTC system with the test subjectlocated in a displaced position. This was performed to ensure that not only the optimalsweet spot provided sufficient performance. This will be tested by hypothesis testing ofthe performance in a non-optimal position can be considered similar to the optimal. Theactual hypothesis and the results of this are seen in table 9.2.

Before studying table 9.2 it should be recalled that figure 9.6 in the comparison of overallperformance have already revealed that the displaced position actually gave a higherrating than XTC system using the optimal sweet spot. Therefore, it was decided toinclude a comparison between XTC system performance in a non-optimal position andheadphone reproduction. It can be seen that the performance in the non-optimal sweet

Criteria TS Value of dist. func H0

µXtcOpt = µXtcDis -0.99 z 0.052

= 1.96√

Xn,XtcOpt = Xn,XtcDis (n=All directions) -1.69 z 0.052

= 1.96√

Xn,XtcOpt = Xn,XtcDis (n=Repeated directions) -0.51 z 0.052

= 1.96√

µHP ≤ µXtcDis 2.31 z0.05 = 1.65 XµHP = µXtcDis 2.31 z 0.05

2

= 1.96 X

Xn,HP = Xn,XtcDis (n=All directions) 0.48 z 0.052

= 1.96√

Xn,HP = Xn,XtcDis (n=Repeated directions) 3.03 z 0.052

= 1.96 X

Table 9.2: The XTC system performance in the displaced position is comparable to headphone perfor-mance and performance in the optimal sweet spot. Hypothesis tests of whether the performance can beconsidered similar are made. The

√in the H0 column indicates that the hypothesis can be accepted.

spot is still not comparable to the headphone reproduction. Only the hypothesis includingdirections that subjects have rated once allows for accepting the H0 hypothesis. However,

99

9 Subjective Test

in all cases the TS’s are better than those found in the comparison of headphones andXTC system at the optimal sweet spot.

All three hypothesis tests stating the optimal and non-optimal sweet spot yield compa-rable ratings are however accepted. Hereby, it is concluded that if the user considers theperformance within a position of the designed sweet spot as sufficient, then the user willconsider the performance within the entire sweet spot as sufficient.

It was unexpected that the non-optimal sweet spot yielded the better localization perfor-mance, but a reason for this might however be that it was the last listening session, andthe test subjects can be considered more trained in determining a direction.

9.3.5 Comparison of Subjects Performance

The XTC system is intended to be usable for any user. It will be investigated if thisrequirement is ensured, by analyzing if the test subjects performance can be consideredsimilar. To get an idea of this, the average ratings of each subject have been calculatedfor each of the three sessions. A plot of this is found in figure E.10 on page 156. Toactually determine whether the subjects rating can be considered similar an ANOVAanalysis is made using the repeated directions. First it will be tested whether the averageperformance for each subject can be considered similar and afterwards the it will beanalyzed whether the reduction in performance can be considered equal for all subjects.The first test is

H0 : XXtcOpt,Subject1 = XXtcOpt,Subject2 · · · = XXtcOpt,Subject11 , (9.15)

whereXXtcOpt,Subject m is Subject m’s mean rating for the directions which were repeated,when using the XTC system. The ANOVA will be arranged as

XXtcOpt,Direction1,Subject1 . . . XXtcOpt,Direction1,Subject11

.... . .

...XXtcOpt,Direction9,Subject1 . . . XXtcOpt,Direction9,Subject11

. (9.16)

The p-value was found to be 0.12, so the H0 hypothesis is accepted. The mean values ofthe test subjects can be seen in table 9.3, and a box whisker plot of the results can beseen in figure E.9, page 156.

Test subject 1 2 3 4 5 6 7 8 9 10 11

Avg. rating 3.13 4.22 3.8 3.58 3.4 3.76 4.29 3.96 4.38 4.4 3.96∆-rating 1.04 0.04 0.16 0.22 0.42 0.22 0.09 0.42 -0.07 -0.2 0.29

Table 9.3: Performance rating using XTC system for the individual test subjects presented togetherwith the difference compared to their performance using headphones. The ∆-rating is calculated as(XHP −XXtcOpt), and therefore positive values indicate that the headphone reproduction yields the bestperformance.

100

9.3 Results

The second hypothesis test is whether the reduction in performance can be consideredequal for all test subjects expressed as

H0 : X(HP-XtcOpt),Subject1 = X(HP-XtcOpt),Subject2 · · · = X(HP-XtcOpt),Subject11 , (9.17)

and ANOVA is used the same way as the previous H0 hypothesis. The p-value was foundto be 0.04, which means rejection of the H0 hypothesis. By looking at the test subjectsmean rating in table 9.3 and the box wisker plot in figure 9.8, it can be seen that thereare two test subjects who obtain a better performance using the XTC system than usingheadphones. Furthermore it is noticed, that it is specifically test subject 1 that differs byhaving a large decrease in performance. This was confirmed by computing an ANOVAwithout the test subject 1. This increased the p-value to 0.582, which would lead toacceptance of the H0 hypothesis.

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

1 2 3 4 5 6 7 8 9 10 11Test subject

Ave

rage

diff

eren

ce in

rat

ing

betw

een

hea

dpho

nes

and

XT

C s

yste

m

Figure 9.8: Box whisker plot of whether the difference between the headphone performance and XTCsystem performance (X(HP-XtcOpt),Subjectn) can be considered equal for all test subjects.

9.3.6 Comparison With Other XTC Systems

The last objective was to compare the performance against other XTC systems. The arti-cle [Bai and Lee, 2006a] studied different loudspeaker spans at different lateral positions,and in figure 9.9 results from this article are compared to the XTC system of this project.Since this article only tested the localization performance of the horizontal plane, onlythe results of the horizontal plane test is included in the comparison. Since only thedirections in the horizontal plane is considered, the average performance is higher than infigure 9.6, which includes all directions. It should be noted that this comparison dependson how good the test subjects are at localizing. Furthermore, this article used two groups

101

9 Subjective Test

of test subjects. One group tested the 10 and 120 span, and the second tested 60 and120 span. The performance of the 120 span presented in figure 9.9 is the mean valuefor both groups, and the standard deviation is the RMS value for both groups.

2

2.5

3

3.5

4

4.5

5

5.5

HP XtcOpt Span10°

@ 0 cm displacement

Span60°

Span120°

XtcDis Span10°

@ 5 cm lateral displacement

Span60°

Span120°

M

ean

ratin

g fo

r tr

ansd

ucer

(

All

subj

ects

for

the

horiz

onta

l pla

ne)

Figure 9.9: Comparison of the implemented XTC system with results from previous articles. The errorbars indicate a distance of one standard deviation. The headphones are included as a reference.

It can be seen that in the optimal sweet spot position, the performance is comparable tothe 60 and 120 span, and it is better than the 10 span. At the 5 cm lateral position,the average performance is better, so it seems that the XTC system developed in thisproject might be more robust to head displacements, i.e. yielding a larger sweet spot.The differences can however be considered small, and it can not be ruled out that this isonly due to uncertainties such as difference in subjects.

Summary and Discussion

The listening test was performed to address the questions of section 9.2.1. A total of11 subjects participated in the listening test. The test was divided into three sessions, aheadphone session for reference and two sessions testing the XTC system at optimal sweetspot and displaced position respectively. In the following, the statistical conclusions forthe XTC systems performance are compared with subjective impressions.

The debriefing questionnaires have been browsed through, and eight subjects found itmoderate or difficult to localize in all three seessions. This may be due to that thebinaural synthesis used the HRTF of a head and torso simulator and not the subject’sown. Another reason could be that the subjects were unfamiliar with the stimulus.

102

9.3 Results

• Headphone comparison:Is the localization performance as good as headphone reproduction?

The XTC system was able to provide localization cues, however not as good asheadphones, since a similar overall decrease among the subjects was observed for thelocalization performance in the XTC system sessions compared to the headphonesession. This is in accordance with the results from the debriefing questionnaires,where five subjects answered that localization using headphones was easier than forthe other sessions.

It must however be noted that there are disadvantages of headphone reproductioncompared to loudspeakers. As an example a minor head movement will give adifference in ITD when using loudspeakers, which can be used to help determinean ambiguous direction.

• Virtual directions:Is the localization performance equally good for all directions?

When judging in the frontal plane test and the 3D (both planes) test, the local-ization performance was decreased compared to the horizontal plane test. Fromthe subjects’ debriefing questionnaires, four of the test subjects reflected this resultby noting that it was more difficult to localize in the frontal plane test. A similardecrease in performance was observed between the group tests for all three sessions.Therefore, it is assumed that it is not a limitation of the XTC system reproduction,but a general limitation for human localization. This is also in accordance withsection 2.1.1, where it was found that the uncertainty is approximately twice ashigh when determining the elevation angle compared to determining the azimuthangle.

• Sweet spot performance:Does the entire sweet spot provide sufficient localization performance?

It was found that the localization performance obtained in a lateral displaced posi-tion was better than the performance in the optimal sweet spot. However, it is stillnot as good as in the headphone reproduction. The two positions in the sweet spotwere found to provide similar performance. This is in accordance with the subject’sanswers, since eight of the subjects found it equally difficult to localize in the twoloudspeaker sessions. On basis of these results, it is assumed that the entire sweetspot provides sufficient performance, hence the XTC system seems robust to headmovement.

• Individual performance:Is the localization performance dependent on the subjects?

The performances of the 11 subjects were found to be similar with the exception ofa single subject, who performed considerably worse for XtcOpt session comparedto the HP session. This subject also answered that it was easier to localize usingthe headphones.

103

9 Subjective Test

• Comparison with other XTC systems:How good is the performance compared to other XTC systems?

In the optimal sweet spot, the implemented XTC system was found to yield aperformance similar to a 60 and 120 loudspeaker span and better than a 10 span.In the XtcDis session, it was found to perform better than the other systems.

• Final remarks:It is concluded that the implemented XTC system is able to provide a listenerwith localization cues for 3D perception. The subjective listening test showed thatthe XTC system gave a satisfying performance, however there are issues regardingthe listening test design, which must be considered. Some of these issues concernthe number of repetions in each test group, the number of test subjects used andverification of their hearing by audiometry. There is a large variance in performanceof the subjects. If they had to rate the directions more times, this variance couldhave been decreased. Furthermore, additional subjects could have been tested.

The session order might have influenced the localization performance, since thelisteners naturally got more trained with each session. The XTC system showeda better performance in the displaced position, where this might be due to thislearning effect, since this session was always the last to be performed by the subjects.Had the order of sessions been randomized from subject to subject, the test mighthave solved the problem.

104

Chapter 10Conclusions

Based on the drawbacks of using headphones for 3D sound reproduction in a multimediasetup, the suggestion of using loudspeakers for this purpose was motivated. This led tothe goal of the project, which was to:

Design and implement a binaural reproduction system using loudspeakers,

which provides good localization performance and has a crosstalk cancelation

robust to minor head movement.

In order to fullfil this goal, a thorough analysis was performed.

The aspects of human sound localization were described together with an analysis of theHRTF and its properties related to binaural synthesis. This helped setting requirementsfor the XTC system and provide basis for the listening test design. Afterwards, the basicXTC principle was described together with an analysis of loudspeaker span based on theOSD principle.

At this point, a requirement specification was developed. It was decided to use a two-way loudspeaker setup positioned according to the OSD principle. The sweet spot sizewas defined as a lateral movement of ±5 cm and a frontal movement of 10 cm forward,where the CHSP performance was required to be better than -12 dB. Furthermore, theperformance error was required to be within a tolerance of 3 dB. The frequency range ofthe XTC system was set from 160 Hz to 6 kHz.

A channel model of the XTC system was presented, whereafter the requirement for filtergain was described. Next, possible solution methods were presented as implementationalgorithms. Here, the LS method was preferred over the GCC method.

Three implementation models were proposed, where two of these were chosen for im-plementation. A model utilizing a crossover network (model B), and another relyingcompletely on LS optimization (model C). Here LSfreq was used as implementation al-gorithm. To ensure that the XTC system would yield a large sweet spot, its channeltransfer function was measured in 16 positions and used in the simulation. The finalimplementation specifications were stated for the system, where some of the parametershad to be found by simulation.

In order to obtain the optimal parameter set for the implementation, a simulation hadto be performed. This was done by brute force and performance was evaluated againstconstraints regarding filter power and performance error. Here, it was chosen to optimize

105

10 Conclusions

for the largest sweet spot and optimal sweet spot. Simulation results showed that modelC was preferred as implementation. For the simulation results from largest sweet spot,the best and worst average CHSPs were found to be -29.1/-29.6 dB for position 1 and-12.1/-8.9 dB for position 5 in the sweet spot. For the optimal sweet spot simulation, thebest and worst average CHSPs were found to be -36.8/-37.6 dB for position 1 and -11.7/-9.1 dB for position 5 in the sweet spot. The average CHSPs are denoted as right/left.Thus, simulation results showed that the requirement of less than -12 dB CHSP withinthe specified sweet spot was not met. As expected, the lateral positions were the ones notfulfilling the requirement, though it was close. Since the largest sweet spot was preferred,this was implemented. The average PE was found to 0.5 dB for position 1 in the largestsweet spot and 0.2 dB for position 1 in the optimal sweet spot.

For verification of the implementation, an objective test was performed. Here, the CHSPand PE were measured for comparison with the obtained simulation results. AverageCHSP was found to be -20.1/-21.0 dB for position 1 and -11.1/-7.8 dB for position 5.The average PE was found to 0.7 dB for position 1. The performance results of theobjective test showed consistency with the expected performance simulated for largestsweet spot.

In order to gain insight into localization performance of the XTC system, a subjective testwas carried out. A total of 11 test subjects volunteered for this. The subjects were testedin three sessions consisting of respectively a loudspeaker test in center position, displacedposition and a headphone test. A rating system was adopted to ease the analysis, andmade it possible to compare localization performance with other XTC systems. It wasconcluded that the XTC system was able to provide localization cues in both positionscomparable to the headphone test. The hypothesis of being equally as good as headphoneswas, however, rejected. In comparison with other XTC systems, the implemented systemperformed equally good or better.

Thus, it can be summarized that binaural reproduction through loudspeakers is possiblethrough use of XTC. Localization cues provided comparable performance to binauralreproduction through headphones and other XTC systems.

10.1 Perspective

At the current state of the XTC system, there are several issues to address in order toimprove the performance. If the channel transfer function could be calibrated to the lis-tener, it would be possible to individualize the system. Doing so, it is possible to increasethe upper frequency range of the system. This is advantageous, since an accurate HRTFin the high frequency region is believed to reduce the number of cone of confusion errors.Another advantage of the individualized HRTF is that a longer filter lengths can be uti-lized, thus making the match of the HRTF even better by increasing resolution. The XTCsystem was implemented using four identical full-range loudspeakers. However, it mightconsidered to use designated tweeters and bass’ if the frequency range is extended. It is

106

10.1 Perspective

expected that the lower frequency limit could be easily extended by adding a subwoofer.Extending the low frequency limit of the XTC system is not believed to contribute withlocalization cues, however the increased may enhance the listening experience. Thus, itis not necessary to provide XTC at in this extended region.

The simulation was performed using a brute force method. Thus, it is limited by therather large step size of the parameters. If the simulation was redefined to a constrainedoptimization problem, it would be both more efficient and give a more precise result.This is especially relevant if long filter lengths are simulated.

The listening test showed promising performance, however the statistical basis of the lis-tening test could be improved by repeating virtual directions additional times, or throughuse of additional test subjects. Furthermore, performing an audiometry of each subjectwill exclude the possibility of any subjects having hearing loss.

The listening test only addressed actual localization performance by playing short noisestimuli. Hence, it can be difficult to relate the test to the actual application usage.Therefore, it might be suggested to playback music or a movie soundtrack through theXTC system, and let the subjects rate the sound experience.

The listening test was performed in an anechoic room, which ensured that no influenceof the room would deteriorate the reproduction. In real-life situation, there would bereflections from the desk, walls etc. To determine if the XTC system is actually usable,it must be ensured that this influence is minimal.

To provide the listener with the advantage of dynamic localization some kind of headtracking must be included. Since this application addresses multimedia setups, the uti-lization of a webcam could be used for head tracking. However, before such an imple-mentation can be realized, the XTC system would require real-time implementation. Ifthis is implemented, the possibility for using the XTC system for computer games greatlyenhances the versatility of the application.

107

Bibliography

[Aalborg Universitet, 2008]Aalborg Universitet. Studieordning for Kandidatuddannelserne indenfor Elektronik og

IT 1.–4. semester, 2008. Included on the enclosure DVD.

[Algazi et al., 2001]V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendani. The CIPIC HRTFdatabase. IEEE Workshop, pages 2001–1 – 2001–4, October 2001.

[Bai and Lee, 2006a]M. R. Bai and C.-C. Lee. Comprehensive analysis of loudspeaker span effects oncrosstalk cancellation in spatial sound reproduction. Audio Engineering Society - Con-

vention Paper, 120(Convention, Paper 6701), May 2006a.

[Bai and Lee, 2006b]M. R. Bai and C.-C. Lee. Objective and subjective analysis of effects listening angleon crosstalk cancellation in spatial sound reproduction. Journal Audio Engineering

Society, 120(4), October 2006b.

[Blauert, 1997]J. Blauert. Spatial Hearing. Massachusetts Institute of Technology, revised edition,1997. ISBN 0-262-02413-6.

[Bovbjerg et al., 2000]B. P. Bovbjerg, F. Christensen, P. Minnaar, X. Chen, J. Plogsties, and C. Vestergaard.Measurement report - Measuring the head-related transfer functions of an artificial head

with a high directional resolution, 2000.

[Brungart, 1999]D. S. Brungart. Auditory parallax effects in the HRTF for nearby sources. IEEE

Workshop, pages 171–174, October 1999.

[Brungart and Rabinowitz, 1999]D. S. Brungart and W. M. Rabinowitz. Auditory localization of nearby sources. head-related transfer functions. Journal of the Acoustical Society of America, 106(3):1465–1479, May 1999.

[Danish Sound Technology, 2002]Danish Sound Technology. Vifa M10md-39-08 datasheet, 2002. URL http://www.

109

http://www.tymphany.com/m10md-39-08

Bibliography

tymphany.com/m10md-39-08. An old datasheet with more specs are included on theenclosure DVD as VIFA M10md-39-08eOld.pdf.

[Gardner, 1997]W. G. Gardner. 3-D Audio Using Loudspeakers. Kluwer Academic Publishers, 1997.ISBN 0-7923-8156-4.

[Guang-Zheng et al., 2008]Y. Guang-Zheng, X. Bo-Sun, and R. Dan. Effect of sound source scattering on mea-surement of near-field head-related transfer functions. Journal of the Acoustical Society

of America, 26(8):2926–2929, May 2008.

[Hammershøi, 1995]D. Hammershøi. Binaural Technique. Institute of Electronic Systems, 1995. ISBN87-7307-516-7.

[Hammershøi and Møller, 2005]D. Hammershøi and H. Møller. Binaural Technique - Basic methods for Recording,

Synthesis, and Reproduction, chapter 9, pages 223–254. Springer Berlin Heidelberg,2005. ISBN 978-3-540-22162-3.

[Kirkeby, 1998]O. Kirkeby. The "Stereo Dipole" - a virtual source imaging system using two closelyspaced loudspeakers. Journal of Acoustical Society of America, 49(5):387–395, May1998.

[Kirkeby and Nelson, 1999]O. Kirkeby and P. A. Nelson. Digital filter design for inversion problems in soundreproduction. Journal of the Audio Engineering Society, 47(7/8):583–595, July/August1999.

[Kirkeby et al., 1996]O. Kirkeby, P. A. Nelson, F. O. Bustamante, and H. Hamada. Local sound field repro-duction using digital signal processing. Journal of the Acoustical Society of America,100(3):1584–1593, September 1996.

[Kirkeby et al., 1998]O. Kirkeby, P. A. Nelson, H. Hamada, and F. O. Bustamante. Fast deconvolution ofmulti-channel systems using regularization. This article was handed out for the course"Inverse Filtering and Deconvolution P2-4" and this version of the article might nothave been published., 1998.

[Lacouture, 2008]Y. P. Lacouture. Analysis of design parameters for crosstalk cancellation filters appliedto different loudspeaker configurations. Audio Engineering Society - Convention Paper,125(Convention, San Francisco), October 2008.

110

http://www.tymphany.com/m10md-39-08

BIBLIOGRAPHY

[MathWorks, 2008]MathWorks. norm - vector and matrix norms, 2008. Included on the enclosure DVD.

[Moore, 2003]B. C. J. Moore. An Introduction to the Psychology of Hearing. Academic Press, 5.edition, 2003. ISBN 0-12-505628-1.

[Møller, 1987]H. Møller. Den elektrodynamiske højttalers princip. 1987.

[Nelson et al., 1995]P. A. Nelson, F. O. Bustamante, and H. Hamada. Inverse filter design and equalizationzones in multichannel sound reproduction. IEEE Transactions on Speech and Audio

Processing, 3(3):185–192, May 1995.

[Oddershede et al., 2008]J. M. Oddershede, S. S. Villadsen, D. F. Álvarez, J. G. Romero, and V. morenoArtagoitia. Digital directivity control of loudspeakers for sound reinforcement. Tech-nical report, AAU Acoustics section, May 2008. Group 08gr860.

[Pedersen and Jørgensen, 2005]J. A. Pedersen and T. Jørgensen. Localization performance of real and virtualsound sources. Technical report, April 2005. URL http://handle.dtic.mil/100.

2/ADA454835. Presented at NATO HFM-123 Symposium.

[Press et al., 1997]W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes

in C: The Art of Scientific Computing. Cambridge University Press, 2. edition, 1997.ISBN 0521431085. URL http://www.nr.com/oldverswitcher.html.

[Ross, 2004]S. M. Ross. Introduction to Probability and Statistics for Engineers and scientists.Elsevier Academic Press, 3. edition, 2004. ISBN 0-12-598057-4.

[Rubak, 2008]P. Rubak. Lecture in "Inverse Filtering and Deconvolution P2-4", minimodule 4, 2008.

[Takeuchi and Nelson, 2000]T. Takeuchi and P. A. Nelson. Optimal source distribution for virtual acoustic imaging.Technical Report 288, Institute of Sound and Vibration, February 2000.

[Takeuchi and Nelson, 2002]T. Takeuchi and P. A. Nelson. Optimal source distribution for binaural synthesis overloudspeakers. Journal of Acoustical Society of America, 112(6):2786–2797, December2002.

111

http://handle.dtic.mil/100.2/ADA454835

http://handle.dtic.mil/100.2/ADA454835

http://www.nr.com/oldverswitcher.html

Bibliography

[Takeuchi et al., 2001]T. Takeuchi, P. A. Nelson, and H. Hamada. Robustness to head misalignment of virtualsound imaging systems. Journal of the Acoustical Society of America, 109(3):958–971,March 2001.

[Ward, 2000]D. B. Ward. Joint least squares optimization for robust acoustic crosstalk cancellation.IEEE Transactions on Speech and Audio Processing, 8(2):211–215, February 2000.

112

Appendices

113

Appendix AComputation of Singular Values

This appendix contains the calculations of the singular values of C, which are used toanalyze the filter gain of the point source model stated in section 5.3. The values arecalculated as the singular values of W and afterwards inverted to yield the singularvalues of C. This is possible according to the singular value decomposition (SVD), wherea matrix is inverted by decomposing it into three sub-matrices by

W = UΣVH . (A.1)

Here U and V are unitary matrices (l2-norm = 1) and Σ is a matrix consisting of thesingular values on the diagonal. By inverting Σ the matrix W is inverted

W−1 = UΣ−1VH = C . (A.2)

The singular values of W can be calculated as the square root of the eigenvalues toWHW

WHW =

[1 gejk∆l

gejk∆l 1

] [1 ge−jk∆l

ge−jk∆l 1

]

=

[1 + g2 g

(ejk∆l + e−jk∆l

)

g(ejk∆l + e−jk∆l

)1 + g2

]. (A.3)

Using the characteristic equation det(WHW − λI) = 0 to find the eigenvalues the ex-pression becomes

0 = det

([1 + g2 − λ g


)

g(ejk∆l + e−jk∆l

)1 + g2 − λ

])

=(1 + g2 − λ

) (1 + g2 − λ

)−

(g


)g


))

= λ2 −(2g2 + 2

)λ+ g4 − 2g2 − g2ej2k∆l − g2e−j2k∆l + 1 . (A.4)

Equation (A.4) is a second order equation with roots that correspond to the eigenvalues

λW = (1 + gejk∆l)(1 + ge−jk∆l) ∧ (1 − gejk∆l)(1 − ge−jk∆l) . (A.5)

The singular values to W correspond to the square root of the eigenvalues of WHW

σW =√

(1 + gejk∆l)(1 + ge−jk∆l) Λ√

(1 − gejk∆l)(1 − ge−jk∆l) . (A.6)

The singular values of C can be determined as σC = 1σW

.

115

Appendix BAdditional Details of Least Square

Solutions

Section 5.5 describes how the least square solutions are formulated and mainly addressesoverall design considerations. This appendix explains the filter length, modeling delayand regularization used for the least square solutions. This gives the background for anactual implementation flow given in the end of this appendix.

Filter Length and Modeling Delay

To ensure that a channel can be inverted, the filter should have a sufficient length,and a suitable modeling delay must be used. An explanation of modeling delay andguidelines for minimum required filter length is described in the following. LSfreq andLStime are similar in principle, but in terms of actual implementation, they differ by e.g.requirements for filter length. Hence, they are described separately.

In general, a plant which contains short impulse responses will require shorter filters thana corresponding plant which contains long impulse responses. Therefore, it is advisableto remove any initial delay, which is common to all impulse responses of the plant, i.e.the same amount of samples from all impulse responses in the plant matrix.

It is possible to reduce the minimum amount of filter coefficients even further, if theimpulse responses of the plant are cropped individually. However, this also means thatthe individual differences of delay must be taken into consideration during playback. Thisis, however, considered too cumbersome since, decrease in needed coefficients is small.[Kirkeby et al., 1996, p. 1587].

Modeling Delay for LSfreq

The exact inversion using LSfreq can in general be considered ill-conditioned. The filtersolution found will be stable but not causal and will not necessarily have finite duration[Kirkeby et al., 1998, p. 4]. The filter solution found will either be unstable or a solutionwhich is non-causal, and it might not necessarily have finite duration [Kirkeby et al.,1998, p. 4]. The stable solution is however guaranteed from the FIR filter formulation,and the infinite duration is dealt with by increasing the filter length to contain the most

117

B Additional Details of Least Square Solutions

prominent part (this will be explained later in the following section). The remainingproblem of non-causality is dealt with by introducing a modeling delay to the solutionfound, hence there will be a delay from an input gets to the XTC filter until it producesan output.

The Fourier transform assumes a periodic time signal. This is utilized when the modelingdelay is introduced. Due to the periodicity the initial response of the found solution islocated at the end of the time sequence. This means that the modeling delay should shiftthe end of the sequence to the start instead of making a longer time response.

This modeling delay will be referred to as an FFT shift. Filters have generally been foundto perform best when the modeling delay is set to values around half the filter length,but the exact value is not critical. The shift will typically concentrate the most of theenergy in the center of the filter.

Guidelines for Filter Length using LSfreq

For the LSfreq method, the minimum filter length is associated with avoiding wrap around.The wrap around occurs when the XTC filter solution is transformed back to the timedomain. The initial responses of the plant can be represented with Nw taps in bothtime and frequency domain. However, the solutions can in general not be representedsufficiently well with the same number of coefficients when they are transformed back intotime domain. When this is the case a wrap around effect will occur, when the inverseFFT is calculated. To solve this, longer filters are needed, and this can in practice behandled by zero padding the input time sequence before the FFT.

A rule of thumb was given regarding the filter length in [Rubak, 2008].

• Set Nc = 4 ·Nw for well-conditioned problems

• Set Nc = 10 ·Nw for ill-conditioned problems

Here Nc is the length of an XTC filter, and Nw is the length of the impulse responsedescribing the plant W .

As an example, an ill-conditioned problem occurs when the frequency spectrum of theinverse filter contains sharp spikes. Just as a narrow time signal (e.g. a dirac) gives abroad frequency spectrum, a narrow peak in the frequency domain results in a long timeresponse. In the time response this can sometimes be seen as a ringing.

Such an ill-conditioned problem can be identified in a pole/zero diagram as when thereare poles very close to the unit circle. In [Kirkeby et al., 1998] this was used for makingan approximation of the needed filter length to

τ =1

ru[samples] , (B.1)

118

where ru is the distance to the unit circle from the pole which is closest to the unit circle.To ensure the filter length is sufficient to avoid wrap-around, the following can be verifiedby inspection.

• The impulse responses of the filters in C[n] should have decayed sufficiently at thestart and end.

• The equalized channel should not alter the signal, and its autocorrelation shouldonly have a peak at its center lag. Fluctuations should only occur near the cen-ter lags. Wrap-around will show up as fluctuations in the start and end of theautocorrelation.

Figure B.1 and B.2 shows a channel for which the filter length is sufficient and one wherethe filter is insufficient.

50 100 150 200 250 300−0.1

−0.05

0

0.05

0.1

0.15Inverse Filter Response

Filt

er c

oeffi

cien

t val

ue [−

]

Time [samples]

(a) Short filter, which have not decayed sufficientlyat start and end.

200 400 600 800 1000 1200 1400 1600−0.1

−0.05

0

0.05

0.1

0.15Inverse Filter Response

Time [samples]

(b) Long filter, which have the largest coeffi-cients in the center and much smaller values atstart and end.

Figure B.1: Inspection of a short filter and a long filter solution. The short filter shows signs of problemsfrom wrap-around. The figure is inspired by [Kirkeby et al., 1998, Fig. 6 and 8]

119


−300 −200 −100 0 100 200 300−0.2

0

0.2

0.4

0.6

0.8

1

1.2Autocorrelation

Cor

rela

tion

pow

er [V

2 ]

Lags [samples]

(a) Short filter, where fluctuations occur at theouter lags

−1500 −1000 −500 0 500 1000 1500−0.2

0

0.2

0.4

0.6

0.8

1

1.2Autocorrelation

Lags [samples]

(b) Long filter, where fluctuations are not lo-cated at the outer lags and the power is smallerthan for the short filter.

Figure B.2: Inspection of a short filter and a long filter solution. The short filter shows signs ofproblems from wrap-around in the autocorrelation of the inverted channel. The figure is inspired by[Kirkeby et al., 1998, Fig. 6 and 8]

Guidelines for Filter Length using LStime

In [Nelson et al., 1995, p. 187] it is explained that an exact inversion of the plant ispossible, if it is non-singular and of square dimensions. The requirement of a squareplant was derived to the following requirement of minimum filter length

I =(∑R

r=1 Jr) −R

S −R, (B.2)

where I is the number of coefficients for the inverse filter, S is the number of loudspeakers,R is the number of positions and Jr is the number of common filter taps needed todescribe all the transmission paths to a specific position r. The equation is, however, notconsidered valid for S = R, since this makes the denominator equal to 0.

An example of Jr is explained from figure B.3. There are four loudspeakers and theirimpulse responses are seen in the figure. In the figure it is seen that the response of firstloudspeaker starts at sample nmin and the important information is kept within the next15 samples. The response of the fourth loudspeaker start latest and end a nmax. Herebyit is possible to calculate the required amount of samples as

Jr = nmax − nmin. (B.3)

An example of this is used in [Nelson et al., 1995]. Three loudspeakers and two positionswere used. The loudspeakers were modeled as simple delays, and the difference in prop-agation resulted in J1 = 5 and J2 = 9. Using equation (B.2) this resulted in 12 samples.It should be noted that LStime does not suffer from the wrap around effects the same wayLSfreq does. This is expected to be an important reason why LStime performs better withshorter filter lengths.

120

0 10 20 30 40 50

0

0

0

0

Time [samples]

Am

plitu

de [V

/Pa]

nmin

nmax

Figure B.3: The impulse responses of four loudspeakers to a specific position. It is seen that there is apropagation delay which, is common to all loudspeaker channels (nmin), and after a specific sample thereis no significant information in any of the impulse responses (nmax). The figure is made from arbitraryimpulse responses generated from random sequences being exponentially weighted.

Modeling Delay for LStime

For an LStime implementation the modeling delay is employed to compensate for thenon-minimum-phase components [Kirkeby and Nelson, 1999, p. 584]. It allows for theexact inverse LS solution to include a non-causal part [Kirkeby et al., 1996, p. 1586].The modeling delay should be applied after the initial delay is removed, so the modelingdelay yields a delay in between nmin and nmax shown in figure B.3. There have been foundarticles studying LStime filter design, and these both showed that the performance wasoptimum near half the filter length, i.e. halfway between nmin and nmax [Nelson et al.,1995, p. s189],[Ward, 2000, p. 213,fig.2]. However the performance was similar for awide range of value near half the filter length, so the exact value of the modeling delayis non-critical.

In practice the modeling delay and the target matrix is gathered to the dt vector ofequation (5.39). In the case of a 2x2 target matrix with identity mapping with nofrequency dependency, the dt vectors would end up as following

d1 =

zeros(Nc/2, 1)1

zeros(Nc/2 − 1, 1)−−−−−−−−−−

zeros(Nc, 1)

,d2 =

zeros(Nc, 1)

−−−−−−−−−−zeros(Nc/2, 1)

1zeros(Nc/2 − 1, 1)

, (B.4)

where Nc is the filter length. It should be recalled that the desired response vector isstacked as indicated by the dashed lines in the vectors.

121


Explanation of Regularization

There have been many references to regularization and here an explanation will be given.

Smoothing of a Single Channel before Inversion

An effect of regularization is a smoothing of the peaks in the filter solution. This will beillustrated with a basic inversion of a single channel based on LSfreq.

The effect of regularization is seen by comparing the solution with regularization with anordinary inversion of a complex number by

1

W(z)=

W(z)∗

W(z)∗ · W(z), (B.5)

where the division is made real by extending the fraction of the complex conjugated. Fora single channel with unity gain, equation (5.31) can be simplified to

C(z) =W(z)∗

W(z)∗ · W(z) + β, (B.6)

where the hermitian is now reduced to conjugated. Since β is added in the denominator itwill always decrease the filter gain C (z) at this specific frequency. Hereby it correspondsto increasing the amplitude of original frequency spectrum before the inversion. Theincreased spectrum will be

W’(z) =W(z)∗ · W(z) + β

W(z)∗= W(z) +

1

W(z)∗· β . (B.7)

From this equation it is seen that the increase is not a constant, but obtains larger valuesfor small values of W (z).

The increased spectrum is illustrated using an arbitrary impulse response which is seenin figure B.4a. The frequency response is shown as the black line in figure B.4b and B.4c.Notice the steep dips which would require a large gain in case of inversion. The otherline colors show the increased spectrum for different regularization values. The followingcan be noticed by comparing non regularized spectrum with the the lowest value of β.

• The modified frequency spectrum is always increased, hence it will not require alarge filter gain to invert.

• The increase is negligible where the spectrum already has a large value. It is easiestto see the increase being non-linear in the plot with linear scale.

When the value of β is increased, the dips will be increased so much they can no longer beregarded as dips. This of course decreases the required filter gain, but it also means thatthe inverse filter no longer yields a flat frequency spectrum. From the plots in figure B.4,it should be relatively easy to imagine that the inverse filters will have fewer peaks.

122

0 5 10 15 20 25 30 35−1

−0.5

0

0.5

1

1.5

Time [samples]

Am

plitu

de [l

inea

r]

(a) Impulse response.

0 0.2 0.4 0.6 0.8 10

2

4

6

8

10

12

Normalized Frequency [Hz]

Am

plitu

de [l

inea

r]

(b) Linear scale.

0 0.2 0.4 0.6 0.8 1−40

−30

−20

−10

0

10

20

30

Normalized Frequency [Hz]

Mag

nitu

de [d

B]

No regularizationRegularization of β = 0.1Regularization of β = 0.5Regularization of β = 1

(c) Logarithmic scale.

Figure B.4: A arbitrary impulse response is shown in (a) and the frequency spectrum is shown in (b)and (c). The black frequency spectrum is not regularized, and the colored frequency spectrum’s havebeen modified using equation (B.7) so their deviations from the original spectrum correspond to a certainamount of regularization. The impulse response is an arbitrary example generated as a random functionwith normal distribution and afterwards weighted with an exponential decay.

Shortening Inverse Filters for LSfreq

The filter length for LSfreq should be sufficient to contain the time response of the inversefilter. Since the regularization attenuates peaks in the inverse filters frequency response,it can be used to shorten the filters found with the LSfreq method. In [Kirkeby et al.,1998, sec. 5] it is suggested to set the regularization factor β as the smallest value,which ensures that all poles are at least at a distance of 0.01 from the unit circle. Thisvalue is denoted β0 and should minimize wrap around effects sufficiently for filter lengthsNc = 10 ·Nw.

This optimal value can be found iteratively by evaluating the pole positions from denom-inator of equation (5.31), while increasing the value of β until all poles are at sufficientdistance. For long filters this approach is, however, considered too computational de-

123


manding to be implemented.

In practice the exact value of β is not critical. Similar results are expected if beta arekept within a range close to β0 (0.8 · β0 < β < 1.2 · β0), and the solution should still beusable as long as ratio between β0 and β do not become too large (0.2 · β0 < β < 5 · β0)[Kirkeby et al., 1998, p. p. 10]. If the regularization is sufficient, the filter length shouldbe long enough to avoid the wrap-around effects. Hence the same inspection methodsalready mentioned regarding the filter lengths (figure B.1) can be used.

Determining Regularization for LStime

The LStime solutions do not have the same problems with wrap around, but the solutionscan however be ill-conditioned. In [Kirkeby et al., 1996, p. 1585] it is recommended toset β as

1

1000max(eigen(WHW)) < β <

1

5000max(eigen(WHW)) , (B.8)

since this should make the problem well-behaved without decreasing the performance tomuch. In [Press et al., 1997, p. 811] a hint for scaling of the regularization is given forthe application of data fitting. Here, it is suggested to select the regularization factoraccording to the measurement noise. The regularization should be set such that thevariance from the exact solution matches the variance of the noise in the measurements.For the XTC application, the variance of the inter-individual HRTF’s could be used asnoise. Such regularization would be frequency dependent.

124

Summary of Practical Implementation

The following summarizes the flow for practical implementation.

Implementation Flow of the Frequency Domain Method

1. Initial design and setup:

(a) Measure the impulse responses of the plant W.

(b) Crop the initial delay, which is common to all impulse responses.

(c) Apply zero padding, so impulse responses either becomes a desired filter lengthor at least ten times the length of the information of the impulse responses.

(d) Calculate the FFT of the plant’s impulse responses and put these into W (z).

(e) Design a suitable shape factor B(z), which takes the limits of the loudspeakerand desired reproduction into account.

(f) Determine the desired mapping of the input channels to obtain matrix A(z).

(g) Set a low value of β as initial value.

2. Calculation and verification:

(a) Calculate the C (z) matrix for each discrete frequency from equation (5.31).

(b) Calculate the IFFT of all elements of C (z), to obtain a C[n] matrix containingimpulse responses.

(c) Apply the modeling delay to all impulse responses of C[n], by making a circularshift of half a filter length.

(d) Ensure that the impulse responses of C only have negligible values in the startand end. If the values seem to large, β should be increased, and the procedureis repeated from step 2a.

(e) Implement the XTC-filtering using these obtained impulse responses.

125


Implementation Flow of the Time Domain Method

1. Initial design and setup:

(a) Determine a mapping vector dt for each input signal. This is created from thedesired mapping matrix A and a suitable modeling delay.

(b) Measure the impulse responses of the plant W.

(c) Crop the initial delay which is common to all impulse responses.

(d) Design a suitable regularization shape factor B(z), which takes the limits ofthe loudspeaker and desired reproduction into account.

(e) Calculate the IFFT of the B(z) to get a impulse response b[n].

(f) Generate the convolution matrices Wrs (equation (5.33)) from the measuredimpulse responses, and B from the determined regularization. Use an existingconvmtx (convolution matrix) function for this.

(g) Generate the stacked W and B matrix from these.

2. Calculations and implementation:

(a) Calculate the matrices WHW, WH and BHB.

(b) Set the scaling of the regularization as β = 12500 · eigen(WHW).

(c) Calculate a ct vector for the first mapping vector (dt) and use equation (5.39)and the matrices from step 2a.

(d) Calculate ct for the remaining mapping vectors.

(e) Separate each of stacked ct vectors into cst vectors of filters for each loud-speaker.

(f) Implement the XTC-filtering using the cst impulse responses.

126

Appendix CMeasurement Report: Channel

Transfer Function

The purpose of this measurement report is to measure the channel transfer function(CTF), which is the impulse response of the plant W . This includes the loudspeakersand the acoustical path to the ears of a head and torso simulator. The measurementsare performed using a Valdemar head and torso simulator, as a blocked ear canal mea-surement. Furthermore, it was decided to include reference measurements, so it wouldbe possible to extract HRTF’s.

Measurement Theory

This section describes the theory behind the different settings chosen for the measure-ments.

Sampling Frequency

The sampling frequency on the recording system has been chosen to 48 kHz, since previ-ously recorded HRTF’s that will be used for binaural synthesis, has been recorded withthis sampling frequency.

Stimulus

To measure the impulse response, it is chosen to use a Maximum Length Sequence (MLS)that is a pseudo random sequence of 0’s and 1’s. The MLS has a flat frequency spectrum,and a high resistance to noise. It does not require the same amount avaraging as arandom white noise signal, since the stimulus is known beforehand. The duration ofthe stimulus must be longer than the measured impulse response of the loudspeaker andear. This duration is expected to be shorter than 200 ms corresponding to 9600 samples.Hence, the MLS sequence must not be shorter than 9600 samples. This gives a frequencyresolution of at least 5 Hz.

127

C Measurement Report: Channel Transfer Function

An mth order MLS has a length of N = 2m − 1, which means the required order is

Nmin = 2m − 1 ⇒log2(Nmin + 1) = m⇒

m = log2(9601) = 13.2 , (C.1)

so the minimum order is 14. This results in a sequence of 16383 samples that correspondsto 341.3 ms. The stimulus is repeated 4 times and thus the length becomes 341.3·4 = 1.4s .

Measurement Positions

The impulse response is measured at several positions of the head and torso simulator.This means that the head and torso simulator is moved and the impulse response ismeasured for every position. The measurement positions can be seen in figure C.1, whereeach point represents the center of the head.

10 cm

5 cm

12 6 1610

89 7

4 5 1511

1312 14

3

Figure C.1: Measurement positions for impulse response measurements, each point represents thecenter of the head and torso simulator.

Measurement Considerations

Room SetupThe measurements will be carried out in the anechoic room Fredrik Bajers Vej 7, B5-101at AAU. The anechoic room has the ability of giving free field conditions down to about200 Hz. During the measurements, the setup for the subjective test will also be locatedin the room. By this, it means that chair, monitor, curtains etc. is placed as it shouldbe during the subjective test. The setup of the subjective test can be found in figure E.4page 149.

Loudspeaker ConfigurationThe measurements are made for the two loudspeaker configuration described in sec-tion 6.4. Both configurations consists of four loudspeakers placed the same way. Thedifference is that in the first configuration all four loudspeakers are playing the same

128

stimulus one at a time. The outer loudspeakers in the second configuration are play-ing the low pass filtered stimulus and the inner loudspeakers are playing the high passfiltered stimulus. This means one measurement for each separate loudspeaker in theloudspeaker configuration, and one where each side of the loudspeaker configuration isconsidered as a two-way loudspeaker with crossover filtering involved. This gives a totalof six measurements that must be made for each measurement position.

Platform and PlaybackThe measurement platform running MATLAB, is able to interface with the sound cardand thereby play the stimuli. The stimuli have a peak amplitude at ±0.5 V beforefiltering. The sound card itself has a gain of 3.4, giving an amplitude of ±1.7 V tothe loudspeakers. The platform is programmed to play the stimulus through all sixloudspeakers in a sequence with a small pause between each, so only one full recordinghas to be made for each measurement point. The channels will be numerated from rightto left, where the loudspeaker to the right is referred to as channel one, the same listingis used for loudspeaker five and six

Post ProcessingThe stimulus is played through the loudspeakers and recorded by the microphones placedin the ears of the head. The recorded signal is cross correlated with the stimulus, andthereby the measured impulse response can be found by deconvolving the stimulus. Inorder to calculate an HRTF, a Pref measurement must also be recorded. An HRTF canbe calculated by dividing Pear by Pref as described in section 2.2.1.

SynchronizationTo be able to measure the delay of the system, five output channels are needed. - Onefor each loudspeaker, and one for synchronization. This is done so no cables have to beinterchanged, and thereby the risk of recording errors and measuring time is minimized.This also means that a four channel power amplifier is needed. All recordings are per-formed by the microphones placed in the ears and the synchronization signal is measureddirectly at the output from the platform. The synchronization signal consists of a Diracfunction played back in the synchronization channel one second before the MLS sequencesare played in the measurement channels.

Recording SystemThe recording system is a multi channel system, where it is possible to record up to 16channels simultaneously. The recording system has to be started manually, which meansthe recording should start before the platform playback the stimulus. It should also bestopped after the stimulus has been played through loudspeaker six. The microphonesused in the Valdemar head and torso simulator are both 1/2-inch pressure microphones.The same kind of microphone will be used for the reference measurement Pref. Themicrophone preamplifiers must be fed with phantom power of 48 V from an externaldevice to work. The settings for the two input channels to the measurement system fromthe ears, are DC-coupled, the power supply for the microphones is disabled, and highpass filtering at 10 Hz is enabled. The input range for the two measurement channels

129


was set to a maximum of -6 dBV i.e. 500 mVpeak to avoid clipping.

Measurement Setup

During measurements, the setups on figure C.2 and C.3 will be used.

4

Phantom

power

48 V

Recording system

Anechoic room75 cm

DACPower amplifier

4

MATLAB platform

1

2

3

4

5

6

Figure C.2: Measurement setup for the Pref measurement.

2

Phantom

power

48 V

Anechoic room75 cm

1

2

3

4

5

6

4Recording systemDACPower amplifier

4

MATLAB platform

Figure C.3: Measurement setup for the Pposition1 measurement.

Calibration

The calibration procedure of the microphones attached in Valdemar is as follows. Firstremove the artificial ears, and mount the microphone in the calibrator. The calibrator is

130

supplying the microphone with a 1 kHz sine wave at 94 dB SPL. The measurement systemis calibrated in such a way that the output correspond to the actual sound pressure in Pa.The calibration procedure is also used for the reference microphone. Channel one andtwo are calibrated in the same way, where channel three is the synchronization channel,this channel is not calibrated, but the input range is set to 6 dBV i.e. 2 Vpeak.

Equipment

Description: Type: AAU number:

Valdemar Head and torso simulator 2150-00Microphone in Valdemar left G.R.A.S. 40AD 33958Microphone preamp Valdemar left DPA 4037 33957Microphone in Valdemar right G.R.A.S. 40AD 33959Microphone preamp Valdemar right DPA 4037 33954Reference microphone G.R.A.S. 40AD 33958Reference microphone preamp DPA 4037 noneCalibrator B & K type 4231 33691DA converter Behringer ADA 8000 56545Power amplifier Rotel RB-976MkII 52633Measurement system Sinus MSX16 75500Measurement laptop for Sinus MSX16 Lifebook E series 60923Platform running MATLAB Desktop computer 60893Sound card Hammerfall HDSP 9632 2137-04Tube loudspeakers No: 5,6,10,12 none

Table C.1: Used equipment.

Measurement Procedure

• Reference measurement

1. Make the setup as shown in figure C.2, and place Valdemar in position 1 seenon figure C.1.

2. Aim the microphone at the 1st loudspeaker.

3. Start the recording system and record the stimulus from the reference micro-phone. Start playing the stimulus through the loudspeaker the microphoneis aimed at. Aim the microphone at the next loudspeaker and redo this stepuntil the 6th loudspeaker is meassured. This is the Pref measurement.

131


• Measurement of channel transfer function

1. Make the setup of figure C.3.

2. Place the head in position 1 seen on figure C.1.

3. Start the recording system and record the stimulus from the microphonesplaced in the ears of Valdemar. Start playing the stimulus through the loud-speakers. This is the Pposition1 measurement.

4. Place the head at the next measurement position, and redo step 3 until position16 is measured.

Results

All measurements are segmented and deconvolved with the MLS in such a way that thefirst MLS sequence in the stimuli is rejceted. The frequency responses of all loudspeakersetups at position 1 are depicted, for both “ears”, see figure C.4. The HRTF for position1-6 is depicted on figure C.6a and C.6b, for loudspeaker 1 and 5 respectively. Duringcalculating of the HRTF, DC correction has been applied. The division of the two transferfunctions is carried out in the frequency domain, and just before making the inverse FFT,the DC value is set to 0. The extracted impulse responses are reduced to a length of 1024samples. Figure C.5 shows the average and standard deviation in the Transfer functionfor the right channel, loudspeaker 1 and to position 1 to 6.

132

102

103

104

30

40

50

60

70

80

90

100

110

Frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]


(a) Frequency response for the left ear at position 1.

102

103

104

30

40

50

60

70

80

90

100

110

Frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]


(b) Frequency response for the right ear at position 1.

Figure C.4: Some of the results gathered from the measurements.

133


63 80 100 125 160 200 250 315 400 500 630 800 1k 1.25k1.6k 2k 2.5k3.15k 4k 5k 6.3k 8k 10k12.5k55

60

65

70

75

80

85

90

95

100


SP

L [d

B r

e. 2

0 µP

a]

Figure C.5: Average and standard deviation of the plant, analyzed in one-third octave band.

134

102

103

104

−40

−30

−20

−10

0

10

20

30

Frequency [Hz]

Mag

nitu

de [d

B]

Position 1Position 2Position 3Position 4Position 5Position 6

(a) HRTF for loudspeaker number 1, to the right ear for position 1–6.

102

103

104

−40

−30

−20

−10

0

10

20

30

Frequency [Hz]

Mag

nitu

de [d

B]

Position 1Position 2Position 3Position 4Position 5Position 6

(b) HRTF for loudspeaker number 5, to the left ear for position 1–6.

Figure C.6: Some of the results gathered from the measurements.

135

Appendix DMeasurement Report: Channel

Separation

The purpose of this measurement report is to measure the channel separation(CHSP) theXTC system is able to perform. The measurements are made using a Valdemar head andtorso simulator, as a blocked ear canal measurement. The results of the measurements aremainly placed in chapter 8 in the report, where they are compared with the simulationsand the requirements.

Measurement Theory

This section describes the theory behind the different settings chosen for the measure-ments. The CHSP is measured with and without using the XTC system and calculatedfrom equation (3.1) page 22. It is performed as follows:

A stimulus is played back in the right side loudspeakers only, the signal is pre-filteredsuch that the tweeters play the high frequencies and the bass the low-frequencies only.The CHSP is the sound pressure level difference between the right and left measurementchannel in the head and torso simulator. This is the "natural" CHSP due to e.g. headshadowing and the acoustical path. Afterwards, the measurement is repeated with a sti-mulus that is XTC filtered. This gives the XTC CHSP. For both measurements the soundpressure at the right measurement microphone must be the same. The measurements arerepeated, where the left ear is the desired ear.

Stimulus

It is chosen to analyze the CHSP using a white noise stimulus, due to its flat frequencyspectrum. The stimulus should be long enough that a power spectrum analysis can bemade, i.e. an average of several FFTs in order to spectral fluctuations of the white noise.Each measurement stimuli is set to a length of 20 s, where the last five seconds will beused for analysis.

137

D Measurement Report: Channel Separation

Sampling Frequency

The sampling frequency on the recording system has been chosen to 48 kHz, which is thesame for the system implementation as for the rest of the measurements.

Measurement Positions

The CHSP is measured at the 16 positions that were measured for the channel transferfunctions (see appendix C). The positions are also depicted in figure D.1.

10 cm

5 cm

12 6 1610

89 7

4 5 1511

1312 14

3

Figure D.1: Measurement positions for CHSP measurements, each point represents the center of thehead and torso simulator.


Room SetupThe measurements will be carried out in the anechoic room Fredrik Bajers Vej 7, B5-101 at AAU. The anechoic room has the ability of giving free field conditions downapproximately 200 Hz. During the measurements, the setup for the subjective test willalso be located in the room. By this, the chair, monitor, curtains etc. are placed as theyshould be during the subjective test. The setup of the subjective test can be found inappendix E, figure E.4.

Loudspeaker ConfigurationThe measurements are made for a four loudspeaker setup. The centered loudspeakers aretweeters, and the outer loudspeakers are the bass’.

Platform and PlaybackThe measurement platform running MATLAB, is able to interface with the sound cardand thereby play the stimuli. The stimuli is adjusted such that the background noise isat least 10 dB lower than the sound pressure obtained in the contralateral ear for thebest CHSP (position 1).

138

Post ProcessingSince the measurement stimulus is white noise, the frequency responses can not be op-tained from a single FFT. The specral fluctuations of the noise are too big to yield anymeaningful results. Instead, the frequency response must be calculated as an average ofseveral FFTs. For this measurement it is chosen to average over 25 FFTs, each with alength of 1024 logarithmically spaced samples.

Recording SystemThe recording system is a multi channel system, where it is possible to record up to16 channels simultaneously. The microphones used in the Valdemar head and torsosimulator are 1/2-inch pressure microphones. The microphone preamplifiers must be fedwith phantom power of 48 V from an external device to work. The settings for the twoinput channels to the measurement system from the ears, are DC-coupled, the powersupply for the microphones is disabled, and high pass filtering at 10 Hz is enabled. Theinput range for the two measurement channels was set to a maximum of -6 dBV i.e.500 mVpeak.

Test EvaluationThe CHSP is calculated from the performance measures given in section 3.2. The perfor-mance must be analyzed in the frequency domain to calculate the frequency dependentCHSP. Doing so it is important to average the results over several FFTs in order to reducethe spectral fluctuations of the test signal.

Measurement Setup

During measurements, the setup on figure D.2 is used.

2

4

Phantom

power

48 V

Recording system

Anechoic room75 cm

DACPower amplifier

Right ch.

Left ch.

4

MATLAB platform

1

2

3

4

5

6

Figure D.2: Measurement setup for the CHSP measurement.

139


Calibration

The calibration procedure of the microphones attached in Valdemar is as follows. Firstremove the artificial ears, and mount the microphone in the calibrator. The calibratoris supplying the microphone with a 1 kHz sine wave at 94 dB SPL. The microphone isconnected to the measurement system, where the corresponding channel is calibrated tothe microphone in such way the output values corresponds to the actual sound pressurein Pa. Both channels are calibrated in the same way.

Equipment


Valdemar Head and torso simulator 2150-00Microphone in Valdemar left G.R.A.S. 40AD 33958Microphone preamp Valdemar left DPA 4037 33957Microphone in Valdemar right G.R.A.S. 40AD 33959Microphone preamp Valdemar right DPA 4037 33954Calibrator B & K type 4231 33691DA converter Behringer ADA 8000 56545Power amplifier Rotel RB-976MkII 33980Power amplifier Rotel RB-976MkII 33981Measurement system Sinus MSX16 75500Measurement laptop for Sinus MSX16 Lifebook E series 60923Platform running MATLAB Desktop computer 60893Sound card Hammerfall HDSP 9632 2137-04Tube loudspeakers No: 5,6,10,12 none

Table D.1: Used equipment.


1. Make the setup sketched in figure D.2.

2. Make calibration.

3. Position the head and torso simulator in position 1.

4. Playback the non-filtered stimuli, both where right and left ear is desired (one at atime).

5. Playback the filtered stimuli, both where right and left ear is desired (one at atime).

140

6. Repeat step 3 to 5 for all measurement positions.

7. Post process measured data, in total eight recordings for each position.

Results

500 1000 5000 40

50

60

70

80

90

Frequency [Hz]

Mag

nitu

de S

PL

[dB

re.

20

µPa]

Equalized loudspeakerNon−equalized loudspeaker

(a) Position one.

500 1000 5000 40

50

60

70

80

90

Frequency [Hz]

Mag

nitu

de S

PL

[dB

re.

20

µPa]

(b) Position two.

500 1000 5000 40

50

60

70

80

90

Frequency [Hz]

Mag

nitu

de S

PL

[dB

re.

20

µPa]

(c) Position 13.

500 1000 5000 40

50

60

70

80

90

Frequency [Hz]

Mag

nitu

de S

PL

[dB

re.

20

µPa]

(d) Position 15.

Figure D.3: Equalized and non-equalized loudspeaker responses for right ear.

141


500 1000 5000

−30

−20

−10

0

10

20

Frequency [Hz]

CH

SP

[dB

]

(a) Position one.

500 1000 5000

−30

−20

−10

0

10

20

Frequency [Hz]

CH

SP

[dB

]

XTC CHSP"Natural" CHSP

(b) Position two.

500 1000 5000

−30

−20

−10

0

10

20

Frequency [Hz]

CH

SP

[dB

]

(c) Position 13.

500 1000 5000

−30

−20

−10

0

10

20

Frequency [Hz]

CH

SP

[dB

]

(d) Position 15.

Figure D.4: CHSP performance with and without XTC for right ear.

142

Position: XTC CHSPRL: XTC CHSPLR: Natural CHSPRL: Natural CHSPLR:

1 -20.06 -20.98 -2.59 -2.722 -12.16 -12.51 -3.48 -2.483 -9.81 -13.43 -4.21 -3.034 -16.12 -15.97 -3.09 -3.665 -11.08 -7.96 -2.79 -5.046 -11.64 -10.52 -2.86 -3.947 -10.59 -11.56 -2.82 -3.558 -15.90 -16.12 -3.09 -2.719 -12.21 -10.42 -3.02 -2.7210 -6.04 -6.34 -4.24 -2.9811 -5.34 -5.23 -5.35 -2.3712 -8.14 -11.63 -5.61 -2.7113 -11.22 -10.62 -3.92 -4.3914 -10.56 -6.79 -2.97 -6.1915 -2.48 -5.65 -1.92 -6.1616 -3.47 -6.53 -2.14 -5.38

Table D.2: Measured CHSPs. For CHSPRL the right ear is the desired ear and left the undesired. ForCHSPLR the left ear is the desired ear. The center positions (1,4,8,13) are performing best as expected.

143

Appendix EMeasurement Report: Listening Test

This appendix is considered an addition to chapter 9 including considerations for thelistening test along with measurement procedure and additional results not included inthe main report.

Measurement Theory

Stimuli

The stimuli consists of pink noise bursts. The bursts, however introduces some highfrequency noise. This makes the noise burst close to being white in the onset and endof the noise segment. To avoid this, a 20 ms ramp function is used to smooth the noisesegments in both onset and end thus reducing the ”whiteness” of the noise segments. Thiswas done to all segments in the noise sequence. A plot of the pink noise sequence is seenin figure E.1.

0 0.5 1 1.5 2 2.5 3

−0.1

−0.05

0

0.05

0.1

Time [s]

Am

plitu

de [−

]

Figure E.1: The pink noise stimuli.

Familiarization with the Noise Signal

It is considered that the test subjects should know what type of stimuli they are goingto listen to. Therefore the stimuli from a virtual location directly in front of the subject

145

E Measurement Report: Listening Test

is presented before the listening test begins.

Synthesis of Localization Cues

To synthesize different virtual locations, the test stimuli will be convoluted with an HRTF.Only the direction of the stimuli will vary and there will not be made any attempts tomodify the source distance incorporated in the HRTF’s. There will be made use ofAalborg University’s HRTF database, containing recordings, where one of the Valdemarhead and torso simulators are used at a distance of approximately 2 m [Bovbjerg et al.,2000, p. 3]. The HRTFs have a samplerate of 48 kHz and a length of 256 taps (5.33 ms),and the database has an angle resolution of 2 degrees. It can be noted that the impulseresponse of the XTC system’s plant was measured (appendix C) using the same typeof head and torso simulator as the HRTF database. However, this should not have anypractical influence.

Prefiltered Test Stimuli

Since the filters of the XTC system are not implemented with real-time processing, thesystem will playback prerendered sound files. A sketch of this processing flow is seen infigure E.2.

A

f

A

f

!"#$

%

&'()

&#

* +

,,,,-./

0$

#1 "#$

Figure E.2: A diagram of the test stimuli preprocessing and playback.

146

Chosen Virtual Directions

Table E.1 lists the virtual directions chosen for the listening test in the three groups.Note, this is not the order used for playback during testing.

Azimuth θ [] Elevation φ []Horizontal Plane

0 030 0 †60 090 0120 0150 0 †180 0-150 0-120 0-90 0 †-60 0-30 0

Frontal Plane90 -3090 -60 †90 60 †-90 0 †-90 30-90 60

3D (two planes)60 30120 -60150 60-60 -30 †-90 0 †-120 30 †

Table E.1: The virtual directions which will be presented to the listener. Directions are listed asazimuth, elevation pairs and those repeated in the test are marked with a †.

Collecting Test Results

It is decided to handle playback control and collection of answers by presenting thesubjects with a graphical user interface (GUI).

147


Figure E.3: Screenshot of the GUI presented to the listener during the test in 3D (two planes).

Prior to the test, the subject will have to adjust the stimuli volume. During the listeningtest the subject will be presented with a stimulus from either the XTC system or theheadphones. On the screen the subjects will be presented with the GUI shown in fig-ure E.3, representing the possible answers. The figure will vary according to which oneof the three groups of virtual directions they are currently testing. It will be possibleto replay the stimuli two times, in case they are in doubt. Furthermore, there will be aquestionmark button they can press if they are in need of assistance.

Answer Resolution

It is chosen to let the subjects answer with a resolution of 30, i.e. equivalent to theinterspacing of the virtual directions in the horizontal plane. This resolution is used forboth planes, although some of the answering possibilities will never be played since it isexpected that removal of possible answers would bias the results.

The resolution of 30 is considerably lower than the 14 resolution, which is possiblefor localization in the horizontal plane as described in section 2.1.1. It is expected thatthe XTC system will reproduce localization cues with higher resolution than 30. It is,however, more ambiguous whether the subjects will be able to answer correctly using theGUI if higher resolution is used. The subjective mapping from a perceived direction tochoosing direction is expected to introduce additional errors for higher resolutions. Theseerrors will not be due to bad ability to localize, but the uncertainty of imagining whatexact direction the answer corresponds to.

148

Measurement Setup and Equipment

The measurement setup along with the equipment used in this test is the same as usedfor the CHSP measurement described in appendix D, with the exceptions that only twopositions are measured, and that the Valdemar head and torso simulator is replaced byhuman test subjects.

Figure E.4: Top view of the setup for the listening test.

The listening test will be carried out in the anechoic room Frb7-B5-101 and the setupcan be seen in figure E.4.

Positioning Test Subjects

The test subject will be positioned in a chair directly in front of the loudspeaker setupwhich should correspond to the optimal sweet spot of the XTC system. The neckrestof the chair is adjusted to give support to a head position at the optimal sweet spot.Around the chair is a curtain to ensure no visual cues are given to the subjects aboutthe loudspeaker positions. For the same reason the subjects must enter the test roomblindfolded.

The listening session where the performance of the XTC system is verified at a laterallydisplaced position, will be carried out by moving the chair. This will be done during a

149


break between listening sessions and the subject will not be informed of this.

Considerations for the Headphone Session

The subjects localization performance using the XTC system will be compared to theirlocalization performance using headphones. The test using headphones is also carriedout with the test subject positioned in the chair in the anechoic chamber. This ensuressimilar conditions for both tests, so the impressions from the environment should notyield any influence on the subjects. The stimuli for the headphones will be bandpassfiltered, just as the stimuli for the XTC system.

It is chosen to use Beyer Dynamic DT990 open headphones. This headphone modelhas a coupling with an impedance equivalent to air, which ensures that the couplingis similar for all subjects, so the headphone response will vary little between subjects[Hammershøi and Møller, 2005, p. 232].

Since it is chosen to equalize the frequency response of the loudspeakers used with theXTC system, the frequency response of the headphones are also equalized. An inverse fil-ter was therefore designed to equalize the headphones. The frequency response to equalizewas recorded on the Valdemar head and torso simulator and details of the measurementsand the equalizer design are found in the measurement report in appendix F.


Based on the considerations, the test procedure is:

1. Introduction to listening test - 5 minFirst, the test subject will be given a short introduction to the environment theywill perform the test in. They will enter a second anechoic chamber to ensure thatthe test subjects are more confident about being blindfolded. The test subject isgiven instructions on how to perform the test, and are encouraged to ask it theyhave any questions after reading the instructions. The subject will carry out a shortpilot test in the control room, to become familiar with the GUI. The pilot test willpresent all parts of the test such that the test subjects know how to answer duringthe listening test.

2. Entering setup - 3 minWhen the subject feels confident that they are able to perform the test, they shouldbe put into the setup. As they are followed into the listening setup in the anechoicchamber they are blindfolded. Before the blindfold is removed the curtain will beclosed around them to prevent visual cues. The subject is told to position his headin the optimal sweet spot, supported by the neckrest of the chair. The test subject

150

is told to adjust the volume to a suitable level and afterward the subject is leftalone in the anechoic chamber.

3. Perform the listening test - 45 minThe subject will afterward perform the actual listening test, and judge directionsin the following order:

(a) Session 1 - XTC system in optimal sweet spot - 12 min

i. Familiarization of stimuli

ii. Virtual directions group 1 - Horizontal plane - 24 answers

iii. Virtual directions group 2 - Frontal plane - 18 answers

iv. Virtual directions group 3 - Both planes - 18 answers

(b) Break - 4 minAfter the subject has completed the test, the supervising group member willenter the anechoic room and help the test subject start the next part of thelistening test with headphones.

(c) Session 2 - Headphone playback - 12 minAfter the supervisor has left the room, the subject will determine directionsas in Session 1.

(d) Break - 5 minTo ensure that the subject does not get tired, a 5 minute break is introducedafter session 2. The subject will again be blindfolded to avoid visual cues asthey are followed outside the anechoic chamber.In the break refreshments are available to the subject. During the break thechair in the setup is moved to a lateral displacement of 5 cm to the left.

(e) Session 3 - XTC system at lateral displaced position - 12 minThe subject is again led into the anechoic chamber and must determine direc-tions as in Session 1.

4. Debriefing - 5 minWhen it is seen that the subject has finished the test they will be led out of theanechoic chamber and asked if they have any last questions to the test. There willfurthermore be a subjective questionnaire addressing subjective impressions of thereproduction of both systems. The debriefing questionnaire is found in appendix G.

151


Results

Direct Representation

In figure E.5, the results for the XTC system and headphones are presented for bothhorizontal and frontal plane. The plots show the relationship between the perceived andintended virtual directions and the colors indicate the amount of answers. Perfect local-ization would yield that the subject’s answers occurred on the red line on the diagonal.The two black lines are put on the figure to show where cone of confusion errors arelocated. All directions have been normalized according to their number of occurences.

152

Target direction [°]

Per

ceiv

ed d

irect

ion

[°]XtcOpt for horizontal plane, 4.3 avg. rating

−150 −90 −30 30 90 150

180

150

120

90

60

30

0

−30

−60

−90

−120

−150

(a)


XtcOpt for frontal plane, 3.9 avg. rating

−150 −90 −30 30 90 150

180

150

120

90

60

30

0

−30

−60

−90

−120

−150

Ans

wer

s fo

r ta

rget

dire

ctio

n [%

]

0

20

40

60

80

100

(b)


Per

ceiv

ed d

irect

ion

[°]

XtcDis for horizontal plane, 4.4 avg. rating

−150 −90 −30 30 90 150

180

150

120

90

60

30

0

−30

−60

−90

−120

−150

(c)


XtcDis for frontal plane, 4 avg. rating

−150 −90 −30 30 90 150

180

150

120

90

60

30

0

−30

−60

−90

−120

−150

Ans

wer

s fo

r ta

rget

dire

ctio

n [%

]

0

20

40

60

80

100

(d)


Per

ceiv

ed d

irect

ion

[°]

HP for horizontal plane, 4.3 avg. rating

−150 −90 −30 30 90 150

180

150

120

90

60

30

0

−30

−60

−90

−120

−150

(e)


HP for frontal plane, 4.3 avg. rating

−150 −90 −30 30 90 150

180

150

120

90

60

30

0

−30

−60

−90

−120

−150

Ans

wer

s fo

r ta

rget

dire

ctio

n [%

]

0

20

40

60

80

100

(f)

Figure E.5: Localization performance for all test subjects using the different systems. The rating pointsin the title are explained in section 9.3.1.

153


Results Showing Influence of Direction

2

2.5

3

3.5

4

4.5

5

Ave

rage

rat

ing

θ=−150°

θ=−120°

θ=−90°

θ=−60°

θ=−30°

θ=0° θ=30°

θ=60°

θ=90°

θ=120°

θ=150°

θ=180°

ψ=−90°

ψ=−60°

ψ=−30°

ψ=30°

ψ=120°

ψ=150°

θ=−60°, φ=−30°

θ=−120°, φ=30°

θ=−90°, φ=0°

θ=150°, φ=60°

θ=60°, φ=30°

θ=120°, φ=−60°

XtcOpt (Loudspeakers)XtcDis (LoudspeakersDisplaced)HP (Headphones)

Figure E.6: The average performance for a specific direction are plotted for both headphones and XTCsystem. It should be noticed that the directions are grouped for each of the tested planes.

154

0

10

20

30

40

50

60

70

80

90

100

Per

sent

age

of r

atin

gs [%

]

θ=−150°

θ=−120°

θ=−90°

θ=−60°

θ=−30°

θ=0° θ=30°

θ=60°

θ=90°

θ=120°

θ=150°

θ=180°

ψ=−90°

ψ=−60°

ψ=−30°

ψ=30°

ψ=120°

ψ=150°

θ=−60°, φ=−30°

θ=−120°, φ=30°

θ=−90°, φ=0°

θ=150°, φ=60°

θ=60°, φ=30°

θ=120°, φ=−60°

Rating = 1Rating = 2Rating = 3Rating = 4Rating = 5

Figure E.7: The distribution of ratings for the different tested directions, in the XtcOpt listeningsession.

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Val

ues

Column Numberθ=−90° θ=30° θ=150° ψ=−90° ψ=30° ψ=150° θ=−60°

φ=−30°θ=−120°

φ=30°θ=−90°

φ=0°

Figure E.8: Box whisker plot of the ANOVA analysis of whether the difference between the headphoneand XTC systems performance can be considered equal for all directions.

155


Results Showing Influence of Test Subject

1

1.5

2

2.5

3

3.5

4

4.5

5

1 2 3 4 5 6 7 8 9 10 11Test subject

Ave

rage

rat

ing

usin

g X

TC

sys

tem

Figure E.9: Box whisker plot of the ANOVA analysis of whether the subjects performance using theXTC system can be considered similar.

1 2 3 4 5 6 7 8 9 10 112.5

3

3.5

4

4.5

Subject n

Ave

rage

rat

ing

XtcOpt (Loudspeakers)XtcDis (Loudspeakers Displaced)HP (Headphones)

Figure E.10: The average rating for each of the subject in each of the three listening sessions.

156

Appendix FMeasurement Report: Headphone

Transfer Function

The purpose of this measurement report is to measure the impulse response of the head-phones to be used for the subjective listening test section 9, and a describtion of theheadphones are found on page 150 in appendix E. The measurements are performed us-ing a Valdemar head and torso simulator, as a blocked ear canal measurement. After thethe impulse responses are gathered, an inverse filter is made for each channel to equalizethe headphones to a flat frequency response.

Measurement Theory

The theory behind the sampling frequency, stimulus, synchronization, playback platform,recording and calibration are described in the measurement report for the channel trans-fer function (CTF) appendix C. The stimulus is simultaneously played through bothchannels of the headphones, since the crosstalk between the two headphone channels isconsidered negligible. The measurements are performed for ten fittings of one pair ofheadphones. The headphones are removed and put back on the head for every mea-surement. This is done to gain knowledge about the fitting of the headphones on thehead and get an overview of the influence the fitting has on the impulse response. Themeasurements are then compared, and the most representative measurement will be cho-sen as the average headphone transfer function (HPTF). An inverse filter of the chosenmeasurement is then calculated with use of equation (B.6) on page 122. The stimulusis then filtered and played through the headphones and recorded again. This is done toverify the equalization made the frequency response more flat.


Room SetupThe measurements will be measured in the anechoic room Fredrik Bajers Vej 7, B5-101at the AAU. This is the same room as used for the CTF measurement.

Platform and PlaybackThe measurement platform running MATLAB, is able to interface with the sound card

157

F Measurement Report: Headphone Transfer Function

and thereby play the stimuli. The stimuli is adjusted to have a peak amplitude of ±0.5 Vas input to the headphones.

Measurement Setup

During measurements, the setup on figure F.1 will be used.

Phantom

power

48 VAnechoic room

2

2

Recording systemDACMATLAB platform

Power amplifier

Figure F.1: Measurement setup for the HPTF measurement.

158

Equipment


Valdemar Head and torso simulator 2150-00Microphone in Valdemar left G.R.A.S. 40AD 33958Microphone preamp Valdemar left DPA 4037 33957Microphone in Valdemar right G.R.A.S. 40AD 33959Microphone preamp Valdemar right DPA 4037 33954Calibrator B & K type 4231 33691DA converter Behringer ADA 8000 56545Power amplifier Rotel RB-976MkII 52633Measurement system Sinus MSX16 75500Measurement laptop for Sinus MSX16 Lifebook E series 60923Platform running MATLAB Desktop computer 60893Sound card Hammerfall HDSP 9632 2137-04Headphones Beyerdynamic DT 990 PRO 2036-7

Table F.1: Used equipment.


• Measurement of headphone transfer function

1. Make the setup of figure F.1.

2. Place the headphones on Valdemar.

3. Record the stimulus from the microphones placed in the ears of Valdemar.Start playing the stimulus through the headphones. This is the HTPF1 mea-surement.

4. Reposition the headphones on Valdemar, and redo step 3 until ten measure-ments are performed.

Results

The results are calculated in the same way as for the measurement report in appendix C.Figure F.4 depicts the frequency response after inverse filtering has been applied to theheadphones. The inverse filter has been computed with use of equation (B.6), where afilter length of 1024 taps and a β of 0.01 is used.

159

F Measurement Report: Headphone Transfer Function

102

103

104

50

60

70

80

90

100

110

Frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]

MeasurementsMeasurement 4

(a) Frequency response for the right headphone loudspeaker.

102

103

104

50

60

70

80

90

100

110

Frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]

MeasurementsMeasurement 4

(b) Frequency response for the left headphone loudspeaker.

Figure F.2: Measured frequency responses for the 10 measurements. Measurement 4 is the measurementthat the chosen to be used for the project.

160

63 80 100 125 160 200 250 315 400 500 630 800 1k 1.25k1.6k 2k 2.5k3.15k 4k 5k 6.3k 8k 10k12.5k75

80

85

90

95

100

105


SP

L [d

B r

e. 2

0 µP

a]

Figure F.3: Average and standard deviation for the measured transfer function, analyzed in one-thirdoctave band.

102

103

104

70

75

80

85

90

95

100

105

110

115

120

Frequency [Hz]

SP

L [d

B r

e. 2

0 µP

a]

Measurements

Figure F.4: Measured frequency responses for 5 measurements after inverse filtering has been applied.

161

Appendix GQuestionnaires

This appendix contains the two questionnaires for the test subjects participating in thelistening test. Answers can be found on the enclosed DVD

Questionnaire for listening test 08gr960

Full name:Height (only if over 180 cm):

1. I go to discotheques and concerts:©Three times a week or more©Twice a week or more©Once a week©Two to three times a month©Once a month or less

- When doing so, I can easily communicate with other people:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

- When doing so, I use hearing protectors:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

2. I daily use my headphones:©0-1 hours ©1-2 hours ©2-3 hours ©3-5 hours ©More than 5 hours

3. When I do so, I often play loud to suppress sounds from my surround-ings:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

4. I often play so loud that other people can not get my attention:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

G Questionnaires

5. I often rotate my head in order to better hear where the sound originatesfrom:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

6. I often seat myself in the front rows during lectures to better hear thelecturer:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

7. I can easily determine the location of an ambulance from its siren:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

8. When watching TV, people around me often say that the volume on theTV is high:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

9. I have a documented hearing damage:©No ©Yes - Which?

- If yes: It affects my ability to communicate in daily situations:©Disagree ©Slightly disagree ©Moderate ©Slightly agree ©Agree

164

Debriefing questionnaire for listening test 08gr960

Full name:

To determine the direction in the first session was:©Very easy ©Easy ©Moderate ©Difficult ©Very difficult

To determine the direction in the second session was:©Very easy ©Easy ©Moderate ©Difficult ©Very difficult

To determine the direction in the third session was:©Very easy ©Easy ©Moderate ©Difficult ©Very difficult

If you have any general comments to the listening test, you are more than welcome toput these below.

Enclosure: DVD

As enclosure a DVD is attached. This DVD contains information to be studied if thereader desires additional insight into the different topics of the report.

Contents of the DVD:

• Project proposal.

• Project report (PDF & PS format).

• Datasheets.

• Additional literature.

• MATLAB scripts.

• Simulation results.

• Test results.

• Test signals.

• Pictures of system test setup.

167

Binaural Reproduction System for Loudspeakers

Documents

Transcript of Binaural Reproduction System for Loudspeakers