Enhancement of Speech Intelligibility using Beamforming ...832777/...Leela Krishna Gudupudi E-mail:...

1

Master Thesis

Electrical Engineering

Thesis no: MEE 2010-2012

Enhancement of Speech

Intelligibility using

Beamforming Techniques

Leela Krishna Gudupudi

School of Engineerinng

Blekinge Institute of Technology

371 79 Karlskrona

Sweden

2

This thesis is submitted to the School of Engineering at Blekinge Institute of

Technology in partial fulfillment of the requirements for the degree of Master of

Science in Electrical Engineering.

3

Contact Information:

Author:


E-mail: [email protected],

[email protected]

University advisors:

Dr. Nedelko Grbic

Department of Electrical Engineering

School of Engineering

Blekinge Institute of Technology,

Karlskrona, Sweden

Email: [email protected]

Dr. Benny Sällberg

Department of Electrical Engineering

School of Engineering

Blekinge Institute of Technology,

Karlskrona, Sweden

Email: [email protected]

mailto:[email protected]




4

ABSTRACT

Speech enhancement is a process of enhancing the intelligibility or quality of

speech using various audio processing algorithms. The concept of speech

enhancement is the most important field for many applications such as

teleconferencing systems, speech recognition, VoIP, mobile phones, hearing aids,

etc. [1]. The delight of the speech enhancement algorithms using single

microphone was tamed and sooner the markets will be dominated by microphone

arrays with beamforming techniques in which significantly high signal to noise

ratios (SNR) are observed. A considerable amount of research has already been

done about the feasibility and advantages of using microphone arrays. Several

beamforming techniques are widely applied by researchers round the globe.

In this thesis, eight selective beamforming techniques have been applied to

enhance the intelligibility of speech degraded by interference. In order to

demonstrate the algorithms, a graphical user interface (GUI) has been developed in

matlab, where user has a chance to select the number of sensors in the linear array

and the spacing between them, source and interference directions, type of

algorithm and so on. Also, user can compare all the Beamformers by keeping a

default environment. The performances are measured in terms of signal to noise

ratio (SNR), speech and noise distortions and ITU-T’s PESQ values.

This thesis describes the opportunities and constraints for application of eight

beamforming techniques using linear microphone arrays.

Keywords: Beamforming, Speech Enhancement, Microphone Arrays, PESQ, Delay-and-Sum

5

ACKOWLEDGEMENTS

I would never have been able to finish my Thesis without the guidance of my

University faculty, help from friends, and support from my family and fiancé.

I would like to express my deepest gratitude to my supervisor, Dr. Nedelko Grbic,

for his excellent guidance, caring, patience and providing me with an excellent

atmosphere for doing this work. I am also grateful to Dr. Benny Sällberg for his

support and valuable advices throughout the Thesis time.

I would like to thank Ramesh Telagareddy and Bulli KoteswaraRao, who as good

friends, was always willing to discuss the concepts and share their thoughts.

I would also like to thank my parents and family, who were always supporting me

and encouraging me with their best wishes.

Finally, I would like to thank my fiancé, Vineela Musti. She was always there

cheering me up and stood by me through the good times and bad. My thesis would

not have been possible without her help.


6

To My Parents

7

Table of Contents

ABSTRACT ______________________________________________________ 4

ACKOWLEDGEMENTS ____________________________________________ 5

Table of Contents __________________________________________________ 7

List of Figures ____________________________________________________ 9

List of Tables ____________________________________________________ 12

List of Acronyms _________________________________________________ 13

INTRODUCTION ________________________________________________ 14

1 MICROPHONES ______________________________________________ 15

1.1 Microphone Polar Patterns ______________________________________ 15

1.2 Microphone Arrays ___________________________________________ 17

1.3 Spatial Aliasing ______________________________________________ 18

2 BEAMFORMING TECHNIQUES ________________________________ 20

2.1 Fixed Beamforming ___________________________________________ 20

2.2 Adaptive Beamforming ________________________________________ 22

2.3 Generalized Side-lobe Canceller (GSC) ___________________________ 23

2.4 The Wiener Beamformer _______________________________________ 25

2.5 The Elko Beamformer _________________________________________ 26

3 DESIGN AND SIMULATIONS __________________________________ 31

3.1 Simulation Procedure __________________________________________ 31

4 PERFORMANCE METRICS ____________________________________ 43

4.1 Signal-to-Noise Ratio _________________________________________ 43

4.2 Normalized Speech Distortion ___________________________________ 44

4.3 Normalized Noise Distortion ____________________________________ 45

8

4.4 Perceptual Evaluation of Speech Quality (PESQ) ____________________ 45

5 SIMULATION RESULTS _______________________________________ 47

5.1 Delay-and-Sum Beamformer ____________________________________ 48

5.2 WienerBeamformer ___________________________________________ 52

5.3 DSB – DSB GSC _____________________________________________ 56

5.4 DSB–WBF GSC _____________________________________________ 60

5.5 WBF-DSB GSC ______________________________________________ 64

5.6 WBF-WBF GSC _____________________________________________ 68

5.8 The Elko-Wiener Beamformer___________________________________ 76

5.9 Comparison Plots _____________________________________________ 80

6 SUMMARY AND CONCLUSION ________________________________ 82

7 FUTURE WORK: _____________________________________________ 84

REFERENCES __________________________________________________ 85

9

List of Figures

Figure 1.1: Omni-Directional Microphone polar pattern ........................................ 15

Figure 1.2: Bi-Directional Microphone polar pattern1 ............................................ 16

Figure 1.3: Cardioid microphone polar pattern1 ...................................................... 16

Figure 1.4: Example of spatial aliasing ................................................................... 19

Figure 2.1: Delay and Sum Beamforming ............................................................... 21

Figure 2.2: An adaptive beamformer ....................................................................... 22

Figure 2.3: Griffiths-Jim Generalised Side-lobe Canceller ..................................... 23

Figure 2.4: (a) Linear equispaced differential microphone array (b) Diagram of two

element differential array composed of two omni-directional mic’s and

a time delay. .......................................................................................... 27

Figure 2.5: Directional responses of the two-element array shown in Fig. (a) T=0,

(b)T=(d/c)/2, (c) d/c .............................................................................28

Figure 2.6: Implementation of first-order differential microphone using back-to-

back cardioids .......................................................................................29

Figure 2.7: Directional responses of back-to-back cardioid arrangement (a) β=0;

(b)β=0.383; (c) β=1 ..............................................................................29

Figure 3.1: The delay-and-sum beamforming ......................................................... 32

Figure 3.2: Desired and Interference Matrices for Wiener Beamforming .............. 33

Figure 3.3: Pictorial form of the method followed .................................................. 34

Figure 3.4: Wiener Beamformer Structure .............................................................. 35

Figure 3.5: Novel GSC structure ............................................................................. 36

Figure 3.6: The structure of DSB-DSB GSC ........................................................... 37

Figure 3.7: Structure of the DSB-WBF GSC .......................................................... 38

Figure 3.8: Structure of the WBF-DSB GSC .......................................................... 39

10

Figure 3.9: Structure of the WBF-WBF GSC beamformer. .................................... 40

Figure 3.10: Structure of the Elko-Wiener Beamformer ......................................... 42

Figure 5.1: Simulation set-up in an anechoic chamber............................................ 47

Figure 5.2: The Delay and Sum Beamformer .......................................................... 48

Figure 5.3: Signal plots before and after DSB ......................................................... 49

Figure 5.4: DSB SNR plots ...................................................................................... 49

Figure 5.5: DSB Speech Distortion plots................................................................. 50

Figure 5.6: DSB Noise Distortion plot .................................................................... 50

Figure 5.7: DSB PESQ MOS plot ........................................................................... 51

Figure 5.8: The WienerBeamformer ........................................................................ 52

Figure 5.9: Signal plots before and after WBF ........................................................ 53

Figure 5.10: WBF SNR plot .................................................................................... 53

Figure 5.11: WBF Speech Distortion plot ............................................................... 54

Figure 5.12: WBF Noise Distortion plot ................................................................. 54

Figure 5.13: WBF PESQ MOS plot......................................................................... 55

Figure 5.14: DSB – DSB GSC ................................................................................. 56

Figure 5.15: Signal plots before and after DSB-DSB GSC ..................................... 57

Figure 5.16: DSB-DSB GSC SNR plots .................................................................. 57

Figure 5.17: DSB-DSB GSC Speech Distortion plots ............................................ 58

Figure 5.18: DSB-DSB GSC Noise Distortion plots ............................................... 58

Figure 5.19: DSB-DSB GSC PESQ MOS plots ...................................................... 59

Figure 5.20: DSB–WBF GSC .................................................................................. 60

Figure 5.21: Signal plots before and after DSB-WBF GSC .................................... 61

Figure 5.22: DSB-WBF GSC SNR plot .................................................................. 61

Figure 5.23: DSB-WBF GSC Speech Distortion plot ............................................. 62

11

Figure 5.24: DSB-WBF GSC Noise Distortion plot ............................................... 62

Figure 5.25: DSB-WBF GSC PESQ MOS plot....................................................... 63

Figure 5.26: WBF-DSB GSC .................................................................................. 64

Figure 5.27: Signal plots before and after WBF-DSB GSC .................................... 65

Figure 5.28: WBF-DSB GSC SNR plot .................................................................. 65

Figure 5.29: WBF-DSB GSC Speech Distortion plot ............................................. 66

Figure 5.30: WBF-DSB GSC Noise Distortion plot ............................................... 66

Figure 5.31: WBF-DSB GSC PESQ MOS plot....................................................... 67

Figure 5.32: WBF-WBF GSC ................................................................................. 68

Figure 5.33: Signal plots before and after WBF-WBF GSC ................................... 69

Figure 5.34: WBF-WBF GSC SNR plot ................................................................. 69

Figure 5.35: WBF-WBF GSC Speech Distortion plot ............................................ 70

Figure 5.36: WBF-WBF GSC Noise Distortion plot............................................... 70

Figure 5.37: WBF-WBF GSC PESQ MOS plot ...................................................... 71

Figure 5.38: Signal plots before and after Elko Beamformer .................................. 72

Figure 5.39: Elko Beamformer SNR plot ................................................................ 73

Figure 5.40: Elko Beamformer Speech Distortion plot ........................................... 74

Figure 5.41: Elko Beamformer Noise Distortion plot ............................................. 74

Figure 5.42: Elko Beamformer PESQ MOS plot .................................................... 75

Figure 5.43: Elko-Wiener Beamformer ................................................................... 76

Figure 5.44: Signal plots before and after Elko-Wiener Beamformer .................... 77

Figure 5.45: Elko-Wiener Beamformer SNR plot ................................................... 77

Figure 5.46: Elko-Wiener Beamformer Speech Distortion plot .............................. 78

Figure 5.47: Elko-Wiener Beamformer Noise Distortion plot ................................ 78

Figure 5.48: Elko-Wiener Beamformer PESQ MOS plot ....................................... 79

Figure 5.49: SNR comparison plot .......................................................................... 80

Figure 5.50: SNR comparison plot for Elko based beamformers............................ 81

12

List of Tables

Table 4.1: Quality opinion scale used in the development of PESQ ....................... 46

Table 5.1: Delay and Sum Beamformer Observation .............................................. 48

Table 5.2: DSB PESQ MOS Values ........................................................................ 51

Table 5.3: Wiener Beamformer Observation ........................................................... 52

Table 5.4: WBF PESQ MOS Values ....................................................................... 55

Table 5.5: DSB-DSB GSC Observation .................................................................. 56

Table 5.6: DSB-DSB GSC PESQ MOS Values ...................................................... 59

Table 5.7: DSB-WBF GSC Observation ................................................................. 60

Table 5.8: DSB-WBF GSC PESQ MOS Values ..................................................... 63

Table 5.9: WBF-DSB GSC Observation ................................................................. 64

Table 5.10: WBF-DSB GSC PESQ MOS Values ................................................... 67

Table 5.11: WBF-WBF GSC Observation .............................................................. 68

Table 5.12: WBF-WBF GSC PESQ MOS Values .................................................. 71

Table 5.13: Elko Beamformer Observation ............................................................. 73

Table 5.14: Elko Beamformer PESQ MOS Values ................................................. 75

Table 5.15: Elko-Wiener Beamformer Observation ................................................ 76

Table 5.16: Elko Beamformer PESQ MOS Values ................................................. 79

13

List of Acronyms

1. DSB Delay-Sum-Beamformer

2. WBF Wiener Beamforming

3. GSC Generalized Side-lobe Canceller

4. SNR Signal to Noise Ratio

5. SNRI SNR Improvement

6. PESQ Perceptual Evaluation of Speech Quality

7. SD Speech Distortion

8. ND Noise Distortion

9. GUI Graphical User Interface

10. dB Decibels

11. NLMS Normalized Least Mean Square

12. PSD Power Spectral Density

13. NSD Normalized SD

14. NND Normalized ND

15. DOA Direction of Arrival

16. MOS Mean Opinion Score

14

INTRODUCTION

Beamforming is a long-familiar and widely used technique in which a beam is to

be formed in the direction of the desired speech using an array of sensors where the

samples propagating from this beam are processed and the rest are nullified. In this

study, a female voice recording is used as desired source and a male voice

recording as interference. Eight beamforming algorithms have been applied in this

scenario in which the main aim is to recover the desired female voice in the look-

direction from the interference. This thesis summarizes the performance of the

following eight beamforming techniques:

1. Delay and Sum Beamformer (DSB)

2. The DSB-DSB Generalised Side-lobe Canceller (GSC)

3. Wiener Beamformer (WBF)

4. The WBF-WBF GSC

5. The DSB-WBF GSC

6. The WBF-DSB GSC

7. Elko Beamformer

8. The Elko-Wiener Beamformer

Linear microphone arrays are used throughout the study with user defined number

of sensors with constant user defined spacing between the sensors. All the

beamforming algorithms are implemented in time-domain. Fractional delays are

considered while calculating the sensors data [14]. The performance measures are

the SNR, spectral distortions and ITU-T recommended PESQ (Perceptual

Evaluation of Speech Quality). Entire project was implemented in MATLAB. No

reflections have been considered throughout the project and the speed of the sound

was taken as 343m/s.

15

1 MICROPHONES

A microphone is a transducer that converts sound into an electric signal. It is an

electromechanical device that uses vibration to create an electric signal

proportional to the vibration, which is usually an air pressure wave. Microphones

are classified based on their transducer principal, physical characteristics and by

their directional characteristics.

1.1 Microphone Polar Patterns

Not all microphones pass sound waves from sources coming from all the

directions. Microphones are also categorized according to how well they pickup

sound from certain directions. The three most common microphone patterns are

described here:

Omni-directional:

An omnidirectional microphone is the simplest mic design that will accept

all sounds regardless of its point of origin. These are very easy to use and

have good frequency response. The smallest diameter microphone gives the

best omnidirectional characteristics at high frequencies as the flattening of

the response is directly proportional to the diameter of the microphone.

Figure 1.1: Omni-Directional Microphone polar pattern1

1 Image Courtesy: www.prosoundweb.com

16

Bi-directional:

A bi-directional microphone is also called as “figure of eight” microphone.

This mic with a figure of eight polar pattern allows sound coming from in

front of the mic and from the rear but not from the sides i.e., 90 degree

angle. Most of the Ribbon microphones are bi-directional. The frequency

response is just as good as omnidirectional mic, at least for sounds that are

not too close to the mic.

Figure 1.2: Bi-Directional Microphone polar pattern1

Cardioid:

A cardioid microphone has most of the sensitivity at the look direction and is

least sensitive at the back i.e., a mic that picks up sounds it is pointed at. It

isolates from unwanted ambient sound and is much more resistant to

feedback than omnidirectional mic’s. This makes the cardioid mic’s

particularly suitable for loud stages.

Figure 1.3: Cardioid microphone polar pattern1

17

1.2 Microphone Arrays

In order to achieve better directionality, two or more microphones are positioned

closely, called as microphone arrays. If they are positioned linearly then they are

called as Linear Microphone Arrays. If the distance between the mic’s is equal,

then it could be called as linear equispaced microphone array. Considering the

advantage of the fact that an incoming sound wave arrives at each of the mic’s at a

slightly different time, microphone arrays achieve better directionality compared to

single microphone. Distinguishing sounds based on the spatial location of their

sources is achieved by filtering and combining the individual microphone signals.

The important concepts that are used in the design of microphone arrays are

Beamforming, Array directivity and Beam width.

Beamforming:

If a “beam” is formed by using all the microphone’s signals, then the

microphone array is being able to act as a highly directional microphone.

This beam can be electronically managed to point to the desired source.

More details about the beamforming can be discussed in the next chapter.

Array Directivity:

Higher the directivity of the microphone array, higher the reduction of the

amount of captured ambient noises and reverberated waves. Given the

aperture response as a function of frequency and direction of arrival then the

directivity pattern for a linear equispaced (d meters) array of N microphones

is given by [3],

cos

22

1

)2

1(

)(),(nd

c

fj

N

Nn

n efwfD

(1)

From the above equation we see that the directivity pattern depends upon:

o Length of the microphone array, N

o Spacing between the mic’s, d

o The frequency, f

18

Beam Width:

The beam width and the side-lobe level are the two important characteristics

of the directivity pattern. These characteristics depend on the inter-element

spacing (d), spatial sampling frequency (f) and the length of the array (N).

As the frequency increase, the beam width and the side-lobe level will

decrease. As the effective array length (L=N*d) increases, the beam width

and the side-lobe level decreases. Thus, in order to obtain a constant beam

width we must ensure that the f*dremains relatively constant.

1.3 Spatial Aliasing

Spatial aliasing means insufficient sampling of the data along the space axis.

Spatial aliasing results in the appearance of grating lobes. Similar to the temporal

aliasing, microphone arrays implement spatial sampling and an analogous

requirement exists to avoid grating lobes in the directivity pattern,

max*2

1f

dfs

(2)

wherefsis the spatial sampling frequency in samples per meter and fmax is the highest

sampling component. This leads to the relation,

min

max

1

f

(3)

hereλminis the minimum wavelength in the signal of interest and consequently the

requirement is

2

mind

(4)

Equation 4 is known as the spatial sampling theorem and must be followed in

order to prevent the occurrence of spatial aliasing in the directivity pattern of a

sensor array. Figure 4 illustrates the effect of spatial aliasing.

19

Figure 1.4: Example of spatial aliasing

20

2 BEAMFORMING TECHNIQUES

Spatially propagating signals often encounter the presence of interference signals

and noise signals. If the desired signals and the interference signals occupy the

same temporal frequency band, then temporal filtering cannot be used to separate

the signal from the interferers, though the desired and interferences signals

originate from different spatial locations. This can be done using a Beamformer.

Beamforming is a spatial signal processing technique used to control the

directionality of the receiving signal on a microphone array. Beamforming takes

the advantage of interference to change the directionality of the array. This can be

done by combining all the signals from microphones of the array in a way where

signals coming from desired angles experience constructive interference and all

other angles experience destructive interference. Beamforming techniques are

algorithms for determining the complex sensor weights wn(f) (see eqn.1) in order to

implement a desired shaping and steering of the array directive pattern [3].

Beamforming techniques are broadly classified into two types: data-independent

and data-dependent.

Data-independent or fixed beamformers are those algorithms where their

parameters are fixed during operation. The delay-and-sum beamformeris an

example of fixed beamformer.

Data-dependent or adaptive beamformers continuously update their parameters

based on the received input signals. The Griffiths-Jim beamformeris an example of

adaptive beamformer.

2.1 Fixed Beamforming

Delay and Sum Beamformer (DSB):

DSB is the simplest of all beamforming techniques. It is developed based on

the idea that, if a linear equispaced microphone array is being used, then the

output of each microphone will be the same, except that each one will be

21

delayed by a different amount. So, if the output of each mic is delayed

appropriately, then we add all the outputs together. The desired signal that

was propagating through the array will reinforce, while the noise or

interference will tend to suppress. A block diagram of this can be seen below

in Fig. 2.1.

Figure 2.1: Delay and Sum Beamforming

The delays are calculated based on the intra-element distance of the array.

The crucial factors involved in DSB are the geometric arrangement of the

mic’s and weights associated with each mic. The signal-to-noise ratio (SNR)

of the output signal is greater than that of any individual mic’s signal.

The major drawback of the DSB is the requirement of large number of mic’s

to improve the SNR. If the interference or noise signals are completely

uncorrelated with the desired signal then, for every doubling of number of

mic’s will result in an additional 3dB of SNR increment. Another

disadvantage is that, no nulls are being placed directly in the interference

signal’s direction.

22

2.2 Adaptive Beamforming

Data-dependent beamforming techniques adaptively filter the incoming signals in

order to pass the signal from desired direction, while rejecting interferences or

noises coming from other directions. An adaptive beamformer is a data-dependent

beamformer that is able to separate signals collocated in the frequency band but

separated in the spatial domain [2]. An adaptive beamformer is able to

automatically optimize the microphone array pattern by adjusting the elemental

control weights until a prescribed objective function is satisfied. The choice of

selecting the adaptive algorithm for deriving the adaptive weights plays key role in

determining the convergence speed and system complexity. General adaptive

algorithms that are being used for beamforming include Least Mean Squares

(LMS) algorithm, Normalised LMS (NLMS) algorithm and Recursive Least

Squares (RLS) algorithm. Each algorithm has its own advantages and

disadvantages. A typical adaptive beamformer structure can be seen in Fig. 2.2.

Figure 2.2: An adaptive beamformer

The weight vector w can be chosen according to the input signal vector x(t). The

main aim of the adaptive beamformer is to optimize the system response with

respect to the prescribed criteria, so that the output signal y(t) will be free from

23

noise or interference. The optimum adaptive criteria for beamformer are Minimum

Mean Square Error, Maximum Signal-to-Noise/Interference Ration and Minimum

Variance [2].

There are two basic adaptive approaches, they are: Block adaptation and

Continuous adaptation. In block adaptation, statistics are estimated from a

temporal block of array data and used in an optimum weight equation. In

continuous adaptation, weights are updated based on the input data that is sampled

such that the resulting weight vector sequence converges to the optimum solution

[13].

2.3 Generalized Side-lobe Canceller (GSC)

The GSC method provides a simple solution for implementing Linearly

Constrained Minimum Variance (LCMV) beamformers. The basic idea behind

LCMV beamforming is to constrain the response of the beamformer; signals from

the look-direction are passed with specified gain and phase [13, 15]. The weights

are chosen to minimize output variance subject to the response constraint. This has

the effect of preserving the desired signal while minimizing the noise or interfering

signal strength.

Figure 2.3: Griffiths-Jim Generalised Side-lobe Canceller

24

A block structure of the generalised side-lobe canceller is shown in Fig. 2.3 [2].

GSC separates the adaptive beamformer into two main processing blocks [3]. The

first of these implements a standard fixed beamformer (say DSB) which extracts

the desired signal. The second block is a set of adaptive filters that adaptively

minimize the power in the output. Blocking MatrixBis used to eliminate the desired

signal from this second block, ensuring that it is the noise power that is minimised.

Blocking Matrix:

The purpose of the blocking matrix is to block the desired signal entering the

second stage of GSC structure and allow the interference or noise signals.

Blocking will be occurred if the sum of elements of the rows of blocking

matrix are zero [3], i.e., sum(B(i,:))=0. The number of rows in the blocking

matrix is equal to the number of elements in the array. The standard

Griffiths-Jim blocking matrix is given by [2]

11000

01100

0....0110

0....0011

B

(11)

The output of the blocking matrix is adaptively filtered and summed to get

the lower block’s final output. This final output contains only noise /

interference signals and this should be subtracted from the upper block’s

output, which tends to the final result.

In the original GSC method [2], the optimum array parameters are adaptively

estimated based on the available data set. LMS algorithm was used for adaptive

filtering (LMS GSC) because of its low computational complexity. The GSC is

most widely used adaptive beamformer because of its structure flexibility. In

practice, GSC structure leads to the desired signal distortion due to the signal

leakage from the blocking matrix. In most of the cases, blocking matrix fails to

25

block the target signal and these leakages result in the desired signal distortion at

the final output.

2.4 The Wiener Beamformer

The wiener beamformer is also referred to as minimum mean square error

beamformer. It is defined as, the weights of the beamformer which minimizes the

mean square difference between the beamformer output (when all sources are

present) and a single microphone output (when only the desired signal is present)

[5].

2minarg nsnyEWopt r

w Nr ....2,1 (11)

The optimal weights that yield the optimum output signal is given by equation

(11), the output y[n], from the beamformer is given by,

N

i

L

j

ii jnxjwny1

1

0 (12)

where, L-1 is the order of the filters and wi[j], j=0,1,....,L-1,are the filter taps for

ith

mic. N is the number of the mic’s and xi(n) is the ith

mic observation.

Sr[n] in eqn. (11) is the single reference mic observation when only desired signal

is chosen as input. The optimal weights which minimize the mean square error

between the output and the reference signal is given by, [6]

snnssopt rRRw1

(13)

here, Rss and Rnn are the autocorrelation matrices of signal of interest and noise

respectively and are given by, [5]

sNsNsNssNs

sNsssss

sNsssss

ss

RRR

RRR

RRR

R

21

22212

12111

(14)

26

nNnNnNnnNn

nNnnnnn

nNnnnnn

nn

RRR

RRR

RRR

R

21

22212

12111

(15)

rs is the cross correlation vector and is given by,

Ns rrrr 21 (16)

where, each element is given by,

knsnsEkr rii * i=1,2,...,N, Nr ....2,1 , k=0,1,...,L-1.

(17)

The cross correlation vector rs is the centre column of the autocorrelation matrix of

the desired signal, Rss. The optimal weights, w, are arranged as,

TT

N

TT wwww 21 (18)

The performance of this wiener beamforming algorithm and the results will be

presented in the following sections.

2.5 The Elko Beamformer

Gary W. Elko designed an algorithm for directional microphone arrays [7]. In

hands-free communications, background noise has prejudicious effect on the

microphone output power. By placing a null in the rear-half plane of the

microphone array, a significant improvement in the signal-to-noise ratio can be

observed. This is the concept behind adaptive first order differential microphones.

This can be achieved using two closely placed omni-directional microphones. Any

first order array can be realized by combining the weighted subtraction of these

two outputs [7]. The null angle can be steered using certain combination

weighting. The Elko algorithm is very easy to implement and its system

complexity is less. The elements of the differential microphone array are placed in

an alternating sign fashion such that the directionality can be realized. Differential

microphone array is super-directional as it has high directivity compared to omni-

27

directional microphone array. Therefore, these types of arrays are most suitable for

personal communication devices and teleconferencing.

2.5.1 Design

Figure 2.4: (a) Linear equispaced differential microphone array (b) Diagram of two element

differential array composed of two omni-directional mic’s and a time delay.

When a plane sound wave signal s(t) with spectrum S(ω) incident on a two element

linear equispaced (d) microphone array making an angle ϴ with its axis as shown

in Fig. 2.4 (b), then the sound wave will reach the mic’s with different delays

based on the intra element distance d. The time difference can be written as,

cdca /)cos(/ (19)

where, c is the velocity of sound (343m/s). The microphone which gets the earlier

incidence of sound file should be delayed and subtracted from the other

microphone signal to get the output, y(t) [8] ,

)()()( Ttststy

)()/)(cos( Ttscts (20)

28

By transforming Eqn. (20) into frequency domain,

)(),(cos

TjeeY c

dj

(21)

The directional response of the array is presented in the below Fig. 2.5. The null

location can be steered by changing the time delay T. In order to change the null

location from 90o to 0

o, the time delay T should be changed from 0 to d/c.

Figure 2.5: Directional responses of the two-element array shown in Fig. 5 (a) T=0, (b)T=(d/c)/2,

(c) d/c

A cardioid system is easy to develop as shown in Fig. 2.5(c) with very less

computational complexity using the set-up shown in Fig. 2.4. The constraints to get

cardioid directivity are: 1) by using unit sample delay, T = 1. 2) by setting the

sampling period as d/c. By using the combination of two back-to-back cardioid

mic’s, one can implement a first-order differential microphone [7].

29

Figure 2.6: Implementation of first-order differential microphone using back-to-back cardioids

The back-to-back cardioid arrangement is as shown in Fig. 2.6. The directivity

patterns using the arrangement in Fig. 2.6 are shown in Fig. 2.7.

Figure 2.7: Directional responses of back-to-back cardioid arrangement (a) β=0; (b)β=0.383;

(c) β=1

By examining Fig. 2.6 and Eqn. (20), we can write the expressions for forward

cardioid and backward cardioid,

)()()(CF Ttstst

30

)/)cos()/(()( cdcdtsts (22)

)()()(CF Ttstst

))/(()/)cos(( cdtscdts (23)

)(*)()( tCtCty BF (24)

Transforming Eqn. (24) into frequency domain,

),(*),(),( BF CCY (25)

cdjjjeeeSY c

dcd

/cos)cos1(1)(),(

(26)

Here the null location can be steered from 180o

to 90o by changing the value of β

from 0 to 1. Unit time delay (T=1) has been considered. Directional responses of

the system in Fig. 2.6 are plotted in Fig. 8 by changing the values of β.

In order to make the system adaptive, a simple and easy LMS algorithm has been

used for the back-to-back cardioid adaptive first-order differential array.

Squaring and differentiating with respect to β on both sides of the Eqn. (24) gives,

)()(2)(2

tCtyd

tdyB

(27)

Thus, the LMS version of the β is given as,

)()(21 tCty Btt (28)

The normalized LMS version of the β is therefore,

)(

)()(2

21tC

tCty

B

Btt (29)

where, µ is the step size and )(2 tCB indicates the time average to normalize µ.

31

3 DESIGN AND SIMULATIONS

A graphical user interface (GUI) has been designed using MATLAB to study,

demonstrate and play with beamforming techniques. A linear equispaced

microphone array is used throughout the thesis, where the user is allowed to

choose the number of elements in the array and the intra-element distance. Two

sound sources s(n) and I(n), English speaking female and male recordings are used

as desired and interference signals respectively. User has again given choice to

select the incidence angles of desired and interference sources. A total of eight

beamforming algorithms have been implemented, they are: 1) The delay-and-sum

beamformer (DSB) 2) The wiener beamformer (WBF) 3) The DSB-DSB GSC 4)

The WBF-DSB GSC 5) The DSB-WBF GSC 6) The WBF-WBF GSC 7) The Elko

beamformer 8) The Elko-Wiener beamformer. The performance measures used in

the GUI are: 1) signal-to-noise ratio (SNR) 2) speech distortion plot. 3) noise

distortion plot 4) perceptual evaluation of speech quality (PESQ).

3.1 Simulation Procedure

Let s(n) and I(n) be the sequences of desired and interference sound sources

respectively and making angles of incidences ϴ1 and ϴ2 with the centre of N

element linear microphone array. The sound field is composed by plane waves. Let

d meters be the intra-element distance. The velocity of the sound is c=343m/s. The

arrival time differences of the two speech signals to each microphone in the array

are computed and added to get the microphone response. The delay calculations

are determined by the steering direction of the array:

c

dnn

cos)1( (30)

where, n is the microphone index range from 1 n N, is the time the sound

wave takes to travel between the reference sensor and nth sensor. Getting

microphone array response is the common step in the simulation procedure of all

the mentioned beamformers.

32

3.1.1 The Delay-and-sum Beamformer (DSB):

The delay-and-sum beamformer is very easy to implement after getting the each

microphone response in the array. As the name indicates, appropriate delays are

inserted after each microphone to compensate for the arrival time differences of the

desired speech signal. The outputs of the delays are the time aligned signals which

are then added together to get the beamformer output [see Fig. 2.2]. How to insert

appropriate delays? The answer is simple, using the input delay vector of desired

speech signal, calculated using Eqn. 30. Flip this vector and insert each element as

delay to each microphone. So, the microphone to which the desired signal

impinges first will get the highest delay. This has the effect of reinforcing the

desired signals in phase while attenuating the interference, which are likely to shift

out of phase.

Figure 3.1: The delay-and-sum beamforming

33

3.1.2 The Wiener Beamforming (WBF):

The simulation procedure for wiener beamforming starts before adding the desired

and interference signals at each mic’s to get microphone response. Consider the

desired and interference signals separately at each mic and represent in a matrix

form. Let the total number of samples of speech and interference signals at each

mic be ‘S’. Therefore, the order of the desired and interference matrices are N x S

as shown in Fig. 3.2. Each row in the matrix represents corresponding mic’s

speech/interference response.

Figure 3.2: Desired and Interference Matrices for Wiener Beamforming

Let us recall Eqn. (13), the optimum weights which minimize the mean square

error between the output and the reference signal. It requires the auto-correlation

matrices of desired and interference signals, also the cross-correlation vector. How

to get these matrices using M_S and M_I? Here is the method that I used while

implementing wiener beamformer. Let the order of the wiener filter be 64.

Consider an empty matrix X of order 64*N xK, where K is the integer part of S/64.

Now, consider the first 64 columns of matrix M_S and make this N x 64 matrix in

to a single column vector and let it be x1. Replace the first column of X with x1.

Now, consider the next 64 columns (65 to 128) of M_S and convert it into single

column vector and let it be x2. Replace the second column of X with x2. Follow this

procedure till the end of the matrix M_S and get the matrix X. Also, follow the

34

similar method using matrix M_I and get the matrix Y. This method of finding X

and Y matrices is shown in following Figure 3.3.

Figure 3.3: Pictorial form of the method followed

After getting the X and Y matrices, the auto-correlation matrices of the desired

signal s(n) and interference signal I(n) can be calculated using the following

equations:

35

The auto-correlation matrices of the desired (Rss) and interference signal (Rnn):

K

i

T

iiss xxK

R1

).(1

(31)

K

i

T

iinn yyK

R1

).(1

(32)

The cross-correlation vector term (rs) of Eqn. (13) is given by the centre column of

the auto-correlation matrix, Rss. If the user wants to concentrate on the interference

alone then the cross-correlation vector should be chosen as the centre column of

the auto-correlation matrix, Rnn. So, now we have all the required elements in Eqn.

(13) to calculate the optimum weight vector, Wopt. In this particular case, the order

of the optimum weight vector Wopt is N*64 x 1. Now, consider the first 64

elements of Woptand make it flip upside down and let it be wN. Again, consider the

next 64 elements of Wopti.e., from 65 to 128 and make it flip upside down and let it

be wN-1. Follow the similar procedure till the end of vector Wopt that is, until we get

w1. Now, use these individual weight vectors to filter the N microphone responses

in the array. Add the entire filtered outputs gives rise to the wiener beamformer

response s’(n) as shown in following Fig. 3.4.

Figure 3.4: Wiener Beamformer Structure

36

3.1.3 Generalized Side-lobe Canceller (GSC):

In this thesis, few efficient algorithms are implemented based on the GSC structure

discussed in section 2.3. The performances of the suggested realizations of the

NLMS algorithm based GSC methods are illustrated in the context of beamforming

applications. The major problem with the traditional GSC [2] is the sensitiveness

of the blocking matrix which leads to target signal leakage and thereby

cancellation of the desired signal at the final output. So, why don’t we use another

beamformer in the second path of the GSC structure instead of a blocking matrix?

Yes, based on this idea and using the DSB and the WBF beamformers I

implemented four kinds of beamformer structures. The structure of the beamformer

with N microphones is as shown in Fig. 3.5.

Figure 3.5: Novel GSC structure

Therefore, the four implemented beamformers are the DSB-DSB GSC, the DSB-

WBF GSC, the WBF-DSB GSC and the WBF-WBF GSC.

37

3.1.4 The DSB-DSB GSC:

The DSB-DSB GSC structure with N microphones is shown in Fig. 3.6. The GSC

structure includes two delay-and-sum beamformers, one in the upper path and the

other in the lower path and an adaptive filter. The upper path DSB enhances the

target signal and let it be d1(n), where n is the sample index. The lower path DSB

enhances the interference signal and rejects the target signal, let the output from

the lower DSB be d2(n). The NLMS adaptive filter adaptively subtracts the

components correlated to the d2(n)from the d1(n) gives rise to the final output E(n).

Here, the lower DSB acts as blocking matrix. If the number of sound sources

increases then the number of lower paths in the GSC structure also increases to the

same number where the lower paths consist of a beamformer and an adaptive filter.

All the outputs of the adaptive filters are summed before subtracting from the

upper beamformer output (d1(n)).

Figure 3.6: The structure of DSB-DSB GSC

38

3.1.5 The DSB-WBF GSC:

We can imagine target signal leakage from the DSB in the above method,

especially in the lower path which leads to the desired signal cancellation at the

final output. Now, replace the DSB from the lower path with wiener beamformer

(WBF) which leads to the DSB-WBF GSC. The structure of the DSB-WBF GSC

with N microphones and two sound sources is shown in Fig. 3.7. In this method,

the DSB enhances the desired signal s(n) and let the output of the DSB be d1(n).

The WBF in the lower path acts like a blocking matrix which extracts the

interference signal and rejects the desired signal. The output of the WBF d2(n)is

free from leakage problems. The NLMS adaptive filter adaptively subtracts the

components correlated to the d2(n)from the d1(n) gives rise to the final output E(n).

We can observe better signal enhancement at the output compared to DSB-DSB

GSC.

Figure 3.7: Structure of the DSB-WBF GSC

39

3.1.6 The WBF-DSB GSC:

Let us invert the beamformer sections from the above method that is wiener

beamformer in the upper layer to enhance the desired signal and delay-and-sum

beamformer in the lower path to get the interference signal. As mentioned before,

d1(n)and d2(n) are the outputs of the WBF and DSB respectively. The structure of

the DSB-WBF GSC with N microphones and two sound sources is shown in Fig.

3.8. The NLMS adaptive filter adaptively subtracts the components correlated to

the d2(n)from the d1(n) gives rise to the final output E(n). We can observe better

signal-to-noise ratio at the output compared to DSB-WBF GSC, this is because of

the robust performance of the WBF in the upper layer.

Figure 3.8: Structure of the WBF-DSB GSC

40

3.1.7 The WBF-WBF GSC:

If there is no chance for the signal leakage in both the upper and lower layers of

GSC structure then it leads to the perfect system structure with best signal-to-noise

ratio. The WBF-WBF GSC is such a desired system which is a common

improvement that builds on the wiener beamformer. It comprises two signal flow

paths, a WBF in the upper path enhances the target signal while the other WBF in

the lower path approximates the interference signal. Let the outputs of the upper

and lower path beamformers be d1(n)and d2(n)respectively. The interference signal

d2(n) is adaptively filtered using NLMS filter and subtracting with the

approximated target signal d1(n) to get the paths converges. The WBF-WBF GSC

structure with N microphone array is shown in Fig. 3.9.

Figure 3.9: Structure of the WBF-WBF GSC beamformer.

41

3.1.8 The Elko Beamformer:

The design of the Elko beamformer was discussed in section 2.5. The initial step in

the implementation part is similar up to getting microphone array response as

mentioned in section 3.1. The necessary steps while implementing Elko

beamformer includes alternate arrangement of forward and backward facing

cardioids, using unit delays after each cardioid. As I am using two speech

recordings throughout the thesis and the Elko beamformer is used to suppress the

background sound sources, hence desired speech is allowed to impinge from the

forward direction (-90to 90 degrees) and the interference is from backward

direction (-90 to -180 or 90 to 180). After choosing the direction of arrivals of

sound sources, consider adjacent pairs of microphones and then find the individual

angles of arrival (say ϴ1 and ϴ2 for desired and interference sources) to the

respective microphone pair. Use these new angles to get the forward and backward

cardioid data.

)2

)cos1(sin(*)()

2

)cos1(sin(*)()( 21

kd

nIkd

nSnCF

(30)

)2

)cos1(sin(*)()

2

)cos1(sin(*)()( 21

kd

nIkd

nSnCB

(31)

where, wave vector k=ω/c, ϴ1 and ϴ2 are the angles of arrivals of the desired and

interference signals for selected pair of microphones.

After getting the cardioid responses, using equations 2.4 and 2.9, we will get the

elkobeamformer output y1(n) for the first pair of microphones. Similarly we will

get the elkobeamformer outputs for all the pairs of microphones in the array.

)()()()( 21 nynynyny m (24)

Finally by summing up all the outputs we will get the first-order differential

microphone array response. If the number of microphones in the array is odd then

take cumulative pairs of forward and backward facing microphones that is if the

first pair of microphones are forward and backward cardioids then the next pair

42

will be backward cardioid from the last pair and the next adjacent forward

cardioid.

3.1.9 The Elko-Wiener Beamformer:

The Elko-Wiener beamformer is a common improvement that builds on the

elkobeamformer. It comprises of two sections, one the elkobeamformer that

minimizes the microphone output power by locating the solitary first order

microphone null in the rear-half plane, while the other section is the typical wiener

beamformer which enhances the desired speech signal from the elkobeamformer

output. The structure of N microphone elko-wiener beamformer is shown in Fig.

3.10.

Figure 3.10: Structure of the Elko-Wiener Beamformer

The intelligibility of the desired speech signal at the output will be considerably

high compared to the only elkobeamformer due to the wiener beamforming. If the

number of microphones in the array is odd then the pairs of microphones should be

like [(1, 2), (2, 3), (3, 4) ........ (N-2, N-1), (N-1, N)] to perform elko algorithm.

43

4 PERFORMANCE METRICS

A speech performance metric is a measure of the performance of the hands-free

speech enhancement system with respect to the quality of the speech it reproduced.

An ideal speech communication system should produce the same perceived sound

impression on the listener’s side as in a situation where a speaker and a listener

communicate by speech in a close and quite anechoic chamber [9]. The system

performing noise/interference suppression should preserve the quality of the

speech. The performance of the speech enhancement systems requires the

development of measures to evaluate the overall quality of the received speech

[10].

The objective measures that I used in this thesis to evaluate the performance of the

beamforming algorithms are: 1) signal-to-noise ratio (SNR) at the input and output

of the systems, the output SNR vs. input SNR graphs are occasionally plotted. 2)

Measures of speech and noise spectral distortion caused by the beamforming filters

and 3) Perceptual evaluation of speech quality (PESQ), an ITU-T standard. These

metrics present the advantage of being repeatable and efficiently computable.

While computing the performance metrics; a system with particular beamforming

algorithm is first executed with both the sound sources (desired and interference

speech) present. After the system has converged, all the beamforming filters

weights W should be saved. Now, disable the interference sound source and allow

only the desired speech signal through the system and save the response. Next,

disable the desired sound source and allow the interference only to pass through

the saved system and save the response. Using these two individual responses we

can compute the performance measures.

4.1 Signal-to-Noise Ratio

This metric measures the dominance of speech signal with respect to the

noise/interference. Though I am not using any noise in this thesis, consider

interference as noise. So, instead of mentioning signal-to-interference ratio (SIR), I

used the term signal-to-noise ratio (SNR) which is more familiar. Input SNR is

44

given by the ratio of the variance of the desired speech signal (ESi) to the variance

of the interference signal (ENi) at a particular reference microphone. Similarly, the

output SNR is given by the variance of the output when only desired speech is

given as input (ESo) to the variance of the output when only interference is given as

input (ENo).

)(

10log10)( Ni

Si

E

E

IN dBSNR (32)

)(

10log10)( No

So

E

E

OUT dBSNR (33)

Therefore, the difference of the output SNR and input SNR values gives the SNR

improvement of the beamforming algorithm.

INOUT SNRSNRSNRI (34)

4.2 Normalized Speech Distortion

The normalized speech distortion is defined as the deviation in the power spectral

density (PSD) of the clean speech at the reference microphone and the processed

speech signal at the output. To establish a power reference level, the enhanced

output signal is normalized by the target speech signal gain of the beamforming

structure [10]. The power normalization constant CS is computed as

))((

))((

S

SYmean

SmeanC (35)

where, S(Ω) is PSD of S(n) and YS(Ω) is PSD of ys(n).

)(*)(1 SSS YCY (36)

Now, the speech distortion is given by

))((

))()((log10

1

10

Sabs

SYabsNSD S

(37)

where, abs() gives the absolute value.

45

4.3 Normalized Noise Distortion

Normalized noise distortion (NND) is similar to NSD and is defined as the

deviation in the PSD of the clean interference signal at the reference microphone

and the processed interference signal at the output. To establish a power reference

level, the output interference signal is normalized by the interference signal gain of

the beamforming structure [10]. The power normalization constant CI is computed

as

))((

))((

I

IYmean

ImeanC (38)

where, I(Ω) is PSD of I(n) and YI(Ω) is PSD of yI(n).

)(*)(1 III YCY (39)

Now, the noise distortion is given by

))((

))()((log10

1

10

Iabs

IYabsNND I

(40)

where, abs() gives the absolute value.

4.4 Perceptual Evaluation of Speech Quality (PESQ)

The subjective analysis in determining the speech quality of a transmission system

is always a lengthy and expensive process. PESQ is an objective analysis method

recommended by International Telecommunications Union, ITU-T P.862. This tool

predicts the results of subjective listening tests based on the schematic procedure

described by the PESQ application guide ITU-T P.862.3 [12].

Let X(n) be the pre-processed signal and Y(n) be the processed signal then PESQ

tool determines the quality of Y(n) by using a large database of subjective tests.

The final PESQ score is a linear combination of the average disturbance value and

the average asymmetrical disturbance value [12]. The perceived listening quality is

expressed in terms of Mean Opinion Score (MOS), an average quality score over a

46

large set of subjects. For most of the cases the output range will be in between 1.0

and 4.5. Most of the subjective experiments used in the development of PESQ used

the Absolute Category Rating (ACR) opinion scale as shown in table-1.

PESQ posses an advantage of being rapid and repeatable besides it admits less

computational time.

Table 4.1:Quality opinion scale used in the development of PESQ

TABLE - 1

QUALITY OPINION SCALE USED IN THE DEVELOPMENT OF PESQ

Quality of the Speech

Score

Excellent

Good

Fair

Poor

Bad

4.5

4

3

2

1

47

5 SIMULATION RESULTS

In order to evaluate the implemented beamformers described in Chapter-3, I used a

6 element linear microphone array. The distance between the adjacent mic’s was

2.14 centimetres. The sampling rate was 16Khz. Except for Elko beamformers, for

the rest of the six beamformers, the Direction of Arrival (DOA) of the target signal

was set at 45o and the DOA of the interference was set at -45

o. All the

beamformers was evaluated in a computer simulated anechoic environment by

using the performance metrics described in Chapter-4. Normalized Least Mean

Square (NLMS) algorithm has been used for adaptive filtering wherever

applicable. Order of the filter was taken as 20 with a step-size (µ) of 0.0001. Two

sound sources have been used in the simulation, one female English speaker

recording as desired source and the other male English speaker recording as

interference. The simulation set-up is shown in Fig. 5.1.

Figure 5.1: Simulation set-up in an anechoic chamber

48

5.1 Delay-and-Sum Beamformer

Figure 5.2: The Delay and Sum Beamformer

The output obtained with DSB along with its input is shown in Fig. 5.3, which

shows that the signal was enhanced. A constant SNR improvement of 4.1984 dB

was observed with DSB for all input SNR conditions as shown in the table 5.1.

Table 5.2 shows the PESQ scores which depicts the improvement in the desired

speech quality at the output. The speech and the noise distortion plots are shown in

Fig. 5.5 and Fig. 5.6 respectively.

Table 5.1: Delay and Sum Beamformer Observation

The observations while performing DSB are presented below using the

performance metrics discussed in Chapter 3.

Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 4.1984 -45.7430 -32.8376

5 4.1984 -45.7430 -32.8376

10 4.1984 -45.7430 -32.8376

15 4.1984 -45.7430 -32.8376

20 4.1984 -45.7430 -32.8376

49

Figure 5.3: Signal plots before and after DSB

Figure 5.4: DSB SNR plots

50

Figure 5.5: DSB Speech Distortion plots

Figure 5.6: DSB Noise Distortion plot

51

Figure 5.7: DSB PESQ MOS plot

Table 5.2: DSB PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.574 2.179

5 1.883 2.504

10 2.215 2.832

15 2.543 3.140

20 2.865 3.453

52

5.2 WienerBeamformer

Figure 5.8: The WienerBeamformer

The input vs. output signals of the WBF is shown in Fig. 5.9, which clearly

indicates the high level performance of the WBF. The SNR improvements at

various input SNR levels are given in table 5.3. An average of 13 dB SNRI was

observed using WBF as shown in Fig. 5.10. PESQ MOS scores in the table 5.4

prove that the quality of the desired speech signal at the output was good. PESQ

scores for different input conditions are shown in Fig. 5.13. The speech and the

noise distortion plots of WBF are shown in Fig. 5.11 and Fig. 5.12 respectively.

Table 5.3: Wiener Beamformer Observation

The observations while performing WBF are presented below using the


Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 18.7719 -40.5616 -28.3315

5 14.1242 -42.6910 -28.3316

10 11.9452 -46.2568 -28.3316

15 11.1301 -50.6987 -28.3316

20 10.8564 -55.4347 -28.3316

53

Figure 5.9: Signal plots before and after WBF

Figure 5.10: WBF SNR plot

54

Figure 5.11: WBF Speech Distortion plot

Figure 5.12: WBF Noise Distortion plot

55

Figure 5.13: WBF PESQ MOS plot

Table 5.4: WBF PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.572 3.241

5 1.888 3.262

10 2.223 3.440

15 2.548 3.709

20 2.862 4.012

56

5.3 DSB – DSB GSC

Figure 5.14: DSB – DSB GSC

The input-output signal comparison plot of the DSB-DSB GSC is shown in Fig.

5.15. As shown in table 5.5, a constant SNR improvement of 5.3 dB was observed

using DSB-DSB GSC, which is 1.2 dB more compared to normal DSB as shown in

Fig. 5.16. The speech and the noise distortion plots are shown in Fig. 5.17 and Fig.

5.18 respectively. PESQ MOS scores for different input SNR conditions are shown

in Fig. 5.19 and table 5.6, which depicts slender improvement compared to DSB.

Table 5.5: DSB-DSB GSC Observation

The observations while performing DSB – DSB GSC are presented below using

the performance metrics discussed in Chapter 3.

Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 5.3421 -44.6611 -33.0241

5 5.3384 -44.5430 -33.0058

10 5.3501 -44.4329 -33.0388

15 5.3614 -44.3435 -33.0733

20 5.3825 -44.2812 -33.0987

57

Figure 5.15: Signal plots before and after DSB-DSB GSC

Figure 5.16: DSB-DSB GSC SNR plots

58

Figure 5.17: DSB-DSB GSC Speech Distortion plots

Figure 5.18: DSB-DSB GSC Noise Distortion plots

59

Figure 5.19: DSB-DSB GSC PESQ MOS plots

Table 5.6: DSB-DSB GSC PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.574 2.188

5 1.883 2.514

10 2.215 2.847

15 2.543 3.160

20 2.865 3.473

60

5.4 DSB–WBF GSC

Figure 5.20: DSB–WBF GSC

The output obtained using DSB-WBF GSC is plotted along with its input in Fig.

5.21. Table 5.8 and Fig. 5.22 shows the SNR improvement at various input SNR

levels, an average of 12 dB improvement was observed using DSB-WBF GSC.

The better performance of the DSB-WBF GSC compared to the DSB-DSB GSC is

due to the WBF in the second stage of the GSC. The WBF has got high level

performance with very lesser extent of desired speech signal leakage. The PESQ

scores are shown in Fig. 5.24 and in table 5.8. The speech and the noise distortion

plots of DSB-WBF are shown in Fig. 5.23 and Fig. 5.24 respectively.

Table 5.7: DSB-WBF GSC Observation

The observations while performing DSB–WBF GSC are presented below using the


Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 11.9192 -34.5783 -31.3902

5 11.9996 -38.1477 -31.5655

10 13.6731 -42.1718 -31.7685

15 17.0847 -45.0234 -32.0713

20 12.8110 -46.1565 -33.4017

61

Figure 5.21: Signal plots before and after DSB-WBF GSC

Figure 5.22: DSB-WBF GSC SNR plot

62

Figure 5.23: DSB-WBF GSC Speech Distortion plot

Figure 5.24: DSB-WBF GSC Noise Distortion plot

63

Figure 5.25: DSB-WBF GSC PESQ MOS plot

Table 5.8: DSB-WBF GSC PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.579 2.197

5 1.883 2.528

10 2.215 2.866

15 2.543 3.192

20 2.865 3.596

64

5.5 WBF-DSB GSC

Figure 5.26: WBF-DSB GSC

The input and output signals of the WBF-DSB GSC is plotted in Fig. 5.25. The

improvement in the SNR using WBF-DSB GSC at different input SNR levels is

shown in Fig. 5.26 and table 5.7. A slight decrease in the SNR levels can be

observed compared to the WBF. PESQ scores are given in Fig. 5.29 and table 5.8,

though it depicts a good quality signal, again we can observe slighter decrease in

the PESQ scores compared to WBF. The reason behind the fall of values is due to

the poor performance of the DSB in the second stage of the GSC which leads to

the desired speech cancellation at the output. The speech and the noise distortion

curves of WBF-DSB GSC are shown in Fig. 5.27 and Fig. 5.28 respectively.

Table 5.9: WBF-DSB GSC Observation

The observations while performing WBF – DSB GSC are presented below using

the performance metrics discussed in Chapter 3.

Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 18.3303 -40.4923 -29.0132

5 14.0236 -41.6520 -28.7313

10 11.9201 -41.8120 -28.7319

15 11.1123 -41.6466 -28.8029

20 10.7507 -41.4458 -28.9159

65

Figure 5.27: Signal plots before and after WBF-DSB GSC

Figure 5.28: WBF-DSB GSC SNR plot

66

Figure 5.29: WBF-DSB GSC Speech Distortion plot

Figure 5.30: WBF-DSB GSC Noise Distortion plot

67

Figure 5.31: WBF-DSB GSC PESQ MOS plot

Table 5.10: WBF-DSB GSC PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.575 3.107

5 1.885 3.177

10 2.224 3.361

15 2.548 3.609

20 2.863 3.837

68

5.6 WBF-WBF GSC

Figure 5.32: WBF-WBF GSC

The output obtained using WBF-WBF GSC is shown in Fig. 5.31. The SNR

improvements using WBF-WBF GSC under different input SNR levels are shown

in Fig. 5.32 and table 5.9. An almost similar improvement as in the case of only

WBF was observed. PESQ MOS scores are shown in Fig. 5.35 and table 5.10,

interference / noise gain mismatch in the two stages is the cause for the slighter fall

of the values but still the good quality is maintained. The speech and the noise

distortion curves of WBF-DSB GSC are shown in Fig. 5.33 and Fig. 5.34

respectively.

Table 5.11: WBF-WBF GSC Observation

The observations while performing WBF-WBF GSC are presented below using the


Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 18.7720 -40.5617 -28.3316

5 14.1244 -42.6912 -28.3317

10 11.9453 -46.2570 -28.3317

15 11.1302 -50.6988 -28.3317

20 10.8565 -55.4348 -28.3317

69

Figure 5.33: Signal plots before and after WBF-WBF GSC

Figure 5.34: WBF-WBF GSC SNR plot

70

Figure 5.35: WBF-WBF GSC Speech Distortion plot

Figure 5.36: WBF-WBF GSC Noise Distortion plot

71

Figure 5.37: WBF-WBF GSC PESQ MOS plot

Table 5.12: WBF-WBF GSC PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.572 3.188

5 1.888 3.225

10 2.223 3.425

15 2.548 3.700

20 2.862 4.008

72

5.7 The Elko Beamformer

The simulation environment for the evaluation of Elko based beamformers consists

of 6 cardioid microphones. The forward and backward cardioids are arranged

alternatively as shown. The DOA of the target signal was set at 45o and the

interference was set at -135o to the centre of the array. Rest of the properties is

similar to the other beamformers discussed so far. NLMS algorithm has been used

for obtaining β values.

Figure 5.38: Signal plots before and after Elko Beamformer

The observations while performing Elko beamforming are presented above. The

input vs. output signals of the Elko beamformer is shown in Fig. 5.36. As shown

in the Fig. 5.37 and table 5.11, a constant SNR improvement of 10 dB was

observed. PESQ MOS scores after placing a null in the rear half plane of the array

are shown in Fig. 5.40 and table 5.12. The speech and the noise distortions curves

are shown in Fig 5.38 and Fig. 5.39 respectively.

73

Table 5.13: Elko Beamformer Observation

Figure 5.39: Elko Beamformer SNR plot

Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 10.7451 -36.9375 -32.6808

5 10.7778 -37.0730 -32.7357

10 10.8007 -37.1781 -32.7757

15 10.8129 -37.2341 -32.7973

20 10.8179 -37.2572 -32.8063

74

Figure 5.40: Elko Beamformer Speech Distortion plot

Figure 5.41: Elko Beamformer Noise Distortion plot

75

Figure 5.42: Elko Beamformer PESQ MOS plot

Table 5.14: Elko Beamformer PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.573 2.358

5 1.887 2.638

10 2.223 2.902

15 2.547 3.195

20 2.862 3.432

76

5.8 The Elko-Wiener Beamformer

Figure 5.43: Elko-Wiener Beamformer

The output obtained using the Elko-Wiener beamformer is shown in Fig. 5.42.

Wiener filtering the Elko output raises its SNR improvement level to constant 28

dB as shown in Fig. 5.45 and table 5.13. Though there was tremendous increase in

the SNR level, the PESQ MOS scores shown in Fig. 5.46 and table 5.14 were not

to the greater extent just due to the gain of interference from the rear half plane of

the array. The speech and the noise distortion curves of WBF-DSB GSC are

shown in Fig. 5.44 and Fig. 5.45 respectively.

Table 5.15: Elko-Wiener Beamformer Observation

The observations while performing Elko beamforming are presented below.

Input SNR

(dB)

Output SNRI

(dB)

Speech

Distortion

Noise

Distortion

0 28.7614 -36.6123 -28.4446

5 28.2470 -37.0400 -28.4938

10 27.9669 -37.3921 -28.5744

15 27.8048 -37.6134 -28.6413

20 27.7210 -37.7185 -28.6769

77

Figure 5.44: Signal plots before and after Elko-Wiener Beamformer

Figure 5.45: Elko-Wiener Beamformer SNR plot

78

Figure 5.46: Elko-Wiener Beamformer Speech Distortion plot

Figure 5.47: Elko-Wiener Beamformer Noise Distortion plot

79

Figure 5.48: Elko-Wiener Beamformer PESQ MOS plot

Table 5.16: Elko Beamformer PESQ MOS Values

Input SNR

(dB)

Input

PESQ MOS

Output

PESQ MOS

0 1.573 2.905

5 1.887 3.135

10 2.223 3.387

15 2.547 3.636

20 2.862 3.879

80

5.9 Comparison Plots

For ease of comparison, the SNR improvements for different beamformers at

different input SNR levels are provided in Fig. 5.47. Elko based beamformers are

not included in this figure for the fare comparison. The simulation environment for

Elko based beamformers is different compared to the rest and hence compared

separately as shown in Fig. 5.48. Though the simulation environment is different,

the evaluation procedure is similar to the rest of the beamformers.

Figure 5.49: SNR comparison plot

81

Figure 5.50: SNR comparison plot for Elko based beamformers

82

6 SUMMARY AND CONCLUSION

Beamforming is a widely used technique for spatial filtering, in which an array of

microphones is used to collect samples of propagating waves to be processed. In

this thesis, several beamforming techniques for speech enhancement in

interference environment have been implemented and their evaluations are

presented. The main goal while implementing a beamformer is to extract the target

signal in the look-direction from the interference. This thesis report summarizes

the performance of eight beamforming algorithms: DSB, WBF, DSB-DSB GSC,

DSB-WBF GSC, WBF-DSB GSC, WBF-WBF GSC, Elko beamformer and Elko-

Wiener beamformer.

Two test environments have been considered in this work, one for non-Elko based

beamformers and the other for Elko based beamformers. For the two Elko based

beamformers implemented, interference was mounted in the rear half plane of the

array where as for the rest of the six beamformers, the target signal and the

interference were considered in the same side of the array. All the implemented

beamformers approaches optimum performance, as long as the environment is

stationary. Adaptive beamformers are especially pleasing, as they can adapt

automatically to the varying environments. All the algorithms were implemented in

a computer simulated anechoic chamber.

The following conclusions can be drawn from the simulation results:

DSB is the simplest among all the beamformers implemented and has a

SNRI of 4.19dB. The performance of the DSB would improve considerably

if the length of the microphone array is infinite, but practically it is an

unrealistic solution.

WBF provides quite promising results and sustains a high level of

performance regardless of the length of the microphone array and the

location of the speech sources. WBF possesses a SNRI of 18dB at 0dB input

SNR.

The four adaptive beamforming algorithms based on the GSC architecture

possesses enhanced noise suppression capability. The blocking matrix in the

83

GSC architecture was replaced with an additional beamformer, either DSB

or WBF that keeps track of the characteristics of the interfering signal

leading to high interference rejection.

The DSB-DSB GSC and the DSB-WBF GSC beamformers outperformed

the individual DSB in all the performance metrics measurements.

The WBF-DSB GSC and the WBF-WBF GSC struggled to surpass

individual WBF execution. Overall, the individual WBF maintained its high

level of performance.

The pathetic performance of the WBF-DSB GSC is due to the target signal

leakage from the DSB which leads to target signal cancellation at the GSC

output.

The Elko beamformer using first order adaptive differential microphone

array which is able to attenuate a noise / interference source from any

location in the rear half plane of the array have been presented. It offers

rapid adaptation at modest computational complexity.

The Elko-Wiener beamformer could be used for applications in which user

wants to nullify the backward noise / interference and preferred to get

signals from the desired direction. The Elko-Wiener beamformer attenuates

interference over 28dB at the cost of computational complexity.

84

7 FUTURE WORK:

After implementing several beamformers in the context of speech applications,

I could suggest few thoughts for the future work.

During the course of thesis work, only single interference has been used.

It is suggested to calculate the performance of all the beamformers for

multiple spatially distinct noise / interference sources.

Apart from computer simulations, real time implementations are also

necessary to check the beamformers performances.

While linear microphone arrays delivering proficient results,

beamformers should imply different approach by using other array

geometries like circular arrays, semi circular arrays, etc.

85

REFERENCES

[1] Benesty, S. Makino, J. Chen (ed). Speech Enhancement. pp.1-8. Springer,

2005. ISBN 978-3540240396.

[2] L. J. Griffiths and C. W. Jim, “An alternative approach to linearly

constrained adaptive beamforming,” IEEE Transactions on Antennas and

Propagation, vol. 30, no. 1, pp. 27–34, 1982

[3] I..A. McCowan. “Robust Speech Recognition using Microphone Arrays,”

PhD Thesis, Queensland University of Technology, Australia, 2001.

[4] Ferrara, E., Jr.; Widrow, B.; , "Multichannel adaptive filtering for signal

enhancement," Acoustics, Speech and Signal Processing, IEEE

Transactions on , vol.29, no.3, pp. 766- 770, Jun 1981

[5] N.Grbic, “Optimal and Adaptive SubbandBeamforming, Principles and

Applications,”Doctoral Dissertation Series No. 2001:01, ISSN: 1650-

2159, Blekinge Institute of Technology, 2001.

[6] S. Haykin, “Adaptive Filter Theory,” Prentice Hall Int. Inc., 1996, ISBN

0-13-397985-7.

[7] G. W. Elko, “A Simple Adaptive First-Order Differential Microphone,”

in Acoust. And Speech Research Dept. Bell Labs, Lucent Technologies,

Murray Hill, NJ, Aug. 1999.

[8] Lars Johansson, Anders Törnqvist, “Multiple Sensors Speech

Enhancement in Hearing Aids,” BlekingeTekniskaHögskola, Karlskrona,

Sweden, 2004.

[9] A. Eriksson, “Speech Enhancement in Mobile Devices,” in International

Workshop on Acoustics, Echo and Noise Control Proceedings, Plenary

Talk, 2006.

http://en.wikipedia.org/wiki/Special:BookSources/9783540240396

86

[10] Z. Yermeche, “Soft-Constrained SubbandBeamforming for Speech

Enhancement,” Doctoral Dissertation Series No 2007:14, ISSN 1653-

2090, Blekinge Institute of Technology, 2007.

[11] B. Sällberg, “Applied methods to combat noise in human

communication”, Licentiate dissertation series no. 2006:06, Blekinge

Institute of Technology, ISBN 91-7295-087-0.

[12] ITU-T p. 862, “Perceptual Evaluation of Speech Quality (PESQ).”

[13] Van Veen, B.D.; Buckley, K.M.; , "Beamforming: a versatile approach to

spatial filtering," ASSP Magazine, IEEE , vol.5, no.2, pp.4-24, April 1988.

[14] J. –P. Thiran, “Recursive digital filters with maximally flat group delay,”

IEEE Trans. Circ. Theory, vol. 18, no. 6, pp.659-664, 1971.

[15] Hoshuyama, O.; Sugiyama, A.; Hirano, A.; , "A robust adaptive

beamformer for microphone arrays with a blocking matrix using

constrained adaptive filters," Signal Processing, IEEE Transactions on ,

vol.47, no.10, pp.2677-2684, Oct 1999.

[16] N. Grbic, S. Nordholm, “Soft Constrained Subband Beamforming for

Hands-Free Speech Enhancement,” in IEEE International Conference on

Acoustics, Speech and Signal Processing Proceedings, vol. 1, pp. 885-

888, May 2002.

[17] R. A. Monzingo, T. W. Miller, “Introduction to adaptive arrays,” John

Wiley and Sons, New York, 1980.

[18] G. W. Elko, “Super directional Microphone Arrays,” in Acoustic Signal

Processing for Telecommunication, J. Benesty and S. L. Gay (eds.), pp.

181-236, Kluwer Academic Publishers, 2000.

87

[19] T. S. N. U. V. Ramesh, “Speech Enhancement in Hands-Free Device

(Hearing Aid) with emphasis on Elko’s Beamformer,”

BlekingeTekniskaHögskola, Karlskrona, Sweden, 2012.

[20] S.S.V.S. Kotta, K.B.K. Rao, “Acoustic Beamforming for Hearing Aids

using Multi Microphone Array by Designing Graphical User Interface,”

BlekingeTekniskaHögskola, Karlskrona, Sweden, 2012.

Enhancement of Speech Intelligibility using Beamforming ...832777/...Leela Krishna Gudupudi E-mail:...

Documents

Transcript of Enhancement of Speech Intelligibility using Beamforming ...832777/...Leela Krishna Gudupudi E-mail:...