Enhancement of Speech Intelligibility using Beamforming ...832777/...Leela Krishna Gudupudi E-mail:...
Transcript of Enhancement of Speech Intelligibility using Beamforming ...832777/...Leela Krishna Gudupudi E-mail:...
1
Master Thesis
Electrical Engineering
Thesis no: MEE 2010-2012
Enhancement of Speech
Intelligibility using
Beamforming Techniques
Leela Krishna Gudupudi
School of Engineerinng
Blekinge Institute of Technology
371 79 Karlskrona
Sweden
2
This thesis is submitted to the School of Engineering at Blekinge Institute of
Technology in partial fulfillment of the requirements for the degree of Master of
Science in Electrical Engineering.
3
Contact Information:
Author:
Leela Krishna Gudupudi
E-mail: [email protected],
University advisors:
Dr. Nedelko Grbic
Department of Electrical Engineering
School of Engineering
Blekinge Institute of Technology,
Karlskrona, Sweden
Email: [email protected]
Dr. Benny Sällberg
Department of Electrical Engineering
School of Engineering
Blekinge Institute of Technology,
Karlskrona, Sweden
Email: [email protected]
4
ABSTRACT
Speech enhancement is a process of enhancing the intelligibility or quality of
speech using various audio processing algorithms. The concept of speech
enhancement is the most important field for many applications such as
teleconferencing systems, speech recognition, VoIP, mobile phones, hearing aids,
etc. [1]. The delight of the speech enhancement algorithms using single
microphone was tamed and sooner the markets will be dominated by microphone
arrays with beamforming techniques in which significantly high signal to noise
ratios (SNR) are observed. A considerable amount of research has already been
done about the feasibility and advantages of using microphone arrays. Several
beamforming techniques are widely applied by researchers round the globe.
In this thesis, eight selective beamforming techniques have been applied to
enhance the intelligibility of speech degraded by interference. In order to
demonstrate the algorithms, a graphical user interface (GUI) has been developed in
matlab, where user has a chance to select the number of sensors in the linear array
and the spacing between them, source and interference directions, type of
algorithm and so on. Also, user can compare all the Beamformers by keeping a
default environment. The performances are measured in terms of signal to noise
ratio (SNR), speech and noise distortions and ITU-T’s PESQ values.
This thesis describes the opportunities and constraints for application of eight
beamforming techniques using linear microphone arrays.
Keywords: Beamforming, Speech Enhancement, Microphone Arrays, PESQ, Delay-and-Sum
5
ACKOWLEDGEMENTS
I would never have been able to finish my Thesis without the guidance of my
University faculty, help from friends, and support from my family and fiancé.
I would like to express my deepest gratitude to my supervisor, Dr. Nedelko Grbic,
for his excellent guidance, caring, patience and providing me with an excellent
atmosphere for doing this work. I am also grateful to Dr. Benny Sällberg for his
support and valuable advices throughout the Thesis time.
I would like to thank Ramesh Telagareddy and Bulli KoteswaraRao, who as good
friends, was always willing to discuss the concepts and share their thoughts.
I would also like to thank my parents and family, who were always supporting me
and encouraging me with their best wishes.
Finally, I would like to thank my fiancé, Vineela Musti. She was always there
cheering me up and stood by me through the good times and bad. My thesis would
not have been possible without her help.
Leela Krishna Gudupudi
6
To My Parents
7
Table of Contents
ABSTRACT ______________________________________________________ 4
ACKOWLEDGEMENTS ____________________________________________ 5
Table of Contents __________________________________________________ 7
List of Figures ____________________________________________________ 9
List of Tables ____________________________________________________ 12
List of Acronyms _________________________________________________ 13
INTRODUCTION ________________________________________________ 14
1 MICROPHONES ______________________________________________ 15
1.1 Microphone Polar Patterns ______________________________________ 15
1.2 Microphone Arrays ___________________________________________ 17
1.3 Spatial Aliasing ______________________________________________ 18
2 BEAMFORMING TECHNIQUES ________________________________ 20
2.1 Fixed Beamforming ___________________________________________ 20
2.2 Adaptive Beamforming ________________________________________ 22
2.3 Generalized Side-lobe Canceller (GSC) ___________________________ 23
2.4 The Wiener Beamformer _______________________________________ 25
2.5 The Elko Beamformer _________________________________________ 26
3 DESIGN AND SIMULATIONS __________________________________ 31
3.1 Simulation Procedure __________________________________________ 31
4 PERFORMANCE METRICS ____________________________________ 43
4.1 Signal-to-Noise Ratio _________________________________________ 43
4.2 Normalized Speech Distortion ___________________________________ 44
4.3 Normalized Noise Distortion ____________________________________ 45
8
4.4 Perceptual Evaluation of Speech Quality (PESQ) ____________________ 45
5 SIMULATION RESULTS _______________________________________ 47
5.1 Delay-and-Sum Beamformer ____________________________________ 48
5.2 WienerBeamformer ___________________________________________ 52
5.3 DSB – DSB GSC _____________________________________________ 56
5.4 DSB–WBF GSC _____________________________________________ 60
5.5 WBF-DSB GSC ______________________________________________ 64
5.6 WBF-WBF GSC _____________________________________________ 68
5.8 The Elko-Wiener Beamformer___________________________________ 76
5.9 Comparison Plots _____________________________________________ 80
6 SUMMARY AND CONCLUSION ________________________________ 82
7 FUTURE WORK: _____________________________________________ 84
REFERENCES __________________________________________________ 85
9
List of Figures
Figure 1.1: Omni-Directional Microphone polar pattern ........................................ 15
Figure 1.2: Bi-Directional Microphone polar pattern1 ............................................ 16
Figure 1.3: Cardioid microphone polar pattern1 ...................................................... 16
Figure 1.4: Example of spatial aliasing ................................................................... 19
Figure 2.1: Delay and Sum Beamforming ............................................................... 21
Figure 2.2: An adaptive beamformer ....................................................................... 22
Figure 2.3: Griffiths-Jim Generalised Side-lobe Canceller ..................................... 23
Figure 2.4: (a) Linear equispaced differential microphone array (b) Diagram of two
element differential array composed of two omni-directional mic’s and
a time delay. .......................................................................................... 27
Figure 2.5: Directional responses of the two-element array shown in Fig. (a) T=0,
(b)T=(d/c)/2, (c) d/c .............................................................................28
Figure 2.6: Implementation of first-order differential microphone using back-to-
back cardioids .......................................................................................29
Figure 2.7: Directional responses of back-to-back cardioid arrangement (a) β=0;
(b)β=0.383; (c) β=1 ..............................................................................29
Figure 3.1: The delay-and-sum beamforming ......................................................... 32
Figure 3.2: Desired and Interference Matrices for Wiener Beamforming .............. 33
Figure 3.3: Pictorial form of the method followed .................................................. 34
Figure 3.4: Wiener Beamformer Structure .............................................................. 35
Figure 3.5: Novel GSC structure ............................................................................. 36
Figure 3.6: The structure of DSB-DSB GSC ........................................................... 37
Figure 3.7: Structure of the DSB-WBF GSC .......................................................... 38
Figure 3.8: Structure of the WBF-DSB GSC .......................................................... 39
10
Figure 3.9: Structure of the WBF-WBF GSC beamformer. .................................... 40
Figure 3.10: Structure of the Elko-Wiener Beamformer ......................................... 42
Figure 5.1: Simulation set-up in an anechoic chamber............................................ 47
Figure 5.2: The Delay and Sum Beamformer .......................................................... 48
Figure 5.3: Signal plots before and after DSB ......................................................... 49
Figure 5.4: DSB SNR plots ...................................................................................... 49
Figure 5.5: DSB Speech Distortion plots................................................................. 50
Figure 5.6: DSB Noise Distortion plot .................................................................... 50
Figure 5.7: DSB PESQ MOS plot ........................................................................... 51
Figure 5.8: The WienerBeamformer ........................................................................ 52
Figure 5.9: Signal plots before and after WBF ........................................................ 53
Figure 5.10: WBF SNR plot .................................................................................... 53
Figure 5.11: WBF Speech Distortion plot ............................................................... 54
Figure 5.12: WBF Noise Distortion plot ................................................................. 54
Figure 5.13: WBF PESQ MOS plot......................................................................... 55
Figure 5.14: DSB – DSB GSC ................................................................................. 56
Figure 5.15: Signal plots before and after DSB-DSB GSC ..................................... 57
Figure 5.16: DSB-DSB GSC SNR plots .................................................................. 57
Figure 5.17: DSB-DSB GSC Speech Distortion plots ............................................ 58
Figure 5.18: DSB-DSB GSC Noise Distortion plots ............................................... 58
Figure 5.19: DSB-DSB GSC PESQ MOS plots ...................................................... 59
Figure 5.20: DSB–WBF GSC .................................................................................. 60
Figure 5.21: Signal plots before and after DSB-WBF GSC .................................... 61
Figure 5.22: DSB-WBF GSC SNR plot .................................................................. 61
Figure 5.23: DSB-WBF GSC Speech Distortion plot ............................................. 62
11
Figure 5.24: DSB-WBF GSC Noise Distortion plot ............................................... 62
Figure 5.25: DSB-WBF GSC PESQ MOS plot....................................................... 63
Figure 5.26: WBF-DSB GSC .................................................................................. 64
Figure 5.27: Signal plots before and after WBF-DSB GSC .................................... 65
Figure 5.28: WBF-DSB GSC SNR plot .................................................................. 65
Figure 5.29: WBF-DSB GSC Speech Distortion plot ............................................. 66
Figure 5.30: WBF-DSB GSC Noise Distortion plot ............................................... 66
Figure 5.31: WBF-DSB GSC PESQ MOS plot....................................................... 67
Figure 5.32: WBF-WBF GSC ................................................................................. 68
Figure 5.33: Signal plots before and after WBF-WBF GSC ................................... 69
Figure 5.34: WBF-WBF GSC SNR plot ................................................................. 69
Figure 5.35: WBF-WBF GSC Speech Distortion plot ............................................ 70
Figure 5.36: WBF-WBF GSC Noise Distortion plot............................................... 70
Figure 5.37: WBF-WBF GSC PESQ MOS plot ...................................................... 71
Figure 5.38: Signal plots before and after Elko Beamformer .................................. 72
Figure 5.39: Elko Beamformer SNR plot ................................................................ 73
Figure 5.40: Elko Beamformer Speech Distortion plot ........................................... 74
Figure 5.41: Elko Beamformer Noise Distortion plot ............................................. 74
Figure 5.42: Elko Beamformer PESQ MOS plot .................................................... 75
Figure 5.43: Elko-Wiener Beamformer ................................................................... 76
Figure 5.44: Signal plots before and after Elko-Wiener Beamformer .................... 77
Figure 5.45: Elko-Wiener Beamformer SNR plot ................................................... 77
Figure 5.46: Elko-Wiener Beamformer Speech Distortion plot .............................. 78
Figure 5.47: Elko-Wiener Beamformer Noise Distortion plot ................................ 78
Figure 5.48: Elko-Wiener Beamformer PESQ MOS plot ....................................... 79
Figure 5.49: SNR comparison plot .......................................................................... 80
Figure 5.50: SNR comparison plot for Elko based beamformers............................ 81
12
List of Tables
Table 4.1: Quality opinion scale used in the development of PESQ ....................... 46
Table 5.1: Delay and Sum Beamformer Observation .............................................. 48
Table 5.2: DSB PESQ MOS Values ........................................................................ 51
Table 5.3: Wiener Beamformer Observation ........................................................... 52
Table 5.4: WBF PESQ MOS Values ....................................................................... 55
Table 5.5: DSB-DSB GSC Observation .................................................................. 56
Table 5.6: DSB-DSB GSC PESQ MOS Values ...................................................... 59
Table 5.7: DSB-WBF GSC Observation ................................................................. 60
Table 5.8: DSB-WBF GSC PESQ MOS Values ..................................................... 63
Table 5.9: WBF-DSB GSC Observation ................................................................. 64
Table 5.10: WBF-DSB GSC PESQ MOS Values ................................................... 67
Table 5.11: WBF-WBF GSC Observation .............................................................. 68
Table 5.12: WBF-WBF GSC PESQ MOS Values .................................................. 71
Table 5.13: Elko Beamformer Observation ............................................................. 73
Table 5.14: Elko Beamformer PESQ MOS Values ................................................. 75
Table 5.15: Elko-Wiener Beamformer Observation ................................................ 76
Table 5.16: Elko Beamformer PESQ MOS Values ................................................. 79
13
List of Acronyms
1. DSB Delay-Sum-Beamformer
2. WBF Wiener Beamforming
3. GSC Generalized Side-lobe Canceller
4. SNR Signal to Noise Ratio
5. SNRI SNR Improvement
6. PESQ Perceptual Evaluation of Speech Quality
7. SD Speech Distortion
8. ND Noise Distortion
9. GUI Graphical User Interface
10. dB Decibels
11. NLMS Normalized Least Mean Square
12. PSD Power Spectral Density
13. NSD Normalized SD
14. NND Normalized ND
15. DOA Direction of Arrival
16. MOS Mean Opinion Score
14
INTRODUCTION
Beamforming is a long-familiar and widely used technique in which a beam is to
be formed in the direction of the desired speech using an array of sensors where the
samples propagating from this beam are processed and the rest are nullified. In this
study, a female voice recording is used as desired source and a male voice
recording as interference. Eight beamforming algorithms have been applied in this
scenario in which the main aim is to recover the desired female voice in the look-
direction from the interference. This thesis summarizes the performance of the
following eight beamforming techniques:
1. Delay and Sum Beamformer (DSB)
2. The DSB-DSB Generalised Side-lobe Canceller (GSC)
3. Wiener Beamformer (WBF)
4. The WBF-WBF GSC
5. The DSB-WBF GSC
6. The WBF-DSB GSC
7. Elko Beamformer
8. The Elko-Wiener Beamformer
Linear microphone arrays are used throughout the study with user defined number
of sensors with constant user defined spacing between the sensors. All the
beamforming algorithms are implemented in time-domain. Fractional delays are
considered while calculating the sensors data [14]. The performance measures are
the SNR, spectral distortions and ITU-T recommended PESQ (Perceptual
Evaluation of Speech Quality). Entire project was implemented in MATLAB. No
reflections have been considered throughout the project and the speed of the sound
was taken as 343m/s.
15
1 MICROPHONES
A microphone is a transducer that converts sound into an electric signal. It is an
electromechanical device that uses vibration to create an electric signal
proportional to the vibration, which is usually an air pressure wave. Microphones
are classified based on their transducer principal, physical characteristics and by
their directional characteristics.
1.1 Microphone Polar Patterns
Not all microphones pass sound waves from sources coming from all the
directions. Microphones are also categorized according to how well they pickup
sound from certain directions. The three most common microphone patterns are
described here:
Omni-directional:
An omnidirectional microphone is the simplest mic design that will accept
all sounds regardless of its point of origin. These are very easy to use and
have good frequency response. The smallest diameter microphone gives the
best omnidirectional characteristics at high frequencies as the flattening of
the response is directly proportional to the diameter of the microphone.
Figure 1.1: Omni-Directional Microphone polar pattern1
1 Image Courtesy: www.prosoundweb.com
16
Bi-directional:
A bi-directional microphone is also called as “figure of eight” microphone.
This mic with a figure of eight polar pattern allows sound coming from in
front of the mic and from the rear but not from the sides i.e., 90 degree
angle. Most of the Ribbon microphones are bi-directional. The frequency
response is just as good as omnidirectional mic, at least for sounds that are
not too close to the mic.
Figure 1.2: Bi-Directional Microphone polar pattern1
Cardioid:
A cardioid microphone has most of the sensitivity at the look direction and is
least sensitive at the back i.e., a mic that picks up sounds it is pointed at. It
isolates from unwanted ambient sound and is much more resistant to
feedback than omnidirectional mic’s. This makes the cardioid mic’s
particularly suitable for loud stages.
Figure 1.3: Cardioid microphone polar pattern1
17
1.2 Microphone Arrays
In order to achieve better directionality, two or more microphones are positioned
closely, called as microphone arrays. If they are positioned linearly then they are
called as Linear Microphone Arrays. If the distance between the mic’s is equal,
then it could be called as linear equispaced microphone array. Considering the
advantage of the fact that an incoming sound wave arrives at each of the mic’s at a
slightly different time, microphone arrays achieve better directionality compared to
single microphone. Distinguishing sounds based on the spatial location of their
sources is achieved by filtering and combining the individual microphone signals.
The important concepts that are used in the design of microphone arrays are
Beamforming, Array directivity and Beam width.
Beamforming:
If a “beam” is formed by using all the microphone’s signals, then the
microphone array is being able to act as a highly directional microphone.
This beam can be electronically managed to point to the desired source.
More details about the beamforming can be discussed in the next chapter.
Array Directivity:
Higher the directivity of the microphone array, higher the reduction of the
amount of captured ambient noises and reverberated waves. Given the
aperture response as a function of frequency and direction of arrival then the
directivity pattern for a linear equispaced (d meters) array of N microphones
is given by [3],
cos
22
1
)2
1(
)(),(nd
c
fj
N
Nn
n efwfD
(1)
From the above equation we see that the directivity pattern depends upon:
o Length of the microphone array, N
o Spacing between the mic’s, d
o The frequency, f
18
Beam Width:
The beam width and the side-lobe level are the two important characteristics
of the directivity pattern. These characteristics depend on the inter-element
spacing (d), spatial sampling frequency (f) and the length of the array (N).
As the frequency increase, the beam width and the side-lobe level will
decrease. As the effective array length (L=N*d) increases, the beam width
and the side-lobe level decreases. Thus, in order to obtain a constant beam
width we must ensure that the f*dremains relatively constant.
1.3 Spatial Aliasing
Spatial aliasing means insufficient sampling of the data along the space axis.
Spatial aliasing results in the appearance of grating lobes. Similar to the temporal
aliasing, microphone arrays implement spatial sampling and an analogous
requirement exists to avoid grating lobes in the directivity pattern,
max*2
1f
dfs
(2)
wherefsis the spatial sampling frequency in samples per meter and fmax is the highest
sampling component. This leads to the relation,
min
max
1
f
(3)
hereλminis the minimum wavelength in the signal of interest and consequently the
requirement is
2
mind
(4)
Equation 4 is known as the spatial sampling theorem and must be followed in
order to prevent the occurrence of spatial aliasing in the directivity pattern of a
sensor array. Figure 4 illustrates the effect of spatial aliasing.
19
Figure 1.4: Example of spatial aliasing
20
2 BEAMFORMING TECHNIQUES
Spatially propagating signals often encounter the presence of interference signals
and noise signals. If the desired signals and the interference signals occupy the
same temporal frequency band, then temporal filtering cannot be used to separate
the signal from the interferers, though the desired and interferences signals
originate from different spatial locations. This can be done using a Beamformer.
Beamforming is a spatial signal processing technique used to control the
directionality of the receiving signal on a microphone array. Beamforming takes
the advantage of interference to change the directionality of the array. This can be
done by combining all the signals from microphones of the array in a way where
signals coming from desired angles experience constructive interference and all
other angles experience destructive interference. Beamforming techniques are
algorithms for determining the complex sensor weights wn(f) (see eqn.1) in order to
implement a desired shaping and steering of the array directive pattern [3].
Beamforming techniques are broadly classified into two types: data-independent
and data-dependent.
Data-independent or fixed beamformers are those algorithms where their
parameters are fixed during operation. The delay-and-sum beamformeris an
example of fixed beamformer.
Data-dependent or adaptive beamformers continuously update their parameters
based on the received input signals. The Griffiths-Jim beamformeris an example of
adaptive beamformer.
2.1 Fixed Beamforming
Delay and Sum Beamformer (DSB):
DSB is the simplest of all beamforming techniques. It is developed based on
the idea that, if a linear equispaced microphone array is being used, then the
output of each microphone will be the same, except that each one will be
21
delayed by a different amount. So, if the output of each mic is delayed
appropriately, then we add all the outputs together. The desired signal that
was propagating through the array will reinforce, while the noise or
interference will tend to suppress. A block diagram of this can be seen below
in Fig. 2.1.
Figure 2.1: Delay and Sum Beamforming
The delays are calculated based on the intra-element distance of the array.
The crucial factors involved in DSB are the geometric arrangement of the
mic’s and weights associated with each mic. The signal-to-noise ratio (SNR)
of the output signal is greater than that of any individual mic’s signal.
The major drawback of the DSB is the requirement of large number of mic’s
to improve the SNR. If the interference or noise signals are completely
uncorrelated with the desired signal then, for every doubling of number of
mic’s will result in an additional 3dB of SNR increment. Another
disadvantage is that, no nulls are being placed directly in the interference
signal’s direction.
22
2.2 Adaptive Beamforming
Data-dependent beamforming techniques adaptively filter the incoming signals in
order to pass the signal from desired direction, while rejecting interferences or
noises coming from other directions. An adaptive beamformer is a data-dependent
beamformer that is able to separate signals collocated in the frequency band but
separated in the spatial domain [2]. An adaptive beamformer is able to
automatically optimize the microphone array pattern by adjusting the elemental
control weights until a prescribed objective function is satisfied. The choice of
selecting the adaptive algorithm for deriving the adaptive weights plays key role in
determining the convergence speed and system complexity. General adaptive
algorithms that are being used for beamforming include Least Mean Squares
(LMS) algorithm, Normalised LMS (NLMS) algorithm and Recursive Least
Squares (RLS) algorithm. Each algorithm has its own advantages and
disadvantages. A typical adaptive beamformer structure can be seen in Fig. 2.2.
Figure 2.2: An adaptive beamformer
The weight vector w can be chosen according to the input signal vector x(t). The
main aim of the adaptive beamformer is to optimize the system response with
respect to the prescribed criteria, so that the output signal y(t) will be free from
23
noise or interference. The optimum adaptive criteria for beamformer are Minimum
Mean Square Error, Maximum Signal-to-Noise/Interference Ration and Minimum
Variance [2].
There are two basic adaptive approaches, they are: Block adaptation and
Continuous adaptation. In block adaptation, statistics are estimated from a
temporal block of array data and used in an optimum weight equation. In
continuous adaptation, weights are updated based on the input data that is sampled
such that the resulting weight vector sequence converges to the optimum solution
[13].
2.3 Generalized Side-lobe Canceller (GSC)
The GSC method provides a simple solution for implementing Linearly
Constrained Minimum Variance (LCMV) beamformers. The basic idea behind
LCMV beamforming is to constrain the response of the beamformer; signals from
the look-direction are passed with specified gain and phase [13, 15]. The weights
are chosen to minimize output variance subject to the response constraint. This has
the effect of preserving the desired signal while minimizing the noise or interfering
signal strength.
Figure 2.3: Griffiths-Jim Generalised Side-lobe Canceller
24
A block structure of the generalised side-lobe canceller is shown in Fig. 2.3 [2].
GSC separates the adaptive beamformer into two main processing blocks [3]. The
first of these implements a standard fixed beamformer (say DSB) which extracts
the desired signal. The second block is a set of adaptive filters that adaptively
minimize the power in the output. Blocking MatrixBis used to eliminate the desired
signal from this second block, ensuring that it is the noise power that is minimised.
Blocking Matrix:
The purpose of the blocking matrix is to block the desired signal entering the
second stage of GSC structure and allow the interference or noise signals.
Blocking will be occurred if the sum of elements of the rows of blocking
matrix are zero [3], i.e., sum(B(i,:))=0. The number of rows in the blocking
matrix is equal to the number of elements in the array. The standard
Griffiths-Jim blocking matrix is given by [2]
11000
01100
0....0110
0....0011
B
(11)
The output of the blocking matrix is adaptively filtered and summed to get
the lower block’s final output. This final output contains only noise /
interference signals and this should be subtracted from the upper block’s
output, which tends to the final result.
In the original GSC method [2], the optimum array parameters are adaptively
estimated based on the available data set. LMS algorithm was used for adaptive
filtering (LMS GSC) because of its low computational complexity. The GSC is
most widely used adaptive beamformer because of its structure flexibility. In
practice, GSC structure leads to the desired signal distortion due to the signal
leakage from the blocking matrix. In most of the cases, blocking matrix fails to
25
block the target signal and these leakages result in the desired signal distortion at
the final output.
2.4 The Wiener Beamformer
The wiener beamformer is also referred to as minimum mean square error
beamformer. It is defined as, the weights of the beamformer which minimizes the
mean square difference between the beamformer output (when all sources are
present) and a single microphone output (when only the desired signal is present)
[5].
2minarg nsnyEWopt r
w Nr ....2,1 (11)
The optimal weights that yield the optimum output signal is given by equation
(11), the output y[n], from the beamformer is given by,
N
i
L
j
ii jnxjwny1
1
0 (12)
where, L-1 is the order of the filters and wi[j], j=0,1,....,L-1,are the filter taps for
ith
mic. N is the number of the mic’s and xi(n) is the ith
mic observation.
Sr[n] in eqn. (11) is the single reference mic observation when only desired signal
is chosen as input. The optimal weights which minimize the mean square error
between the output and the reference signal is given by, [6]
snnssopt rRRw1
(13)
here, Rss and Rnn are the autocorrelation matrices of signal of interest and noise
respectively and are given by, [5]
sNsNsNssNs
sNsssss
sNsssss
ss
RRR
RRR
RRR
R
21
22212
12111
(14)
26
nNnNnNnnNn
nNnnnnn
nNnnnnn
nn
RRR
RRR
RRR
R
21
22212
12111
(15)
rs is the cross correlation vector and is given by,
Ns rrrr 21 (16)
where, each element is given by,
knsnsEkr rii * i=1,2,...,N, Nr ....2,1 , k=0,1,...,L-1.
(17)
The cross correlation vector rs is the centre column of the autocorrelation matrix of
the desired signal, Rss. The optimal weights, w, are arranged as,
TT
N
TT wwww 21 (18)
The performance of this wiener beamforming algorithm and the results will be
presented in the following sections.
2.5 The Elko Beamformer
Gary W. Elko designed an algorithm for directional microphone arrays [7]. In
hands-free communications, background noise has prejudicious effect on the
microphone output power. By placing a null in the rear-half plane of the
microphone array, a significant improvement in the signal-to-noise ratio can be
observed. This is the concept behind adaptive first order differential microphones.
This can be achieved using two closely placed omni-directional microphones. Any
first order array can be realized by combining the weighted subtraction of these
two outputs [7]. The null angle can be steered using certain combination
weighting. The Elko algorithm is very easy to implement and its system
complexity is less. The elements of the differential microphone array are placed in
an alternating sign fashion such that the directionality can be realized. Differential
microphone array is super-directional as it has high directivity compared to omni-
27
directional microphone array. Therefore, these types of arrays are most suitable for
personal communication devices and teleconferencing.
2.5.1 Design
Figure 2.4: (a) Linear equispaced differential microphone array (b) Diagram of two element
differential array composed of two omni-directional mic’s and a time delay.
When a plane sound wave signal s(t) with spectrum S(ω) incident on a two element
linear equispaced (d) microphone array making an angle ϴ with its axis as shown
in Fig. 2.4 (b), then the sound wave will reach the mic’s with different delays
based on the intra element distance d. The time difference can be written as,
cdca /)cos(/ (19)
where, c is the velocity of sound (343m/s). The microphone which gets the earlier
incidence of sound file should be delayed and subtracted from the other
microphone signal to get the output, y(t) [8] ,
)()()( Ttststy
)()/)(cos( Ttscts (20)
28
By transforming Eqn. (20) into frequency domain,
)(),(cos
TjeeY c
dj
(21)
The directional response of the array is presented in the below Fig. 2.5. The null
location can be steered by changing the time delay T. In order to change the null
location from 90o to 0
o, the time delay T should be changed from 0 to d/c.
Figure 2.5: Directional responses of the two-element array shown in Fig. 5 (a) T=0, (b)T=(d/c)/2,
(c) d/c
A cardioid system is easy to develop as shown in Fig. 2.5(c) with very less
computational complexity using the set-up shown in Fig. 2.4. The constraints to get
cardioid directivity are: 1) by using unit sample delay, T = 1. 2) by setting the
sampling period as d/c. By using the combination of two back-to-back cardioid
mic’s, one can implement a first-order differential microphone [7].
29
Figure 2.6: Implementation of first-order differential microphone using back-to-back cardioids
The back-to-back cardioid arrangement is as shown in Fig. 2.6. The directivity
patterns using the arrangement in Fig. 2.6 are shown in Fig. 2.7.
Figure 2.7: Directional responses of back-to-back cardioid arrangement (a) β=0; (b)β=0.383;
(c) β=1
By examining Fig. 2.6 and Eqn. (20), we can write the expressions for forward
cardioid and backward cardioid,
)()()(CF Ttstst
30
)/)cos()/(()( cdcdtsts (22)
)()()(CF Ttstst
))/(()/)cos(( cdtscdts (23)
)(*)()( tCtCty BF (24)
Transforming Eqn. (24) into frequency domain,
),(*),(),( BF CCY (25)
cdjjjeeeSY c
dcd
/cos)cos1(1)(),(
(26)
Here the null location can be steered from 180o
to 90o by changing the value of β
from 0 to 1. Unit time delay (T=1) has been considered. Directional responses of
the system in Fig. 2.6 are plotted in Fig. 8 by changing the values of β.
In order to make the system adaptive, a simple and easy LMS algorithm has been
used for the back-to-back cardioid adaptive first-order differential array.
Squaring and differentiating with respect to β on both sides of the Eqn. (24) gives,
)()(2)(2
tCtyd
tdyB
(27)
Thus, the LMS version of the β is given as,
)()(21 tCty Btt (28)
The normalized LMS version of the β is therefore,
)(
)()(2
21tC
tCty
B
Btt (29)
where, µ is the step size and )(2 tCB indicates the time average to normalize µ.
31
3 DESIGN AND SIMULATIONS
A graphical user interface (GUI) has been designed using MATLAB to study,
demonstrate and play with beamforming techniques. A linear equispaced
microphone array is used throughout the thesis, where the user is allowed to
choose the number of elements in the array and the intra-element distance. Two
sound sources s(n) and I(n), English speaking female and male recordings are used
as desired and interference signals respectively. User has again given choice to
select the incidence angles of desired and interference sources. A total of eight
beamforming algorithms have been implemented, they are: 1) The delay-and-sum
beamformer (DSB) 2) The wiener beamformer (WBF) 3) The DSB-DSB GSC 4)
The WBF-DSB GSC 5) The DSB-WBF GSC 6) The WBF-WBF GSC 7) The Elko
beamformer 8) The Elko-Wiener beamformer. The performance measures used in
the GUI are: 1) signal-to-noise ratio (SNR) 2) speech distortion plot. 3) noise
distortion plot 4) perceptual evaluation of speech quality (PESQ).
3.1 Simulation Procedure
Let s(n) and I(n) be the sequences of desired and interference sound sources
respectively and making angles of incidences ϴ1 and ϴ2 with the centre of N
element linear microphone array. The sound field is composed by plane waves. Let
d meters be the intra-element distance. The velocity of the sound is c=343m/s. The
arrival time differences of the two speech signals to each microphone in the array
are computed and added to get the microphone response. The delay calculations
are determined by the steering direction of the array:
c
dnn
cos)1( (30)
where, n is the microphone index range from 1 n N, is the time the sound
wave takes to travel between the reference sensor and nth sensor. Getting
microphone array response is the common step in the simulation procedure of all
the mentioned beamformers.
32
3.1.1 The Delay-and-sum Beamformer (DSB):
The delay-and-sum beamformer is very easy to implement after getting the each
microphone response in the array. As the name indicates, appropriate delays are
inserted after each microphone to compensate for the arrival time differences of the
desired speech signal. The outputs of the delays are the time aligned signals which
are then added together to get the beamformer output [see Fig. 2.2]. How to insert
appropriate delays? The answer is simple, using the input delay vector of desired
speech signal, calculated using Eqn. 30. Flip this vector and insert each element as
delay to each microphone. So, the microphone to which the desired signal
impinges first will get the highest delay. This has the effect of reinforcing the
desired signals in phase while attenuating the interference, which are likely to shift
out of phase.
Figure 3.1: The delay-and-sum beamforming
33
3.1.2 The Wiener Beamforming (WBF):
The simulation procedure for wiener beamforming starts before adding the desired
and interference signals at each mic’s to get microphone response. Consider the
desired and interference signals separately at each mic and represent in a matrix
form. Let the total number of samples of speech and interference signals at each
mic be ‘S’. Therefore, the order of the desired and interference matrices are N x S
as shown in Fig. 3.2. Each row in the matrix represents corresponding mic’s
speech/interference response.
Figure 3.2: Desired and Interference Matrices for Wiener Beamforming
Let us recall Eqn. (13), the optimum weights which minimize the mean square
error between the output and the reference signal. It requires the auto-correlation
matrices of desired and interference signals, also the cross-correlation vector. How
to get these matrices using M_S and M_I? Here is the method that I used while
implementing wiener beamformer. Let the order of the wiener filter be 64.
Consider an empty matrix X of order 64*N xK, where K is the integer part of S/64.
Now, consider the first 64 columns of matrix M_S and make this N x 64 matrix in
to a single column vector and let it be x1. Replace the first column of X with x1.
Now, consider the next 64 columns (65 to 128) of M_S and convert it into single
column vector and let it be x2. Replace the second column of X with x2. Follow this
procedure till the end of the matrix M_S and get the matrix X. Also, follow the
34
similar method using matrix M_I and get the matrix Y. This method of finding X
and Y matrices is shown in following Figure 3.3.
Figure 3.3: Pictorial form of the method followed
After getting the X and Y matrices, the auto-correlation matrices of the desired
signal s(n) and interference signal I(n) can be calculated using the following
equations:
35
The auto-correlation matrices of the desired (Rss) and interference signal (Rnn):
K
i
T
iiss xxK
R1
).(1
(31)
K
i
T
iinn yyK
R1
).(1
(32)
The cross-correlation vector term (rs) of Eqn. (13) is given by the centre column of
the auto-correlation matrix, Rss. If the user wants to concentrate on the interference
alone then the cross-correlation vector should be chosen as the centre column of
the auto-correlation matrix, Rnn. So, now we have all the required elements in Eqn.
(13) to calculate the optimum weight vector, Wopt. In this particular case, the order
of the optimum weight vector Wopt is N*64 x 1. Now, consider the first 64
elements of Woptand make it flip upside down and let it be wN. Again, consider the
next 64 elements of Wopti.e., from 65 to 128 and make it flip upside down and let it
be wN-1. Follow the similar procedure till the end of vector Wopt that is, until we get
w1. Now, use these individual weight vectors to filter the N microphone responses
in the array. Add the entire filtered outputs gives rise to the wiener beamformer
response s’(n) as shown in following Fig. 3.4.
Figure 3.4: Wiener Beamformer Structure
36
3.1.3 Generalized Side-lobe Canceller (GSC):
In this thesis, few efficient algorithms are implemented based on the GSC structure
discussed in section 2.3. The performances of the suggested realizations of the
NLMS algorithm based GSC methods are illustrated in the context of beamforming
applications. The major problem with the traditional GSC [2] is the sensitiveness
of the blocking matrix which leads to target signal leakage and thereby
cancellation of the desired signal at the final output. So, why don’t we use another
beamformer in the second path of the GSC structure instead of a blocking matrix?
Yes, based on this idea and using the DSB and the WBF beamformers I
implemented four kinds of beamformer structures. The structure of the beamformer
with N microphones is as shown in Fig. 3.5.
Figure 3.5: Novel GSC structure
Therefore, the four implemented beamformers are the DSB-DSB GSC, the DSB-
WBF GSC, the WBF-DSB GSC and the WBF-WBF GSC.
37
3.1.4 The DSB-DSB GSC:
The DSB-DSB GSC structure with N microphones is shown in Fig. 3.6. The GSC
structure includes two delay-and-sum beamformers, one in the upper path and the
other in the lower path and an adaptive filter. The upper path DSB enhances the
target signal and let it be d1(n), where n is the sample index. The lower path DSB
enhances the interference signal and rejects the target signal, let the output from
the lower DSB be d2(n). The NLMS adaptive filter adaptively subtracts the
components correlated to the d2(n)from the d1(n) gives rise to the final output E(n).
Here, the lower DSB acts as blocking matrix. If the number of sound sources
increases then the number of lower paths in the GSC structure also increases to the
same number where the lower paths consist of a beamformer and an adaptive filter.
All the outputs of the adaptive filters are summed before subtracting from the
upper beamformer output (d1(n)).
Figure 3.6: The structure of DSB-DSB GSC
38
3.1.5 The DSB-WBF GSC:
We can imagine target signal leakage from the DSB in the above method,
especially in the lower path which leads to the desired signal cancellation at the
final output. Now, replace the DSB from the lower path with wiener beamformer
(WBF) which leads to the DSB-WBF GSC. The structure of the DSB-WBF GSC
with N microphones and two sound sources is shown in Fig. 3.7. In this method,
the DSB enhances the desired signal s(n) and let the output of the DSB be d1(n).
The WBF in the lower path acts like a blocking matrix which extracts the
interference signal and rejects the desired signal. The output of the WBF d2(n)is
free from leakage problems. The NLMS adaptive filter adaptively subtracts the
components correlated to the d2(n)from the d1(n) gives rise to the final output E(n).
We can observe better signal enhancement at the output compared to DSB-DSB
GSC.
Figure 3.7: Structure of the DSB-WBF GSC
39
3.1.6 The WBF-DSB GSC:
Let us invert the beamformer sections from the above method that is wiener
beamformer in the upper layer to enhance the desired signal and delay-and-sum
beamformer in the lower path to get the interference signal. As mentioned before,
d1(n)and d2(n) are the outputs of the WBF and DSB respectively. The structure of
the DSB-WBF GSC with N microphones and two sound sources is shown in Fig.
3.8. The NLMS adaptive filter adaptively subtracts the components correlated to
the d2(n)from the d1(n) gives rise to the final output E(n). We can observe better
signal-to-noise ratio at the output compared to DSB-WBF GSC, this is because of
the robust performance of the WBF in the upper layer.
Figure 3.8: Structure of the WBF-DSB GSC
40
3.1.7 The WBF-WBF GSC:
If there is no chance for the signal leakage in both the upper and lower layers of
GSC structure then it leads to the perfect system structure with best signal-to-noise
ratio. The WBF-WBF GSC is such a desired system which is a common
improvement that builds on the wiener beamformer. It comprises two signal flow
paths, a WBF in the upper path enhances the target signal while the other WBF in
the lower path approximates the interference signal. Let the outputs of the upper
and lower path beamformers be d1(n)and d2(n)respectively. The interference signal
d2(n) is adaptively filtered using NLMS filter and subtracting with the
approximated target signal d1(n) to get the paths converges. The WBF-WBF GSC
structure with N microphone array is shown in Fig. 3.9.
Figure 3.9: Structure of the WBF-WBF GSC beamformer.
41
3.1.8 The Elko Beamformer:
The design of the Elko beamformer was discussed in section 2.5. The initial step in
the implementation part is similar up to getting microphone array response as
mentioned in section 3.1. The necessary steps while implementing Elko
beamformer includes alternate arrangement of forward and backward facing
cardioids, using unit delays after each cardioid. As I am using two speech
recordings throughout the thesis and the Elko beamformer is used to suppress the
background sound sources, hence desired speech is allowed to impinge from the
forward direction (-90to 90 degrees) and the interference is from backward
direction (-90 to -180 or 90 to 180). After choosing the direction of arrivals of
sound sources, consider adjacent pairs of microphones and then find the individual
angles of arrival (say ϴ1 and ϴ2 for desired and interference sources) to the
respective microphone pair. Use these new angles to get the forward and backward
cardioid data.
)2
)cos1(sin(*)()
2
)cos1(sin(*)()( 21
kd
nIkd
nSnCF
(30)
)2
)cos1(sin(*)()
2
)cos1(sin(*)()( 21
kd
nIkd
nSnCB
(31)
where, wave vector k=ω/c, ϴ1 and ϴ2 are the angles of arrivals of the desired and
interference signals for selected pair of microphones.
After getting the cardioid responses, using equations 2.4 and 2.9, we will get the
elkobeamformer output y1(n) for the first pair of microphones. Similarly we will
get the elkobeamformer outputs for all the pairs of microphones in the array.
)()()()( 21 nynynyny m (24)
Finally by summing up all the outputs we will get the first-order differential
microphone array response. If the number of microphones in the array is odd then
take cumulative pairs of forward and backward facing microphones that is if the
first pair of microphones are forward and backward cardioids then the next pair
42
will be backward cardioid from the last pair and the next adjacent forward
cardioid.
3.1.9 The Elko-Wiener Beamformer:
The Elko-Wiener beamformer is a common improvement that builds on the
elkobeamformer. It comprises of two sections, one the elkobeamformer that
minimizes the microphone output power by locating the solitary first order
microphone null in the rear-half plane, while the other section is the typical wiener
beamformer which enhances the desired speech signal from the elkobeamformer
output. The structure of N microphone elko-wiener beamformer is shown in Fig.
3.10.
Figure 3.10: Structure of the Elko-Wiener Beamformer
The intelligibility of the desired speech signal at the output will be considerably
high compared to the only elkobeamformer due to the wiener beamforming. If the
number of microphones in the array is odd then the pairs of microphones should be
like [(1, 2), (2, 3), (3, 4) ........ (N-2, N-1), (N-1, N)] to perform elko algorithm.
43
4 PERFORMANCE METRICS
A speech performance metric is a measure of the performance of the hands-free
speech enhancement system with respect to the quality of the speech it reproduced.
An ideal speech communication system should produce the same perceived sound
impression on the listener’s side as in a situation where a speaker and a listener
communicate by speech in a close and quite anechoic chamber [9]. The system
performing noise/interference suppression should preserve the quality of the
speech. The performance of the speech enhancement systems requires the
development of measures to evaluate the overall quality of the received speech
[10].
The objective measures that I used in this thesis to evaluate the performance of the
beamforming algorithms are: 1) signal-to-noise ratio (SNR) at the input and output
of the systems, the output SNR vs. input SNR graphs are occasionally plotted. 2)
Measures of speech and noise spectral distortion caused by the beamforming filters
and 3) Perceptual evaluation of speech quality (PESQ), an ITU-T standard. These
metrics present the advantage of being repeatable and efficiently computable.
While computing the performance metrics; a system with particular beamforming
algorithm is first executed with both the sound sources (desired and interference
speech) present. After the system has converged, all the beamforming filters
weights W should be saved. Now, disable the interference sound source and allow
only the desired speech signal through the system and save the response. Next,
disable the desired sound source and allow the interference only to pass through
the saved system and save the response. Using these two individual responses we
can compute the performance measures.
4.1 Signal-to-Noise Ratio
This metric measures the dominance of speech signal with respect to the
noise/interference. Though I am not using any noise in this thesis, consider
interference as noise. So, instead of mentioning signal-to-interference ratio (SIR), I
used the term signal-to-noise ratio (SNR) which is more familiar. Input SNR is
44
given by the ratio of the variance of the desired speech signal (ESi) to the variance
of the interference signal (ENi) at a particular reference microphone. Similarly, the
output SNR is given by the variance of the output when only desired speech is
given as input (ESo) to the variance of the output when only interference is given as
input (ENo).
)(
10log10)( Ni
Si
E
E
IN dBSNR (32)
)(
10log10)( No
So
E
E
OUT dBSNR (33)
Therefore, the difference of the output SNR and input SNR values gives the SNR
improvement of the beamforming algorithm.
INOUT SNRSNRSNRI (34)
4.2 Normalized Speech Distortion
The normalized speech distortion is defined as the deviation in the power spectral
density (PSD) of the clean speech at the reference microphone and the processed
speech signal at the output. To establish a power reference level, the enhanced
output signal is normalized by the target speech signal gain of the beamforming
structure [10]. The power normalization constant CS is computed as
))((
))((
S
SYmean
SmeanC (35)
where, S(Ω) is PSD of S(n) and YS(Ω) is PSD of ys(n).
)(*)(1 SSS YCY (36)
Now, the speech distortion is given by
))((
))()((log10
1
10
Sabs
SYabsNSD S
(37)
where, abs() gives the absolute value.
45
4.3 Normalized Noise Distortion
Normalized noise distortion (NND) is similar to NSD and is defined as the
deviation in the PSD of the clean interference signal at the reference microphone
and the processed interference signal at the output. To establish a power reference
level, the output interference signal is normalized by the interference signal gain of
the beamforming structure [10]. The power normalization constant CI is computed
as
))((
))((
I
IYmean
ImeanC (38)
where, I(Ω) is PSD of I(n) and YI(Ω) is PSD of yI(n).
)(*)(1 III YCY (39)
Now, the noise distortion is given by
))((
))()((log10
1
10
Iabs
IYabsNND I
(40)
where, abs() gives the absolute value.
4.4 Perceptual Evaluation of Speech Quality (PESQ)
The subjective analysis in determining the speech quality of a transmission system
is always a lengthy and expensive process. PESQ is an objective analysis method
recommended by International Telecommunications Union, ITU-T P.862. This tool
predicts the results of subjective listening tests based on the schematic procedure
described by the PESQ application guide ITU-T P.862.3 [12].
Let X(n) be the pre-processed signal and Y(n) be the processed signal then PESQ
tool determines the quality of Y(n) by using a large database of subjective tests.
The final PESQ score is a linear combination of the average disturbance value and
the average asymmetrical disturbance value [12]. The perceived listening quality is
expressed in terms of Mean Opinion Score (MOS), an average quality score over a
46
large set of subjects. For most of the cases the output range will be in between 1.0
and 4.5. Most of the subjective experiments used in the development of PESQ used
the Absolute Category Rating (ACR) opinion scale as shown in table-1.
PESQ posses an advantage of being rapid and repeatable besides it admits less
computational time.
Table 4.1:Quality opinion scale used in the development of PESQ
TABLE - 1
QUALITY OPINION SCALE USED IN THE DEVELOPMENT OF PESQ
Quality of the Speech
Score
Excellent
Good
Fair
Poor
Bad
4.5
4
3
2
1
47
5 SIMULATION RESULTS
In order to evaluate the implemented beamformers described in Chapter-3, I used a
6 element linear microphone array. The distance between the adjacent mic’s was
2.14 centimetres. The sampling rate was 16Khz. Except for Elko beamformers, for
the rest of the six beamformers, the Direction of Arrival (DOA) of the target signal
was set at 45o and the DOA of the interference was set at -45
o. All the
beamformers was evaluated in a computer simulated anechoic environment by
using the performance metrics described in Chapter-4. Normalized Least Mean
Square (NLMS) algorithm has been used for adaptive filtering wherever
applicable. Order of the filter was taken as 20 with a step-size (µ) of 0.0001. Two
sound sources have been used in the simulation, one female English speaker
recording as desired source and the other male English speaker recording as
interference. The simulation set-up is shown in Fig. 5.1.
Figure 5.1: Simulation set-up in an anechoic chamber
48
5.1 Delay-and-Sum Beamformer
Figure 5.2: The Delay and Sum Beamformer
The output obtained with DSB along with its input is shown in Fig. 5.3, which
shows that the signal was enhanced. A constant SNR improvement of 4.1984 dB
was observed with DSB for all input SNR conditions as shown in the table 5.1.
Table 5.2 shows the PESQ scores which depicts the improvement in the desired
speech quality at the output. The speech and the noise distortion plots are shown in
Fig. 5.5 and Fig. 5.6 respectively.
Table 5.1: Delay and Sum Beamformer Observation
The observations while performing DSB are presented below using the
performance metrics discussed in Chapter 3.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 4.1984 -45.7430 -32.8376
5 4.1984 -45.7430 -32.8376
10 4.1984 -45.7430 -32.8376
15 4.1984 -45.7430 -32.8376
20 4.1984 -45.7430 -32.8376
49
Figure 5.3: Signal plots before and after DSB
Figure 5.4: DSB SNR plots
50
Figure 5.5: DSB Speech Distortion plots
Figure 5.6: DSB Noise Distortion plot
51
Figure 5.7: DSB PESQ MOS plot
Table 5.2: DSB PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.574 2.179
5 1.883 2.504
10 2.215 2.832
15 2.543 3.140
20 2.865 3.453
52
5.2 WienerBeamformer
Figure 5.8: The WienerBeamformer
The input vs. output signals of the WBF is shown in Fig. 5.9, which clearly
indicates the high level performance of the WBF. The SNR improvements at
various input SNR levels are given in table 5.3. An average of 13 dB SNRI was
observed using WBF as shown in Fig. 5.10. PESQ MOS scores in the table 5.4
prove that the quality of the desired speech signal at the output was good. PESQ
scores for different input conditions are shown in Fig. 5.13. The speech and the
noise distortion plots of WBF are shown in Fig. 5.11 and Fig. 5.12 respectively.
Table 5.3: Wiener Beamformer Observation
The observations while performing WBF are presented below using the
performance metrics discussed in Chapter 3.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 18.7719 -40.5616 -28.3315
5 14.1242 -42.6910 -28.3316
10 11.9452 -46.2568 -28.3316
15 11.1301 -50.6987 -28.3316
20 10.8564 -55.4347 -28.3316
53
Figure 5.9: Signal plots before and after WBF
Figure 5.10: WBF SNR plot
54
Figure 5.11: WBF Speech Distortion plot
Figure 5.12: WBF Noise Distortion plot
55
Figure 5.13: WBF PESQ MOS plot
Table 5.4: WBF PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.572 3.241
5 1.888 3.262
10 2.223 3.440
15 2.548 3.709
20 2.862 4.012
56
5.3 DSB – DSB GSC
Figure 5.14: DSB – DSB GSC
The input-output signal comparison plot of the DSB-DSB GSC is shown in Fig.
5.15. As shown in table 5.5, a constant SNR improvement of 5.3 dB was observed
using DSB-DSB GSC, which is 1.2 dB more compared to normal DSB as shown in
Fig. 5.16. The speech and the noise distortion plots are shown in Fig. 5.17 and Fig.
5.18 respectively. PESQ MOS scores for different input SNR conditions are shown
in Fig. 5.19 and table 5.6, which depicts slender improvement compared to DSB.
Table 5.5: DSB-DSB GSC Observation
The observations while performing DSB – DSB GSC are presented below using
the performance metrics discussed in Chapter 3.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 5.3421 -44.6611 -33.0241
5 5.3384 -44.5430 -33.0058
10 5.3501 -44.4329 -33.0388
15 5.3614 -44.3435 -33.0733
20 5.3825 -44.2812 -33.0987
57
Figure 5.15: Signal plots before and after DSB-DSB GSC
Figure 5.16: DSB-DSB GSC SNR plots
58
Figure 5.17: DSB-DSB GSC Speech Distortion plots
Figure 5.18: DSB-DSB GSC Noise Distortion plots
59
Figure 5.19: DSB-DSB GSC PESQ MOS plots
Table 5.6: DSB-DSB GSC PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.574 2.188
5 1.883 2.514
10 2.215 2.847
15 2.543 3.160
20 2.865 3.473
60
5.4 DSB–WBF GSC
Figure 5.20: DSB–WBF GSC
The output obtained using DSB-WBF GSC is plotted along with its input in Fig.
5.21. Table 5.8 and Fig. 5.22 shows the SNR improvement at various input SNR
levels, an average of 12 dB improvement was observed using DSB-WBF GSC.
The better performance of the DSB-WBF GSC compared to the DSB-DSB GSC is
due to the WBF in the second stage of the GSC. The WBF has got high level
performance with very lesser extent of desired speech signal leakage. The PESQ
scores are shown in Fig. 5.24 and in table 5.8. The speech and the noise distortion
plots of DSB-WBF are shown in Fig. 5.23 and Fig. 5.24 respectively.
Table 5.7: DSB-WBF GSC Observation
The observations while performing DSB–WBF GSC are presented below using the
performance metrics discussed in Chapter 3.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 11.9192 -34.5783 -31.3902
5 11.9996 -38.1477 -31.5655
10 13.6731 -42.1718 -31.7685
15 17.0847 -45.0234 -32.0713
20 12.8110 -46.1565 -33.4017
61
Figure 5.21: Signal plots before and after DSB-WBF GSC
Figure 5.22: DSB-WBF GSC SNR plot
62
Figure 5.23: DSB-WBF GSC Speech Distortion plot
Figure 5.24: DSB-WBF GSC Noise Distortion plot
63
Figure 5.25: DSB-WBF GSC PESQ MOS plot
Table 5.8: DSB-WBF GSC PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.579 2.197
5 1.883 2.528
10 2.215 2.866
15 2.543 3.192
20 2.865 3.596
64
5.5 WBF-DSB GSC
Figure 5.26: WBF-DSB GSC
The input and output signals of the WBF-DSB GSC is plotted in Fig. 5.25. The
improvement in the SNR using WBF-DSB GSC at different input SNR levels is
shown in Fig. 5.26 and table 5.7. A slight decrease in the SNR levels can be
observed compared to the WBF. PESQ scores are given in Fig. 5.29 and table 5.8,
though it depicts a good quality signal, again we can observe slighter decrease in
the PESQ scores compared to WBF. The reason behind the fall of values is due to
the poor performance of the DSB in the second stage of the GSC which leads to
the desired speech cancellation at the output. The speech and the noise distortion
curves of WBF-DSB GSC are shown in Fig. 5.27 and Fig. 5.28 respectively.
Table 5.9: WBF-DSB GSC Observation
The observations while performing WBF – DSB GSC are presented below using
the performance metrics discussed in Chapter 3.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 18.3303 -40.4923 -29.0132
5 14.0236 -41.6520 -28.7313
10 11.9201 -41.8120 -28.7319
15 11.1123 -41.6466 -28.8029
20 10.7507 -41.4458 -28.9159
65
Figure 5.27: Signal plots before and after WBF-DSB GSC
Figure 5.28: WBF-DSB GSC SNR plot
66
Figure 5.29: WBF-DSB GSC Speech Distortion plot
Figure 5.30: WBF-DSB GSC Noise Distortion plot
67
Figure 5.31: WBF-DSB GSC PESQ MOS plot
Table 5.10: WBF-DSB GSC PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.575 3.107
5 1.885 3.177
10 2.224 3.361
15 2.548 3.609
20 2.863 3.837
68
5.6 WBF-WBF GSC
Figure 5.32: WBF-WBF GSC
The output obtained using WBF-WBF GSC is shown in Fig. 5.31. The SNR
improvements using WBF-WBF GSC under different input SNR levels are shown
in Fig. 5.32 and table 5.9. An almost similar improvement as in the case of only
WBF was observed. PESQ MOS scores are shown in Fig. 5.35 and table 5.10,
interference / noise gain mismatch in the two stages is the cause for the slighter fall
of the values but still the good quality is maintained. The speech and the noise
distortion curves of WBF-DSB GSC are shown in Fig. 5.33 and Fig. 5.34
respectively.
Table 5.11: WBF-WBF GSC Observation
The observations while performing WBF-WBF GSC are presented below using the
performance metrics discussed in Chapter 3.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 18.7720 -40.5617 -28.3316
5 14.1244 -42.6912 -28.3317
10 11.9453 -46.2570 -28.3317
15 11.1302 -50.6988 -28.3317
20 10.8565 -55.4348 -28.3317
69
Figure 5.33: Signal plots before and after WBF-WBF GSC
Figure 5.34: WBF-WBF GSC SNR plot
70
Figure 5.35: WBF-WBF GSC Speech Distortion plot
Figure 5.36: WBF-WBF GSC Noise Distortion plot
71
Figure 5.37: WBF-WBF GSC PESQ MOS plot
Table 5.12: WBF-WBF GSC PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.572 3.188
5 1.888 3.225
10 2.223 3.425
15 2.548 3.700
20 2.862 4.008
72
5.7 The Elko Beamformer
The simulation environment for the evaluation of Elko based beamformers consists
of 6 cardioid microphones. The forward and backward cardioids are arranged
alternatively as shown. The DOA of the target signal was set at 45o and the
interference was set at -135o to the centre of the array. Rest of the properties is
similar to the other beamformers discussed so far. NLMS algorithm has been used
for obtaining β values.
Figure 5.38: Signal plots before and after Elko Beamformer
The observations while performing Elko beamforming are presented above. The
input vs. output signals of the Elko beamformer is shown in Fig. 5.36. As shown
in the Fig. 5.37 and table 5.11, a constant SNR improvement of 10 dB was
observed. PESQ MOS scores after placing a null in the rear half plane of the array
are shown in Fig. 5.40 and table 5.12. The speech and the noise distortions curves
are shown in Fig 5.38 and Fig. 5.39 respectively.
73
Table 5.13: Elko Beamformer Observation
Figure 5.39: Elko Beamformer SNR plot
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 10.7451 -36.9375 -32.6808
5 10.7778 -37.0730 -32.7357
10 10.8007 -37.1781 -32.7757
15 10.8129 -37.2341 -32.7973
20 10.8179 -37.2572 -32.8063
74
Figure 5.40: Elko Beamformer Speech Distortion plot
Figure 5.41: Elko Beamformer Noise Distortion plot
75
Figure 5.42: Elko Beamformer PESQ MOS plot
Table 5.14: Elko Beamformer PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.573 2.358
5 1.887 2.638
10 2.223 2.902
15 2.547 3.195
20 2.862 3.432
76
5.8 The Elko-Wiener Beamformer
Figure 5.43: Elko-Wiener Beamformer
The output obtained using the Elko-Wiener beamformer is shown in Fig. 5.42.
Wiener filtering the Elko output raises its SNR improvement level to constant 28
dB as shown in Fig. 5.45 and table 5.13. Though there was tremendous increase in
the SNR level, the PESQ MOS scores shown in Fig. 5.46 and table 5.14 were not
to the greater extent just due to the gain of interference from the rear half plane of
the array. The speech and the noise distortion curves of WBF-DSB GSC are
shown in Fig. 5.44 and Fig. 5.45 respectively.
Table 5.15: Elko-Wiener Beamformer Observation
The observations while performing Elko beamforming are presented below.
Input SNR
(dB)
Output SNRI
(dB)
Speech
Distortion
Noise
Distortion
0 28.7614 -36.6123 -28.4446
5 28.2470 -37.0400 -28.4938
10 27.9669 -37.3921 -28.5744
15 27.8048 -37.6134 -28.6413
20 27.7210 -37.7185 -28.6769
77
Figure 5.44: Signal plots before and after Elko-Wiener Beamformer
Figure 5.45: Elko-Wiener Beamformer SNR plot
78
Figure 5.46: Elko-Wiener Beamformer Speech Distortion plot
Figure 5.47: Elko-Wiener Beamformer Noise Distortion plot
79
Figure 5.48: Elko-Wiener Beamformer PESQ MOS plot
Table 5.16: Elko Beamformer PESQ MOS Values
Input SNR
(dB)
Input
PESQ MOS
Output
PESQ MOS
0 1.573 2.905
5 1.887 3.135
10 2.223 3.387
15 2.547 3.636
20 2.862 3.879
80
5.9 Comparison Plots
For ease of comparison, the SNR improvements for different beamformers at
different input SNR levels are provided in Fig. 5.47. Elko based beamformers are
not included in this figure for the fare comparison. The simulation environment for
Elko based beamformers is different compared to the rest and hence compared
separately as shown in Fig. 5.48. Though the simulation environment is different,
the evaluation procedure is similar to the rest of the beamformers.
Figure 5.49: SNR comparison plot
81
Figure 5.50: SNR comparison plot for Elko based beamformers
82
6 SUMMARY AND CONCLUSION
Beamforming is a widely used technique for spatial filtering, in which an array of
microphones is used to collect samples of propagating waves to be processed. In
this thesis, several beamforming techniques for speech enhancement in
interference environment have been implemented and their evaluations are
presented. The main goal while implementing a beamformer is to extract the target
signal in the look-direction from the interference. This thesis report summarizes
the performance of eight beamforming algorithms: DSB, WBF, DSB-DSB GSC,
DSB-WBF GSC, WBF-DSB GSC, WBF-WBF GSC, Elko beamformer and Elko-
Wiener beamformer.
Two test environments have been considered in this work, one for non-Elko based
beamformers and the other for Elko based beamformers. For the two Elko based
beamformers implemented, interference was mounted in the rear half plane of the
array where as for the rest of the six beamformers, the target signal and the
interference were considered in the same side of the array. All the implemented
beamformers approaches optimum performance, as long as the environment is
stationary. Adaptive beamformers are especially pleasing, as they can adapt
automatically to the varying environments. All the algorithms were implemented in
a computer simulated anechoic chamber.
The following conclusions can be drawn from the simulation results:
DSB is the simplest among all the beamformers implemented and has a
SNRI of 4.19dB. The performance of the DSB would improve considerably
if the length of the microphone array is infinite, but practically it is an
unrealistic solution.
WBF provides quite promising results and sustains a high level of
performance regardless of the length of the microphone array and the
location of the speech sources. WBF possesses a SNRI of 18dB at 0dB input
SNR.
The four adaptive beamforming algorithms based on the GSC architecture
possesses enhanced noise suppression capability. The blocking matrix in the
83
GSC architecture was replaced with an additional beamformer, either DSB
or WBF that keeps track of the characteristics of the interfering signal
leading to high interference rejection.
The DSB-DSB GSC and the DSB-WBF GSC beamformers outperformed
the individual DSB in all the performance metrics measurements.
The WBF-DSB GSC and the WBF-WBF GSC struggled to surpass
individual WBF execution. Overall, the individual WBF maintained its high
level of performance.
The pathetic performance of the WBF-DSB GSC is due to the target signal
leakage from the DSB which leads to target signal cancellation at the GSC
output.
The Elko beamformer using first order adaptive differential microphone
array which is able to attenuate a noise / interference source from any
location in the rear half plane of the array have been presented. It offers
rapid adaptation at modest computational complexity.
The Elko-Wiener beamformer could be used for applications in which user
wants to nullify the backward noise / interference and preferred to get
signals from the desired direction. The Elko-Wiener beamformer attenuates
interference over 28dB at the cost of computational complexity.
84
7 FUTURE WORK:
After implementing several beamformers in the context of speech applications,
I could suggest few thoughts for the future work.
During the course of thesis work, only single interference has been used.
It is suggested to calculate the performance of all the beamformers for
multiple spatially distinct noise / interference sources.
Apart from computer simulations, real time implementations are also
necessary to check the beamformers performances.
While linear microphone arrays delivering proficient results,
beamformers should imply different approach by using other array
geometries like circular arrays, semi circular arrays, etc.
85
REFERENCES
[1] Benesty, S. Makino, J. Chen (ed). Speech Enhancement. pp.1-8. Springer,
2005. ISBN 978-3540240396.
[2] L. J. Griffiths and C. W. Jim, “An alternative approach to linearly
constrained adaptive beamforming,” IEEE Transactions on Antennas and
Propagation, vol. 30, no. 1, pp. 27–34, 1982
[3] I..A. McCowan. “Robust Speech Recognition using Microphone Arrays,”
PhD Thesis, Queensland University of Technology, Australia, 2001.
[4] Ferrara, E., Jr.; Widrow, B.; , "Multichannel adaptive filtering for signal
enhancement," Acoustics, Speech and Signal Processing, IEEE
Transactions on , vol.29, no.3, pp. 766- 770, Jun 1981
[5] N.Grbic, “Optimal and Adaptive SubbandBeamforming, Principles and
Applications,”Doctoral Dissertation Series No. 2001:01, ISSN: 1650-
2159, Blekinge Institute of Technology, 2001.
[6] S. Haykin, “Adaptive Filter Theory,” Prentice Hall Int. Inc., 1996, ISBN
0-13-397985-7.
[7] G. W. Elko, “A Simple Adaptive First-Order Differential Microphone,”
in Acoust. And Speech Research Dept. Bell Labs, Lucent Technologies,
Murray Hill, NJ, Aug. 1999.
[8] Lars Johansson, Anders Törnqvist, “Multiple Sensors Speech
Enhancement in Hearing Aids,” BlekingeTekniskaHögskola, Karlskrona,
Sweden, 2004.
[9] A. Eriksson, “Speech Enhancement in Mobile Devices,” in International
Workshop on Acoustics, Echo and Noise Control Proceedings, Plenary
Talk, 2006.
86
[10] Z. Yermeche, “Soft-Constrained SubbandBeamforming for Speech
Enhancement,” Doctoral Dissertation Series No 2007:14, ISSN 1653-
2090, Blekinge Institute of Technology, 2007.
[11] B. Sällberg, “Applied methods to combat noise in human
communication”, Licentiate dissertation series no. 2006:06, Blekinge
Institute of Technology, ISBN 91-7295-087-0.
[12] ITU-T p. 862, “Perceptual Evaluation of Speech Quality (PESQ).”
[13] Van Veen, B.D.; Buckley, K.M.; , "Beamforming: a versatile approach to
spatial filtering," ASSP Magazine, IEEE , vol.5, no.2, pp.4-24, April 1988.
[14] J. –P. Thiran, “Recursive digital filters with maximally flat group delay,”
IEEE Trans. Circ. Theory, vol. 18, no. 6, pp.659-664, 1971.
[15] Hoshuyama, O.; Sugiyama, A.; Hirano, A.; , "A robust adaptive
beamformer for microphone arrays with a blocking matrix using
constrained adaptive filters," Signal Processing, IEEE Transactions on ,
vol.47, no.10, pp.2677-2684, Oct 1999.
[16] N. Grbic, S. Nordholm, “Soft Constrained Subband Beamforming for
Hands-Free Speech Enhancement,” in IEEE International Conference on
Acoustics, Speech and Signal Processing Proceedings, vol. 1, pp. 885-
888, May 2002.
[17] R. A. Monzingo, T. W. Miller, “Introduction to adaptive arrays,” John
Wiley and Sons, New York, 1980.
[18] G. W. Elko, “Super directional Microphone Arrays,” in Acoustic Signal
Processing for Telecommunication, J. Benesty and S. L. Gay (eds.), pp.
181-236, Kluwer Academic Publishers, 2000.
87
[19] T. S. N. U. V. Ramesh, “Speech Enhancement in Hands-Free Device
(Hearing Aid) with emphasis on Elko’s Beamformer,”
BlekingeTekniskaHögskola, Karlskrona, Sweden, 2012.
[20] S.S.V.S. Kotta, K.B.K. Rao, “Acoustic Beamforming for Hearing Aids
using Multi Microphone Array by Designing Graphical User Interface,”
BlekingeTekniskaHögskola, Karlskrona, Sweden, 2012.