Acoustic Beamforming for Hearing Aids Using Multi Microphone...
Transcript of Acoustic Beamforming for Hearing Aids Using Multi Microphone...
MEE-2010-2012
Acoustic Beamforming for Hearing Aids Using
Multi Microphone Array by Designing
Graphical User Interface
Master’s Thesis
S S V SUMANTH KOTTA
BULLI KOTESWARARAO KOMMINENI
This thesis is presented as a part of Degree of Master of Science in Electrical
Engineering with Emphasis on Signal Processing
Blekinge Institute of Technology
January-2012
Blekinge Institute of Technology
School of Engineering
Department of Electrical Engineering
Supervisor : Dr. Benny Sällberg
Examiner : Dr. Nedelko Grbic
Blekinge Tekniska Högskola
SE 371 41 Karlskrona.
ii
Contact Information:
Author 1:
S S V Sumanth Kotta (880617-4499)
Email: [email protected]
Author 2:
Bulli Koteswararao Kommineni (880812-4690)
Email: [email protected]
Supervisor:
Dr. Benny Sällberg
Department of Electrical Engineering
School of Engineering, BTH
Blekinge Institute of Technology, Sweden
Email: [email protected]
Examiner:
Dr. Nedelko Grbic
Department of Electrical Engineering
School of Engineering, BTH
Blekinge Institute of Technology, Sweden
Email: [email protected]
iii
ABSTRACT
Hearing impaired persons lose their ability to distinguish speech signal in ambient
noise. Human hearing system is sensitive to interfering noise. Interfering noise decreases the
quality and intelligibility of the speech signal which in turn makes speech communication
default. To make the speech signal effective and useful for hearing impaired, they need to be
enhanced from noisy speech signal. Speech enhancement is one of the most emerging and
useful branch in signal processing, to reduce the noise and improves the perceptual quality
and intelligibility of the speech signal.
Several signal processing techniques has been widely used in hearing aids to enhance
the speech signal from the noisy environment. Microphone array is one of the signal
processing technique implemented in hearing aids to provide a better solution to the problem
encountered by the hearing impaired person when listening to speech in the presence of
background noise. Generalized Sidelobe Canceller (GSC) is a powerful technique to enhance
the signal of interest which suppressing the interference signal and noise at the output of the
array microphones. The main focus of the thesis is to implement a GSC using microphone
array, the blocking matrix in the GSC is replaced with Elko’s algorithm. Elko’s algorithm is
used to track and attenuate interference or background noise located in the back half plane of
the array of microphones.
The proposed system is implemented successfully and validated effectively. Clean
speech signal is corrupted by various background noises respectively multi-talker babble
noise, wind noise, car interior noise, destroyer engine room noise, tank noise, interference
male and female voices at five different Signal-to-Noise Ratio (SNR) levels 0db, 5db, 10db,
15db and 20db. Different types of objective tests, such as SNR, Signal-to-Noise Ratio
Improvement (SNRI), Perceptual Evaluation of Speech Quality (PESQ), Speech Distortion
(SD) and Noise Distortion (ND) are performed on the test set. The platform is made in
Matlab Graphical User Interface (GUI) and all the results have been shown by plots produced
from Matlab code.
iv
To Our Parents
v
Acknowledgment
We owe our deepest gratitude to our supervisor, Benny Sällberg, for his
encouragement and guidance. He provided us with all the advice and support for completing
the thesis. His deep knowledge in the field allowed us to learn many things which are helpful
to us during the thesis work. We would like to express our utmost gratitude to our Examiner
Dr. Nedelko Grbić for providing us this opportunity to pursue Master Thesis. We would like
to thank Dr. Gary W. Elko for giving his suggestions during thesis work.
We would like to thank our parents and family for their support and encouragement
for the completion of thesis. They helped us throughout our educational carrier and motivated
us. They helped us both morally and financially. We would like to thank all of our friends
who supported us during the thesis work.
Lastly, we offer our regards to all of those who supported us in any respect during the
completion of thesis.
S S V Sumanth Kotta,
Bulli Koteswararao Kommineni,
Karlskrona, January, 2012, Sweden.
vi
Table of Contents
ABSTRACT ........................................................................................................................... III
ACKNOWLEDGMENTS ...................................................................................................... V
LIST OF FIGURES ........................................................................................................... VIII
LIST OF TABLES ................................................................................................................ XI
LIST OF ABBREVIATIONS .............................................................................................. XI
CHAPTER 1 ............................................................................................................................. 1
INTRODUCTION.................................................................................................................... 1
1.1 Objective of the Thesis .................................................................................................... 2
1.2 Problem Statement ........................................................................................................... 2
1.3 Aim of the Thesis Work ................................................................................................... 3
1.4 Overview of the Proposed System ................................................................................... 3
1.5 Outline of the Thesis: ....................................................................................................... 4
CHAPTER 2 ............................................................................................................................. 5
BACKGROUND ...................................................................................................................... 5
2.1 Microphone Array in Hearing Aids ................................................................................. 5
2.2 Beamforming in Hearing Aids ......................................................................................... 6
2.3 Noise Reduction in Hearing Aids .................................................................................... 6
2.4 Feedback Cancellation in Hearing Aids .......................................................................... 7
CHAPTER 3 ............................................................................................................................. 8
MICROPHONE ARRAY ........................................................................................................ 8
3.1 Basics of Microphone ...................................................................................................... 8
3.2 Microphone Array ............................................................................................................ 8
3.3 Microphone array structure and connections ................................................................... 9
3.4 Physical Preliminaries .................................................................................................... 10
3.5 Trigonometric Solution .................................................................................................. 11
CHAPTER 4 ........................................................................................................................... 14
ELKO ALGORITHM ........................................................................................................... 14
4.1 Introduction .................................................................................................................... 14
4.2 Aim of Elko Algorithm .................................................................................................. 14
4.3 Derivation of the Adaptive First-Order Array ............................................................... 15
4.4 LMS Version of Back-to-Back Cardioid Microphone ................................................... 19
vii
CHAPTER 5 ........................................................................................................................... 21
BEAMFORMING AND GUI................................................................................................ 21
5.1 Basics structure of Beamforming................................................................................... 22
5.2 Types of Beamformers ................................................................................................... 24
5.2.1 Fixed Beamforming ................................................................................................ 24
5.2.2 Adaptive Beamforming ........................................................................................... 24
5.2.3 Acoustic Beamforming ........................................................................................... 26
5.3 Generalized Sidelobe Canceller ..................................................................................... 26
5.4 Elko Based Generalized Sidelobe Canceller.................................................................. 29
5.5 Graphical User Interface ................................................................................................ 33
CHAPTER 6 ........................................................................................................................... 35
EVALUATION ...................................................................................................................... 35
6.1 Test Data ........................................................................................................................ 35
6.1. 1 Clean Speech Data ................................................................................................. 35
6.1.2 Noise Data ............................................................................................................... 36
6.2 Objective Measures ........................................................................................................ 39
6.2.1 Signal to Noise ratio................................................................................................ 39
6.2.2 SNR Improvement .................................................................................................. 39
6.2.3 Perceptual Evaluation of Speech Quality ............................................................... 39
6.2.4 Measurement of Speech Distortion ......................................................................... 41
6.2.5 Measure of Noise Distortion ................................................................................... 41
6.3 Test Results with Various Noise Signals ....................................................................... 42
6.3.1 Evaluation of Babble Noise .................................................................................... 43
6.4.2 Evaluation of Car Interior Noise ............................................................................. 47
6.4.3 Evaluation of Tank Noise ....................................................................................... 50
6.4.4 Evaluation of wind Noise........................................................................................ 54
6.4.5 Evaluation of Man voice as interference Noise ...................................................... 57
6.4.6. Evaluation of Destroy Engine Noise...................................................................... 61
6.4.7 Evaluation of Female voice as interference Noise .................................................. 64
CHAPTER 7 ........................................................................................................................... 68
SUMMARY AND FUTURE WORK ................................................................................... 68
7.1 Conclusion ..................................................................................................................... 68
7.2 Future work .................................................................................................................... 69
BIBLIOGRAPHY .................................................................................................................. 70
viii
List of Figures
Fig. 1. 1. Basic overview of microphone array for recording the signals 01
Fig. 1. 2. Basic overview of the speech enhancement system 02
Fig. 1. 3. Block diagram of the proposed system 03
Fig. 2. 1. Head simulator with the three element microphone array 06
Fig. 3. 1. Electronic symbol of microphone 08
Fig. 3. 2. Microphone constellation in an array 09
Fig. 3. 3. Physical set up of the microphones 10
Fig. 3. 4. Diagram showing the possible situation of microphone and source 12
Fig. 4. 1. First-order sensor composed of two zero-orders and a delay 15
Fig. 4. 2. Directional response of the array for 16
Fig. 4. 3. Schematic implementation of an adaptive first-order differential microphone
using the combination of forward and backward facing cardioids 17
Fig. 4. 4. Directional response of the array for 18
Fig. 4. 5. Directional response of the back-to-back cardioid microphone 19
Fig. 4. 6. Directional response of the adaptive array for 20
Fig. 5. 1. Block diagram of beamforming 22
Fig. 5. 2. Signal model for microphone array and beamforming 22
Fig. 5. 3. An Adaptive beamforming system 25
Fig. 5. 4. Block diagram of generalized sidelobe canceller 27
Fig. 5. 5. Detailed structure of generalized sidelobe canceller 28
Fig. 5. 6. Structure of Proposed Elko Based Generalized Sidelobe Canceller Model 30
Fig. 5. 7. GUI layout used for the design of the proposed model 33
Fig. 6. 1. Power spectrum of babble noise 36
Fig. 6. 2. Power spectrum of car interior noise 37
Fig. 6. 3. Power spectrum of destroy engine noise 37
Fig. 6. 4. Power spectrum of tank noise 38
Fig. 6. 5. Power spectrum of wind noise 38
Fig. 6. 6. Structure of perceptual evaluation of speech quality 40
Fig. 6. 7. Graph represents the position of microphones, source and noise signal 42
Fig. 6. 8. Graph represents a clean speech signal 42
ix
Fig. 6. 9. Graph represents babble corrupted speech signal at 10dB, enhanced signal 43
Fig. 6. 10. Graph represents the SNR value for babble noise 44
Fig. 6. 11. Graph represents the PESQ value for babble noise 44
Fig. 6. 12. Graph represents the SD value for babble noise at 10dB 45
Fig. 6. 13. Graph represents the ND value for babble noise at 10dB 45
Fig. 6. 14. GUI Layout with babble as input noise at 10dB of input SNR 46
Fig. 6. 15. Graph represents car noise corrupted speech signal at 0dB, enhanced signal 47
Fig. 6. 16. Graph represents the SNR value for car noise 47
Fig. 6. 17. Graph shows the PESQ value for car noise 48
Fig. 6. 18. Graph represents the SD value for car noise at 0dB 48
Fig. 6. 19. Graph represents the ND value for car noise at 0dB 49
Fig. 6. 20. GUI Layout with car noise as input noise at 0dB of input SNR 50
Fig. 6. 21. Graph represents tank noise corrupted speech signal at 5dB, enhanced signal 50
Fig. 6. 22. Graph represents the SNR value for tank noise 51
Fig. 6. 23. Graphs represents the PESQ value for tank noise 51
Fig. 6. 24. Graph represents the SD value for tank noise at 5dB 52
Fig. 6. 25. Graph represents the ND value for tank noise at 5dB 52
Fig. 6. 26. GUI Layout with car noise as input noise at 5dB of input SNR 53
Fig. 6. 27. Graph represents wind noise corrupted speech signal at 0dB, enhanced signal 54
Fig. 6. 28. Graph represents the SNR value for wind noise 54
Fig. 6. 29. Graph shows the PESQ value for wind noise 55
Fig. 6. 30. Graph represents the SD value for wind noise at 0dB 55
Fig. 6. 31. Graph represents the ND value for wind noise at 0dB 56
Fig. 6. 32. GUI Layout with wind noise as input noise at 0dB of input SNR 57
Fig. 6. 33. Graph represents man noise corrupted speech signal at 5dB, enhanced signal 57
Fig. 6. 34. Graph shows the SNR value for man voice as interference noise 58
Fig. 6. 35. Graph shows the PESQ value for man voice as interference noise 58
Fig. 6. 36. Graph shows the SD value for man voice as interference noise at 5dB 59
Fig. 6. 37. Graph shows the ND value for man voice as interference noise at 5dB 59
Fig. 6. 38. GUI Layout with man noise as input noise at 5dB of input SNR 60
Fig. 6. 39. Graph represents engine noise corrupted speech signal at 0dB, enhanced signal 61
Fig. 6. 40. Graph shows the SNR value for destroy engine noise 61
Fig. 6. 41. Graph shows the PESQ value for destroy engine noise 62
x
Fig. 6. 42. Graph shows the SD value for destroy engine noise at 0dB 62
Fig. 6. 43. Graph shows the ND value for destroy engine noise at 0dB 63
Fig. 6. 44. GUI Layout with destroy engine noise as input noise at 0dB of input SNR 64
Fig. 6. 45. Graph represents female noise corrupted signal at 10dB, enhanced signal 64
Fig. 6. 46. Graph shows the SNR value for female voice as interference noise 65
Fig. 6. 47. Graph shows the PESQ value for female voice as interference noise 65
Fig. 6. 48. Graph shows the SD value for female voice as interference noise at 10dB 66
Fig. 6. 49. Graph shows the ND value for female voice as interference noise at 10dB 66
Fig. 6. 50. GUI Layout with female noise as input noise at 10dB of input SNR 67
xi
List of Tables
Table 5.1 Basic components used in GUI 34
Table 6.1 Type of male and female sentences used for evaluation 35
Table 6.2 Represents the SNRI speech and noise distortion for babble noise 46
Table 6.3 Represents the SNRI speech and noise distortion for car interior noise 49
Table 6.4 Represents the SNRI speech and noise distortion for tank noise 53
Table 6.5 Represents the SNRI speech and noise distortion for wind noise 56
Table 6.6 Represents the SNRI speech and noise distortion for man interference noise 60
Table 6.7 Represents the SNRI speech and noise distortion for destroy engine noise 63
Table 6.8 Represents the SNRI speech and noise distortion for female interference noise 67
List of Abbreviations
GSC Generalized Sidelobe Canceller
SNR Signal-to-Noise Ratio
SNRI Signal-to-Noise Ratio Improvement
PESQ Perceptual Evaluation of Speech Quality
ANC Adaptive Noise Canceller
ADMA Adaptive Differential Microphone Array
NLMS Normalized Least Mean Square
GUI Graphical User Interface
VoIP Voice over Internet Protocol
DAT Digital Audio Tape
SD Speech Distortion
ND Noise Distortion
dB Decibels
xii
Chapter-1
1
Chapter 1 Introduction
Hearing impairments affect 10% of the world population. Surveys in Sweden, has
estimated that about 1.2 million people aged 18 years and older have mild hearing loss,
495,000 have moderate hearing loss, 120,000 have sever hearing loss [1]. About 367,000
Swedes, with hearing damage uses hearing aids. For the people suffering with hearing
impairment, hearing aids are used to amplify the acoustic signal that enables an individual
with hearing loss to understand the acoustic signal in an efficient manner. Most of the hearing
impaired people with hearing aids do not satisfy with the hearing aids because of background
noise.
The poor performance of the conventional hearing aids in background noise
motivated the use of microphone array to create directional sensitive hearing aids that
amplify the signal arriving in a particular direction. Microphone array is used to improve
desired speech signal when the interference arises from different directions. The microphone
array is considered as a preprocessor, followed by conventional hearing aid processing [2]. It
is used to improve the SNR value and speech intelligibility. Microphone array is used in
various applications such as audio, teleconference, voice recognition applications [3]. Fig. 1.
1 represents the basic overview of microphone array for recording signals.
Fig. 1. 1. Basic overview of microphone array for recording the signals
Chapter-1
2
The speech signal recorded by the microphone array is of poor quality, because of
various interfering noises recorded and the distance between the speaker and microphones.
Further the output of the microphone array should be processed to enhance the pure speech
signal. Speech enhancement is one of the key technology, used to enhance the speech signal
and to suppress the unwanted noise while maintaining the quality of the speech signal. Fig. 1.
2. represents the basic overview of speech enhancement system.
Fig. 1. 2. Basic overview of the speech enhancement system
1.1 Objective of the Thesis
The objective of the thesis is to improve the perceptual aspects such as quality and
intelligibility of the degraded signal in hearing aids by the use of microphone array. This
project will analyze the achievable performance of speech enhancement, for a microphone
with 1 cm aperture. The speech enhancement paradigm that will be used exclusively
throughout the project is GSC and Elko algorithm.
1.2 Problem Statement
Microphone array as a preprocessor to hearing aids, the problem is to design a system
to enhance desired speech signal form the interfering noise signals. The interfering noise
signal may be of random, wind, background sounds form offices, car or babble noise. The
noise signal may affect the original signal in an additive, multiplicative or convolution
manner. This thesis concern with
Firstly how to design and implement a microphone array that suits the system and to
determine the angle of arrival of the speech and noise signal to the microphones.
Secondly how to suppress the noise by implementing a new way of speech
enhancement method which uses GSC in which blocking matrix is replaced with the
Elko algorithm and to develop a GUI layout which suits the proposed method.
Chapter-1
3
1.3 Aim of the Thesis Work
The main aim of developing this thesis is to overcome the problem of interfering
noise signals in hearing aids, to improve the quality and intelligibility of the speech signal by
using microphone array as a preprocessor to the hearing aids. This work is divided into four
major parts
Design of the microphone array that suits the system and determining the angle of
arrival of the speech and noise signal to the microphones.
Implementing the Elko algorithm and GSC. The blocking matrix in the GSC is
replaced with Elko algorithm.
The final objective is to analyze the performance of the proposed system with
different interfering noises and to perform the objective tests on the system.
Implementing the proposed system in Matlab GUI.
1.4 Overview of the Proposed System
The block diagram of the proposed system used in our thesis is as shown in Fig. 1. 3.
Fig. 1. 3. Block diagram of the proposed system
The proposed system consists of a speech signal and an interfering noise signal. It
consists of an array of microphones placed in an arc position as shown in Fig. 1. 3.
Chapter-1
4
Speech and noise signals are individually passed through an array of microphones, the output
is the delayed version of the original signals. Elko algorithm is applied to the signals obtained
from the microphone array. Elko algorithm is a linear system which is used to improve the
SNR [4]. Both the speech and noise signal obtained from the Elko algorithm is added
together and is applied to the adaptive beamformer such as GSC [5]. GSC is most popular
adaptive beamformer which is used to enhance the speech signal. For the effective
performance of the system the blocking matrix in GSC is replaced with Elko algorithm. The
modified GSC is used to suppress the noise signal form the noise contaminated speech signal.
The output from the modified GSC is a speech signal that is presented to the listener. The
detailed description of the Elko algorithm and GSC is explained in the further chapters. This
system is implemented in the Matlab GUI. To validate the system it undergoes various
objective measures such as measure of SNR, SNRI, PESQ, SD and ND.
1.5 Outline of the Thesis:
This document is a report on thesis presented as a part of Degree of Master of Science in
Electrical Engineering with Emphasis on Signal Processing. It is made up of six chapters and
is organized as follows Chapter 1 introduces the subject that are handled in this thesis. It
comprises of sub-chapters that deals with objective of the thesis, problem statement, aim of
the thesis, overview of the proposed system. Chapter 2 provides the background information
of hearing aids and various enhancement technologies used in hearing aids. Chapter 3
provides the information about the microphones, arrangement of microphones and the angle
of arrival of the speech signal to the microphones. Chapter 4 provides the information of the
elko algorithm. It comprises of sub-chapters that deal with first order differential
microphones, derivation of the adaptive first order and second order arrays, LMS version of
the differential microphone. Chapter 5 provides the information about the beamforming. It
comprises of sub-chapters that deals with different types of beamforming, generalized
sidelobe canceller, the model proposed in the thesis and information about GUI. Chapter 6
provides the information about the testing and presentation of the results. It gives the
description about the SNR, SNRI, PESQ, SD and ND values. Chapter 7 gives the information
about final conclusion and recommendation for future work. Some useful references used in
thesis.
Chapter-2
5
Chapter 2 Background
It is difficult for the normal hearing person to understand the speech signal with
background noise. The problem is very severe for the person suffering from hearing
impairment. To enhance the speech signal form the background noise many algorithms have
developed. To increase the quality and intelligibility of the speech signal in the hearing aids
several signal processing techniques. Signal processing has wide range of applications in
hearing aids. Since more than 25 years onwards research is going on hearing aids as an
application in signal processing.
In the past decades, the development of hearing aids was increased with the
development of sophisticated signal processing algorithms such as beamforming, noise
reduction techniques and feedback cancellation. Other signal processing technologies such as
adaptive filtering, echo cancellation, array processing has been widely used in hearing aids.
2.1 Microphone Array in Hearing Aids
Microphone array consists of a multiple microphones arranged in spatial domain.
Microphone array hearing aids provides a better solution for the hearing impaired person
when listening to speech in the presence of background noise. The aim of the microphone
array hearing aids is to increase the speech to interference ratio when the interference is
arrived from different directions rather than the desired speech signal. Functionally, a
microphone array hearing aid consists of three components: the microphone array, processing
unit and receiver all these units are interconnected. Microphone array acts as a preprocessor
to the system followed by the speech enhancement system [2]. A microphone array is capable
of maintaining high signal to noise ratio in a noisy environment. The advantage of
microphone array is their ability to exploit, reduction of noise based on the knowledge of the
position of speech signal. Fig. 2. 1. represents an example of a head simulator using a hearing
aid with element microphone array [13].
Chapter-2
6
Fig. 2. 1. Head simulator with the three element microphone array
2.2 Beamforming in Hearing Aids
Beamforming is a signal processing technique used for signal transmission or
reception. Beamforming technology is used to create a constructive interference in a
particular direction and destructive interference in other directions. Hearing impaired person
facing problem with different directions of noise source. Beamforming is used to create a null
in the direction of the noise source and allows only signal coming from a particular direction.
Beamforming is performed in hearing aids to enhance the SNR and to increase the speech
intelligibility in hearing aids [6].
Several beamforming techniques have been developed for hearing aids to enhance the
desired speech signal from various types of noises. Fixed beamforming is used to obtain the
beam in a particular direction and don’t change its direction as that the incident source
direction changes. To steer the directional pattern to the location of the desired source and to
maximize the attenuation of noise source an adaptive beamforming is used in hearing aids.
2.3 Noise Reduction in Hearing Aids
Noise is an unwanted signal, plays a major role in many applications. Noise exists in
several forms and creates problems to various devices such as telecommunications, radar,
sonar, medical applications and so on. Hearing impaired people faces a problem to
understand the speech signal in presence of noise because SNR is an important factor for
hearing impairment. A person with normal hearing can understand the speech signal with
SNR as low as -5db. Hearing impaired person needs at least +5db SNR to understand the
speech signal. To enhance the speech signal in hearing aids several noise reduction
techniques have been developed. A reference signal is available and is used to reduce the
noise.
Chapter-2
7
Reliable and intelligent signal detection plays an important role for the success of
noise reduction. Hearing aids are sensitive to the presence of noise. Amplitude modulation is
the key technology used to separate speech from noise signal. Amplitude modulation works
on the principle that desired speech signal has a harmonic structure and the amplitude of this
harmonic component will change over the time and produces amplitude modulation. The
amplitude signal of the speech and noise may vary. Speech signal has higher amplitude signal
compared to that of stationary and pseudo stationary noises. Pseudo noise has very low
amplitude modulation. The amplitude modulation of the environmental noises such as babble,
traffic noise has higher amplitude than the stationary noise and lower amplitude compare to
the speech signal.
Amplitude modulation alone does not provide the reliable signal detection because the
signal with higher modulation need not be the desired signal. With reliable signal detection, it
is not possible to enhance the speech signal while attenuating the noise. Intelligent signal
detection and noise reduction have been improved by using temporal and timing information
about the signal and noise in combination with amplitude modulation [7].
2.4 Feedback Cancellation in Hearing Aids
Feed back cancellation is used to suppress the feedback signal for which the hearing
aid gain is larger than the feedback part which is the attenuation between the hearing aid
output and its microphone input. Feedback compensation approach consists of a linear
adaptive filter subtracts the feedback signal [6]. The adaption control of the adaptive filter is
the challenging for the feedback cancellation. The typical correlation between the input signal
and feedback signal causes the signal distortion at the hearing aid output.
Chapter-3
8
Chapter 3 Microphone Array
3.1 Basics of Microphone
Microphone is a device used to convert one form of energy to another form.
Microphone is a transducer, which converts a non electrical signal into electrical signal. The
input to the microphone is sound information exists as a pattern of air pressure. Sound
information is converted into patterns of electric current by the microphone. Microphones are
used in various applications such as hearing aids, telephones, tape recorders, radio, television
and non acoustic purpose such as ultrasonic checking. The electronic symbol of microphone
is as shown in Fig. 3. 1.
Fig. 3. 1. Electronic symbol for microphone
3.2 Microphone Array
Microphone array consists of multiple microphones arranged in space with a single
directional input device whose outputs are processed individually and added to produce the
desired output. Microphone array improves the performance of picking up distance sound
compared to that of directional microphone. In applications where a speech signal is
monitored by the microphone, a better performance can be achieved by using an array of
microphones. The microphone array processing technique can be effectively used for the
reduction of noise, it can be used to improve the signal-to-noise ratio of acquired sound pick
up the desired speech with a flat spectrum response at arbitrary speaker position, and detect
the speech period in noisy speech signal [8]. The outputs from the microphone array are
further processed in order to achieve the speech enhancement.
Microphone array has been used in wide different fields such as speech acquisition in
hand-free communication, audio, teleconference and hearing aid applications [9]. The
microphone array processing is well tested and well understood to enhance distance noisy
Pattern of Air
Pressure Pattern of
electric current
Chapter-3
9
target signal. The main aim of microphone array is to improve the quality of the input signal,
to reduce the effect of typical recording problem.
3.3 Microphone array structure and connections
Fig. 3. 2. Microphone constellation in an array
Fig. 3. 2. shows 8-element semi circle shaped microphone array with the sound source
in located in the far field. The microphone array consists of 8-elements and the microphones
are placed in a semi circle shape with a distance between the microphones. The distance
between the source and microphone is greater than that of the distance between the
microphones, indicates that the source is located in the far field. The sound signals coming
from the source are assumed to be parallel to each other. The sound signals from the source
arrives the microphone at different time instants because the distance traveled by the source
to the microphones may vary.
Each of the microphones will receive the input signal with
some delay due to the distance between the microphone and the source signal. Let us consider
that the distance between the microphones as . The distance travelled by the source signal
to the microphone array is considered as where is the angle of arrival of the
source signal to the microphone. The time delay to the microphone is considered as
and is given as
Chapter-3
10
where is the speed of the sound. The input to the microphone is given as
The phase shift of the incoming signal is given as
From equation 2.3 substituting the value of in equation 2.3 then
By considering
equation 3.4 can be written as
In the similar manner a noise signal is passed through the microphone array
[10]. The total signal received by the microphone is the combination of the source signal
and the noise signal given as,
The output from the microphone array is given to the Elko algorithm further to
beamforming to enhance the speech signal from the noisy speech signal.
3.4 Physical Preliminaries
To determine the angle of arrival of the source signal to the microphones let us
consider two microphones placed as shown in the Fig. 3. 3.
Fig. 3. 3. Physical set of the microphones
Chapter-3
11
Fig. 3. 3. represents the physical set of two microphones and . The source
signal is represented by S and is located in the front of the microphones. To determine the
angle of arrival of the source signal to microphone, it is needed to fix the origin for the
microphones. The midpoint between the microphones is considered as the origin.
Considering the orthogonal line to the microphone axis at the origin (OX). The angle is
defined as the separation between the line OX and OS. The angle determines the angle of
arrival of the source signal to the microphone array [11].
From the Fig. 3. 3. it is observed that the source signal is closer to the microphone
compared to that of the microphone . The sound travelling from the source signal reaches
to the microphone and then to . The time delay between the two microphones is
denoted as .
The source signal received by the microphone is represented as and the
source signal received by the microphone is represented as .
3.5 Trigonometric Solution
To determine the angle of arrival between the microphones and the source signal
consider a point S with coordinates x and y these are assumed to be the variables. The
coordinates of the microphones and are considered as and
respectively. The distance between the microphones is considered as cm. The midpoint
between the microphones and is taken as origin.
The target is to determine the angle of angle of arrival of sound signal from the source
signal to microphone. A signal coming from the source reaches the microphone in time
t. In the same moment, the signal travels from the source to the microphone [11]. Let
be the number of samples between the two signals and is expressed a
Chapter-3
12
Fig. 3. 4. Diagram showing the possible situation of microphone and source
Fig. 3. 4. shows the possible arrangement of the microphone and source signal. Let
be the midpoint between the microphones and is expressed as
The slope of the line joining the midpoint of the microphones and the source signal is
expressed as
The angle made by the line joining the midpoint of the microphones and the source
signal is obtained by taking arctangent of the slope of the line joining the point and is
expressed as
Chapter-3
13
The angle is the angle made by the line joining the line between the midpoint of the
microphone and the source signal to the X-axis. The angle is the angle made by the Y-axis
and the line joining the point and is given as follows
If then
and if then
In this way we can determine the angle of arrival of the source signal to the
microphone. This procedure is applied to all the microphones to determine the angle of the
microphone. Similar this procedure is applied to the noise signal as the input to the
microphones.
Chapter-4
14
Chapter 4 Elko Algorithm
4.1 Introduction
Communication devices are widely used in many environments, the acoustic pick up
of the electro acoustic transducer requires a combination of transducer and signal processing
unit. During communication, the transmitted signal is effected by the background noise due
to this the quality of the signal is degraded. The presence of background noise causes acoustic
signal transmission to ubiquitous problems. To overcome the problem of background noise,
convectional microphones are used to pick up the signal in a particular direction, such that the
background noise can be eliminated. Utilization of the conventional directional microphones
limits the solution to this problem because the noise doesn’t have particular direction of
arrival. A better solution can be obtained by taking the advantage of ANC capabilities of the
differential microphone array in combination of digital signal processing [12]. An adaptive
microphone system is to be designed such that it adjusts its directive pattern to maximize the
SNR. ADMA is used to suppress the background noise and to maximize the SNR value.
ADMAs are able to adaptively track and attenuate possibly moving noise sources that are
located in the back half plane of the differential array.
4.2 Aim of Elko Algorithm
Elko has proposed a solution for an adaptive directional microphone. Elko algorithm
covers the design and implementation of a novel adaptive first order differential microphone
that minimizes the microphone output power. By attenuating sound from one direction it can
improve the SNR in acoustic field. An adaptive differential microphone has been
implemented by combining two omni directional elements to from back-to-back cardioid
directional microphone. The microphone signals and a delayed version of the microphones
signals are combined such that a null is placed in one direction, any first order array can be
realized. The adaption process works under the constrain so that a single null is placed in the
rear half plane.
Chapter-4
15
4.3 Derivation of the Adaptive First-Order Array
When a plane-wave with spectrum and wave vector incident on a two-
element microphone array as shown in Fig. 4. 1. the sound waves reaches one microphone
before the other. The time difference depends on the distance between the microphone
and the angle of incident sound wave ,
where is the speed of the sound. The output can be obtained by taking the difference
between the delayed microphone signal and signal from the other microphone, by changing
the time delay it is possible to steer the null. The output signal can be written as
Fig. 4. 1. First-order sensor composed of two zero-orders and a delay
Transforming the equation into frequency domain we get,
The magnitude plot of the equation (4.13) is as shown in Fig. 4. 2. The plots
represent the directional response of the array for three different values of . The time delay
is changed between 0 to , so that the null is steered between and .
T
Chapter-4
16
(a) (b) (c)
Fig. 4. 2. Directional response of the array for
The magnitude of the frequency and angular dependent response of the first-order
differential microphone from a single point source located in the far field is given as
If we assume a small element spacing and inner element delay
the above equation can be written as
The first-order differential array has a monopole term and first order dipole
term . It is observer that the first-order array has first-order differentiator frequency
dependency which can be compensated by a first-order low pass filter [14]. The term in the
brackets of the above equation has a directional response.
The adaptive algorithm minimizes the array output with the appropriate combination
of omnidirectional and dipole sensors such that the mean square output would be minimized.
The dipole directivity pattern can be realized by subtracting two closely-spaced
omnidirectional microphones. A low-pass filter is implemented in the dipole path, the filter is
used for inter channel phase shift. Due to this the adaptive algorithm can steer a null in noise
source direction.
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Chapter-4
17
By setting the sampling period equal to and use a fixed delay of one sample, we
get a cardioid directional pattern. A directional microphone array with two microphones
generates forward and backward cardioid signals. An adaption factor is applied to the
backward cardioid and signal obtained is subtracted from the forward cardioid signal to
generate output signal [4]. The output signal is applied to the low pass filter which is used to
compensate the differential response of the differential microphone.
Fig. 4. 3. Schematic implementation of an adaptive first-order differential microphone using
the combination of forward and backward facing cardioids
The output of back-to-back cardioid microphone is obtained by setting the sampling
period equal to . With sampling period , the expression for the forward facing
cardioid and backward facing cardioid is as given
The output can be given as
T
T
Chapter-4
18
Transforming the above equations into frequency domain we get,
Normalizing the output signal by the input spectrum results in
The time delay T is fixed instead the value of β is changed between 0 and 1. The
magnitude plot of the Equation (4.11) is as shown Fig. 4. 4. the direction pattern is obtained
for different values of β. By changing the value of β between 0 and 1 it is possible to steer
between 1800 and 90
0.
(a) (b) (c)
Fig. 4. 4. Directional response of the array for .
The direction response of the back-to-back cardioid microphone is as shown in Fig. 4.
5.
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Chapter-4
19
Fig. 4. 5. Directional response of the back-to-back cardioid microphone
4.4 LMS Version of Back-to-Back Cardioid Microphone
Least mean square algorithm is an adaptive algorithm, which uses a gradient method
of steepest decent. LMS incorporates an iterative procedure that makes the successive
correction to the weight vector in the direction of negative gradient vector which leads to
minimum mean square error. LMS algorithm is commonly used algorithm for its simplicity
and does not require correction function calculation. LMS algorithm is implemented to back-
to-back cardioid adaptive first-order differential array [15]. The output of the back-to-back
cardioid microphone is given as
Squaring the above equation on both sides we get,
The minimum error is determined by using steepest descent algorithm by
stepping in the direction opposite to gradient of the surface with respect to the weight
parameter . The steepest descent update equation is given as,
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Forward Cardioid
Backward Cardioid
Chapter-4
20
where, is the update step-size and the derivative gives the gradient of error surface
with respect to . LMS algorithm minimizes the mean of i.e. instantaneous
estimate of the gradient but not the expectation value that is [15]. Taking the
derivative of we get,
LMS update equation is given as
The LMS algorithm is modified by normalizing the update size. Therefore the LMS version
with normalized is given as
The bracket indicates the time average. The directional pattern for the adaptive array for
is shown in the Fig. 4. 6.
Fig. 4. 6. Directional response of the adaptive array for
10
20
30
40
30
210
60
240
90
270
120
300
150
330
180 0
Chapter-5
21
Chapter 5 Beamforming and GUI
In speech communication system such as hearing aids, wireless communication,
radar’s and sonar’s the recorded speech is corrupted by various background noises. The
reason behind this is that the recording microphone array is located at a certain distance
which causes the microphone to record the background noise. The background noise arises
from audio equipment and other speakers present. The impact of these background noises on
the speech quality depends on the acoustic environment. The intelligibility of the recorded
speech signal is degraded by the background noise. The signal recorded by the microphone
should be enhanced to improve the quality and the intelligibility of the speech signal.
Beamforming is one the simplest method used for distinguishing signals based on the
physical location, it is used with the combination of an array of sensors to provide versatile
form of spatial filtering. The sensors are used to collect the spatial samples of propagating
wave, which are further processed by the beamforming. The term beamforming is derived
from the spatial filters, which are used to design the beam in order to receive a signal from
the direction of interest and attenuate the signal from other directions. The objective of the
beamforming is to estimate the signal arriving from the desired direction in the presence of
noise and interference signal. The desired signal and interference signal are placed at
different spatial directions [27]. A beamformer performs spatial filtering to separate signals
that have overlapping frequency content but originate from different spatial locations.
Beamforming is applicable to either radiation or reception of energy.
Beamforming is used to extract the signal contaminated by interference signal based
on directivity. The signal extraction is performed by processing the signals obtained from
multiple sensors such as microphones, antenna and sonar located at different positions in
space. Beamforming can be used in both transmitter and receiver side. During the
transmission, the beamformer controls the phase and amplitude of the signal at each
transmitter, in order to obtain the pattern of the constructive and destructive interference [16].
In the receiving side, information from different sensors are combined together to obtain a
desired radiation pattern. Beamforming is used in microphone array for speech enhancement.
Chapter-5
22
Beamforming can be considered as multidimensional signal processing in space and
time. The general block diagram of beamforming in which sensors are placed at different
locations is as shown in Fig. 5. 1.
Fig. 5. 1. Block diagram of beamformer
The signals picked by the sensors at particular instant of time are considered as a
snapshot. The beamforming combines the signals arriving the sensors in a particular are
amplified, while signals from the signals from other direction are attenuated.
5.1 Basics structure of Beamforming
Let us assume microphone array and beamformer. Consider the desired signal is
received by the Omni-directional microphone at a time instant as shown in Fig. 5. 2.
Fig. 5. 2. Signal model for microphone array and beamforming
Microphone Array
Chapter-5
23
Let us consider the source signal as , noise signal as and time delay
between the source and microphones as . Let as assume that the
microphone output as is the attenuated and delayed
version of the source and noise given by
where the source signal and noise signal are considered as statistically independent. The
frequency domain representation of the microphone output is given as
The vector representation of arrayed microphone is give by
The data vector is given as:
and
The represents array steering vector and depends on microphone and source location
and is given as
where is the gain scaling of microphone and is given as
and time delay is given as
where represents the distance between the microphone and reference
microphone respectively and represents the speed of sound. The source signal is
Chapter-5
24
retrieved by processing with frequency domain filter weights . The weight vector
is given as:
The output of the beamformer us the sum of weighted microphone outputs and is given as
where (.)H
represents hermitian transpose and is represented in vector form as
In this a microphone array is used in combination with beamformer to enhance the
speech signal from a noise contaminated speech signal [17].
5.2 Types of Beamformers
Beamforming technique is further divided into two types
I. Fixed beamforming
II. Adaptive beamforming
5.2.1 Fixed Beamforming
Fixed beamforming uses a set of weights and time delays to combine the
signals from the sensors in an array. This type of beamforming optimizes the microphone in a
particular direction and does not change the direction as the incident source signal changes.
The beam is optimized for the direction of desired source while suppressing the sound from
other directions as much as possible. Thus the direction response of the array is fixed to
particular angle of elevation. If the target source is non-stationary, the signal enhancement
performance is reduced as the source moved away from the steering direction.
5.2.2 Adaptive Beamforming
A beamforming which adaptively forms its directive patterns is called an
adaptive beamforming. Adaptive beamforming is a powerful technique to enhance a signal of
interest while suppressing the interference signal and noise at the output of the array sensor.
Adaptive beamforming alters the direction pattern in according to the changes in the acoustic
environment, thus provides a better performance than fixed beamforming. Adaptive
Chapter-5
25
beamforming is more sensitive than fixed beamforming to errors such as sensors mismatch,
mis-steering and to correlated reflections [18].
Let us consider microphones the general adaptive beamforming is as shown in Fig.
5. 3.
Fig. 5. 3. An adaptive beamforming system
Adaptive beamforming is used to create multiple beams towards the signal of interest
and suppress the interfering signals from all the other directions. The input signal
received by the microphones is multiplied with a coefficient weight vector to adjust
the phase and amplitude of the incoming signal. The multiplied signals are summed up to
produce a resulting output array . An adaptive algorithm is applied to minimize the error
between the desired signal and the output array . The output of the
beamformer at an instant of time n is given by the equation
where and . The weights
are used to adjust the amplitude so that when added together produce a desired beam of
interest [19].
Adaptive beamformers has higher capability of unknown directional noise reduction
compared to that of fixed beamforming and potentially provides better performance that fixed
beamformers. Adaptive beamformers are sensitive to steering errors and might suffer from
Chapter-5
26
signal leakage and degradation of the desired signal. Due to this the conventional adaptive
beamforming has not gained a wide spread of acceptance for speech applications. Robust
modifications to avoid signal leakage and cancellation have been an important matter of
interest in microphone applications. GSC is an adaptive beamforming solution that has been
proposed for microphone array processing.
5.2.3 Acoustic Beamforming
Acoustic beamforming is a technique where the microphone array is placed in the
far field. As a rule of thumb, the far field is defined as being further away from the source
than the array dimensions or diameter. The area between near field and far field remains a
grey zone. In the near field, sound waves behave like circular or spherical waves whereas, in
the far field, they become planar waves. Acoustic beamforming modifies the propagation of
sound by introducing spatially dependent delay into a wave front. This focuses incoming
sound from a single source or direction into a small volume of space so that it can be detected
by a single transducer. Acoustic beamforming can efficiently enhance the speech of interest
while suppressing interference, background noise. It allows people to move freely around
without wearing or holding a microphone. Acoustic beamforming provides the option to
enhance the signal from the specific individual and allows background noise (other speech,
motors, movement. etc) to take place. Acoustic beamforming is sometimes called “sum and
delay” since it considers the relative delay of sound wave reaching different microphone
positions. Acoustic beamforming requires that all data is measured simultaneously [28].
The main advantage of Acoustic beamforming is good spatial resolution and main
disadvantage is it does not perform well in the low frequency range. To rectify this
disadvantage we choose high frequency range that is higher than 8000 Hz [28].
5.3 Generalized Sidelobe Canceller
Generalized sidelobe canceller is a most common and successful approach used
widely in microphone array applications. GSC is used to reduce the interference noise from
non target location in array beamforming [5]. It can be used as adaptive noise canceller in
array processing. The structure is used with arrays which have been time delay steered such
that the desired signal of interest appears in phase at the steered output. GSC is very
susceptible to the burst of interference noise. The Block diagram of GSC is as shown in Fig.
5. 4.
Chapter-5
27
Fig. 5. 4. Block diagram of Generalized Sidelobe Canceller
The structure of GSC consists of an adaptive filter and a non adaptive filter. The non
adaptive filter is steered in the direction of the input signal . The non adaptive part of the
GSC consists of a fixed beamformer such as delay-and-sum beamformer. The adaptive part
of the GSC is the cascade combination of the blocking matrix and an adaptive filter. The
adaptive part is used to estimate the non-desired components through the blocking matrix that
blocks the input signal and allows all the other signals to pass through it. The adaptive filter is
used to match the interference in the adaptive branch to as close as possible to interference in
the non adaptive branch. The reduction of the noise is performed by a simple unconstrained
NLMS algorithm [20]. Fig. 5. 5. depicts one simple realization of the GSC
A signal flow diagram of the GSC is as shown in Figure. 5. 4. The input signal
is applied to as array of microphones that are used to steer towards the desired focal point
with some time delay. The upper part of the GSC is a delay-and-sum beamformer. A delay-
sum-beamformer is used to delay the signal received at each microphone and sum them
together. The lower part of the GSC consists of a blocking matrix used for processing the
signals from the microphone array in order to estimate the noise reference signal from the
array of the microphones. A delay of samples is applied to the delay-and-sum beamformer
to make the signal processing delay encountered by the adaptive filtering in the lower part of
the GSC.
Let us assume that the system consists of microphones. The output of the delay-
and-sum-beamformer is given as
Non Adaptive
Filter
Blocking
Matrix Adaptive
Filter
Chapter-5
28
Fig. 5. 5. Detailed structure of Generalized Sidelobe Canceller
In this case the blocking stage is achieved by simple subtracting pair of sensors. Then
the output of the blocking matrix is
where is a blocking matrix. The output of the adaptive path can be written in terms of
and adaptive filter as
The total output of the GSC beamforming is given as
Blocking
Matrix
NLMS
NLMS
NLMS
NLMS
Chapter-5
29
where the vector of the adaptive filter is weights for each blocking matrix and
is the blocking matrix output. The filter weights of the NLMS algorithm are
updated using
where is given by
The value of is given as for the stability of the system the value of the should
be very small.
The adaptive path of the GSC is used to reduce the coherent noise and it has a poor
performance in terms of non-coherent noise. For this reason GSC is used for the rejection of
unknown directional interference [21]. In real world applications maladjustment in the
microphone position, assumed source position and characteristics of different microphones
causes signal leakage in the blocking matrix output which results in target signal cancellation
and further reduces the SNR of the system. To decrease the signal leakage in the blocking
matrix the GSC blocking matrix is replaced with an Elko algorithm. The structured of the
modified GSC is as shown in Fig. 5. 6
5.4 Elko Based Generalized Sidelobe Canceller
Fig. 5. 6. represents the proposed Elko based GSC. The Proposed system is a
combination of a microphone array, Elko algorithm and an adaptive part of the GSC. The
microphones are placed in an arc shape. In our proposed model we are using 8 microphones.
A sound wave is passed through the microphone array. The input sound signal reaches the
microphones with some time delay because the source signal is placed at a distance from the
microphones which indicates that the microphones are located in the far field. The angle of
arrival of the speech signal to the microphones is calculated as explained in chapter-3. The
output signals from the microphone are given to the elko algorithm. The elko algorithm is
applied by considering the output signals from the pair of microphones. The eight
microphone used are considered as five pairs of microphones as shown in Fig. 5. 6. and elko
algorithm is applied on the pairs of microphones. The description of the elko algorithm is
explained in chapter-4, the elko algorithm used here is as shown in Fig. 4. 3. A noise signal
is applied in the same procedure as that of that of the speech signal. The output speech and
Chapter-5
30
noise signals from the pairs of microphones are added together. The output signals after
adding both the noise and speech signal are named as , , , and .
The microphones output which are straight forward to the source signal are and ,
the output from these pair of microphone is and is considered as the output of fixed
path in the GSC or as a main lobe. All the other elko outputs i.e., and
are considered as the output from the blocking matrix of the GSC or as the sidelobes
and are applied to the adaptive path of the GSC. The adaptive part of the GSC consists of an
unconstrained NLMS algorithm.
Fig. 5. 6. Structure of Proposed Elko Based Generalized Sidelobe Canceller Model
Let us assume that the signal arriving the and as and and the angle of
arrival of the speech signal is considered as and . The speech signal from the is
multiplied with forward cardioid and the signal from the is multiplied with backward
cardioid is given as
8
7
3
4
5
6
Chapter-5
31
where is the distance between the microphones and is given as which is
equivalent to 1cm distance, ,
and is the speed of sound.
Elko algorithm is applied on the signals which are obtained by multiplying with
forward and backward cardioid. In the elko algorithm, initially a unit delay is applied to the
signals the delayed signals is considered as and . From these delayed signals the
forward and backward cardioids are obtained as
The output from the elko algorithm for speech as the input signal is given as
where is a constant.
Similarly noise signal is passed through the microphones and the output of the elko
algorithm with noise as input is given as . The output of the elko algorithm for and
with speech and noise as input is given as
In the similar manner elko algorithm is applied to all the other microphone pairs and
the outputs are named as and . To enhance the speech signal the
output of the elko algorithm is applied to the adaptive part of the GSC which consist of an
NLMS algorithm. For the NLMS algorithm is considered as the reference signal. Let
us consider that all the other elko are kept in a vector form as
Chapter-5
32
The output vector of the elko algorithm is further applied to adaptive filter. The output
of the adaptive algorithm is given as
where is the number of microphones and represents the vector of the output elko
vector. The total output of the proposed system is given as
where is the weight vector of the adaptive filter for the elko vector . These
filter weights are updated by the NLMS algorithm. Weight update equation for the NLMS
algorithm is given as
where is given by
The value of is given as for the stability of the system the value of the should
be very small. In this way an elko based GSC is implemented to enhance the speech signal
from the noisy speech signal.
Chapter-5
33
5.5 Graphical User Interface
MATLAB code is performed by command-line-operation which is a bit difficult to
understand the program during the execution. Most of the people interested to perform the
task simply by hiding the unnecessary clutters and technicality that lies in the program. A
user friendly interface is need for simplification of entry point of the program and
encapsulation of its functional behavior. A graphical front-end such as GUI is used in
MATLAB to perform the task simply by hiding unnecessary clutters. GUI is used for the
pictorial representation of the program. GUI uses graphics and text input to make a familiar
environment to the user for the execution of the program. GUI based programs must be
prepared for mouse clicks. Each control in the GUI has user-written routine know as call
back, used to call back MATLAB to ask it to do things. The execution of the call back is
triggered by the user action such as clicking a mouse button, selecting a menu item or
pressing the screen button etc. GUI then responds to these events and this type of
programming is called as event-driven programming. In event-driven programming call back
execution is asynchronous, because it is triggered by the event external to the software [22].
GUI enables the user, to analyze the performance of the system using SNR values,
graphical representation of input signal, output signal, speech and noise distortion. The layout
of the GUI designed for evaluation of the proposed system is as shown in the Fig. 5. 7.
Fig. 5. 7. GUI layout used for the design of the proposed model
Chapter-5
34
Fig. 5. 7. represents the layout of the GUI designed for our system. Various
components used for the design of the GUI are push button, edit box, popup menu and axes.
The brief description of the components used in the GUI is explained in Table. 5.1.
TABLE 5.1
BASIC COMPONENTS USED IN THE GUI
Elements Description
Push button It is created by uicontrol call back. It triggers a call back when with
clicked with mouse.
Edit box It created by uicontrol call back. It is used to display a string and
allows the user to modify the information. It triggers a call back
when the user press the enter key.
Popup menu It is created by uicontrol. It is used to display a series of text strings
in response to a mouse click.
Axes Creates a new set of axes which is used to display the data on.
Nerves triggers a call back
Fig. 5. 7. represents the GUI layout designed for our proposed model. In the layout
the buttons start, clear and close buttons are push buttons. Input noise signal, output signal
and input SNR are made up of popup menu which are used to select one value from a list of
values. Input signal, output signal, SD and ND are made of axes component which displays
the information. Elko SNR, output SNR, SD and ND are created by edit box in which the
information is displayed.
To run the GUI designed for your model, select the type of the input noise given to
the system, output signal from the system, input SNR value, enter the value of order and step
size. By triggering the start button the call back will execute the corresponding call back
program and corresponding results are displayed. To clear the previous execution results,
clear button should be triggered. To close the GUI layout, clear button should be triggered.
Chapter-6
35
Chapter 6 Evaluation
This chapter deals with the performance evaluation of the speech enhancement system
in hearing aid which is proposed in previous chapters. Enhancement of speech depends on the
quality of the processed speech determines whether the effort is worthwhile. An evaluation of
speech enhancement requires a series of objective measures to be conducted on the proposed
system, these measures determines the quality of the output signal. The objective methods
include the measure of SNR, SNRI, the measure of PESQ value under the ITU-TP.862 is
used to measure the quality of the speech signal, SD and ND [23]. Objective measures are
widely used in speech enhancement. The advantages of the objective measure are that the
results can be easily viewed for verification and a large number of test data can be evaluated
using a computer. Though there maybe overall noise reduction in the signal, there may be
very little amount of noise remains in the processed signal. This chapter describes the test
employed and the test data used.
6.1 Test Data
6.1. 1 Clean Speech Data
The speech signals used for the test are sampled at 16 kHz. The signal is of
short speech sample of 3 seconds. Two male voices and one female voice are used as the test
date. The speech file used throughout the test is as Table 6. 1. Sentence.wav is used as a main
speech signal and the other two voice signals are used as interference signals.
TABLE 6. 1.
TYPE OF MALE AND FEMALE SENTENCES USED FOR EVALUATION
File Name Type of Voice Sentence
Setntence.wav Male Voice “She sells seashells by the seashore”
Man.wav Male Voice “Someone walking on the side walk with the
rainbow”
Woman.wav Female Voice “A good birthday has canoes with cap cakes
cargoes in rainbow color”
Chapter-6
36
6.1.2 Noise Data
Various noise signals are used for the evaluation of the proposed method. All
the noise signals are taken from Noisex-92 database [24]. All the noise signals are recorded
at these signals are resampled to as that of the sampling frequency of
the speech signal. These noise signals and interference male and female voice are added to
speech signal at different SNR values. The input SNR value are scaled to different levels such
as using the formula
where is the variance of speech signal and
is the variance of the noise
signal. The value of in the equation may be . The brief
description of various noise signals used for the evaluation are given as follows
6.1.2.1 Babble Noise
The most challenging interference noise for the speech system is babble
noise. This type of noise is highly non-stationary and is obtained by recording the voice of
people speaking in a canteen. This is obtained by recording samples from B&K
condenser microphone onto DAT. The room radius is over two meters therefore, individual
voices are slightly audible. The sound level during the recording process was . It is
the most difficult noise for speech enhancement. The power spectrum of the babble noise is
as show in the Fig. 6. 1.
Fig. 6. 1. Power Spectrum of Babble Noise
0 2000 4000 6000 8000 10000 12000 14000 16000-30
-20
-10
0
10
20
30
40
50
Frequency in [Hz]
Pow
er in
[db]
Power spectrum of Babble Noise
Chapter-6
37
6.1.2.2 Car Interior Noise
This recording was made in Volvo car at , in the gear, on an
asphalt road, in rainy conditions. Many speech enhancement systems perform well with car
interior noise due to low pass nature of this noise filter. The power spectrum of the car
interior noise is as shown in Fig. 6. 2.
Fig. 6. 2. Power Spectrum of Car Interior Noise
6.1.2.3 Destroy Engine Noise
This type of noise is obtained by recording samples from microphone on
DAT. Sound level during the recording process is . The power spectrum of this type
of noise is as shown in Fig. 6. 3.
Fig. 6. 3. Power Spectrum of Destroy Engine Noise
0 2000 4000 6000 8000 10000 12000 14000 16000-20
-10
0
10
20
30
40
50
60
Frequency in [Hz]
Pow
er in
[db]
Power spectrum of Car Interior Noise
0 2000 4000 6000 8000 10000 12000 14000 160000
5
10
15
20
25
30
35
40
45
50
Frequency in [Hz]
Pow
er in
[db]
Power spectrum of Destroy Engine Noise
Chapter-6
38
6.1.2.4 Tank Noise This type of noise is recorded from tank by using B&K condenser
microphone onto DAT. The tank is moving at a speed of . The sound level
during the recording process was . The power spectrum of the tank noise is as shown
in Fig. 6. 4.
Fig. 6. 4. Power Spectrum of Tank Noise
6.1.2.5 Wind Noise
Wind noise is the noise caused by the turbulent airflow over and around an
object. The wind is an invisible force. When wind strikes the surface of the microphone it
produces an effect called as wind noise. The power spectrum of the wind noise is as shown in
Fig. 6. 5.
Fig. 6. 5. Power Spectrum of Wind Noise
0 2000 4000 6000 8000 10000 12000 14000 16000-10
0
10
20
30
40
50
60
Frequency in [Hz]
Pow
er in
[db]
Power spectrum of Tank Noise
0 2000 4000 6000 8000 10000 12000 14000 16000-40
-30
-20
-10
0
10
20
30
40
50
60
Frequency in [Hz]
Pow
er in
[db]
Power spectrum of Tank Noise
Chapter-6
39
6.2 Objective Measures
Various objective tests used to measure the performance of the proposed system are
described below.
6.2.1 Signal to Noise ratio
Signal to noise ratio (SNR) is used to measure to compare the level of the
desired signal to level of the background noise. The conventional method to measure the
SNR is to compute the amount of speech energy over the noise energy after the enhancement
and is given as
where is the variance of the speech signal and
is the variance of the noise
signal.
6.2.2 SNR Improvement
SNR improvement is measured by subtracting the input SNR value from that of
the output SNR value and is expressed as follows
where
is the variance of the output speech signal,
is the variance of the
output noise signal,
is the variance of the input speech signal,
is the
variance of the input noise signal.
6.2.3 Perceptual Evaluation of Speech Quality
Perceptual Evaluation of Speech Quality (PESQ) is the international standard
for objective speech quality measurement and is well known as intrusive objective speech
quality assessment method. The PESQ is an Objective measure but it based on cognitive
models of the human hearing organ to form pseudo subjective scores and it has high
correlation with real subjective tests. It is standardized as ITU-T P.862 PESQ. PESQ operates
on a transmitted (input) signal and received (output) speech signal to compute the perceptual
quality of the received signal. PESQ is used in Voice over Internet Protocol (VoIP), mobile
Chapter-6
40
transmission, in fixed networks in order to measure the quality of the speech signal. The
evaluation of system using PESQ measure is as shown in Fig. 6. 6.
Fig. 6. 6. Structure of Perceptual Evaluation of Speech Quality model
A number of objective measures examined in previous study for predicting the
intelligibility of speech in noisy conditions. The mostly used one is PESQ. Among all
objective measures considered, the PESQ measure is the most complex to compute and is one
recommended by a standardized agency i.e. International Telecommunication union (ITU-T
2000) for speech quality assessment of 3.2 KHz (narrow band) handset telephony and
narrow-band speech codec [25, 29].
The PESQ measure is computed as follows:
The original (clean) and degraded signals are first level equalized to a standard
listening level and filtered by a filter with response similar to that of standard telephone
headset. The signals are time aligned to correct for time delays, and the processed through an
auditory transform to obtain the loudness spectra. The difference in loudness between the
original and degraded signals is computed and averaged over time and frequency to produce
the prediction of subjective quality rating.
Finally the output of the system determines the PESQ value of the signal. The PESQ
delivers an output value which lies in the range between -0.5 to 4.5. PESQ values in the range
-0.5 indicates the poor quality of the voice signal. The PESQ value in the range 4.5 indicates
excellent quality of the voice signal [26].
System
under
Test
y(n) x(n)
PESQ
P.862 s(n)
v(n)
PESQ
Score
Chapter-6
41
6.2.4 Measurement of Speech Distortion
SD is defined as the spectral deviation in the power of the input clean speech
signal and the power of the processed speech signal at the output. A reference power level of
the enhanced output signal is obtained by normalizing the target speech signal. The
normalizing factor given as
SD is given by
where is the power of input speech is signal and
is the power of output speech signal.
6.2.5 Measure of Noise Distortion
ND is defined as the spectral deviation in the power of the input noise signal
and the power of the processed noise signal at the output. A reference power level of the
enhanced output signal is obtained by normalizing the target noise signal. The normalizing
factor given as
ND is given by
where is the power of input noise is signal and
is the power of output noise signal.
Chapter-6
42
6.3 Test Results with Various Noise Signals
In this thesis, we considered a source signal, noise signal and eight microphones
placed in an arc position, the distance between the microphones is considered to be 1cm and
is as shown in Fig. 6. 7., red dot indicates the source signal position, black dots indicates the
position of microphones and blue dot indicates the noise signal position.
Fig. 6. 7. Graph represents the position of source, noise signal and microphones position
we use a clean male speech signal as a test signal sampled at 16 kHz frequency which
is used for the effective validation of the system. This speech signal is corrupted with various
noise signals such as babble noise, car interior noise, tank noise, wind noise, male voice as
interference noise, destroy engine noise and female voice as interference noise signal at 0 dB,
5 dB, 10 dB, 15 dB and 20 dB for testing the system. The graphical representation of clean
speech signal “she sells seashells by the seashore” is as shown in Fig. 6. 8.
Fig. 6. 8. Graphs represent a clean speech signal
0 0.01 0.02 0.03 0.04 0.05 0.06 0.070
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1Position of Source Signal, Microphones and Noise Signal
Chapter-6
43
Fig. 6. 8. (a), 6. 15. (a), 6. 21. (a), 6. 28. (a), 6. 34. (a), 6. 40. (a), 6. 46. (a), represents
the corrupted speech signal with babble noise, car interior noise, tank noise, wind noise, man
voice as interference noise, destroy engine noise, woman noise respectively. Fig. 6. 8. (b), 6.
15. (b), 6. 21. (b), 6. 28. (b), 6. 34. (b), 6. 40. (b), 6. 46. (b) represents the enhanced speech
signal from various noise signals. Fig. 6. 10., 6. 16., 6. 23., 6. 29., 6. 35., 6. 41., 6. 47.,
represents the graph of SNR values measured. Fig. 6. 11., 6. 17., 6. 24., 6. 30., 6. 36., 6. 42.,
6. 48., represents the graphs of input and output signal PESQ score. Fig. 6. 12., 6. 18., 6. 25.,
6. 31., 6. 37., 6. 43., 6. 49., represents the graph of SD between pure clean speech signal and
enhanced speech signal from various noises. Fig. 6. 13., 6. 19., 6. 26., 6. 32., 6. 38., 6. 44., 6.
50., represents the graph of ND between the input noise signal and output noise signal. Fig. 6.
15., 6. 20., 6. 27., 6. 33., 6. 39., 6. 45., 6. 51., represents the GUI layout designed for the
proposed system with various noise signal and different input SNR values. Table 6.2, 6.3,
6.4, 6.5, 6.6, 6.7, 6.8 represents the SNRI, SD and ND values for different input SNR values.
6.3.1 Evaluation of Babble Noise
(a)
(b)
Fig. 6. 9. Graphs represent (a) Corrupted speech with babble noise at 10 dB (b) Enhanced speech signal.
The enhanced speech signal produced by the proposed method is clean with enhanced
quality without audible noise signal.
Chapter-6
44
Fig. 6. 10. Graph represents the SNR value for babble noise
Fig. 6. 11. Graph shows the PESQ value for babble noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Babble Noise
Input SNR in db
Elk
o,
Outp
ut
SN
R
Input SNR
Elko SNR
Output SNR
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
SNR in db
PE
SQ
Valu
e
PESQ of Babble Noise
input
output
Input SNR
Elko SNR
Output SNR
Chapter-6
45
Fig. 6. 12. Graph shows the SD of babble noise at 10 dB
Fig. 6. 13. Graph shows the ND of babble noise at 10 dB
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Babble Noise
Input Speech
Output Speech
0 1000 2000 3000 4000 5000 6000 7000 8000-90
-80
-70
-60
-50
-40
-30
-20Noise Distortion Graph of Babble Noise
Input Noise
Output Noise
Chapter-6
46
TABLE 6.2
REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR BABBLE NOISE
Input SNR SNRI Speech Distortion Noise Distortion
0 db 17.0935 -21.6603 -30.9882
5 db 16.9475 -28.3759 -36.8680
10 db 17.2390 -27.7635 -41.1443
15 db 17.3293 -27.3468 -45.7790
20 db 17.2475 -27.1291 -50.7423
Fig. 6. 14. GUI Layout with babble as input noise at 10dB of input SNR
The proposed system produces a good performance of results with babble noise. The
proposed system produces approximately 17 dB of SNRI with babble noise. The PESQ value
represents that the quality of the output speech signal is good compared to that of the input
signal.
Chapter-6
47
6.4.2 Evaluation of Car Interior Noise
(a)
(b)
Fig. 6. 15. Graphs represent (a) Corrupted speech with car noise at 0db (b) Enhanced speech signal.
Fig. 6. 16. Graph represents the SNR value for car noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Car Noise
Input SNR in db
Elk
o, O
utpu
t S
NR
Input SNR
Elko SNR
Output SNR
Chapter-6
48
Fig. 6. 17. Graph shows the PESQ value for car noise
Fig. 6. 18. Graph shows the SD of car interior noise at 0 dB
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
SNR in db
PE
SQ
Valu
e
PESQ of Car Noise
input
output
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Car Interior Noise
Input Speech
Output Speech
Chapter-6
49
Fig. 6. 19. Graph shows the ND of car interior noise at 0 dB
TABLE 6.3
REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR CAR INTERIOR NOISE
Input SNR SNRI Speech Distortion Noise Distortion
0 db 21.6896 -29.0254 -43.9777
5 db 22.2870 -28.4140 -47.3695
10 db 23.1764 -27.8790 -51.1349
15 db 23.3517 -27.5479 -56.8825
20 db 22.7619 -27.3463 -64.8594
The proposed method has a high performance with car interior noise as input noise.
The proposed system produces approximately 22 dB of SNRI. By listening the output speech
signal it is free from noise and a clean speech is audible at the output. The GUI layout of the
babble noise at 10db is as shown in Fig. 6. 20.
0 1000 2000 3000 4000 5000 6000 7000 8000-90
-80
-70
-60
-50
-40
-30
-20
-10
0Noise Distortion Graph of Car Interior Noise
Input Noise
Output Noise
Chapter-6
50
Fig. 6. 20. GUI Layout with car interior noise as input noise at 0dB of input SNR
6.4.3 Evaluation of Tank Noise
(a)
(b) Fig. 6. 21. Graphs represent (a) Corrupted speech with tank noise at 5db (b) Enhanced speech signal.
Chapter-6
51
Fig. 6. 22. Graph represents the SNR value for tank noise
Fig. 6. 23. Graph shows the PESQ value for tank noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Tank Noise
Input SNR in db
Elk
o,
Outp
ut
SN
R
Input SNR
Elko SNR
Output SNR
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
SNR in db
PE
SQ
Valu
e
PESQ of Tank Noise
input
output
Chapter-6
52
Fig. 6. 24. Graph shows the SD of tank noise at 5 dB
Fig. 6. 25. Graph shows the ND of tank noise at 5 dB
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Tank Noise
Input Speech
Output Speech
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-55
-50
-45
-40
-35
-30
-25
-20
-15
-10Noise Distortion Graph of Tank Noise
Input Noise
Output Noise
Chapter-6
53
TABLE 6.4
REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR TANK NOISE
Input SNR SNRI Speech Distortion Noise Distortion
0 db 18.0880 -29.0542 -31.7077
5 db 18.4749 -28.4263 -36.0378
10 db 18.8690 -27.9134 -40.3683
15 db 18.8887 -27.5661 -45.2082
20 db 18.5283 -27.3500 -50.5688
Fig. 6. 26. GUI Layout with tank noise as input noise at 5dB of input SNR
The proposed system produces a good performance of results with tank noise. The
proposed system produces approximately 18 dB of SNRI with tank noise. The PESQ value
represents that the quality of the output speech signal is good compared to that of the input
signal. By listening the output speech signal it is free from noise and a clean speech is audible
at the output.
Chapter-6
54
6.4.4 Evaluation of wind Noise
(a)
(b)
Fig. 6. 27. Graphs represent (a) Corrupted speech with wind noise at 0db (b) Enhanced speech signal.
Fig. 6. 28. Graph represents the SNR value for wind noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Wind Noise
Input SNR in db
Elk
o,
Outp
ut
SN
R
Input SNR
Elko SNR
Output SNR
Chapter-6
55
Fig. 6. 29. Graph shows the PESQ value for wind noise
Fig. 6. 30. Graph shows the SD of wind noise at 0 dB
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
SNR in db
PE
SQ
Valu
e
PESQ of Wind Noise
input
output
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Wind Noise
Input Speech
Output Speech
Chapter-6
56
Fig. 6. 31. Graph shows the ND of wind noise at 0 dB
TABLE 6.5 REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR WIND
Input SNR SNRI Speech Distortion Noise Distortion
0 db 18.3066 -29.0286 -34.3433
5 db 18.8293 -28.3215 -38.2911
10 db 19.3762 -27.7482 -42.4971
15 db 19.6856 -27.3794 -47.3313
20 db 19.6901 -27.1757 -52.5111
The proposed system produces a good performance of results with tank noise. The
proposed system produces approximately 19 dB of SNRI with wind noise. The PESQ value
represents that the quality of the output speech signal is good compared to that of the input
signal. By listening the output speech signal it is free from noise and a clean speech is audible
at the output.
0 1000 2000 3000 4000 5000 6000 7000 8000-90
-80
-70
-60
-50
-40
-30
-20
-10Noise Distortion Graph of Wind Noise
Input Noise
Output Noise
Chapter-6
57
Fig. 6. 32. GUI Layout with wind noise as input noise at 0db of input SNR
6.4.5 Evaluation of Man voice as interference Noise
(a)
(b)
Fig. 6. 33. Graphs represent (a) Corrupted with man interference noise at 5db (b) Enhanced speech signal.
Chapter-6
58
Fig. 6. 34. Graph represents the SNR value for man voice as interference noise
Fig. 6. 35. Graph shows the PESQ value for man voice as interference noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Male Voice as Interference Noise
Input SNR in db
Elk
o,
Outp
ut
SN
R
Input SNR
Elko SNR
Output SNR
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
SNR in db
PE
SQ
Valu
e
PESQ of Man Voice as Interference Noise
input
output
Chapter-6
59
Fig. 6. 36. Graph shows the SD of man voice as interference noise at 5 dB
Fig. 6. 37. Graph shows the ND of man voice as interference noise at 5 dB
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Man Voice as Interference Noise
Input Speech
Output Speech
0 1000 2000 3000 4000 5000 6000 7000 8000-65
-60
-55
-50
-45
-40
-35
-30
-25
-20
-15Noise Distortion Graph of Man Voice as Interference Noise
Input Noise
Output Noise
Chapter-6
60
TABLE 6.6 REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR MAN VOICE AS INTERFERENCE NOISE
Input SNR SNRI Speech Distortion Noise Distortion
0 db 14.5147 -21.6962 -30.1394
5 db 15.3550 -27.9082 -35.6224
10 db 15.5771 -27.5811 -40.0245
15 db 15.7168 -27.3656 -44.7639
20 db 15.7653 -27.2448 -49.7774
Fig. 6. 38. GUI Layout with man voice as interference noise at 5db of input SNR
The proposed system produces a good performance of results with man voice as
interference noise. The proposed system produces approximately 15 dB of SNRI with man
voice as interference noise. The PESQ value represents that the quality of the output speech
signal is good compared to that of the input signal. By listening the output speech signal it is
free from noise and a clean speech is audible at the output.
Chapter-6
61
6.4.6. Evaluation of Destroy Engine Noise
(a)
(b)
Fig. 6. 39. Graphs represent (a) Corrupted speech with destroy engine noise at 0dB (c) Enhanced speech signal
Fig. 6. 40. Graph represents the SNR value for Destroy engine noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Destroy Engine Noise
Input SNR in db
Elk
o,
Outp
ut
SN
R
Input SNR
Elko SNR
Output SNR
Chapter-6
62
Fig. 6. 41. Graph shows the PESQ value for destroy engine noise
Fig. 6. 42. Graph shows the SD of destroy engine noise at 0 dB
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
SNR in db
PE
SQ
Valu
e
PESQ of Destroy Engine Noise
input
output
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Destroy Engine Noise
Input Speech
Output Speech
Chapter-6
63
Fig. 6. 43. Graph shows the ND of destroy engine noise at 0 dB
TABLE 6.7
REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR DESTROY ENGINE NOISE
Input SNR SNRI Speech Distortion Noise Distortion
0 db 16.4261 -29.7541 -33.4305
5 db 12.3759 -28.8133 -37.7039
10 db 11.3883 -28.0778 -42.1343
15 db 12.5644 -27.6350 -46.9517
20 db 13.4878 -27.3962 -51.9464
The proposed system produces a good performance of results with destroys
engine noise. The proposed system produces approximately 13 dB of SNRI with wind noise.
The PESQ value represents that the quality of the output speech signal is good compared to
that of the input signal. By listening the output speech signal it is free from noise and a clean
speech is audible at the output.
0 1000 2000 3000 4000 5000 6000 7000 8000-45
-40
-35
-30
-25
-20
-15
-10
-5Noise Distortion Graph of Destroy Engine Noise
Input Noise
Output Noise
Chapter-6
64
Fig. 6. 44. GUI Layout with destroy engine noise at 0db of input SNR
6.4.7 Evaluation of Female voice as interference Noise
(a)
(b)
Fig. 6. 45. Graphs represent (a) Corrupted with female interference noise at 10dB (b) Enhanced speech signal.
Chapter-6
65
Fig. 6. 46. Graph represents the SNR value for female voice as interference noise
Fig. 6. 47. Graph shows the PESQ value for female voice as interference noise
0 5 10 15 200
5
10
15
20
25
30
35
40
45
50Graph Representing the SNR Value of Female Voice as Interference Noise
Input SNR in db
Elk
o,
Outp
ut
SN
R
Input SNR
Elko SNR
Output SNR
0 5 10 15 201
1.5
2
2.5
3
3.5
4
4.5
5
SNR in db
PE
SQ
Valu
e
PESQ of Female Voice as Interference Noise
input
output
Chapter-6
66
Fig. 6. 48. Graph shows the SD of female voice as interference noise at 10 dB
Fig. 6. 49. Graph shows the ND of female voice as interference noise at 10 dB
0 1000 2000 3000 4000 5000 6000 7000 8000-60
-50
-40
-30
-20
-10
0Speech Distortion Graph of Woman Voice as Interference Noise
Input Speech
Output Speech
0 1000 2000 3000 4000 5000 6000 7000 8000-70
-65
-60
-55
-50
-45
-40
-35
-30
-25
-20Noise Distortion Graph of Woman Voice as Interference Noise
Input Noise
Output Noise
Chapter-6
67
TABLE 6.8 REPRESENTS THE SNRI, SPEECH AND NOISE DISTORTION FOR FEMALE VOICE AS INTERFERENCE NOISE
Input SNR SNRI Speech Distortion Noise Distortion
0 db 13.5490 -21.7572 -30.3441
5 db 15.3606 -27.9074 -35.4025
10 db 15.4650 -27.6047 -40.0446
15 db 15.4885 -27.4057 -44.8835
20 db 15.4453 -27.2826 -49.8412
Fig. 6. 50. GUI Layout with female voice as interference noise at 10db of input SNR
The proposed system produces a good performance of results with female voice as
interference noise. The proposed system produces approximately 15 dB of SNRI with man
voice as interference noise. The PESQ value represents that the quality of the output speech
signal is good compared to that of the input signal. By listening the output speech signal it is
free from noise and a clean speech is audible at the output.
Chapter7
68
Chapter 7 Summary and Future work
7.1 Conclusion
This thesis is focused on the design, implementation and testing of an elko based GSC
system for the enhancement of the speech signal from the noisy speech signal. The proposed
elko based GSC has been implemented successfully and the performance of the system is
measured with five different noisy environments and with a male and female voice as an
interference signal to the main speech signal. The performance or the quality of the output
signal can be measured by objective measures performed on the system. Objective tests such
as SNR, SNRI, SD, ND and measure of the PESQ values have been implemented for the
evaluation of the proposed system. From the results obtained it can be concluded that the
proposed system provides a satisfactory results with the objective tests conducted on the
proposed system.
The system provides acceptable improvement of SNR for the car noise as the input
test noise signal even at low input SNR values. The proposed system provides a maximum of
approximately 22db for car noise as input test noise. By listening to the output speech signals
it is clean and free from noise. The speech quality of the output signal is measured with the
help of PESQ and it provides a value lies in the range 4.1 to 4.4 for car noise for different
input SNR values which indicates at the quality of the output speech signal is good. Car noise
provides a low speech distortion and high noise distortion values. The proposed system
performs well with babble noise provides an improvement of 17db and PESQ value lies in
the range 3.3 to 3.9 with acceptable speech and noise distortion. In case of tank and wind
noise the system provides an improvement of 18db and the PESQ value lies in the range 3.5
to 4.1. For destroy engine noise the system provides an improvement of 13db and PESQ
value lies in the range 1.9 to 3.8. For the male and female voice as interference signal the
system provides an improvement of 15db and the PESQ value lies in the range 2.7 to 4.1. By
listening to the output a small amount of interference signal is present but overall interference
signal has been reduced.
Chapter7
69
The proposed elko based GSC is less complex and computationally efficient. The
proposed system is successfully implemented and validated. For the better view the results
are presented the form of tables and graphs.
7.2 Future work
Considering the results of the experiment, the proposed elko based GSC is efficient in
reduction of noise from the noisy speech signal as the input to the system. Implementation of
the proposed elko based GSC was done in offline mode and in future the proposed method
has to be implemented in real time. Further the performance of the system should be tested
under the reverberation environment.
Reference
70
Bibliography
[1] S. Arlinger, A. Leijon, “Hearing Aids for Adults Benefits and Costs,” in The Swedish
council on Technology Assessment in Health Care, May 2003.
[2] M. Brandstein, D. ward, “Microphone Array Signal Processing Techniques and
Applications,” Ed. New York: Springer, 2010.
[3] A. Wang, K. Yao, R. E. Hudson, D. Korompis, F. Lorenzellii, S. Soli and S. Gao, ”
Microphone Array for Hearing Aid and Speech Enhancement Applications,” in Proc.
of Int. Conf. on Architectures and Processors, Los Angeles, CA, 19-21 Aug. 1996,
pp.231-239.
[4] G. W. Elko, “A Simple Adaptive First-Order Differential Microphone,” in Acoust. And
speech Research Dept. Bell Labs, Lucent Technologies, Murray Hill, NJ, Aug. 1999.
[5] J. P. Townsend, K. D. Donohue, “Stability Analysis for the Generalized Sidelobe
Canceller,” in IEEE Signal Process. Lett. Lexington, KY, USA, June 2010, pp. 603-
606.
[6] H. Puder, “Hearing Aids: An Overview of the State-of-the-Art, Challenges, and Future
Trends of an Interesting Audio Signal Processing Applications,” in proc. of 6th
Int.
Symp. on Image and Signal process. and Anal., Erlangen, Germany, Sept. 2009, pp. 1-
6.
[7] H. Luo, H. Arndt, “Digital Signal Processing Technology and Applications in Hearing
Aids,” in 6th
Int. Conf. on Signal Process., Canada, Aug. 2002, pp. 1727-1730 vol. 2.
[8] K. Kiyohara, Y. Kaneda, S. Takahashi, H. Nomura and J. Kijima, ”A Microphone
Array System for Speech Recognition,” in IEEE Int. Conf. on Acoust., Speech and
Signal Process., Musashino, Apr. 1997, pp. 215-218 vol. 1.
[9] A. Wang, K. Yao, R. E. Hudson, D. Korompis, F. Lorenzelli, S. F. Soli and S. Gao,
”A High Performance Microphone Array System for Hearing Aid Applications,” in
IEEE Int. Conf. on Acoust., Speech and Signal Process., Los Angeles, CA, May 1996,
pp. 3197-3200 vol. 6.
[10] David K. Campbell, “Adaptive Beamforming Using a Microphone Array for Hand-
Free Telephony,” M. S. Thesis, Dept. Elect. Eng., Virginia Polytechnic Institute and
State Univ., Blacksburg, Virginia, 1999.
Reference
71
[11] C. F. Scola, M. D. B. Ortega, “Direction of Arrival Estimation-A Two Microphone
approach,” M. S. Thesis, Dept. Elect. Eng., Blekinge Institute of Technology,
Karlskrona, Sweden, Sept. 2010.
[12] H. Teutsch, G. W. Elko, “First-and Second-Order Differential Microphone Array,” in
Acoust. And speech Research Dept. Bell Labs, Lucent Technologies, Murray Hill, NJ,
Aug. 1999.
[13] A. Acero, J. Droppo, M. Seltzer and I. Tashev. Audio Processing. [Online]. Available:
http://research.microsoft.com/en-us/projects/audioprocessing/default.aspx
[14] G. W. Elko, A. T. N. Pong, “A Simple Adaptive First-Order Differential
Microphone,” in IEEE ASSP Workshop on Applicat. of Signal Process. to Audio and
Acoust., Oct. 1995, pp. 169-172.
[15] G. W. Elko, “Noise-Reducing Directional Microphone Array,” U.S. Patent
US2009/0175466 A1, Jul. 2009.
[16] Basics of Beamforming [Online]. Available:
http://en.wikipedia.org/wiki/Beamforming
[17] I. Himawan, “Speech Recognition Using AD-HOC Microphone Arrays,” Ph.D.
dissertation, Dept. Elect. Eng., Queensland Univ., of Technology, Queensland, 2010.
[18] M. Zhang, M. H. Er, “Adaptive Beamforming by Microphone Array,” in IEEE Global
Telecomm. Conf., Nov. 1995, pp. 163-167 vol.1.
[19] A. Bouacha, F. Debbat and F. T. Bendimerad. (2008, Jan.). Modified Blind
Beamforming Algorithm For Smart Antenna System. [Online]. Available:
http://jre.cplire.ru/jre/jan08/3/text.html
[20] D. N. Johnson, D. E. Dudgeon, “Array Signal Processing,” Ed. New Jersey, Prentice-
Hall, 1993, Ch. 7, pp. 349-413.
[21] A. A. Gareta, “A Multi-Microphone Approach to Speech Processing in a Smart-room
Environment,” Ph.D. dissertation, Dept. Signal Theory and Commun., Universitat
Polit’ecnica de Catalunya, Barcelona, 2007.
[22] “Matlab Creating Graphical User Interface.” Ed. Natick, The Math Works, 2011, Ch.
2, pp. 2.2-2.36.
Reference
72
[23] Y. Hu, P. C. Loizou, “Evaluation of Objective Quality Measures for Speech
Enhancement,” in IEEE Trans. On Audio, Speech and Language Process., Dallas, TX,
Jan.2008, pp. 229-238.
[24] Noisex-92 database, taken from Signal Process. Inform. Base. [Online]. Available:
http://spib.rice.edu/spib/select_noise.html
[25] A. W. Rix, J. G. Beerends, M. P. Hollier and A. P. Hekstra, “Perceptual Evaluation of
Speech Quality-A New Method for Speech Quality Assessment of Telephone
Networks and Codecs,” in IEEE Int. Conf. on Acoust., Speech and Signal Process.,
Ipswich, 2001, pp. 749-752 vol.2.
[26] P. Stefan, T. Uhl, “Quantifying the Suitability of Reference Signals for the PESQ
Algorithm,” in Third Int. Conf., on Commun. Theory, Rel. and Quality of Service, June
2010, pp. 110-115.
[27] B. D. V. Veen, K. M. Buckley, “Beamforming: A Versatile Approach to Spatial
Filtering,” in IEEE, ASSP Mag., USA, April 1998, pp. 4-24.
[28] Description of Acoustic beamforming [Online]. Available:
http://www.lmsintl.com/acoustic-beamforming
[29] Y. Hu, P. C. Loizou, “Evaluation of Objective Quality Measures for Speech
Enhancement,” in IEEE Trans. On Audio, Speech and Language Process., Dallas, TX,
Jan.2008, pp. 229-238.