FEATURE EXTRACTION FROM HEART SOUND 31-12-2010

PSZ 19 :16 (Pind. 1/97)

UNIVERSITI TEKNOLOGI MALAYSIA

BORANG PENGESAHAN STATUS TESIS***

JUDUL: FEATURES EXTRACTION OF HEART SOUNDS USING TIME-FREQUENCY

DISTRIBUTION AND MEL-FREQUENCY CEPSTRUM COEFFICIENT

SESI PENGAJIAN : 2005/2006

Saya MASNANI BT MOHAMED_________ (HURUF BESAR)

mengaku membenarkan tesis (PSM/Sarjana/Doktor Falsafah)* ini disimpan di Perpustakaan Universiti

Teknologi Malaysia dengan syarat-syarat kegunaannya seperti berikut:

1. Tesis adalah hak milik Universiti Teknologi Malaysia.

2. Perpustakaan Universiti Teknologi Malaysia dibenarkan membuat salinan untuk tujuan pengajian

sahaja.

3. Perpustakaan dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara institusi

pengajian tinggi.

4. * * Sila tandakan ( �� )

Disahkan oleh

_____________________________ ______________________________

(TANDATANGAN PENULIS) (TANDATANGAN PENYELIA)

Alamat Tetap:

LOT 530, KAMPUNG TENDONG, PROF. IR. DR. SHEIKH HUSSAIN

17030 PASIR MAS, BIN SHAIKH SALLEH__________

KELANTAN Nama Penyelia

Tarikh : 2 MAY 2006 __ Tarikh : 2 MAY 2006__________

CATATAN : * Potong yang tidak berkenaan

** Jika tesis ini SULIT atau TERHAD, sila lampirkan surat daripada pihak

berkuasa/organisasi berkenaan dengan menyatakan sekali sebab dan tempoh

tesis ini perlu dikelaskan sebagai SULIT atau TERHAD.

*** Tesis dimaksudkan sebagai tesis bagi Ijazah Doktor Falsafah dan Sarjana secara

penyelidikan, atau disertai bagi pengajian secara kerja kursus atau

penyelidikan, atau Laporan Projek Sarjana Muda (PSM).

��

SULIT

TERHAD

TIDAK TERHAD

(Mengandungi maklumat yang berdarjah keselamatan

atau kepentingan Malaysia seperti yang termaktub di

dalam AKTA RAHSIA RASMI 1972)

(Mengandungi maklumat terhad yang telah ditentukan

oleh organisasi / badan di mana penyelidikan dijalankan)

“I/We* hereby declare that I/we* have read this thesis and in my/our* opinion this

thesis is sufficient in terms of scope and quality for the award of the degree of

Master of Engineering (Electrical-Electronic & Telecommunication)”

Signature : ....................................................................

Name of Supervisor : Prof. Ir. Dr. Sheikh Hussain Shaikh Salleh

Date : 2 May 2006

FEATURES EXTRACTION OF HEART SOUNDS USING TIME-

FREQUENCY DISTRIBUTION AND MEL-FREQUENCY CEPSTRUM

COEFFICIENT

MASNANI BT MOHAMED

A thesis submitted in fulfillment of the

requirements for the award of the degree of

Master of Engineering (Electrical – Electronic & Telecommunication)

Faculty of Electrical Engineering

Universiti Teknologi Malaysia

MAY 2006

ii

I declare that this thesis entitled “Features Extraction of Heart Sounds using Time-

Frequency Distribution and Mel-Frequency Cepstrum Coefficient ” is the result of

my own research except as cited in the references. The thesis has not been accepted

for any degree and is not concurrently submitted in candidature of any other degree.

Signature : ………………………..

Name : Masnani Bt. Mohamed

Date : 2 May 2006

iii

To my beloved parents, Mohamed B. Shafie and Siti Zaharah Bt. Othman, my lovely

brothers and sisters

iv

ACKNOWLEDGEMENTS

I would like to thank my supervisor, Prof. Ir. Dr. Sheikh Hussain B. Shaikh

Salleh for giving me the opportunity to work on this project and always giving me

lots of good advices. I would also like to thank En. Kamarulafizam B. Ismail for

giving me the opportunity to cooperate with him. He has been a great source of

information and his invaluable guidance and timely input has helped me throughout

my Master’s project. His good guidance has given me an exposure to learn a lot of

new things.

A great thank to Mr. Asasul Islam for providing me the heart sounds data for

my analysis. I would also like to thank all my colleagues in UTM & KUITTHO for

their supports and helping me to complete this project successfully. Last but not

least, a special thanks to my family especially my beloved parents, brothers and

sisters who have supported me all through their life and provided constant

encouragement.

v

ABSTRACT

Heart sounds analysis can provide lots of information about heart condition

whether it is normal or abnormal. Heart sounds signals are time-varying signals

where they exhibit some degree of non-stationary. Due to these characteristics,

therefore, two techniques have been proposed to analyze them. The first technique is

the Time-Frequency Distribution using B-Distribution, used to resolve signal’s

components in the time-frequency domain and specifies the frequency components

of the signal that changing over time. Another proposed technique is the Mel-

Frequency Cepstrum Coefficient, used to obtain the cepstrums coefficients by

resolving signal’s components in the frequency domain. An experiment is presented

to extract features of heart sounds using both mentioned techniques and compare

their performances. Both techniques are discussed in details and tested against ideal

simulations of 50 heart sound signals including normal and abnormal signals. All

simulations are done using Matlab software except for MFCC where it has used the

Microsoft Visual C++ software. A brief description of SVD is included to the

technique using time-frequency distribution. Also, a brief description of Neural

Network is used to verify and to compare the performances results of the two

techniques with regard to the values of hidden node, learning rate and momentum

coefficient. The results showed that performance of the TFD can be achieved up to

90% whereas MFCC is only 80%. Therefore, the TFD technique is chosen as the

best technique to analyze and to extract features of the non-stationary signals such as

the heart sounds signals.

vi

ABSTRAK

Analysis degupan jantung dapat memberikan banyak maklumat tentang

keadaan jantung sama ada ia normal atau tidak. Isyarat degupan jantung sentiasa

berubah-ubah, menunjukkan bahawa ia adalah isyarat yang tidak pegun. Disebabkan

oleh ciri-ciri tersebut, maka dua teknik khas telah disarankan untuk menganalisanya.

Teknik yang pertama adalah menggunakan taburan masa-frekuensi (TFD) dengan

jenis taburan-B (B-Distribution) untuk merungkaikan komponen-komponen isyarat

dalam domain masa-frekuensi. Satu lagi teknik yang disarankan adalah mengunakan

pekali Mel-Frekuensi Sepstrum (MFCC) bagi mendapatkan pekali sepstrum dengan

merungkaikan komponen-komponen isyarat dalam domain frekuensi. Satu

eksperimen telah dilakukan bagi mengekstrak ciri-ciri yang ada pada bunyi degupan

jantung menggunakan dua teknik tersebut dan membandingkan tahap pencapaian

yang diperolehi. Kedua-dua teknik telah dibincangkan dengan terperinci dan telah

diuji dengan mensimulasi sebanyak 50 isyarat degupan jantung yang terdiri daripada

isyarat normal dan abnormal. Kesemua teknik simulasi tersebut telah dilakukan

menggunakan perisian Matlab kecuali MFCC menggunakan perisian Microsoft

Visual C++. Terdapat penerangan ringkas tentang penguraian nilai tunggal (SVD)

yang digunakan bersama teknik TFD. Juga disertakan huraian mengenai rangkaian

saraf tiruan (ANN) untuk menentukan pencapaian kedua-dua teknik tersebut

berdasarkan kepada jumlah lapisan tersembunyi, kadar latihan dan kadar momentum.

Keputusan telah menunjukkan bahawa pencapaian teknik TFD telah mencecah 90%

manakala teknik MFCC pula hanya 80%. Jadi, teknik TFD merupakan teknik yang

terbaik untuk menganalisa dan mengekstrak ciri-ciri yang ada pada isyarat yang tidak

pegun seperti isyarat degupan jantung.

vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENTS iv

ABSTRACT v

ABSTRAK vi

TABLE OF CONTENTS vii

LIST OF FIGURES xi

LIST OF TABLE xiii

LIST OF ABBREVIATIONS xiv

LIST OF SYMBOLS xvi

1 INTRODUCTION 1

1.1 Project Background 1

1.2 Project Objectives 2

1.3 Scope of Work 2

1.4 Heart Sounds 3

1.5 Time-Frequency Distributions 5

1.6 Mel-Frequency Cepstrum Coefficient 6

1.7 Thesis Outline 8

viii

2 LITERATURE REVIEW 8

2.1 Time-Frequency Representations/Distributions 8

2.2 Cross-terms Elimination (Signal-to-Interference Ratio) 11

2.3 B-Distribution 12

2.4 Singular Value Decomposition 14


2.6 Neural Network 16

3 TIME-FREQUENCY DISTRIBUTION TECHNIQUE 18

3.1 Introduction 18

3.1.1 Time-Frequency Transforms 19

3.2 General Signal Representations 20

3.3 General Form of Time-Frequency Distributions 21

3.3.1 Properties of the Cohen Class of Distributions 22

3.3.2 Reduced Interference Distributions 24

3.3.2.1 The B-Distribution 25

3.3.2.2 The B-Distribution Kernel 26

3.3.2.3 The B-Distribution Properties 27

3.4 Singular Value Decomposition 28

3.4.1 Theories 29

3.4.2 Define Items 30

3.4.3 Formula and Criteria 30

3.4.4 Benefits of using SVD 31

4 MEL-FREQUENCY CEPSTRUM COEFFICIENT

TECHNIQUE 33

4.1 Introduction 33

4.1.1 Discrete Fourier Transform 34

4.1.2 Convolution 36

4.2 Short-Term Analysis 37

4.3 Cepstrum 38


ix

5 NEURAL NETWORK 43

5.1 Introduction 43

5.2 The Multilayer Perceptron 44

5.3 The Logistic Activation (Sigmoid) Function 45

5.4 The Backpropagation (BP) Algorithm 46

5.4.1 Learning Mode 47

5.4.2 The Generalized Delta Rule (G.D.R) 47

5.5 Summary of the BP Algorithm 49

5.6 Parameters for Neural Network 50

6 METHODOLOGY 52

6.1 Introduction 52

6.2 Methodology 53

6.2.1 Time-Frequency Distribution Technique 54

6.2.2 SVD Implementation 57

6.2.3 Mel-Frequency Cepstrum Coefficient 60

6.2.4 Verification Technique 63

7 RESULTS AND DISCUSSION 65

7.1 Introduction 65

7.2 Time-Frequency Distribution 66

7.2.1 B-Distribution of Normal Heart Sounds 66

7.2.2 B-Distribution of Abnormal Heart Sounds 68

7.2.3 SVD of Normal Heart Sounds 70

7.2.4 SVD of Abnormal Hart Sounds 71


7.3.1 MFCC of Normal Heart Sounds 73

7.3.2 MFCC of Abnormal Heart Sounds 75

7.4 Neural Network 77

7.4.1 Verification of TFD 78

7.4.2 Verification of MFCC 80

7.5 Discussion 82

x

8 CONCLUSIONS AND RECOMMENDATION 84

8.1 Conclusions 84

8.2 Recommendation 86

8.2.1 Modified MFCC 87

REFERENCES 89

xi

LIST OF FIGURES

FIGURE NO. TITLE PAGE

1.1 Heart sounds components 3

4.1 Discrete Fourier Transform 35

4.2 Short-term analysis 38

4.3 Example of signal magnitude spectrum 39

4.4 Cepstrum 39

4.5 Computing of Mel Cepstrum 40

4.6 Triangular filter used to compute Mel Cepstrum 41

5.1 A two-layer network 44

5.2 The architecture of a typical MLP network 44

5.3 Sigmoid activation function with different values of c 46

5.4 Configuration for training ANN using BP algorithm 47

5.5 Typical ANN model 48

6.1 Feature extraction techniques and verification 53

6.2 Technique using TFD/SVD 55

6.3 Technique using MFCC 60

6.4 Steps of computing MFCC 61

6.5 Steps of backpropagation algorithm 64

7.1 Normal heart sound; e.g. 1 66

7.2 Normal heart sound; e.g. 2 67

7.3 Abnormal heart sound; e.g. 1 68




xii

7.7 SVD of normal heart sound; e.g.1 70

7.8 SVD of normal heart sound; e.g.2 71

7.9 SVD of abnormal heart sound; e.g.1 71

7.10 SVD of abnormal heart sound; e.g.2 72

7.11 MFCC of normal heart sound; e.g.1 73



7.14 MFCC of abnormal heart sound; e.g.1 75



7.17 Verification of TFD (number of hidden neurons) 78

7.18 Verification of TFD ( learning rate ) 78

7.19 Verification of TFD ( momentum coefficient ) 79

7.20 Verification of MFCC ( number of hidden neurons ) 80

7.21 Verification of MFCC ( learning rate ) 80

7.22 Verification of MFCC ( momentum coefficient ) 81

8.1 Basic structure of modified MFCC 87

xiii

LIST OF TABLE

TABLE NO. TITLE PAGE

7.1 Performance comparison between the TFD and MFCC 82

xiv

LIST OF ABBREVIATIONS

AF - Ambiguity Function

ANN - Artificial Neural Network

BoF - Bank of filters

BP - Backpropagation

BPN - Backpropagation Neural Network

DCT - Discrete Cosine Transform

DFT - Discrete Fourier Transform

DSP - Digital Signal Processing

FFT - Fast Fourier Transform

G.D.R - Generalized Delta Rule

GMM - Gaussian Mixture Model

IDFT - Inverse Discrete Fourier Transform

IF - Instantaneous Frequency

Im - Imaginary

MFCC - Mel Frequency Cepstrum Coefficient

MLP - Multilayer Perceptron

MLSA - Mel Log Spectrum Approximation

PC - Principal Component

PCA - Pricipal component analysis

PCG - Phono-cardiographic

PE - Processing element

PWVD - Pseudo-Wigner-Ville distribution

Re - Real

RMS - Root Mean Square

SBC - Subband Based Cepstral

xv

SCG - Scaled Conjugate Gradient

SNR - Signal-to-noise ratio

STFT - Short-Time Fourier Transform

SVD - Singular Value Decomposition

SWWVD - Smooth windowed Wigner-Ville distribution

TFA - Time Frequency Analysis

TFD - Time-Frequency Distribution

TFR - Time-Frequency Representation

VQ - Vector Quantization

WVD - Wigner-Ville distribution

WWVD - Windowed Wigner-Ville distribution

xvi

LIST OF SYMBOLS

Az (v,τ) - symmetrical ambiguity function

c - firing angle control

Ck - cosine function

Ei - instantaneous energy

Emin - minimum error

f - frequency

fi - instantaneous frequency

f(netj) - sigmoid function

G(n,m) - discrete-time expression of time-lag kernel

G(t,τ) - time-lag kernel

g(v,τ) - Kernel function

h[i] - number of samples of long system’s impulse response flipped

left-or-right

Hj[k] - transfer function of filter j

Hz - hertz

i,D - number of cepstrum coefficient

j,p - number of filter outputs

J(w) - lower threshold on the sum squared error

k - samples

kHz - kilo Hertz

l - latent variables dimension

m - number of rows of matrix X

Mag[k] - magnitudes notation

Mel(f) - Mel frequency

mj - log band-pass filter output amplitudes

xvii

ms - millisecond

n - number of columns of matrix X

N - number of filters

Ns - number of samples

Nw - window size

Oj - output of neuron j

Ok - output of neuron k

P(f ) - magnitude spectrum

Phase[k] - phase notation

P(M) - Mel spectrum

r - rank of matrix X

Rz (t,τ) - instantaneous autocorrelation function

s - second

S, σj - singular values

S(f) - frequency domain representation of a signal

Sk - sine function

s(t) - time domain representation of a signal

T - iteration number

t, τ - time

Tz - total signal duration

U - left singular vectors

V - right singular vectors

v - frequency variable

Wji (t+1) - adaptation weight between input (i) and hidden (j) layers

Wkj (t+1) - adaptation weight between output (k) and hidden (j) layers

w(n) - Hamming window function

X - dataset in matrix mxn

x[i] - input discrete signal

XT - transpose of matrix X

y[i] - output discrete signal

z(n) - discrete-time expression of analytic signal

z(t) - analytic signal (associated with real)

∆w - adaptation weights

θj - bias weights of neuron j

xviii

θk - bias weights of neuron k

ρ(t,f) - Cohen’s class of distributions

φ, fc, α - constant argument

λj - eigenvalues

δj - error signal through layer j

δk - error signal between the output and hidden layers

Γ(.) - gamma function

τg(f) - group delay

θ(f) - instantaneous phase in frequency domain

ϕ - instantaneous phase in time domain

η - learning rate

θ(M) - log mel spectrum

αm - momentum rate

ℜ{β} - real value of β

ℜ{γ} - real value of γ

β - smoothing parameter

CHAPTER 1

INTRODUCTION

1.1 Project Background

This project is focused on the problem of heart sounds analysis using an

integration of signal processing techniques and artificial neural networks. This

includes feature extraction technique, verification technique and estimation of

performance with related parameters. It has proposed two techniques for feature

extraction analysis. The first technique is emphasizing on Time-Frequency

Distributions (TFD). It used to choose a distribution from a group of bilinear time-

frequency distributions that satisfies the TFD properties. In that case, the B-

distribution was chosen because it satisfied the properties of TFD and it performed

well in reducing the cross-terms. Another technique is using Mel Frequency

Cepstrum Coefficient where the outputs are in terms of cepstrum coefficients. For

verification analysis, both of the techniques mentioned above are further simulating

using neural network and after that the performances were compared between the

both proposed techniques.

2

1.2 Project Objectives

The main objective of this work is to choose the best technique to extract

features of heart sounds signals. This can be achieved by comparing two proposed

techniques; Time-Frequency Distribution (B-distribution) and Mel Frequency

Cepstrum Coefficient. The best technique will be chosen according to the

performance accuracy.

1.3 Scope of Work

Different heart sounds were produced when the cardiac system is not in a

proper manner of working, which will produce the heart irregularities or heart

diseases. A good technique needs to be used to extract the features of heart sounds in

order to detect the diseases. Different features will represent different heart diseases.

This project has proposed two techniques that can be used for feature

extraction of heart sound signals. Both of them are outperformed their own classes

compared to others. The first technique is using Time-Frequency Distribution with

Singular Value Decomposition. The second technique is focusing on the Mel

Frequency Cepstrum Coefficient. The data used to implement both techniques are

taken from Centre of Biomedical in UTM, Skudai. They are actually the heart sound

signals including normal and abnormal signals. The normal heart sounds are taken

from healthy persons and the abnormal heart sounds are taken from patients that are

suffering from various kinds of diseases.

For the first technique, the heart sound signals are transformed into time-

frequency domain using bilinear time-frequency distribution. The transformation is

3

done using B-distribution with some parameters setting and the outputs from that

particular distribtuion are then dimensionality reduced using Singular Value

Decomposition. The results after that simulated further using neural network for

verification and performance analysis. All simulations are done using Matlab. The

second technique is different from the first one because the analysis is done based on

frequency analysis using Mel Frequency Cepstrum Coefficient. The heart sound

signals are extracted using MFCC with mel-frequency scaled. The simulation is

done using Microsoft Visual C++. The outputs of MFCC are actually the cepstrum

coefficients that were going to be simulated further using neural network for

performance analysis. Lastly, the performances accuracies from both techniques are

then compared to each other.

1.4 Heart Sounds

Figure1.1 Heart sound components

The heart sounds are generated by mechanical vibration of heart and

cardiovascular where they provide abundant information about them while the

measurement is noninvasive and low cost. Heart sounds and murmurs are the

important parameter used in diagnosing the heart condition and it can be captured by

using phonocardiogram or heart auscultation. Classically the sounds made by a

healthy heart are conceived as being a nearly periodic signal consisting of four

components. These four parts are referred to as the first, second, third and fourth

4

heart sounds. The first two heart sounds give rise to the familiar ‘lub-dup’ beating

sound of the heart and tend to dominate the Phono-CardioGraphic (PCG) signals.

The first heart sound is caused by the closure of the mitral and tricuspid valves. The

second heart sound is due to the closure of the aortic and pulmonary valves. The

four components of heart sounds are stated below:

1. First Heart Sound;

First heart sound is the effect of closing the tricuspid and mitral valve at the

beginning of ventricle systolic. There are four component of the first heart

sounds [22]:

(i) The first component is the effect of ventricular contraction and blood

movement towards atrio-ventricular valve. This occur at beginning of

ventricle systolic.

(ii) The second component is the effect of atrioventricle clossure.

(iii) The third component is reflected the opening of semilunar valve and

the beginning of blood ejection.

(iv) The fourth component represent the maximum blood ejection from

ventricle to aorta.

2. Second Heart Sound;

Second heart sound represents the vibrations as a result from closure of

semilunar valve at the end of ventricle systolic. Since there are two

component of semilunar valve, the second heart sound is a combination of

two components. The aortic valve closed earlier than the closing of

pulmonary valve.

3. Third Heart Sound;

The third heart sound result from vibration setting by early filling of ventricle

during ventricle diastole.

4. Fourth Heart Sound;

The fourth heart sound is caused by rapid filling of ventricle with blood

during atrium systole. It also marked the end of ventricle diastole.

5

Each beat is separated by an interval of the order of 1s, with each heart sound

having duration of roughly 50ms. The interval between beats varies even in a patient

at rest because of respiration. Similarly the exact nature of each beat varies from

beat to beat. The result is a signal which is non-periodic, even though it has a

repetitive character.

The heart murmurs occur as the additional components in the PCG signal,

most often arising in the interval between the first and second heart sound. Heart

murmurs are the result of turbulent blood flow, which produces a series of many

vibrations. The murmur signal is often of much smaller amplitude than either of the

heart sounds. Many murmurs are described as “whooshing” sounds and are believed

to be derived from flow noise. The heart murmurs will produce the abnormal heart

sounds. There are four main factor of producing murmurs [17]:

(i) High rates of flow through normal and abnormal valves

(ii) Forward flow through a constricted or irregular valve or into dilated vessels.

(iii) Backward flow through an incompetent valve, septal defect, or patient ductus

arteriosus.

(iv) Decreased viscosity, which causes increased turbulent and contributes to the

production and intensity of murmurs.

1.5 Time-Frequency Distributions

Many signals encountered in real-world situations are exhibit some degree of

non-stationarity where the frequency content changes over time. One of the most

common applications is heart sound signals processing. Classical signal analysis

tools, however, do not take this into account, assuming that the signal characteristics

are stationary. A solution to the problem of representing non-stationary signals is

found in their joint time and frequency representations which characterized the exact

6

behavior of the time-varying frequency content of the signal. Time-frequency

analysis methods are capable of detecting heart murmurs and vital information to the

classification of heart sounds and murmurs. Therefore, Time-Frequency Analysis is

used to represent the heart sounds in time-frequency domain by mapping the one-

dimensional time-domain signal into a two-dimensional function of time and

frequency.

The introduction of time-frequency analysis (TFA) has led to define new

tools to represent and characterize the time-varying contents of non-stationary

signals using time-frequency distributions (TFDs) [2, 7, 15], also for removing noise

and interference from a signal. Among the most studied time-frequency distributions

are the quadratic distributions. In this paper, a member of the quadratic class of

TFDs is proposed, referred to as the B-distribution, which can resolve close signals

in the time-frequency domain that other members fail to do so. In addition to that,

the B-distribution is shown to outperform existing reduced interference distributions

in suppressing the cross-terms of a multicomponent signal, while keeping a high

time-frequency resolution. The performance of this technique is depending on the

value of smoothing parameter applied to the signal analysis. This condition is

evaluated using the simulation on the heart sound signals using Matlab.

1.6 Mel-Frequency Cepstrum Coefficient

A representation of heart sounds using Mel-Frequency Cepstrum Coefficient

(MFCC) would be provided by a set of cepstrum coefficients. These coefficients are

the results of a cosine transform of the real logarithm of the short-term energy

spectrum expressed on a mel-frequency scale [32]. The MFCC are also an efficient

method to extract any kind of features [8]. The number of resulting mel-frequency

cepstrum coefficients is practically chosen relatively low, in the order of 12 to 20

coefficients. However, in many cases of MFCC analysis, the 0th coefficient of the

7

MFCC cepstrum is ignored because of its unreliability [24]. In fact, the 0th

coefficient can be regarded as a collection of average energies of each frequency

bands in the signal that is being analyzed. The energy of heart sound signal is also a

very important feature for pattern recognition. Many experiments have shown that

the performance can be improved when the energy information is added as another

model feature in addition to cepstrums.

Mel Frequency Cepstrum Coefficients (MFCC) is also used as a method that

analyzes how the Fourier transform extracts frequency components of a signal in the

time-domain. In addition, it is a representation defined as the real cepstrum of a

windowed short-time signal derived from the Discrete Fourier Transform (DFT) of

that signal. The difference from the real cepstrum is that a non-linear frequency, a

mel-scale is used. The mapping from linear frequency to mel frequency is done

using an equation as follows:

Mel(f) = 2595 log10 (1 + f/700) (1.1)

Basically, the analysis of the signal is done using Frequency Domain Analysis where

it converts a temporal signal to a frequency domain representation. The keywords

involve in this analysis as below:

• Cepstrum: a homomorphic signal processing technique that converts the

signal into a domain in which short-term and long-term variations in the

signal can be separated.

• FourierTransform: implements a variety of techniques for performing Fourier

Transforms, including the most effective fast transforms

• Spectrum: an umbrella class that encapsulates most of the frequency domain

techniques, and provides a uniform interface. This capability is used

extensively in many of our front end implementations.

8

1.7 Thesis Outline

This report has been organized into eight chapters. Chapter 1 outlines the

entire project giving a brief introduction to the time-frequency distribution technique

and Mel Frequency Cepstrum Coefficient technique. Chapter 2 provides the

literature reviews where the common references with some information that related

to the project are collected. Chapter 3 describes the time-frequency technique used

in this project by specifically elaborate the B-Distribution and its kernel. In addition,

a brief description about SVD is also included in this chapter. Chapter 4 is an

explanation about the MFCC principle and the steps involve in getting the MFCC.

Chapter 5 is having a detail explanation about neural network. The important

parameters involve in this chapter is explained further in order to get some ideas of

verification technique used in this project. Chapter 7 presents and explains the

results of signal processing experiments conducted on heart sound data including

normal and abnormal based on time-frequency distribution technique and MFCC

technique. The verification results are also attached to this chapter for performances

comparison. Chapter 8 is the last chapter of this thesis where it concludes this

project and provides suggestions for future recommendations and improvement.

CHAPTER 2

LITERATURE REVIEW

2.1 Time-Frequency Representations/Distributions

Hlawatsch and Boudreaux-Bartels (1992) have studied that Time-frequency

representations (TFRs) are powerful tools for the analysis and processing of “non-

stationary” signals for which separate time-domain and frequency-domain analyses

are not adequate. They have combined time-domain and frequency-domain analyses

to yield a potentially more revealing picture of temporal localization of a signal’s

spectral components. The TFRs are including both linear and quadratic

representations. The found that numerous TFRs which have been proposed may be

interpreted as smoothed versions of the WD, with the type of smoothing determining

the amount of attenuation of interference terms, loss of time-frequency concentration

and mathematical properties. Hence, the choice of the “best” TFR depends on the

nature of the signals to be analyzed. Once a specific TFR has been selected, the user

often has to select certain TFR parameters. Finally, the analysis result will also

depend upon the graphical representation of the TFR surface (e.g. 3D plots versus

contour-line plots).

10

White, Collis and Salmon (1996) have studied the use of time-frequency

methods in the detection and analysis of heart murmurs in Phono-CardioGraphic

(PCG) signals. These heart sounds can yield important diagnostic information. An

abnormality between the heart sounds is generically termed a murmur. Heart sounds

are clearly non-stationary signals and hence the natural analysis methods are those of

time-frequency and time-scale. In this application there is no evidence to suggest

that the analysis technique would benefit from a multi-resolution type analysis, so

they concentrated on time-frequency, rather than time-scale, methods. The method

exploits averaged versions of the Pseudo-Wigner-Ville distribution (PWVD). The

algorithms were shown to detect two types of heart murmurs and to be able to

distinguish between them. The general processing strategy they adopted consists of

initially segmenting the signal into individual beats. This segmentation process can

be performed in a variety of ways; the success of each approach depends critically on

the signal to noise ratio (SNR) of the recording under consideration. They have

presented the results illustrating that time-frequency methods are capable of

detecting heart murmurs and also of yielding information vital to the classification of

such murmurs. The methods used involved averaging of time-frequency plots, which

has intuitive value and can be interpreted in a theoretical fashion exploiting the

concepts of cyclo-stationarity.

Haghighi-Mood, and Torry (1997) have addressed the characteristics of heart

murmurs from signal theory point of view and suggests an appropriate signal

analysis method which is capable of describing the dynamics of heart murmurs. The

result of a pilot study using the proposed method indicates a distinctive pattern for

time-frequency distribution of heart murmurs which is expected to provide

information of diagnostic importance. The choice of time-frequency method is

mainly dictated by the time-bandwidth product of the signal under investigation.

While traditional Time-Frequency methods such as Short Time Fourier Transform

(STFT) and bank of filters (BoF) are known to be robust for signals of large time-

bandwidth product, their performance degrades when used with signals of fast-

varying spectra and short duration. An alternative approach to study such signals is

joint Time-Frequency Distribution (TFDs) methods which, unlike traditional

methods, make no assumption of stationarity at any time interval. For their study, a

11

number of TFD methods belonging to the Cohen class of distributions were applied

to heart murmurs. They found that the Choi-Williams distribution (CWD) with an

exponential kernel also can provide the best compromise between spectro-temporal

resolution and cross-term suppression and therefore the detection of time-frequency

dynamics of heart murmurs. Considering the generation mechanism and the

hydrodynamic models proposed for various types of murmurs, it is logical to predict

that the time-frequency pattern of murmurs contains valuable information which may

lead to non-invasive diagnosis of certain cardiac diseases such as valvular stenosis

and ventricular or atrial septal defect.

Cohen (1989) has presented a review and tutorial of the fundamental ideas

and methods of joint time-frequency distributions. He has introduced the types of

distributions and the method to obtain them. The objective of the field is to describe

how the spectral content of a signal is changing in time, and to develop the physical

and mathematical ideas needed to understand what a time-varying spectrum is. The

basic goal is to devise a distribution that represents the energy or intensity of a signal

simultaneously in time and frequency. The basic idea is to devise a joint function of

time and frequency, a distribution that will describe the energy density or intensity of

a signal simultaneously in time and frequency. This review is presented to be

understandable to the non-specialist with emphasis on the diversity of concepts and

motivations that have gone into the formation of the field.

2.2 Cross-terms Elimination (Signal-to-Interference Ratio)

Daliman and Sha’ameri (2003) have done the analysis of the heart sound and

murmurs using time frequency distribution method. The reason why they were using

time-frequency distribution because the heart sound signals are actually non-

stationary signals and time-varying signals that would be best analyzed in time-

frequency domain. The Windowed Wigner-Ville distribution (WWVD) and smooth

12

windowed Wigner-Ville distribution (SWWVD) have been used to obtain the time-

frequency representation of the signal. Determination of parameter setting of

WWVD and SWWVD will eliminate the cross-terms and improve time-frequency

representation. The accuracy of time-frequency representation is determined based

on the mainlobe width and signal-to-interference ratio. By comparing the two types

of the distributions, they found that the most accurate time-frequency representation

can be achieved using the SWWVD.

Sha’ameri and Salleh (2000) have performed an analysis of heart sounds and

murmur using time-frequency signal analysis. They used the technique using

Wigner-Ville distribution (WVD) and windowed Wigner-Ville distribution

(WWVD) that belonged to the bilinear class of time-frequency distribution. They

developed these techniques to provide high-resolution time-frequency representation

for time-varying signals. Due to the nonlinear operation involved, interference terms

were introduced in the time-frequency representation. The signals of' interest are

modeled as multicomponent signals and the characteristics of the signal in time-lag

plane are observed. From the time-lag plane, the interference components are

identified, and the appropriate window width is selected in the WWVD to remove

the interference. Results analyses showed that WWVD produced more accurate

time-frequency representation compared to the WVD and the signal-to-interference

is used to quantify the improvement. This happened because the WWVD has used

the window function to control the amount of interference present in the time-

frequency representation by removing the cross bilinear product terms.

2.3 B-Distribution

Sucic, Barkat and Boashash (1999) proved that B-distribution can resolve

close signals in the time-frequency domain that other members fail to do so. They

showed that the B-distribution is real, time and frequency shift invariant and its first

13

moment with respect to frequency yields the instantaneous frequency of the signal.

Using synthetic and real-life multicomponent signals, it has been shown that the B-

distribution achieves a better time-frequency resolution and energy concentration

around the instantaneous frequency of a signal, while still significantly suppressing

the cross-terms, than other commonly-chosen distributions for multicomponent

signals analysis. They have reviewed the fundamental concepts of the cross-terms

elimination using the ambiguity domain filtering, based on the B-distribution kernel.

The kernel has been defined in both the time-lag and the Doppler-lag domain and

they have proved that B-Distribution has satisfied most of the desirable properties

sought for a time-frequency distribution.

Boashash and Sucic (2000) have presented two novel results which are

significant for the application of time-frequency signal analysis techniques to real life

signals. First, they introduced a measure for comparing the resolution performance

of TFDs in separating closely spaced components in the time-frequency domain.

The measure takes into account key attributes of TFDs such as main-lobes, side-

lobes, and cross-terms. The introduction of this measure is an improvement of some

techniques that rely on visual inspection of plots. The performance comparison of

quadratic TFDs using the proposed resolution measure shows that the B-distribution

outperforms existing quadratic TFDs in resolving closely spaced components in the

time-frequency domain. The second part consists in proposing a methodology for

designing high resolution quadratic TFDs for the time-frequency analysis of

multicomponent signals when components are close to each other. By removing

limitations in the way desirable properties of quadratic TFDs were previously

chosen, a new set of design criteria has been defined. The combination of these two

results is an important break-through for the field of time-frequency signal analysis.

They have concluded that the B-Distribution outperforms other existing distributions

in terms of time-frequency resolution, as well as cross-terms suppression, when used

to represent signals with closely-spaced components in the time-frequency domain.

14

2.4 Singular Value Decomposition

Wall, Rechtsteiner and Rocha (2003) have described SVD methods for

visualization of gene expression data, representation of the data using a smaller

number of variables, and detection of patterns in noisy gene expression data. In

addition, they also described the precise relation between SVD analysis and Principal

Component Analysis (PCA). Their aimed is actually to provide definitions,

interpretations, examples, and references that will serve as resources for

understanding and extending the application of SVD and PCA to gene expression

analysis. An important capability distinguishing SVD and related methods from

other analysis methods is the ability to detect weak signals in the data. Even when

the structure of the data does not allow separation of data points, causing clustering

algorithms to fail, it may be possible to detect biologically meaningful patterns.

SVD allows obtaining the true dimensionality of the data, which is the rank r of

matrix X and a representation is using a reduced number of variables. This property

of the SVD is commonly referred to as dimensionality reduction.


Garcia and Garcia (2003) have done the experiments to present the

development of an automatic recognition system of infant cry, with the objective to

classify two types of cry: normal and pathological cry from dear babies. In this

study, they used acoustic characteristics obtained by the Mel-Frequency Cepstrum

technique and a feed-forward neural network as a classifier that was trained with

several learning methods, resulting better the Scaled Conjugate Gradient (SCG)

algorithm. The SCG method avoids a time consuming line-search per learning

iteration, which makes the algorithm faster than other second order Conjugate

Gradient algorithms. This work has shown good results, with the MFCC technique,

using neural network architecture.

15

Bahoura and Pelletier, (2004) have proposed a new tool based on the cepstral

analysis as feature extractor and the Gaussian Mixture Model for the classification

process. The Cepstral analysis is proposed with Gaussian Mixture Models (GMM)

method to classify respiratory sounds in two categories: normal and wheezing. The

sound signal is divided in overlapped segments, which are characterized by a reduced

dimension feature vectors using Mel-Frequency Cepstral Coefficients (MFCC) or

Subband based Cepstral parameters (SBC). The proposed schema is compared with

other classifiers: Vector Quantization (VQ) and Multi-Layer Perceptron (MLP)

neural networks. In order to improve the classification results, they also proposed the

postprocessing technique. There are also some explanation about two feature

extractors based on cepstral analysis (MFCC and SBC) are briefly presented in this

work. The best result is obtained by the MFCCGMM combination and note that the

postprocessing improves the classification result for all combinations.

Imai (1983) has presented a new technique of cepstral analysis synthesis on

the mel frequency scale. The log spectrum on the mel frequency scale (the mel log

spectrum) is considered to be an effective representation of the spectral envelope of

speech. This analysis synthesis system uses the mel log spectrum approximation

(MLSA) filter which was devised for the cepstral synthesis on the mel frequency

scale. The filter coefficients are easily obtained through a simple linear transform

from the mel cepstrum that defined as the Fourier cosine coefficients of the mel log

spectral envelope of speech. The MLSA filter has low coefficient sensitivity and

good coefficient quantization characteristics. The spectral distortion caused by

interpolation of the filter parameters of two successive frames is small. Accordingly,

the data rate of this system is very low. The log spectrum on a mel frequency scale

(the mel log spectrum) is considered to be a more effective representation of the

spectral envelope of speech than that on the linear frequency scale. The mel

cepstrum which is defined as the Fourier transform of a spectral envelope of the mel

log spectrum has a comparatively low order, hence it is an efficient parameter. The

Mel cepstrum also has the same good features as those of the conventional cepstrum.

By using the MLSA filter, a low bit rate mel cepstral analysis synthesis system was

16

obtained. In this system, the spectral envelope information is transmitted by the filter

parameter of the MLSA filter. The filter parameter is obtained by a simple linear

transform from the mel cepstrum which is defined as the Fourier cosine coefficients

of the mel log spectral envelope. The filter parameter has almost the same good

statistical properties as those of the mel cepstrum. Since the filter parameter

sensitivity of the mel log spectrum is very small, the filter parameter can be roughly

quantized. The system has fairly small spectral distortions.

2.6 Neural Network

Upadhyaya and Yan (1993) have explained about artificial neural networks

that are developed to simulate the most elementary functions of neurons in the

human brain, based on the present understanding of biological nervous systems.

These network models attempt to achieve good human-like performance such as:

learning from experiments and generalization from previous samples. A processing

element (PE) is analogous to a neuron in that it has many inputs from input signals or

from other PEs and combines (sum up) the values of the inputs, adjusted by their

weights. This sum is then subjected to a nonlinear transformation, often called a

transfer function that controls the output in accordance with the prescribed nonlinear

relationship. Back-propagation neural networks (BPN) were used to develop the

neural network models for artifact classification and defect parameters estimation.

There are several issues that need to be considered when utilizing the back-

propagation algorithm to train a neural network such as the selection of hidden layers

and nodes, and the learning options. The selection of number of hidden layers and

hidden nodes is one of the most important issues in back-propagation network

applications. The selection of hidden nodes for a fully-connected, feedforward

networks with one hidden layer is based on the two types of Rule-of-thumb.The

important algorithms for backpropagation network training are actually the learning

coefficient, momentum term that used to smooth the learning, the nonlinear transfer

function, the learning rule that specifies how connection weights are changed during

17

the learning process and Gaussian noise and RMS threshold value that adds a random

number within a special range to each node summation value in the layer.

CHAPTER 3

TIME-FREQUENCY DISTRIBUTION TECHNIQUE

3.1 Introduction

Time-frequency representations are used to analyze or characterize signals

whose energy distribution varies in time and frequency. They map the one-

dimensional time-domain signal into a two-dimensional function of time and

frequency. A time-frequency representation describes the variation of spectral

energy over time.

Time-frequency analysis provides time-localized spectral information for a

non-stationary signal as a distribution function in terms of time and frequency. Non-

stationary signal in this paper implies a signal which has time-varying frequency

components, not a statistically stationary signal. Due to the uncertainty principle,

which restricts the resolution in time and frequency, the time-frequency distribution

functions have been developed for the purpose of analysis. The basic idea is to

devise a joint function of time and frequency, a distribution, that will describe the

energy density or intensity of a signal simultaneously in time and frequency. In

addition, the method of relating a joint time-frequency distribution to a signal will be

a powerful tool for the construction of signals with desirable properties.

19

3.1.1 Time-Frequency Transforms

The Fourier Transform has become one of the most widely used signal-

analysis tools across many disciplines of science and engineering. The basic idea of

the Fourier Transform is that any arbitrary signal (of time, for instance) can always

be decomposed into a set of sinusoids of different frequencies. The Fourier

transform is generated by the process of projecting the signal onto a set of basis

functions, each of which is a sinusoid with a unique frequency. The resulting

projection values form the Fourier transform (or the frequency spectrum) of the

original signal. Its value at a particular frequency is a measure of the similarity of

the signal to the sinusoidal basis at that frequency. Therefore, the frequency

attributes of the signal can be revealed via the Fourier transform. In many

engineering applications, this has proven to be extremely useful in the

characterization, interpretation, and identification of signals.

While the Fourier transform is a very useful concept for stationary signals,

many signals encountered in real-world situations have frequency contents that

change over time. In this case, it is not always best to use simple sinusoids as basis

functions and characterize a signal by its frequency spectrum. Joint time-frequency

transforms were developed for the purpose of characterizing the time-varying

frequency content of a signal. The best-known time-frequency representation of a

time signal is known as the Short-Time Fourier Transforms (STFT). It is basically a

moving window Fourier transforms. By examining the frequency content of the

signal as the time window is moved, a two-dimensional time-frequency distribution

called the spectrogram is generated. The spectrogram contains information on the

frequency content of the signal at different time instances. One well-known

drawback of the STFT is the resolution limit imposed by the window function. A

shorter time window results in better time resolution, but leads to worse frequency

resolution, and vice versa. To overcome the resolution limit of the STFT, a wealth of

alternative time-frequency representations have been proposed.

20

There are various time-frequency transforms developed by researchers in the

signal processing community. They are broadly divided into two classes: linear

time-frequency transforms and quadratic (or bilinear) transforms. In this project, the

discussion is only concentrated on the quadratic time-frequency transform, that is B-

Distribution.

3.2 General Signal Representations

Heart sounds are actually generated by mechanical vibration of heart and

cardiovascular system. Commonly, vibration signals are represented in either the

time domain or the frequency domain. In general, the time domain representation of

a signal,

( ) ( ) ( ) ( )( )

+ ∫

==

t

i dfjtj etaetats 0

0 2 ττπφϕ (3.1)

allows a simple characterization of a signal in terms of its (instantaneous) energy,

( ) ( ) ( ) ( ) ( )tstststatEi *|| 22=== (3.2)

and instantaneous frequency,

( )( )dt

tdfi

ϕ

π2

1= (3.3)

that is, at time t the signal has an energy density of Ei(t) at a frequency of fi (t). For

multicomponent signals (i.e., signals which have more than one frequency at a given

instance of time) the ‘instantaneous frequency’ is the average frequency of the signal

at that time. The frequency domain representation of a signal,

21

( ) ( ) ( ) ( ) dtetsefAfS ftjfj πθ 2−∫== (3.4)

gives a perfect representation of a signal which consist of multiple harmonic

oscillators (i.e., with no amplitude or frequency modulation). However, it is not an

adequate representation of non-stationary signals (i.e., signals whose content change

with time).

A multicomponent non-stationary signal can be described as the

superposition of a number of monocomponent non-stationary signals, giving

( ) ( ) ( ) ( ) ( )( )

+ ∫

=== ∑∑∑

t

cc

c

dfj

c

ctj

c

c

c

c etaetatsts 0

2 ττπφϕ

(3.5)

In order to decompose (and understand) a signal of the form given, a joint time-

frequency domain representation of the signal is required.

3.3 General Form of Time-Frequency Distributions

In 1966, Cohen developed a generalized form of ‘phase-space’ distributions

from which all other time-frequency energy distributions could be derived. The

general form, which has since become known as Cohen’s class of distributions is

( ) ( ) ( ) τττ

τρ τπdvdudeuzuzvgft

fvtvuj −−

−

+= ∫∫∫

2

2*

2,, (3.6)

where g(ν, τ) is referred to as the kernel function that is used to define the properties

of the distribution and z(t) is the analytic signal associated with the real one. For

example, by setting the kernel g(ν, τ) = 1, equation (3.6) becomes

22

( ) ( ) τττ

ρ τπdvdudeuzuzft

fvtvuj −−

−

+= ∫∫∫

2

2*

2,

τττ τπ

detztzfj2

2*

2

−

−

+= ∫ (3.7)

which is the Wigner-Ville distribution and setting g(ν, τ) = |τ|βcosh

-2β(t) will give the

B-Distribution as shown below:

( )( )

( ) ττττ

ρ τπβ

β

dvdudeuzuzt

ftfvtvuj −−

−

+= ∫∫∫

2

2 2*

2cosh,

( )

ττττ τπ

β

β

dudeuzuzut

fj2

2 2*

2cosh

−

−

+

−= ∫∫ (3.8)

Rz (t,τ) is used to be the instantaneous autocorrelation function of z(t) and it is defined

as:

( )

−

+=

2*

2,

τττ tztztRz (3.9)

The symmetrical (“Sussman’s”) ambiguity function (AF), on the other hand, is

defined as the Fourier transform of Rz (t,τ) with respect to time t:

( ) dtetztzvAvtj

zπττ

τ 2

2*

2, −

−

+= ∫ (3.10)

3.3.1 Properties of the Cohen Class of Distributions

The number of distributions which can be generated from equation (3.6) is

infinite. Cohen proposed a number of restrictions on the properties of the

23

distributions to be studied and showed that certain constraints could be placed on the

kernel function, g(ν,τ) in equation (3.6) to ensure that a distribution meets the

restrictions. The constraints on the kernel function required to give certain desirable

properties have been studied and the range of desirable properties and the kernels

required to meet then is being continuously expended. An extensive range of

properties is given by [5] and [7]. Some of these are:

a) The signal energy is preserved

The signal energy is preserved if g(0,0)=1

b) The marginal condition in time

for integration over frequency=energy density in time, g(ν,0)=1

c) The marginal condition in frequency

for integration over time=energy density in frequency, g(0,τ)=1

d) Real valued distributions

the distribution will be real valued if g(ν,τ)=g* (ν,τ)

e) Invariance to time and frequency shifts

if two signals are identical except for a shift in time or frequency then the

distributions of the signals should also be identical except for a similar shift

in time or frequency. Cohen [7] showed this is true as long as the kernel

function is independent of time and frequency.

f) The first moment of the distribution equal the instantaneous frequency and

group delay

Boashash [5] showed that the first moment of a distribution in frequency is

equal to the instantaneous frequency and the first moment in time equals the

group-delay,

( ) ( )( )df

fdfg

θ

πτ

2

1−= (3.11)

24

( )

( )( )tf

dfft

dfftf

i=

∫∫

,

,

ρ

ρ and

( )

( )( )f

dtft

dtftt

gτρ

ρ=

∫∫

,

, (3.12)

if

( ) ( ),0|

,|

,00 =

∂

∂=

∂

∂== vt

v

tvg

t

tvg (3.13)

g(ν,0) = constant for all ν and g(0,τ) = constant for all τ

g) Recovery of signal

Cohen [7] showed that a signal could be recovered up to a constant phase

factor if kernel function g(ν, τ) is well defined at every point or has isolated

zeros but not regions where it is zero.

h) Finite time support

a distribution has finite time support if it is zero before the signal starts and

zero after the signal ends. For a signal to have finite time support the kernel

must met the condition:

( ) 0, 2 =−∫ dvevg vtj πτ for |τ|<2|t| (3.14)

3.3.2 Reduced Interference Distributions

Some of time-frequency distributions posses a number of mathematically

satisfying properties such as the Wigner-Ville distribution, however, it actually

produces large cross-terms when applied to multicomponent signals. This can make

interpretation difficult. Therefore, other type of distribution is used to overcome this

25

problem such as B-distribution that satisfying the properties as well as reducing

cross-terms.

3.3.2.1 The B-Distribution

The B-distribution outperforms the other distributions by achieving a better

time-frequency resolution than the spectrogram does, and by significantly

suppressing the interference that affects the Wigner-Ville distribution. Using real-

life multicomponent signals, it has been shown that the B-distribution achieves a

better time-frequency resolution and energy concentration around the instantaneous

frequency of a signal, while still significantly suppressing the cross-terms, than other

commonly-chosen distributions for multicomponent signals analysis.

The kernel function of the B-Distribution is chosen in the ambiguity domain

as a 2-dimensional function centered around the origin with sharp cut-off edges. In

this way, the kernel would allow to retain as many auto-terms energy as possible

while filtering out as much cross-terms “energy” as possible. The numbers of auto

terms and cross terms kept and filtered out are functions of the volume underneath

the two-dimensional function g(v,τ). This volume can be changed by varying a

single smoothing parameter, β. β is a real parameter that controls the sharpness of

cutoff of the two-dimensional filter in the ambiguity domain. The choice of its value

is application dependent; however, its range should be between zero and unity (0 < β

≤ 1). In general, it is found that small value of β gives the best results in terms of

cross term suppression and time-frequency resolution, but this value is given only as

an indication and is not optimal for all situations.

26

3.3.2.2 The B-Distribution Kernel

The property of the ambiguity function (AF) to have the auto-terms around

the origin in the ambiguity plane, and the cross-terms away from it, has inspired

searches for a kernel g(v, τ) such that the generalized ambiguity function, g(v, τ) Az(v,

τ), is enhanced in the vicinity of the origin and suppressed everywhere else. Barkat

and Boashash [4] recently proposed a kernel for a new quadratic TFD, known as the

B-distribution. They applied the technique of cross-terms filtering in the ambiguity

domain to the design of a low-pass filter-type kernel with a sharp cut-off, essential to

ensure the clear separation of the auto-terms from the cross-terms. The B-

distribution has its time-lag kernel defined in the following manner [4]:

( )( )

βτ

τ

=

ttG

2cosh, (3.15)

Application dependent parameter β, ( 0 ≤ β ≤1 ), controls the sharpness of the cut-off

of the filter and hence the trade-off between the cross-terms elimination and the time-

frequency resolution. The Doppler-lag kernel of B-distribution is defined as the

Fourier transform of the time-lag kernel with respect to time t:

( )( )

dtet

tvgvtj π

βτ 2

2cosh, −

∫

=

( )

dtt

evtj

∫−

=β

βτ

2

2

cosh (3.16)

Since the kernel is an even and a real-valued function, equation (3.16) can be further

simplified to:

( ) ( )( )

dtt

vttvg ∫

∞

=0

2cosh

2cos2,

β

β πτ (3.17)

Using [12] (3.985.1), the following formula is a possible solution to the integral

which must be calculated in (3.17):

27

( )( ) ( )

−Γ

+Γ

Γ=

−∞

∫ β

αγ

β

αγ

γββ

α γ

γ 2222

2

cosh

cos 2

0

jjdt

t

t (3.18)

for ℜ{β}>0, ℜ{γ}>0 and α>0.

Hence, the equation for the B-distribution kernel in the Doppler-lag domain may be

written in the following form:

( )( )

( ) ( )vjvjvg πβπββ

ττβ

β−Γ+Γ

Γ=

−

2

2,

12

(3.19)

3.3.2.3 The B-Distribution Properties

Ideally, as mentioned before, there are some desirable properties that a TFD

should satisfy [2, 14]. B-distribution has satisfied the following properties:

i) Realness

To show this property we must have [2]:

( ) ( )ττ −−= ,*, vgvg

In this case:

( )( )

( ) ( ) *}2

2{,*

12

vjvjvg πβπββ

ττβ

β+Γ−Γ

Γ−=−−

−

( )( )

( ) ( )vjvjvg πβπββ

ττβ

β−Γ+Γ

Γ=−−

−

2

2,*

12

(3.20)

Hence, we have ( ) ( )ττ −−= ,*, vgvg

ii) Time Shift Invariance

To prove that a distribution is time shift invariant, it is sufficient to note that

g(v,τ) does not depend on t [2]. From equation (3.19), the B-distribution

Doppler-lag kernel is not a function of time t.

28

iii) Frequency Shift Invariance

To prove this property, it is sufficient to note that g(v,τ) does not depend on f

[2]. Similarly, from equation (3.19), we see that the B-distribution Doppler-

lag kernel does not depend on frequency f.

iv) The Instantaneous Frequency

The first moment of the B-distribution yields the IF of the signal. That is,

( )

( )dfft

dfftff

z

z

i,

,

ρ

ρ∫=

This property is satisfied since [14]:

( )( )

( )( ) 0|

,|

,0,00,0 =

∂

∂=

∂

∂

τ

ττ vg

v

vg

and

g(v,τ) = constant ∀τ

3.4 Singular Value Decomposition

Depending on the nature of a given classification problem (the raw data, the

chosen features and the chosen classifier) dimension reduction may be fundamental

to successful classification performance. Dimensionality reduction is almost always

essential when time-frequency representations (TFRs) are used as a feature basis; the

descriptive ability of most TFRs comes at the expense of high dimensionality.

Dimensionality reduction is an essential step involved in many research areas

such as signal processing. It is difficult for researchers to manipulate and process

29

data with very high dimensions, and it is also difficult to visualize the data with more

than three dimensions. To overcome these shortcomings, many dimensionality

reduction methods have been explored, for instances, Principal Component Analysis

(PCA) and Singular Value Decomposition (SVD). To make these methods more

robust and general, the Kernel functions are adopted.

Singular Value Decomposition is an excellent unsupervised learning tool to

process source data in order to find clusters, reduce dimensionality, and find latent

variables. This means that field or experiment data can be converted into a form

which is free of noise or redundancy and that it can be better organized to reveal

hidden information.

3.4.1 Theories

Singular Value Decomposition (SVD) is the mathematical algorithm to

identify the pattern relationship in data. In addition, it is widely adopted to compress

data by reducing the number of dimensions without much loss of information.

The main procedure of SVD is to work out the eigenvalues and eigenvectors

of the covariance matrix of the entire dataset. The singular values in SVD provide

the variabilities of its respective eigenvectors. The larger the singular value is the

higher variability the respective eigenvector contains. Usually the first few

eigenvectors with higher variabilities are selected as significant Principal

Components (PCs) to compress the data into less number of dimensions [31]. Due to

the discarding of the rest feature elements, some information will lose when

recovering the original data.

30

3.4.2 Define Items

Let X represent the dataset, i.e. a m x n matrix. Let r indicates the rank of

matrix X. The equation for singular value decomposition of X is the following:

X = USVT (3.21)

where, U contains the left singular vectors. Each column vector uk in U is a left

singular vector and orthonormal to each other. V contains the right singular vectors.

Each column vector vk in V is a right singular vector and orthonormal to each other.

S is a diagonal matrix, and the non-zero elements on the diagonal are called singular

values. These elements are sorted in descending order, with the highest singular

value in the upper left index of S.

3.4.3 Formula and Criteria

To calculate U, S and V, we define the following functions:

eigValue(X) to get the eigenvalues of matrix X

eigVector(X) to get the eigenvectors of matrix X

We may calculate U, V and the singular value matrix S as followings:

)( TU XXeigValueS = , (3.22)

)( TXXeigVectorU = , (3.23)

( )XXeigValueST

V = (3.24)

( )XXeigVectorV T= (3.25)

31

It will only keep the real and positive singular values Sk in the results of equation

(3.23) and (3.25), and the corresponding uk and vk are kept. We ignore the rest

vectors in U and V. After removing the meaningless singular values in SU and SV, SU

or SV can be reduced to an r x r diagonal matrix. At last, we need to sort the singular

values Sk in descending order for both SU and SV to form SU new

and SV new

, and the

positions of the uk and vk are also changed accordingly. Now, from SUnew

and SV new

,

we may find the following:

S = SU new

= SV new

(3.26)

The singular values Sk indicate the importance of their corresponding uk and vk.

3.4.4 Benefits of using SVD

The principal motivation for dimensionality reduction is that it can help to

alleviate the worst effects of the curse of dimensionality. The main goal is to ensure

that as much of relevant information as possible is preserved in as few dimension as

possible. Other benefits of using Singular Value Decomposition are briefly

described as below:

i) Redundancy Reduction:

The sorted singular values in a Singular Value Decomposition help represent

the original matrix in less rows, by pushing the less significant rows to the

bottom of the orthogonal matrices. The matrix may thus be smoothed by

eliminating those vectors whose respective singular value is equal to, or

approaches zero. These are the vectors which least impact on the original

matrix.

32

ii) Exposing Components:

Singular Value Decomposition results in sorted eigenvectors (components).

These are very useful in analyzing patient encounters and revealing the most

predominant components. These are representative of the procedures which

most significantly describe the typical patient’s treatment for the chosen

patient profile. It is believed that the clusters formed when the source data is

projected in the dimensions of the first two components is indicative of their

constituents.

iii) Understanding Latent Variables:

Latent variables are hidden sources in the original data; they explain the

common traits in patient encounters, but cannot be measured directly. For the

observed data represented by matrix X, in n variables, the latent variables

may be expressed in l dimensions where l << n. If assume that the initial n

variables capture the whole domain, the product of SVD will be a set of

variables which includes the latent ones.

CHAPTER 4

MEL-FREQUENCY CEPSTRUM COEFFICIENT TECHNIQUE

4.1 Introduction

In this paper, the second technique approached is the Mel-Frequency

Cepstrum Coefficient (MFCC) feature. Mel Frequency Cepstrum Coefficients is a

representation defined as the real cepstrum of a windowed short-time signal derived

from the DFT of that signal. The difference from the real cepstrum is that a non-

linear frequency scale called mel-scale, need to be used for approximation. A

compact representation would be provided by a set of cepstrum coefficients that are

the results of a cosine transform of the real logarithm of the short-term energy

spectrum expressed on a mel-frequency scale [32]. The number of resulting mel-

frequency cepstrum coefficients is practically chosen relatively low, in the order of

12 to 20 coefficients. In many applications, the 0th coefficient of the MFCC

cepstrum is ignored because of its unreliability.

34

4.1.1 Discrete Fourier Transform

Fourier transform belongs to the family of linear transforms widely used in

DSP based on decomposing signal into sinusoids (sine and cosine waves). Usually

in DSP we use the Discrete Fourier Transform (DFT), a special kind of Fourier

transform used to deal with aperiodic discrete signals [26]. Actually there are an

infinite number of ways how signal can be decomposed but sinusoids are selected

because of their sinusoidal fidelity that means that sinusoidal input to the linear

system will produce sinusoidal output, only the amplitude and phase may change,

frequency and shape remain the same [26]. Discrete Fourier Transform changes an

Ns point input signal into two Ns /2 + 1 point output signals. The output signals

represent the amplitudes of the sine and cosine components scaled in a special way

that is represented by the equations:

[ ] ( )[ ] ( )sk

sk

NkiS

NkiC

/2sin

/2cos

π

π

⋅⋅=

⋅⋅= (4.1)

where Ck are Ns /2+1 cosine functions and Sk are Ns /2+1 sine functions, index k runs

from zero to Ns /2. These functions are called basis functions. Actually zero samples

in resulting signals are amplitudes for zero frequency waves, first samples for waves

which make one complete cycle in Ns points, second for waves which make two

cycles and so on. Signal represented in such a way is called to be in frequency

domain and obtained coefficients are called spectral coefficients or spectrum.

Frequency domain contains exactly the same information as the time domain and

every discrete signal can be moved back to the time domain, using operation called

Inverse Discrete Fourier Transform (IDFT). Because of this fact, the DFT is also

called Forward DFT [26]. Schematically DFT is represented in Figure 4.1.

35

Figure 4.1 Discrete Fourier Transform

The amplitudes for cosine waves are also called real part (denoted as Re[k])

and for sine waves are called imaginary part (denoted as Im[k]). This representation

of frequency domain is called rectangular notation. Alternatively, the frequency

domain can be expressed in the polar notation. In this form, real and imaginary parts

are replaced by magnitudes (denoted as Mag[k]) and phases (denoted as Phase[k])

respectively [26]. The equations for conversion from rectangular notation to the

polar notation are as follows:

[ ] [ ] [ ]( )

[ ] [ ][ ]

=

+=

k

kkPhase

xkkMag

Re

Imarctan

ImRe22

(4.2)

There are two main reasons why DFT became so popular in DSP. First is Fast

Fourier Transform (FFT) algorithm [26], developed by Cooley and Tukey in 1965,

which opened a new era in DSP because of the efficiency of the FFT algorithm. The

second reason is the convolution theorem [26], which states that convolution in time

domain is a multiplication in frequency domain and vice versa. This makes possible

to use high-speed convolution algorithm, which convolves two signals by passing

them through the Fast Fourier Transform, multiplying and using Inverse Fourier

Transform computing convolved signal.

N Samples

Input Signal

Inverse DFT

Time Domain

N s /2 +1 Samples

Sine Wave Amplitudes

N s /2 +1 Samples

Cosine Wave Amplitudes Forward DFT

Frequency Domain

Ns Samples

Input Signal

Inverse DFT

Time Domain

36

4.1.2 Convolution

An impulse is a signal composed of all zeros except one non-zero point.

Every signal can be decomposed into a group of impulses, each of them then passed

through a linear system and the resulting output components are synthesized or

added together [26]. The resulting signal is exactly the same as obtained by passing

the original signal through the system. Every impulse can be represented as a shifted

and scaled delta function, which is a normalized impulse, that is, sample number zero

has a value of one and all other samples have a value of zero. When the delta

function is passed through a linear system, its output is called impulse response. If

two systems are different they will have different impulse responses. According to

the properties of linear systems every impulse passed through it will result in the

scaled and shifted impulse response and scaling and shifting of the input are identical

to the scaling and shifting of the output [23, 26]. It means that knowing systems

impulse response will know everything about the system [9, 23, 26].

Convolution is a formal mathematical operation, which is used to describe

relationship between three signals of interest: input and output signals, and the

impulse response of the system. It is usually said that the output signal is the input

signal convolved with the system’s impulse response. Mathematical equation of

convolution for discrete signals is represented in the following (convolution is

denoted as a star):

[ ] [ ] [ ] [ ] [ ]jixjhihixiyM

j

−=∗= ∑−

=

1

0

(4.3)

where y[i] is the output discrete signal, x[i] is the input discrete signal and h[i] is M

samples long system’s impulse response flipped left-for-right. Index i goes through

the size of the output signal. Mathematics behind the convolution does not restrict

how long the impulse response is. It only says that the size of the output signal is the

size of the input signal plus the size of the impulse response minus one. Convolution

37

is very important concept in DSP. Based on the properties of linear systems it

provides the way of combining two signals to form a third signal [9, 26, 23].

4.2 Short-Term Analysis

Because of its nature, the heart sound signal is a time-varying signal or non-

stationary. It means that when it is examined over a sufficiently short period of time

(20-30 milliseconds) it has quite stable characteristics. It leads to the useful concept

of using an analysis called “short-term analysis”, where only a portion of the signal is

used to extract signal features at one time [9]. It works in the following way:

predefined length window (usually 20-30 milliseconds) is moved along the signal

with an overlapping (usually 30-50% of the window length) between the adjacent

frames. Overlapping is needed to avoid losing of information. Parts of the signal

formed in such a way are called frames. In order to prevent an abrupt change at the

end points of the frame, it is usually multiplied by a window function. The operation

of dividing signal into short intervals is called windowing and such segments are

called windowed frames (or sometime just frames). There are several window

functions can be used, but the most popular is Hamming window function, which is

described by the following equation:

( )

−−=

1

2cos46.054.0

wN

nnw

π (4.4)

where Nw is the size of the window or frame. A set of features extracted from one

frame is called feature vector. Overall overview of the short-term analysis approach

is represented in Figure 4.2.

38

Figure 4.2 Short-term analysis

4.3 Cepstrum

The cepstrum is representation of the signal normally having two components

that are resolved into two additive parts [9]. It is computed by taking the inverse

DFT of the logarithm of the magnitude spectrum of the frame. This is represented in

the following equation:

( ) ( )( )( )frameDFTIDFTframecepstrum log= (4.5)

Some explanation of the algorithm is therefore needed. Process of moving to the

frequency domain provides the operation changing from the convolution to the

multiplication. Then by taking logarithm, the operation is moving from the

39

multiplication to the addition. That is desired division into additive components.

Then it can be applied linear operator inverse DFT, knowing that the transform will

operate individually on these two parts and knowing what Fourier transform will do

with each part. The example of the process is as below:

Figure 4.3 Example of signal magnitude spectrum

From Figure 4.3, note that the signal magnitude spectrum is combined from slow and

quickly varying parts. But there is still one problem: multiplication is not a linear

operation. This can be solved by taking logarithm from the multiplication as

described earlier. Finally, the result of the inverse DFT is shown in Figure 4.4 [9]

where the two components are clearly distinctive now as the results of cepstrum

process.

Figure 4.4 Cepstrum

40


Mel-frequency cepstrum coefficient (MFCC) is also an example of feature

used to describe the non-stationary signal. It is based on the known evidence that the

information carried by low-frequency components of a signal is giving more

information rather than carried by high-frequency components [9]. The technique

used to compute MFCC is based on the short-term analysis where the signal is

divided into short time interval by windowing that produces the windowed frames,

and thus from each frame a MFCC vector is computed. MFCC extraction is similar

to the cepstrum calculation except that one special step is inserted, namely the

frequency axis is warped according to the mel-scale. Summing up, the process of

extracting MFCC is illustrated in Figure 4.5.

Figure 4.5 Computing of Mel Cepstrum

As described above, to place more emphasize on the low frequencies, one

special step before inverse DFT in calculation of cepstrum is inserted, namely mel-

scaling or mel-frequency warping. A “mel” is a unit of special measure or scale of

perceived pitch of a tone [9] and it is used to transform the power spectrum into Mel-

Frequency spectrum. It does not correspond linearly to the normal frequency, indeed

it is approximately linear below 1 kHz and logarithmic above [9]. The mapping from

linear frequency to Mel frequency is defined as:

Windowing DFT

Mel-

Frequency

Warping

Log Inverse

DFT

Input

Signals Windowed Frames

Magnitude

Spectrum Mel

Spectrum Log Mel

Spectrum MFCC

Input

Signals

41

( ) ( )700/1log2595 10 ffMel += (4.6)

One useful way to create mel-spectrum is to use a filter bank, one filter for each

desired mel-frequency component. The filter banks are linearly spaced in mel

frequencies. Every filter in this bank has triangular bandpass frequency response.

Such filters compute the average spectrum around each center frequency with

increasing bandwidths, as displayed in Figure 4.6. The triangular filters are actually

multiplied with the magnitude of the frame and log energy is computed (the resulted

warped power spectrum is then convolved with the triangular bandpass filter).

Figure 4.6 Triangular filter used to compute Mel Cepstrum

This filter bank is applied in frequency domain and therefore, it simply

amounts to taking these triangular filters on the spectrum. In practice the last step of

taking inverse DFT is replaced by taking discrete cosine transform (DCT) for

computational efficiency. The expression for computing the MFCC using the discrete

cosine transform is given as follows.

( ) ( )( )NjimNMFCCN

j

ji /5.0cos/21

−= ∑=

π (4.7)

42

where N is the number of filters, mj is the log band-pass filter output amplitudes and i

is number of cepstrum coefficients. The number of resulting mel-frequency

cepstrum coefficients is practically chosen relatively low, in the order of 12 to 20

coefficients. The zeroth coefficient is usually dropped out because it represents the

average log-energy of the frame and carries only a little information.

CHAPTER 5

NEURAL NETWORK

5.1 Introduction

Artificial Neural Networks (ANN) is composed of basic units somewhat

analogous to neurons. These units are linked to each other by connections whose

strength is modifiable as a result of a learning process algorithm. The most

commonly used of ANN architectures is the multilayer perceptron (MLP). In turn,

the learning algorithm which is almost always used to train the MLP is the

backpropagation algorithm, which is a stochastic approximation of the steepest

descent algorithm. When perceptrons are combined together in layers, it is more

common to use the logistic sigmoid function. A multilayer neural network with one

layer of hidden neurons is shown in Figure 5.1 (also called a two-layer network).

44

Figure 5.1 A two-layer network

5.2 The Multilayer Perceptron

The capabilities of single perceptrons however, are limited to linear decision

boundaries and are suitable only for problems requiring a simple linear

dichotomization of the pattern space. However, many problems require a nonlinear

partitioning of the pattern space. This can be achieved using a multilayer perceptron

network, which cascades two or more layers of perceptrons together, making it

possible to partition the pattern space with arbitrarily complex decision boundaries.

The individual perceptrons in the network are called neurons or nodes and usually

employ a logistic sigmoid function. A typical MLP network architecture is depicted

in below:

Figure 5.2 The architecture of a typical MLP network

45

The input vector feeds in to the first layer nodes; the outputs of this layer feed

into each of the second layer nodes and so on. Often, the nodes are fully connected

between layers. The terminology used here will not include the input vector as a

layer, so that the example network in the figure above is referred to as a three-layer

network. The multiple nodes in the output layer correspond to multiple classes in

pattern recognition problem. Many algorithms have been developed which adapt the

network weights so as to provide a suitable map between the set of input vectors and

the set of desired responses. In general, an algorithm is either supervised, in which

the desired response is available during the learning phase, or unsupervised, in which

clusters are formed from the input patterns. Normally, the supervised learning and

the backpropagation (BP) algorithm will be used to train the MLP.

5.3 The Logistic Activation (Sigmoid) Function

In many ANN applications, the activation functions play an important role as

in the early years, they are used to fire or unfire the neuron. However, in new neural

network paradigms, the activation functions are more sophisticatedly used.

Nowadays, many activation functions have been used to produce a continuous value

rather than discrete. One of the most popular activation functions used is the logistic

activation function or more popularly referred to as the sigmoid function. This

function is semilinear in characteristic, differentiable and produces a value between 0

and 1. The mathematical expression of this sigmoid function is:

( )jcnetj

enetf

−+

=1

1 (5.1)

where c controls the firing angle of the sigmoid. When c is large, the sigmoid

becomes like a threshold function and when c is small, the sigmoid becomes more

like a straight line (linear). When c is large learning is much faster but a lot of

information is lost, however when c is small, learning is very slow but information is

46

retained. Because this function is differentiable, it enables the BP algorithm to adapt

the lower layers of weights in a multilayer neural network. An example below shows

that the sigmoid activation function with different values of c.

Figure 5.3 Sigmoid activation function with different values of c

5.4 The Backpropagation (BP) Algorithm

The BP algorithm is perhaps the most popular and widely used neural

paradigm. The BP algorithm is based on the generalized delta rule. The BP

algorithm overcomes the limitations of the perceptron algorithm.

47

5.4.1 Learning Mode

Before the BP can be used, it requires target patterns or signals as it a

supervised learning algorithm. Training patterns are obtained from the samples of

the types of inputs to be given to the multilayer neural network and their answers are

identified by the researcher. The configuration for training a neural network using

the BP algorithm is shown in the figure below in which the training is done offline.

The objective is to minimize the error between the target and actual output and to

find ∆w. The error is calculated at every iteration and is backpropagated through the

layers of the ANN to adapt the weights. The weights are adapted such that the error

is minimized. Once the error has reached a justified minimum value, the training is

stopped, and the neural network is reconfigured in the recall mode to solve the task.

Figure 5.4 Configuration for training ANN using BP algorithm

5.4.2 The Generalized Delta Rule (G.D.R.)

The objective of the BP algorithm, like in other learning algorithms, is to find

the next value of the adaptation weights (∆w) which is also known as the

Generalized Delta Rule (G.D.R). Consider the following ANN model:

48

Figure 5.5 Typical ANN model

What is need is to obtain the following algorithm to adapt the weights between the

output (k) and hidden (j) layers:

( ) ( )TWOTW kjjkkj ∆+=+∆ αηδ1 G.D.R (5.2)

where the weights are adapted as follows:

( ) ( ) ( )11 +∆+=+ TWTWTW kjkjkj (5.3)

where T is the iteration number and δk is the error signal between the output and

hidden layers:

( )( )kkkkk OTOO −−= 1δ (5.4)

The adaptation between input (i) and hidden (j) layers as below:

( ) ( )TWOTW jijjji ∆+=+∆ αηδ1 G.D.R (5.5)

Then the new weight is thus become:

( ) ( ) ( )11 +∆+=+ TWTWTW jijiji (5.6)

49

where the error signal through layer j is:

( ) kj

k

kjjj WOO ∑−= δδ 1 (5.7)

Note that

( )

ji

j

jij

netjj

OWnet

enetfO

j

θ+=

+==

∑

−1

1

(5.8)

5.5 Summary of the BP Algorithm

The adaptation of the weights between output (k) and hidden (j) layers:

( ) ( ) ( )11 +∆+=+ TWTWTW kjkjkj

where ( ) ( )TWOTW kjjkkj ∆++∆ αηδ1

( )( )kkkkk OTOO −−= 1δ

And between the hidden (j) and input (i) layers:

( ) ( ) ( )11 +∆+=+ TWjiTWTW jiji

where ( ) ( )TWOtW jijjji ∆+=+∆ αηδ1

( )∑−=k

kjkjjj WOO δδ 1

50

Note that:

( )

( )k

j

netkk

kj

k

kjk

netjj

ji

j

jij

enetfO

OWnet

enetfO

OWnet

−

−

+=

+=

+=

+=

∑

∑

1

1

1

1

θ

θ

5.6 Parameters for Neural Network

Learning Rate: The learning rates can be uniform throughout the network,

or different for each layer or node. In general, it is difficult to determine the best

learning rate, but a useful rule of thumb is to make the learning rate for each node

inversely proportional to the average magnitude of vector feeding node. Many

schemes which adapt the learning rate as a function of the local curvature of the error

surface have been proposed [18]. The simplest approach and one that works quite

well in practice, is to add a momentum term of the form αm(wℓ,j,i(k) - wℓ,j,i(k-1)) to

each weight update, where 0< αm <1. This term makes the current search direction

an exponentially weighted average of past directions and helps keep the weights

moving across flat portions of the error surface after they have descended from steep

portion.

Hidden Layer Nodes: The optimum number of hidden layer nodes is

difficult to establish and is strongly dependent on the nature of the data. It is not

likely that this issue will be resolved for the general case, since each problem

demands different capabilities of the network. Having said this, choosing the proper

51

network size is important. If the network is too small, it will not capable of forming

a good model of the system. Conversely, if the network is too large, then it may be

too capable. That is, the network may be able to implement numerous solutions that

are consistent with the training data, but most of which are likely to be poor

approximations to the actual problem.

With little or no knowledge of the underlying details of the data, however,

one must determine the network size by trial and error. A methodical approach is

recommended. One may start with a very small network and gradually increase the

size until performance tends to level off, training each network independently.

Insofar as determining an upper bound on the number of hidden layer nodes, this

number should be much less than the number of training samples or the network will

simply “memorize” the training samples, resulting in poor generalization.

Stopping Criteria: The iterative process of computing the gradient and

adjusting the weights is continued until a minimum is found in the error surface (or a

point determined to be sufficiently close). Several measures are candidates for

stopping criteria. If the magnitude of the gradient falls below a chosen level, the

algorithm may be terminated, as this may indicate that the minimum is being

approached. Perhaps a more common stopping criterion is a lower threshold on the

sum squared error, J(w). This requires knowledge (or a decent estimate) of the

minimum value of J(w), which is not always available. One might consider stopping

when a chosen number of iterations have been performed.

Finally, the method of cross-validation can be used to monitor the

generalization performance during learning. This method entails splitting the data

into a training set and a test set. During learning, the performance of the network

upon the training set can only improve but the performance on the test data will only

improve to some point, beyond which it will start to degrade. It is at this point that

the network starts to overfit the data, and the algorithm should be terminated.

CHAPTER 6

METHODOLOGY

6.1 Introduction

This chapter provides an investigation of time-frequency technique together

with dimensionality reduction strategies and also Mel-frequency cepstrum coefficient

technique that are used to generate features for classification of the heart sound

signals. The performance of each case; feature extraction/dimensionality reduction

combination using TFD/SVD and feature extraction using MFCC will be examined

with respect to their statistical properties of the Neural Network parameters or

backpropagation algorithms. The feature sets under TFD are based upon time-

frequency domain features and the bilinear time-frequency transform that is the B-

distribution. While, the feature sets under MFCC are based on frequency domain

features and discrete cosine transform. The relative performance of each technique

will be compared, yielding a prescription of the best overall technique to extract

features of heart sound signals.

53

6.2 Methodology

Figure 6.1 Feature extraction techniques and verification

The intention of this work is to specify the most effective means of features

extraction technique of the heart sound signals. Figure 6.1 shows the related

procedure to achieve that objective proposed. This chapter will examine the

performance of the techniques approached based upon time-frequency domain; B-

Distribution and frequency domain; Mel-Frequency Cepstrum Coefficient. At the

OUTPUT:

TIME-FREQUENCY DOMAIN

B-DISTRIBUTION

VERIFICATION:

NEURAL NETWORK

OUTPUT: CEPSTRUM

COEFFICIENT

DATA:

HEART SOUNDS

SVD

MFCC

OUTPUT:

MFCC

PERFORMANCE

OUTPUT:

TFD

PERFORMANCE

54

final stage of this task, the neural network with multilayer perceptron network will be

employed. To complete all that tasks, it would require a number of significant data

that can demonstrate the extracted features techniques and relative performance

measure. There are two types of data used in this study:

i) Normal Heart Sounds:

These data were collected from the normal person where their heart

conduction system is in a well condition.

ii) Abnormal Heart Sounds:

Data are comprised of various irregularities of heart sound signals. They

were taken from the persons/patients that were suffering from various kinds

of diseases.

6.2.1 Time-Frequency Distribution Technique

Time-frequency representations (TFRs) combine time-domain and frequency

domain analyses to yield a potentially more revealing picture of the temporal

localization of a signal’s spectral characteristics. Time-frequency techniques are

broadly classified into two categories: Linear transforms and Quadratic (or bilinear)

transforms. As heart sound being a non-stationary signal of a relatively small time-

bandwidth product, classical methods or linear transforms methods for time-

frequency analysis, such as Short Time Fourier Transform is not capable of revealing

the time-frequency dynamics of heart sounds. Several bilinear transforms techniques

have been investigated to look at their suitability at identifying features in heart

sound signals. However, B-distribution has been chosen for this analysis technique

because it outperforms other distributions techniques. The steps involve in TFD

technique as below:

55

Figure 6.2 Technique using TFD/SVD

The general form of bilinear class of time-frequency distribution can be

defined as in equation below:

( ) ( ) ( ) τττ

τρ τπdvdudeuzuzvgft

fvtvuj −−

−

+= ∫∫∫

2

2*

2,, (6.1)

where g(ν, τ) is referred to as the kernel function and is used to define the properties

of the distribution and z(t) is the analytic signal associated with the real one. The

type of time-frequency distribution is actually determined by the choice of this kernel

function. The B-distribution can be derived using the types of time-lag kernel as in

equation (6.2).

Time-Frequency Domain

TFSA:

-B-Distribution

SVD

DATA:

Heart Sounds

-Normal

-Abnormal

VERIFICATION:

-Neural network

56

( )( )

βτ

τ

=

ttG

2cosh, (6.2)

Application dependent parameter β, ( 0 ≤ β ≤1 ), controls the sharpness of the cut-off

of the filter and hence the trade-off between the cross-terms elimination and the time-

frequency resolution. The Doppler-lag kernel is defined as the Fourier transform of

the time-lag kernel with respect to time t:

( ) ( ) dtetGvg vtj πττ 2,, −∫= (6.3)

( )

( ) ( )vjvj πβπββ

τβ

β−Γ+Γ

Γ=

−

2

2 12

(6.4)

Since in real-life analysis the practitioner is often required to deal with

discrete-time signals, therefore it has considered a discrete-time formulation of the

proposed TFD. Using (6.3) in the general formula (6.1), the proposed TFD can be

written as

( ) ( ) τττ

τρ τπdudeuzuztuGft

fj2

2*

2,, −

−

+−= ∫∫ (6.5)

The discrete-time version of the above expression is given by

( ) ( ) mfjM

Mm

M

Mp

em

pnzm

pnzmpGfnπρ 2

2*

2,, −

−= −=

−+

++= ∑ ∑ (6.6)

Here, the discrete-time expressions G(n,m) and z(n) are obtained by sampling G(t,τ)

and z(t) at a frequency fs = 1/T such that t = n⋅T and τ = m⋅T. The total signal

duration is Tz, with N being the total number of samples. For simplicity and without

loss of generality, assumed that T = 1. To derive the previous expression, the

commutative property of the convolution operation is needed to be used. An

interpolation of the analytic signal z(n) in (6.6) is necessary in order to obtain non-

57

integer values of the signal arguments. However, this interpolation can be avoided by

using a similar formulation as the one in [34], namely

( ) ( ) ( ) ( ) mfjM

Mm

M

Mp

empnzmpnzmpGfnπρ 2*,, −

−= −=

−+++= ∑ ∑ (6.7)

The resulting TFD in (6.7) is alias-free and periodic in f with a period equal to unity

[34]. Note that (2M+1) is the length of the rectangular window considered in the

implementation. From observation of the previous expression, the implementation

procedure of the proposed TFD requires three steps:

1) Formation of the quadratic quantity Kz(n,m) = z(n+m)⋅z*(n-m);

2) Discrete convolution in time n of Kz(n,m) with G(n,m);

3) Discrete Fourier transform (DFT), with respect to m, of the previous result.

6.2.2 SVD Implementation

In addition to an appropriate process, it is often essential that dimensionality

reduction be performed on the TFR, so that the information is sufficiently compact

for presentation to a classifier such as ANN. Although some dimensionality

reduction may occur as a result of transforming a signal into TFR, in general it does

not. Whereas, the goal of the TFR is to localize and concentrate the signal’s

information, it is the role of dimensionality reduction to optimally select or combine

the TFR coefficients. The main goal of SVD technique is to ensure that as much of

the relevant information as possible is preserved in as few dimensions as possible.

SVD techniques deal with singular matrices, or ones which are very close to

being singular. They are an extension of eigen decomposition to suit non-square

58

matrices. Any matrix may be decomposed into a set of characteristic eigenvector

pairs called the component factors, and their associated eigenvalues called the

singular values. The SVD equation for an (m x n) singular matrix X is:

X = USVT

(6.8)

where U is an (m x n) orthogonal matrix, V is an (n x n) orthogonal matrix and S is

an (m x n) diagonal matrix containing the singular values of X arranged in decreasing

order of magnitude. A vector in an orthogonal matrix can be expressed as a linear

combination of the other vectors. The equation for Singular Value Decomposition

needs to be further explained by showing the significance of the eigenvectors and the

singular values. Each of the orthogonal matrices is achieved by multiplying the

original (m x n) data matrix X by its transpose, once as XXT to get U, and once as X

TX

to get VT. The data is then available in square matrices, on which eigen

decomposition may be applied. This involves the repeated multiplication of a matrix

by itself, forcing its strongest feature to predominate. In SVD, this is done on both

the orthogonal matrices. The pairs of eigenvectors are the rows in U and the

columns in V. These are sorted in order of their strength; the respective singular

value acts as the required numerical indicator as further explained below.

The relationship between the singular values and the co-efficient matrix’s

eigenvalues:

XTX = (USV

T) T

(USVT) (6.9)

XTX = VS

TU

TUSV

T (6.10)

XTX = VS

TSV

T (6.11)

XTX = VS

2V

T (6.12)

S2 contains the eigenvalues of X

TX. Similarly, it may be shown that they are also the

eigenvalues of XXT. They may be arranged in decreasing order of magnitude:

λ1 ≥ λ2 ≥ λ3 ≥…….. ≥ λj ≥ 0 (6.13)

59

Singular values of a matrix X are defined as the square root of the corresponding

eigenvalues of the matrix XTX:

jj λσ = (6.14)

As with the eigenvalues, they are sorted according to their magnitude, and the

corresponding eigenvectors in the orthogonal matrices U and V follow the same

order. The SVD equation (6.8) takes the form:

X =

Some singular values are equal or very close to zero. A singular value σj describes

the importance of uj and vj , and when its value approaches zero it indicates that these

associated vectors are less significant.

To implement SVD using Matlab software, the SVD function was used to

process the dataset where:

[U,S,V] = svd(dataset);

This function produces a diagonal matrix S, of the same dimension as X and with

nonnegative diagonal elements in decreasing order, and unitary matrices U and V.

The first 2 or 3 columns matrix V were selected because these eigenvectors really

contain the most of the information among the samples.

60

6.2.3 Mel-Frequency Cepstrum Coefficient

Figure 6.3 Technique using MFCC

Figure 6.3 shows the method to extract features and to find the performance

accuracy with respect to the MFCC technique. The output to the MFCC is a signal

representation provided by a set of mel-frequency cepstrum coefficients. These

cepstrum coefficients are the result of a cosine transform of the real logarithm of the

short-time energy spectrum expressed on a Mel-frequency scale. In MFCC, the main

advantage is that it uses Mel-frequency scaling or Mel-frequency warping where the

high frequencies of the input signals are compressed since it is believed that the low

frequencies contain more useful information. The steps to compute MFCC are:

CHANNELS:

Heart Sounds

-Normal

-Abnormal

Cepstrum coefficients

EXTRACT

FEATURES:

MFCC

VERIFICATION:

-Neural network

61

Figure 6.4 Steps of computing MFCC

Preprocessing mainly includes framing, windowing and pre-emphasis. These

procedures are also known as the short-term analysis where the signals are divided

into short interval and they are examined over a short period of time, normally 20-30

milliseconds. Framing is used to cut the long-time signal to the short-time signal in

order to get relative stable frequency characteristics. Windowing is mainly to reduce

the aliasing effect, when cut the long signal to a short-time signal in frequency

domain. The windowing function that normally used is hamming window. Discrete

Fourier Transform (DFT) is used to convert the signal from time-domain to

frequency-domain. Therefore the analysis is based on the frequency-domain analysis

because some of the heart sound information is mainly depend on the frequency

characteristics of the signal, although the connection between them is complex.

The skill of frequency scaling is used to map linear frequency into Mel-

frequency scale which is linear below 1 kHz, and logarithmic above. Mel-scale

frequency analysis can be approximated by equation:

( ) ( )700/1log2595 10 ffMel += (6.15)

MFCC

Preprocessing:

Windowing

DFT

Mel-frequency

scaling/warping

Log Cepstrum:

Inverse DFT or Discrete

Cosine Transform

Windowed Frames

Mel

Spectrum Log Mel

Spectrum

Input

Signals

Magnitude

Spectrum

62

where f is the frequency based in hertz. Based on this assumption, the mel-frequency

cepstrum coefficient is proposed. The MFCC can be computed by the following

steps:

(1) The discrete Fourier transform (DFT) transforms the windowed signal

segment into the frequency domain. The real and imaginary components of the

short-term signal spectrum are squared and added to get the short-term power

spectrum P(f ) or magnitude spectrum.

[ ] [ ] NkjN

n

a enxkX/2

1

0

π−−

=∑= , 0 ≤ k ≤ N (6.16)

(2) The spectrum P(f ) is warped along its frequency axis f into the Mel-

frequency axis as P(M) where M is the Mel-frequency.

(3) Each resulted warped power spectrum is then convolved with the triangular

band-pass filter, P(M) into θ(M) and the results accumulated.

[ ] [ ]kHkXm j

N

k

aj

21

0

∑−

=

= 0 ≤ j ≤ p (6.17)

where Hj[k] is the transfer function of filter j.

(4) The MFCC is then the discrete cosine transform of the p filter outputs

computed as in Equation (6.18). Because the MFCC calculation compresses the

signal components into the lower dimensions, we often choose D<<N.

( ) ( )( )NjimNMFCCN

j

ji /5.0cos/21

−= ∑=

π , i=1…D (6.18)

63

6.2.4 Verification Technique

The focus of this work is not only upon the features extraction stage but also

upon the verification technique, where there are some important observations to be

made with respect to the neural algorithms. The representative network chosen is the

multilayer perceptron network because it is the simplest, most studied and most

widely-used of all neural network classifiers. The design of the signal verification

should be based on the training set. The separate test set can be employed to obtain

an unbiased estimate of the performance of the system thus designed. In order to

select design parameters with the intent of maximizing performance, the training set

must held back to form a validation set.

The learning algorithm which is almost always used to train the MLP is the

backpropagation algorithm, which is a stochastic approximation of the steepest

descent algorithm. The activation function is used to fire or unfire the neurons. One

of the most popular activation functions used is the logistic sigmoid function. For

better results, some of the neural network parameters are need to be taken into

consideration such as the learning rate, momentum term, hidden layer nodes and last

but not least the stopping criteria.

The learning rate and momentum coefficient must be set using the values

range between zeros to unity. The optimum number of hidden layer nodes is difficult

to establish and it is strongly dependent on the nature of the data. This can be test

out using trial and error method and the number should be much less than the number

of training sample. However, it should not be too small because it will not capable

of forming a good model system. Furthermore, it must not too large because it will

become too capable; numerous solutions that are consistent with the training data but

most likely to be poor approximations to the actual problem. The stopping criteria is

applied for computing the gradient and adjusting the weights until a minimum is

found in the error surface and it should be very minimum to achieve better

64

performance. The steps of backpropagation (BP) algorithm in the neural network

verification technique are as in figure 6.5:

Figure 6.5 Steps of backpropagation algorithm

NO

YES

Obtain a set of training patterns

Set up neural network model:

e.g. No. of Input neurons, Hidden neurons,

and Output Neurons.

Set learning rate η and momentum rate αm

Initialize all connection Wji, Wkj and bias

weights θj θk to random values

Set minimum error, Emin

Start training by applying input patterns

one at a time and propagate through

the layers then calculate total error.

Backpropagate error through hidden and

input layer and adapt weights

Backpropagate error through output and

hidden layer and adapt weights

Check if Error < Emin

Stop training

CHAPTER 7

RESULTS AND DISCUSSION

7.1 Introduction

The simulation of heart sound signals are done using some normal signals

from normal people and abnormal signals from some patients that suffer from

various kinds of diseases. The simulation is first tested using a technique known as

time-frequency distribution that involved B-Distribution technique to transform the

signals from time domain into time-frequency domain. After that, it is continued by

feature extraction procedure using Singular Value Decomposition. The second part

is using another technique which is known as the Mel-Frequency Cepstrum

Coefficient. It also used transformation technique but the signals are only

transformed from time domain into frequency domain. After features extraction, the

results are further simulating using neural network for verification. The most

important results from neural network simulations are the performances efficiency

contributed by both techniques. All simulations results from each of steps involved

are shown in Figure 7.1 until Figure 7.32.

66

7.2 Time-Frequency Distribution

Figure 7.1 until figure 7.12 show the results of the normal and abnormal heart

sounds in time-frequency domain. The raw data signals are transformed according to

bilinear transformation using B-distribution with smoothing parameter, β=0.01.

Simulations have shown that β = 0.01 gives visually most appealing results for

various multicomponent signals. It was found that such defined B-distribution

outperforms other existing distributions in terms of time-frequency resolution, as

well as cross-terms suppression, when used to represent signals with closely-spaced

components in the time-frequency domain.

7.2.1 B-Distribution of Normal Heart Sounds

Figure 7.1 Normal heart sound; e.g.1

67

Figure 7.2 Normal heart sound; e.g.2

In these examples (Figure 7.1 and Figure 7.2), the heart sound signals of

different length, N but same sampling frequency, fs (4000 Hz) are analyzed using B-

distribution with smoothing parameter, β = 0.01. These examples are resulted from

analysis of normal heart sounds taken from different healthy people. The rationale

behind the choice of this particular distribution is that it constitutes the two extremes

of the quadratic class in the sense of time-frequency resolution and cross-term

suppression that is, B-distribution is chosen because of its very high resolution in the

time-frequency domain in addition to maximum cross term reduction or suppression

between the closely-spaced components.

68

7.2.2 B-Distribution of Abnormal Heart Sounds

Figure 7.3 Abnormal heart sound; e.g.1


69



70

In these examples (Figure 7.3 until Figure 7.6), the heart sound signals of

different length, N with the same sampling frequency, fs (4000 Hz) are analyzed

using B-distribution with smoothing parameter, β = 0.01. These examples are

resulted from analysis of abnormal heart sounds taken from unhealthy people that

suffered from various kinds of diseases. Figure 7.3 is an example of the abnormality

due to the mitral regurgitation, Figure 7.4 is happened due to the pericardiatis, Figure

7.5 is caused by aortic regurgitation and Figure 7.6 is caused by hypertention. Their

patterns are slightly different from the normal heart sound in terms of their

components where they have some additional components in the signal. These extra

components are actually the abnormalities of the heart sounds or sometimes called

the murmurs.

7.2.3 SVD of Normal Heart Sounds

Figure 7.7 SVD of normal heart sound; e.g.1

71

Figure 7.8 SVD of normal heart sound; e.g.2

7.2.4 SVD of Abnormal Heart Sounds

Figure 7.9 SVD of abnormal heart sound; e.g.1

72

Figure 7.10 SVD of abnormal heart sound; e.g.2

Figure 7.7 until 7.10 illustrate plots that are derived from applying SVD to

normal and abnormal heart sounds data set. To perform the SVD, the data is pre-

processed by transforming into time-frequency domain using B-distribution. SVD is

always essential when time-frequency representations are used as a feature basis

because most of TFRs comes at the expense of high dimensionality. The plots in all

figures are the results from the first two columns of matrix V. These are because the

first two eigenvectors really contain the most of the information among the samples

as well as capture the similarity among the samples from the same class. The plots

reveal interesting patterns in the data. A pattern in the first two eigenvectors

normally shows the pattern with cyclic structure. Neither eigenvector is exactly like

the underlying sine or exponential pattern; each such pattern, however, is closely

approximated by a linear combination of the eigenvectors to form biologically

meaningful signals. The figures show that the plots of the abnormal heart sounds do

not give the smooth curve lines as compared to the normal plots. This eigenvectors

are dominated by high-frequency components that can only come from the abnormal

sounds or murmurs produced by the heart of unhealthy people.

73


Figure 7.11 until figure 7.16 show the Mel-Frequency Cepstrum Coefficient

of normal and abnormal heart sounds. The mel-frequency scale is understood as

linear frequency spacing below 1,000 Hz and a logarithmic spacing above 1,000 Hz,

so filters spaced linearly at low frequencies and logarithmically at high frequencies

have been used to capture the important characteristics of heart sound signals. Here

is the common used formula to approximately reflex the relation between the mel-

frequency and the physical frequency:

(7.1)

7.3.1 MFCC of Normal Heart Sounds

Figure 7.11 MFCC of normal heart sound; e.g.1

( )

+=

7001log2595 10

ffMel

74



Figure 7.11 until Figure 7.13 show the results of the Mel-Frequency cepstrum

coefficients technique that applied to the normal heart sound. In this experiment, it is

used feature vectors composed from 12 lowest Mel-frequency cepstral coefficients

75

(MFCC) computed using about 27 mel-spaced filters. However, these figures only

display the cepstrum coefficient 1 for viewing. The 0th coefficient was excluded

because it carries a little of signal specific information. Analysis frame was

windowed by Hamming window with some overlapping.

7.3.2 MFCC of Abnormal Heart Sounds

Figure 7.14 MFCC of abnormal heart sound; e.g.1

76



Figure 7.14 until Figure 7.16 are the results analysis of the abnormal heart

sound using MFCC. These figures only stated the cepstrum coefficient 1 for

viewing. In this analysis, the hearts sound signals are analyzed to extract the more

77

important features including in time domain. The heart sound wave is also filtered to

eliminate irrelevant or undesired information. Before transforming the heart sound

signals into its power spectrum, it must be divided into overlapping blocks. In this

work, the blocks consist in Hamming windows. Figure 7.14 is an example of the

abnormality due to mitral regurgitation, Figure 7.15 is happened due to aortic

regurgitation and Figure 7.16 is caused by ventricular septal defect and mitral

regurgitation.

7.4 Neural Network

Figure 7.17 until Figure 7.22 show the verification results of normal and

abnormal heart sounds after extracted features using TFD and MFCC. The minimum

error, Emin must be as small as possible. Therefore, this neural network simulation

technique has used the Emin=0.01 for better performances.

78

7.4.1 Verification of TFD

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50 55

Hidden Neurons

Perf

orm

an

ce (%

)

Figure 7.17 Verification of TFD ( number of hidden neurons )

0

10

20

30

40

50

60

70

80

90

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Learning Rate(lr)

Pe

rfo

rma

nc

e (

%)

Figure 7.18 Verification of TFD ( learning rate )

79

0

10

20

30

40

50

60

70

80

90

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Momentum Coefficient (mc)

Perf

orm

ance (%

)

Figure 7.19 Verification of TFD ( momentum coefficient )

Figure 7.17 until Figure 7.19 shows the results of verification

technique using TFD. They showed that the highest performance they can achieved

is up to 90% at hidden nodes = 20, learning rate = 0.05 and momentum coefficient =

0.1. Using too large or too small value of hidden nodes will give less performance

accuracy. The learning rate and momentum coefficient values should lies in between

zero and unity but the values cannot be too large because they will give a poor

performance. If the learning rate is set too high, the algorithm may oscillate and

become unstable. If the learning rate is too small, the algorithm will take too long to

converge. At each epoch, new weights and biases are calculated using the current

learning rate and momentum coefficient.

80

7.4.2 Verification of MFCC

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35 40 45 50 55

Hidden Neurons

Pe

rfo

rman

ce (

%)

Figure 7.20 Verification of MFCC ( number of hidden neurons )

0

10

20

30

40

50

60

70

80

90

0 0.05 0.1 0.15 0.2 0.25

Learning Rate (lr)

Perf

orm

an

ce (%

)

Figure 7.21 Verification of MFCC ( learning rate )

81

0

10

20

30

40

50

60

70

80

90

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Momentum Coefficient (mc)

Perf

orm

ance (%

)

Figure 7.22 Verification of MFCC ( momentum coefficient )

Figure 7.20 until Figure 7.22 shows the results of verification technique using

MFCC. They showed that the highest performance they can achieved is only up to

80% at hidden nodes = 20, learning rate = 0.05 and momentum coefficient = 0.1.

Using too large or too small value of hidden nodes will also give less performance

accuracy. The learning rate and momentum coefficient values should lies in between

zero and unity but the values also cannot be too large because they will affect the

performance.

82

7.5 Discussion

Table 7.1: Performance comparison between the TFD and MFCC

TFD

(B-DISTRIBUTION)

MFCC

INPUT TRAINING

295x50

345x50

INPUT TESTING

295x50

345x50

HIDDEN LAYER

(NEURONS)

20

20

LEARNING RATE (LR)

0.05

0.05

MOMENTUM

COEFFICIENT (MC)

0.1

0.1

PERFORMANCE

90.0%

80.0%

Performances of both techniques are depending on the specification of hidden

layer, learning rate and also momentum coefficient. Hidden layer will give the

number of hidden neurons for the network that gathers its weighted inputs and bias to

form its own scalar output. Note that it is common for the number of inputs to a

layer to be different from the number of neurons. A layer is not constrained to have

the number of its inputs equal to the number of its neurons. However, the number of

hidden neurons should not be exceeded the number of input. Furthermore, for a

better performance, hidden neurons should not be too high or too low. If such cases

happened, then the performances will slowly decreasing.

83

Learning rate is another factor that affected the efficiency of the techniques

used. The performances of both techniques are very sensitive to the proper setting of

the learning rate. Learning rate values normally in the range between 0 and 1.

However, if the learning rate is set too high, the algorithm of the extracted features

may oscillate and become unstable. If the learning rate is too small, the algorithm

will take too long to converge but still can achieve the minimum error, Emin. It is not

practical to determine the optimal setting for the learning rate before training, and, in

fact, the optimal learning rate changes during the training process, as the algorithm

moves across the performance surface. The performance of the algorithm can be

improved by allow the learning rate to change during the training process. The

learning rate chosen in these experiments are the optimum values for faster

convergence and for achieving the target.

Momentum coefficient is used to overcome the network stuck at the local

minimum of the error. Momentum allows a network to respond not only to the local

gradient, but also to recent trends in the error surface. Acting like a low-pass filter,

momentum allows the network to ignore small features in the error surface. Without

momentum a network may get stuck in a shallow local minimum. With momentum a

network can slide through such a minimum. The momentum coefficient, mc, can be

any number between 0 and 1. Both techniques proposed in this project had the

optimum mc values that satisfy the overall performance efficiency. The most

important thing is, never set the mc too high to get a better performance.

CHAPTER 8

CONCLUSIONS AND RECOMMENDATION

8.1 Conclusions

Heart sounds and murmurs are time-varying signals where they exhibit some

degree of non-stationary. Both techniques approached in this project are suitable for

analyzing and extracting the features of these non-stationary signals. This project is

actually proposed B-Distribution to perform the Time-Frequency Distribution

technique in order to resolve signal’s components in the time-frequency domain.

Another proposed technique is the Mel-Frequency Cepstrum Coefficient to obtain

the cepstrums coefficients that were resolving signal’s components in the frequency

domain. The important reason of choosing both mentioned techniques in this project

is because they are outperformed their own classes compared to others for example,

B-distribution is outperformed the other quadratic TFDs for signals with components

closely-spaced in the time-frequency plane. Other important points of using B-

Distribution are because it achieves a better time-frequency resolution and a better

energy concentration around the instantaneous frequency of the signal and still

significantly suppresses the cross-terms. This distribution is also satisfies most of the

desirable properties sought for a time-frequency distribution. The reasons of using

MFCC as the feature extraction technique are because it is good in error reduction

during the analyzing of the signals, it also helps to produce more robust features

85

when the input is affected by the noises or interferences and another advantage is it

performs more precise characterizing that can improve the performance.

The time-frequency algorithms can be divided into two categories which are

linear and bilinear (quadratic). Even though the linear time-frequency distributions

(TFD) map a signal between the time domain and the time-frequency domain, but the

bilinear transforms are even more powerful and useful for real-time applications

rather than the linear TFD. Therefore, in this paper, the revision about fundamental

concepts of the bilinear time-frequency distribution with the reducing interferences

distribution based on the B-distribution kernel was proposed. The kernel has been

defined in both the time-lag and the Doppler-lag domain, and some of important

properties that it satisfies have been presented. In addition, the singular value

decomposition (SVD) was introduced as a method to identify the pattern relationship

in data and to compress the data by reducing the number of dimensions without

much loss of information. The SVD was also introduced as a method which directly

decomposes a non-square matrix. This enables decomposition of a TFD without the

need for covariance representation. This has the advantage of increasing the

dynamic range of the decomposition. In addition to the advantage of rectangular

decomposition, the SVD also provides a characterization of the vector spaces

outlined above. This enables us to identify features for the row-space and column

space of a TFD as well as enabling us to ignore the null-space components.

Furthermore, the revision about fundamental principles of the second technique

(Mel-Frequency Cepstrum Coefficient) is presented in this paper with briefly

explained the related processing steps which leaded to this technique. It starts from

the basic definitions used in DSP which related to the processing part and then

followed by the explanations of feature extraction steps with a type of feature named,

MFCC. MFCC vector is a representation provided by a set of cepstrum coefficients

that are the results of a cosine transform of the real logarithm of the short-term

energy spectrum expressed on a mel-frequency scale. It actually has the ability to

capture the important characteristics of the signals. The difference from the real

cepstrum is that a non-linear frequency scale, called mel-scale, is used for

86

approximation. The cepstrum is one homomorphic transformation that allows the

separation of the source from the filter. The cepstrum of a signal segment can be

computed by windowing the signal.

In conclusion, this project has developed a work for investigating the

performances of the above two techniques by considering artificial neural network

analysis. It compares the performance accuracy of TFD method and MFCC method

with regard to the features extraction of heart sound signals. The performance result

obtained using MFCC technique was slightly lower than the TFD. This may be

attributed to the features used. Whereby, TFD is an approach that widely applied in

heart sounds analysis and resulted in high performance accuracy. Therefore, the

TFD technique is still chosen as the best technique to analyze and to extract features

of the non-stationary signals such as the heart sounds signals.

8.2 Recommendation

Basically, the MFCC technique gives slightly lower performance rather than

the TFD technique. Noticed that if the MFCC is being modified, its may be able to

perform well and increase the performance accuracy. Therefore, this modification is

recommended for future work. The steps involve in modified MFCC technique will

be further define in the next section.

87

Input signal

8.2.1 Modified MFCC

The method described here is based on both of MFCC and the masking effect

to modify MFCC in order to improve robustness as well as performance accuracy of

signals recognition system. The basic structure includes Preprocessing and DFT,

Masking Threshold, Decimation, Mel-frequency scale, Cepstrum. It’s shown in

figure 8.1.

Figure 8.1 Basic structure of modified MFCC

Modified MFCC

Preprocessing:

Windowing

DFT

Masking threshold

Decimation

Mel-frequency scaling

Log

Cepstrum

88

The differences between Modified-MFCC and original MFCC are the part of

Masking-Threshold and Decimation. The basic idea of these two parts is:

1) After the step DFT, the peak value of spectrum in each critical band is

calculated. According to the empirical formula of noise masking tone (Here is

assume that the peak value is tone), the masking threshold of the peak value is

determined. The final masking threshold would consider both of the masking

threshold and absolute threshold of hearing. The larger one between them is selected

for modifying spectrum.

2) After getting the threshold, the spectrum of signal is compared with the value

of threshold. The spectrum whose value is less than threshold is deleted. And the

new spectrum is obtained after deleting.

Finally, the same method used in MFCC is used to get the feature of the

signal. It throws away the useless spectrums. The effect of noise that is under the

threshold is ignored and then the robustness of system is improved.

89

REFERENCES

1. Abdi, H. (1994). A Neural Network Primer. Journal of Biological Systems.

2(3): 247-283.

2. Boashash, B. (1992). Time Frequency Signal Analysis: Methods and

Applications. Melbourne, Australia: Longman Cheshire.

3. Boashash, B. and Sucic, V. (2000). A Resolution Performance Measure for

Quadratic Time-Frequency Distributions. Proceedings of the Tenth IEEE

Workshop on Statistical Signal and Array Processing. August 14 -16. 584 -588.

4. Barkat B. and Boashash B. (2001). A High-Resolution Quadratic Time–

Frequency Distribution for Multicomponent Signals Analysis. IEEE

Transactions on Signal Processing. 49(10). October 2001. 2232-2239.

5. Boashash, B. (2003). Time Frequency Signal Analysis and Processing: A

Comprehensive Reference. Oxford, UK: Elsevier.

6. Bahoura, M. and Pelletier, C. (2004). Respiratory Sounds Classification using

Cepstral Analysis and Gaussian Mixture Models. Proceedings of the 26th

Annual International Conference of the IEEE EMBS. September 1-5. San

Francisco:IEEE. 9-12.

7. Cohen L. (1989). Time-Frequency Distributions: A Review. Proceedings of the

IEEE. 77(7). July 1989. 941-981.

8. Davis, S. B. and Mermelstein, P. (1980) Comparison of Parametric

Representations for Monosyllabic Word Recognition in Continuously Spoken

Sentences. IEEE Transactions on Acoustics, Speech and Signal Processing.

28(4). August 1980. 357-366.

9. Deller, J. R., Hansen, J. H. L. and Proakis, J. G. (2000). Discrete-Time

Processing of Speech Signals. Wiley-IEEE Press.

90

10. Daliman, S. and Sha’ameri, A. Z. (2003). Time-Frequency Analysis of Heart

Sounds using Windowed and Smooth Windowed Wigner-Ville Distribution.

Proceedings Seventh International Symposium on Signal Processing and its

Application. 2. July 1-4. 625-626.

11. Daliman, S. and Sha’ameri, A. Z. (2003). Time Frequency Analysis of Heart

Sounds and Murmurs. Proceedings of the 2003 Joint Conference of the Fourth

International Conference on Information, Communications and Signal

Processing 2003 and The Fourth Pasific Rim Conference on Multimedia.

December 15-18. Singapore:IEEE. 840-843.

12. Gradshteyn, I. S. and Ryzhik, I. M. (1980). Tables of Integrals, Series and

Products. Academic Press.

13. Garcia, J. O and Garcia, C. A. R. (2003). Mel-Frequency Cepstrum

Coefficients Extraction from Infant Cry for Classification of Normal and

Pathological Cry with Feed-Forward Neural Networks. Proceedings of the

International Joint Conference on Neural Network. July 20-24. 3140-3145.

14. Haykin, S. (1991). Advances in Spectrum Analysis and Array Processing.

Prentice Hall. 418-517.

15. Hlawatsch, F. and Boudreaux-Bartels, G. F. (1992). Linear and Quadratic

Time-Frequency Signal Representations. IEEE Signal Processing Magazine.

9(2). April 1992. 21-67.

16. Imai S. (1983). Cepstral Analysis Synthesis on the Mel Frequency Scale. IEEE

International Conference on ICASSP ’83, Acoustics, Speech and Signal

Conference. April 1983. 93-96.

17. Jozef Wartak, M.D. (1972). Phonocardiology: Integrated Study of Heart Sound

and Murmurs, Harper and Row Publisher.

18. Jacobs, R. A. (1988). Increased Rates of Convergence Through Learning Rate

Adaptation, Neural Network. 1(4). 295-308.

19. Lin, Z. (1997). An Introduction to Time-Frequency Signal Analysis.

EmeraldFulltext. 17 (1): 46-53.

20. Molau, S., Pitz, M., Schluter, R. and Ney, H. (2001). Computing Mel-

Frequency Cepstral Coefficients on the Power Spectrum. Proceedings (ICASSP

’01) 2001 IEEE International Conference on Acoustics, Speech and Signal

Processing. May 7-11. 73-76.

91

21. Mak, B. (2002). A Mathematical Relationship between Full-Band and

Multiband Mel-Frequency Cepstral Coefficients. IEEE Signal Processing

Letters. 9(8). August 2002. 241- 244.

22. Malarvili, MB. (2003). Heart Sound Segmentation Based on Instantaneous

Energy of Electrocardiogram. Computers in Cardiology. 327-330.

23. Proakis, J. G. and Manolakis, D. G. (1992). Digital Signal Processing,

Principles, Algorithms, and Applications. New York. Macmillan Publishing

Company.

24. Picone, J. W. (1993). Signal Modeling Techniques in Speech Recognition.

Proceedings of the IEEE. 81(9). September 1993. 1215-1247.

25. Sucic, V., Barkat B. and Boashash, B. (1999). Performance Evaluation of the

B-Distribution. Fifth International Symposium on Signal Processing and its

Applications. August 22-25. Brisbane, Australia: IEEE, 267-270.

26. Smith, S. W. (1999). The scientist and Engineer’s Guide to Digital

SignalProcessing. California Technical Publishing. http://www.dspguide.com

27. Sha’ameri, A. Z. and Salleh, S. H. S. (2000). Window Width Estimation and

the Application of the Windowed Wigner-Ville Distribution in the Analysis of

Heart Sounds and Murmurs. Proceedings of the TENCON 2000. September 24

-27. 114 -119.

28. Upadhyaya, B. R. and Yan, W. Hybrid Digital Signal Processing and Neural

Networks for Automated Diagnostics using NDE Methods. Master Thesis.

University of Tennessee. 1993.

29. Wood, J. C., Buda, A. J. and Barry D. T. (1992). Time-Frequency Transforms:

A New Approach to First Heart Sound Frequency Dynamics. IEEE

Transactions on Biomedical Engineering. 39(7). July 1992. 730-740.

30. White, P. R., Collis, W. B. and Salmon, A. P. (1996). Analysing Heart

Murmurs using Time-frequency Methods. Proceedings of the IEEE-SP

International Symposium on Time-Frequency and Time-Scale Analysis. June

18-21. 385-388.

31. Wall, M. E., Rechtsteiner, A. and Rocha, L. M. (2003). Singular Value

Decomposition and Principal Component Analysis. in D. P. Berrar, W.

Dubitzky, and M. Granzow, A Practical Approach to Microarray Data

Analysis, Kluwer: Norwell. 91-109.

92

32. Zheng, F. and Zhang G. (2000). Integrating the Energy Information into

MFCC. International Conference on Spoken Language Processing. October

16-20. Beijing, China: IEEE . I-389-292.

33. Zhang, L., Marron, J. S., Shen, H. and Zhu, Z. (2005). Singular Value

Decomposition and its Visualization. 1-19.

34. Jeong, J. and Williams, W. J. (1992). Alias-free generalized discrete-time time-

frequency distributions. IEEE Transactions on Signal Processing, 40(11).

November 1992. 2757–2765.

FEATURE EXTRACTION FROM HEART SOUND 31-12-2010

Documents

Transcript of FEATURE EXTRACTION FROM HEART SOUND 31-12-2010