[IEEE 2012 5th International Symposium on Communications, Control and Signal Processing (ISCCSP) -...

VOCAL FOLDS PARALYSIS CLASIFFICATION USING FLDA AND PCA ALGORITHMS SUPORTED BY AN ADAPTED BLOCK MATCHING ALGORITHM

Amaia Méndez Zorrilla, Eneko Lopetegui Alba, Begoña García Zapirain

DeustoTech Institute of Technology. DeustoTech-LIFE Unit. University of Deusto, Bilbao, Spain

{amaia.mendez, eneko.lopetegui, mbgarciazapi}@deusto.es

ABSTRACT Movement study in vocal folds recordings is basic to detect pathologies related to the movement, and specially the vocal folds paralysis. This approach involves four process steps: 1) a pre-processing stage 2) the analysis of the image textures applying Gabor filtering for the segmentation of the glottal area, 3) an adapted block matching algorithm using the Exhaustive Search method, and 4) classification using FLDA (Fisher’s Linear Discriminant Analysis) and PCA (Principal Component Analysis) techniques. The adaptation of the block matching algorithm is made due to the heterogeneous nature of the ROI of each frame of the video sequence. The results show that our proposal works correctly to detect automatically vocal folds with paralysis and to distinguish them from healthy or pathological vocal folds with accuracy over the 95%. There is also shown the classification of the correct pathology over the 65% of the cases.

Index Terms— FLDA, PCA, Vocal Folds Paralysis, Block Matching

1. INTRODUCTION According to the National Statistics Institute, in Spain in Spain 5% of the population have diseases or disorders of

the voice [1]. The scientific community accepts two methods to evaluate and register the movement and characteristics of the vocal folds: high speed video recordings [2], and low speed recordings illuminated with a stroboscopic light. The high speed video recording allows the acquisition of stills of the vocal folds with a frame rate over 2000 pictures per second, while low speed (videostroboscopic) recordings rate is between 25 and 50 frames per second. The authors in this paper use the second method due to the extended utilization of it among the specialists. The quality of the images is the main problem to analyze them and to apply image processing algorithms because they have several problems such as: the rotation of the camera, the side movements of the laryngoscope and the movements of the patient during the recording. Another source of

variability is given by the degree of illumination that depends on the instrumentation used. According to the frames’ motion, we can find many algorithms to study the movement in various contexts [3]. The algorithms used in mage codification and compression (such as MPEG standards) are the base of this study [4], but all of them have to be adapted to apply in vocal folds images. The main objective is to classify the vocal folds’ pathologies and distinguish paralyzed (unilateral or bilateral) and healthy vocal folds establishing a method based on two steps Classifier using FLDA and PCA. The specific aims of this study are the followings:

o To process the video sequences from a commercial database: “Laryngeal Video stroboscopic Images (Dr. Wendy LeBorgne; Plural Publishing)”. The selected recordings contain healthy vocal folds sequences and sequences with polyps, nodules or cyst. All of them are from adult patients.

o To segment the glottal space without user interaction.

o To apply the block matching methodology establishing correctly the search window and macro block size.

o To distinguish in a reliable and robust way paralyzed and healthy vocal cords (or with pathologies not related to the movement).

This paper is divided in the following sections: Section 2 describes the methods and the proposed design and Sections 3 presents the results and Section 4 the project’s conclusions.

2. METHODS AND PROPOSED SYSTEM This section is divided in two: the background of the main methods used in this research and the proposed system. 2.1 Background A) Block Matching Algorithms Block Matching algorithms are widely used for motion estimation and video compression, and are also

Proceedings of the 5th International Symposium on Communications, Control and Signal Processing, ISCCSP 2012, Rome, Italy, 2-4 May 2012

978-1-4673-0276-0/12/$31.00 ©2012 IEEE

implemented in various standards: MPEG1/MPEG2/MPEG4 or H.261/H.263 [4-5]. Block Matching techniques are the most popular and efficient of the various motion estimation techniques [6]. In this research, the authors try to detect and measure the movement, not to predict it. The proposed block matching algorithm is not used in the whole image, but only along the vocal chords. Due to this reasons the best option is the Exhaustive Search (ES) Algorithm. In the literature, we can find some modifications of this algorithm applied in different areas [7-8]. B) Gabor filtering Gabor filters [9] are bandpass filters which are used in image processing for feature extraction and texture analysis. These filters will provide the capacity necessary to highlight an image’s features as regards as certain orientation and frequency. However, nothing more than information on a single orientation or frequency can be obtained by a Gabor filter. Therefore, it will be necessary to build up a Gabor filter bank, in which all the different orientations and frequencies are combined in such a way that, in the following step, all the necessary information can be extracted so as to be able to compare two texture blocks with a certain guarantee that all the image’s strands are being taken into consideration. The Gabor filter bank is defined in the equation below:

, , , = + − (1)

Where:

= ∙ , = ∙ = √−1

x and y are the coordinates of the image’s pixels in the range (-x/2, x/2) y (-y/2, y/2). The pass-band filter’s central

frequency is f, the spatial orientation! , and the parameter

determines the filter band’s width. C) Fisher’s Linear Discriminant Analysis (FLDA) FLDA is a method used in statistic, pattern recognition and machine learning. The aim of the algorithm is to construct a feature vector to project on it the data of the two or more classes. This is done to extract features of the projected data, and based on these new features one could differentiate the classes correctly. For a two class separation the following expression is used [10]:

=

= ∙

∙ ∑ ∑ ∙ (2)

Where: are the classes’ means

∑ ∑ are the classes’ covariances.

So, the maximum separation occurs when:

= ∑ + ∑ − (3)

Figure 1. Proposed System

C) Principal Component Analysis Classification (PCA) To carry out Pathologies classification, the Principal Component Analysis (PCA) –also known as the Eigenfaces Technique- has been chosen. This technique comes from multivariable data analysis and its development started in 1901 in a different context [11]. Data projection over Principal Component subspace is also named Hotelling Transform or Karhunen-Loéve (KLT) Transform, according to the literature in [12-14]. This method has been very useful in data recognition (especially in faces) as can be seen in [15]. In this research work, it is going to be oriented towards vocal fold pathology classification. 2.2 Proposed System The proposed system block diagram can be seen in Figure 1, and it is composed by four main stages: Pre-processing, Segmentation, Block Matching and two Classification steps. A) Pre-Processing Stage First step is to extract all frames from the video in order to work with them independently. Up to now, we have worked with gray scale images to improve the algorithm simplicity and efficiency. By converting the image to grayscale, the problems with brightness and illumination are not resolved; even in a video sequence we can find frames with different characteristics of these parameters. Because of that it is essential to equalize them in order to obtain a uniform histogram in each of them [16].

B) Glottal Space Segmentation The second stage, gradient image of greyscale and previously equalized frame is calculated. The method used to apply the gradient has been Sobel Operator [16]. Before Gabor filtering technique, a basic morphological transformation is applied. It is Erosion, following the definition “A pixel will have the value 1 in processing image if this pixel and all set of its neighbours are worth 1 in the original image”, and the literature [17]. The result is the input of Gabor Filter Bank providing the capacity to highlight, with a certain orientation and frequency, the characteristics of the image to be studied. The orientation applied in the previously explained Filter is θ = Л/3.

"

Figure 2. Block-Matching Stage. Overview

The segmentation is the first step to objectify vocal folds characteristics (in this case movement features) using image processing techniques. C) Block Matching The general purpose of a block matching algorithm is to find a matching block (called Macro Block) from a frame k in some other frame (k-1 for this study), applying a similarity criteria and finding the maximum likelihood for the Macro Block in a defined Search Window (See Figure 2). For this study several changes has been made to this general algorithm. C1. Adapted MBSize & Search Window Size The input of this block is the images with glottal space segmentation. MBSize (Macro Block Size) depends on the amplitude of the glottal space at different positions along cords (see Figure 3). The main goal is to obtain the point of interest in the center of the macro block.

Figure 3. Sample Image. Search Window in Green, and

Macro Block in Blue

C2. Similarity Criteria The chosen cost function is Mean Squared Error (MSE) in (4).

= ∑ ∑

− (4)

Being N the side of the macro block, Cij an Rij are the pixels being compared in current block and reference macro block, respectively.

C3. Exhaustive Block Matching It has been chosen the exhaustive Block Matching because the algorithm has been evaluated in only two points, which is the approximately 2% of the image. So, the waste of time that affects to the Block Matching technique is very low to take it into account. Discarding the time consumption, the “Exhaustive Block Matching Search Technique” is the best Block Matching technique. This step is used in several points along the vocal cords in order to have a reliable movement (see Figures 4-5). D) Classification D1. PCA In this case, we have the original frames and the corresponding mask as input and the pre-diagnosis is the system result. For the Principal Component Analysis to work correctly, all images must be on a comparable scale and have the same orientation. For this purpose, two points of interest will be taken: the interior vertex of the vocal folds and the vocal cavity’s centre of gravity. In this case, the authors cut the images taking into account the mentioned points, and we have evaluated left and right fold, independently.The main modules of the PCA classificatory are described below: Feature extractor module. After performing some pre-processing, the vocal folds’ image is presented to the feature extraction module in order to find the key features that are going to be used for classification. In other words, this module is responsible for composing a feature vector that is good enough to represent the vocal fold image. Training Set. Training sets are used during the "learning phase" recognition process. The feature extraction and the classification modules adjust their parameters in order to achieve optimum recognition performance by making use of training sets. Our final Training Set (we test others, see Table 1) is composed of 25*2 (left and right fold, independently) images: 5 are healthy vocal folds images, 6 with polyps, 5 with nodules, 5 with cysts, 2 with oedema and 2 with paralysis. All of them have been pre-processed and standardized previously, as well. In this first classification stage, the main objective is to distinguish between vocal folds with morphological pathologies and without them.

D2. FLDA The second step is made using the FLDA. To establish the parameters of the classifier, it has to be trained with several motion vectors to distinguish between paralyzed and non-paralyzed vocal cords. Ones the features are set, the testing data can be introduced into the classifier.

3. RESULTS

The proposed algorithm has been tested with 362 images from different videos. All the images have a resolution of 360x288 pixels and all of them belong to a commercial video and image database “Laryngeal Videostroboscopic Images (Dr. Wendy LeBorgne; Plural Publishing). The percentage of frames correctly segmented is 95.9%.

Figure 4. Motion Vectors in a Sequence with paralysis

Figure 5. Motion Vectors in a sequence without

paralysis

In the Figure 4, the motion vectors are shown along the vocal chords (in red) and their summation (in blue) of paralyzed vocal chords. It can be seen that they still move because the paralysis is not complete. Besides, in the Figure 5 it is shown healthy vocal chords, with a more synchronous movement. In the PCA classification, the number of images contained in the Training Set was the most important decision. It was decided to use 16, 18 and 25 images (*2 means that the authors process independently right and left fold in this stage, see table 1) in this research (not many due to the process time).

Number of images in

Training Set

Number images in Test

% of images classified with

the correct pathology

% of images correctly

classified (healthy or pathological)

25 *2 330 *2 65% 95% 18*2 330*2 64% 90% 16*2 330*2 37% 80%

Table 1. First step classification results in %

We will now compare the obtained results with others obtained using a larger Training Set. As can be seen, not all the images available were used as those of very low quality were eliminated, as well as those that are practically closed. The results can be seen in table 1. The motion vectors are the input of the second classifier, for the training step and for the classification step also. For this classifier, the amount of training set has to be at least of 15 frames of paralyzed vocal folds and non-paralyzed vocal folds, because there are only analyzed those images that the first classification has tagged as “without morphological pathology”. In those cases, FLDA distinguish perfectly between paralysis and healthy vocal folds in 100% of the images.

4. CONCLUSIONS This research is part of a complete proposal to obtain an automatic and robust process to diagnose vocal folds pathologies. It can be concluded from these results that the

algorithm designed works appropriately without any interaction with the user. It is precisely in interaction with the user that this software differs from the few other commercial software available dealing with the issue that concerns us. The proposed system consists in two step classifier based on a pathology classifier and a second more specific classifier. For future work, this second classifier can be converted in a multiclass classifier to distinguish several different pathologies.

5. REFERENCES [1] INE. http://www.ine.es/ [2] Manfredi, C., Bocchi, L., Bianchi, S., Migali, N., Cantarella, G. “Objective vocal fold vibration assessment from videokymographic images”. Biomedical Signal Processing and Control Volume 1, Issue 2, April 2006, Pages 129-136 [3] Zhou S., Huang Q., Shi J. “State space modeling of dimensional variation propagation in multistage machining process using differential motion vectors”. Robotics and Automation. 19 (2): 296-309. April 2003. [4] Richardson I.E.G, “H.264 and MPG-4 Video Compression: Video Coding for Next-Generation Multimedia”. Wiley. [5] Richardson IEG. “Video Codec Design”. West Sussex: John Wiley &sons Ltd. 2002. [6] Barjatya A. “Block Matching Algorithms for Motion Estimation”. 2004. http://www.asb.bth.se/MultiSensorSystem/siamak.khatibi/computerVisionBTH10Summer/papers/BlockMatchingAlgorithmsForMotionEstimation.pdf [7] J. R. Jain and A. K. Jain, “Displacement measurement and its application in interframe image coding,” IEEE Trans. Commun., vol. COM-29, pp. 1799–1808, Dec. 1981. [8] Hariharakrishnan K., Schonfeld D. “Fast object tracking using adaptive block matching”. IEEE Trans. Multimedia. vol 7 (5),pp 853-859, 2005. [9] Palm, C., Keysers, D., Lehmann, T., Spitzer, K. "Gabor Filtering of Complex Hue/Saturation Images for Color Texture Classification'' in Proc. JCIS, Atlantic City, USA, pp. 45-49, 2000. [10] Bishop C.M. “Pattern Recognition and Machine Learning”. Springer. 2006. [11] Pearson K. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2:559-572, 1901. [12] Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417-441, 498-520, 1933. [13] Karhunen K.. Uber lineare methoden in der Wahrsccheilichkeitsrechnung. Annales Academiae Scientiarum Fennicae, Seried A1: Mathematica-Physica, 37:3-79, 1947. [14] Loéve, M. Probability Theory. Van Nostrand, New York, 1963. [15] De la Torre, F, Black, M. J. Robust Principal Component Analysis for Computer Vision. Conf. on Computer Vision (ICCV’2001), Vancouver, Canada, July 2001. [16] Gonzalez & Woods. “Digital Image Processing”. 3rd Edition. Prentice Hall. 2008. [17] Van den Boomgard, Rein, and Richard van Balen, "Methods for Fast Morphological Image Transforms Using Bitmapped Images," Computer Vision, Graphics, and Image Processing: Graphical Models and Image Processing, Vol. 54, No. 3, May 1992, pp. 252-254.

[IEEE 2012 5th International Symposium on Communications, Control and Signal Processing (ISCCSP) -...

Documents

Transcript of [IEEE 2012 5th International Symposium on Communications, Control and Signal Processing (ISCCSP) -...