[IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France...

11
Hybrid Appearance Based Disease Recognition of Human Brains Leyla Zhuhadar, IEEE Member and Gopi Chand Nutakki, IEEE Member, University of Louisville, Louisville, USA. Abstract—The magnetic resonance imaging (MRI) is a diagnostic and treatment evaluation tool which is very widely used in various areas of medicine. MRI images provide very high quality images of the brain tissue and so can be used to study the brain conditions. This research paper proposes a productive technique to classify brain MRI images. Examining the MRI brain images manually is not only slow but is also error prone. In order to both speed up the process and maintain the quality of the classification we need a very high-quality classification system. In this research work, advanced classification techniques based on the well known SIFT and Gabor features are applied on brain images. From our analysis we observed that a hybrid feature derived with SIFT and Gabor features yielded a higher accuracy than Gabor features alone. Index Terms—Bioinformatics, SIFT, Gabor, Image Processing, PCA. 1 INTRODUCTION Classification of images can be performed by extracting the image features and then by comparing the similarity between the features. Two of the various existing fea- tures extracting techniques are SIFT and Gabor features. In our research, extracted images features using these two techniques and then obtained a hybrid feature by concatenating the SIFT features at the end of the Gabor features. The results were high in accuracy. Scale Invariant Feature Transform (SIFT) performs image recognition by extracting a local image feature vector. The feature vector is invariant to scaling, trans- lation, rotation, and partially invariant to changes in illumination and affine transformations of the images and so is well suited for classifying the MRI brain images. The calculation of the features is performed under a multi-staged filtering process that discovers interest points in scale space. The SIFT features are local and based on the appearance of the object at particu- lar interest points.They are also robust to changes as noise, and minor changes in viewpoint. In addition to these properties, they are highly distinctive, relatively easy and straight forward to extract, allow for correct Gopi Chand Nutakki is a prospective student at the Department of CECS, University of Louisville. Dr. Leyla Zhuhadar is an Adjunct Assistant Professor at University of Louisville. object identification with low probability of mismatch and are easy to match against a huge set of local features. Object description by set of SIFT features is also robust to partial occlusion. SIFT follows the steps of, Scale-invariant feature detection, Feature matching and indexing, Cluster identification by Hough transform voting [11], Model verification by linear least squares. In our research instead of using the SIFT interest points we divided an image into blocks and accumulated the resulting gradient magnitudes of each point into a bin of length 8 per block. Gabor filters, is a linear filter used for edge detection. The Gabor filters can be generated from one mother wavelet by dilation and rotation. Frequency and orienta- tion representations of Gabor filters are similar to those of the human visual system, and they have been found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave. In our research we created Gabor masks for various scale and orientation combinations. We then convoluted the mask with the image blocks and then populated the Gabor feature. Appearance based stage recognition recently received attention [4] because of its potential in accelerating the pattern discovery of gene-gene interaction. There are two main strategies for appearance based recognition [2]: i) global appearance based, and ii) local appearance based. The former strategy assumes that a human brain image is standardized with the same orientation and same scales and an image is represented as a single feature vector used as the similarity measure of images. The latter strategy does not have this assumption, but it requests an interest point detector [2] to localize a set of distinct local features that usually correlate with certain geometric/structural information. An important observation of human brain images is that their ap- pearances can be dominated by textural information, which challenges the local appearance based strategy. The recognition methods proposed in [4] is under the strategy of global appearance. 2012 16th International Conference on Information Visualisation 1550-6037/12 $26.00 © 2012 IEEE DOI 10.1109/IV.2012.99 588

Transcript of [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France...

Page 1: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

Hybrid Appearance Based Disease Recognitionof Human Brains

Leyla Zhuhadar, IEEE Member and Gopi Chand Nutakki, IEEE Member, University of Louisville,Louisville, USA.

Abstract—The magnetic resonance imaging (MRI) is a diagnostic andtreatment evaluation tool which is very widely used in various areasof medicine. MRI images provide very high quality images of the braintissue and so can be used to study the brain conditions. This researchpaper proposes a productive technique to classify brain MRI images.Examining the MRI brain images manually is not only slow but is alsoerror prone. In order to both speed up the process and maintain thequality of the classification we need a very high-quality classificationsystem. In this research work, advanced classification techniques basedon the well known SIFT and Gabor features are applied on brain images.From our analysis we observed that a hybrid feature derived with SIFTand Gabor features yielded a higher accuracy than Gabor featuresalone.

Index Terms—Bioinformatics, SIFT, Gabor, Image Processing, PCA.

1 INTRODUCTIONClassification of images can be performed by extractingthe image features and then by comparing the similaritybetween the features. Two of the various existing fea-tures extracting techniques are SIFT and Gabor features.In our research, extracted images features using thesetwo techniques and then obtained a hybrid feature byconcatenating the SIFT features at the end of the Gaborfeatures. The results were high in accuracy.

Scale Invariant Feature Transform (SIFT) performsimage recognition by extracting a local image featurevector. The feature vector is invariant to scaling, trans-lation, rotation, and partially invariant to changes inillumination and affine transformations of the imagesand so is well suited for classifying the MRI brainimages. The calculation of the features is performedunder a multi-staged filtering process that discoversinterest points in scale space. The SIFT features are localand based on the appearance of the object at particu-lar interest points.They are also robust to changes asnoise, and minor changes in viewpoint. In addition tothese properties, they are highly distinctive, relativelyeasy and straight forward to extract, allow for correct

• Gopi Chand Nutakki is a prospective student at the Department of CECS,University of Louisville.

• Dr. Leyla Zhuhadar is an Adjunct Assistant Professor at University ofLouisville.

object identification with low probability of mismatchand are easy to match against a huge set of localfeatures. Object description by set of SIFT features isalso robust to partial occlusion. SIFT follows the stepsof, Scale-invariant feature detection, Feature matchingand indexing, Cluster identification by Hough transformvoting [11], Model verification by linear least squares.In our research instead of using the SIFT interest pointswe divided an image into blocks and accumulated theresulting gradient magnitudes of each point into a binof length 8 per block.

Gabor filters, is a linear filter used for edge detection.The Gabor filters can be generated from one motherwavelet by dilation and rotation. Frequency and orienta-tion representations of Gabor filters are similar to thoseof the human visual system, and they have been foundto be particularly appropriate for texture representationand discrimination. In the spatial domain, a 2D Gaborfilter is a Gaussian kernel function modulated by asinusoidal plane wave. In our research we created Gabormasks for various scale and orientation combinations.We then convoluted the mask with the image blocks andthen populated the Gabor feature.

Appearance based stage recognition recently receivedattention [4] because of its potential in accelerating thepattern discovery of gene-gene interaction. There aretwo main strategies for appearance based recognition[2]: i) global appearance based, and ii) local appearancebased. The former strategy assumes that a human brainimage is standardized with the same orientation andsame scales and an image is represented as a singlefeature vector used as the similarity measure of images.The latter strategy does not have this assumption, butit requests an interest point detector [2] to localize aset of distinct local features that usually correlate withcertain geometric/structural information. An importantobservation of human brain images is that their ap-pearances can be dominated by textural information,which challenges the local appearance based strategy.The recognition methods proposed in [4] is under thestrategy of global appearance.

2012 16th International Conference on Information Visualisation

1550-6037/12 $26.00 © 2012 IEEE

DOI 10.1109/IV.2012.99

588

Page 2: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

2 IMAGE STANDARDIZATIONImage standardization is an important step in orderto provide a reliable and dimensionally synchronizedimage database [1] for the classification of the humanbrain images. The human brain images are capturedusing different scientific devices which can exaggeratethe effect of illumination variation, in our research weused the human brain MRIs. We also need to dealwith issues such as noise, occlusion, and inconsistentorientation in the human brain images.

2.1 Human Brain Image Extraction

With very few exceptions, the brain image and the background have significantly different local texture prop-erties. Brain images have a rougher texture with highlocal variance, while the background which is darkerhas smooth tonal variations, which means pixels withlow local variance. The variance of pixel intensity ina window of a given size (say 3 × 3), centred at eachpixel of the image is calculated and the pixel is set asforeground if the value is above a fixed threshold value.It is quite common to have brain-pixels assigned asbackground, mainly at the center region of them. Thus,after obtaining the binary image, a morphological binaryoperator is applied to “fill the holes” inside the brainimage region. In our research we would have manuallyisolated the brain by extracting the brain image usinggraphical tools and recreating another image containingonly brain. We used the Whole Brain Atlas of the Har-vard Medical School which were standardized but alsocontained images which do not provide enough detailabout the diseases.

2.2 Isolating Human Brain

A human brain image set taken using an MRI scanningdevice usually contains a few dozen brain images, withvarious interesting brain parts spread across the image.The brain in an image generally contains the humanskull and the neck bone occluding it. To extract thehuman brain, we can use the well known, “watershedtransform” to partition the foreground region of the bi-nary image. However, to the noisy borders and concaveshapes of the human head, the watershed approach witha bad initial state tends to “over-segment” the brain.We can perform a shrink-expand processing of the fore-ground region, first the region is continuously erodeduntil we find two separated regions. The two partitionsof the foreground region are then the initial state for thewatershed flooding algorithm. The algorithm “grows”back the regions, until they are tough again, creating awatershed. For an image with more than enough datain the form of the human skull, jaws and neckbone, the“shrink-expand” algorithm is recursively applied overthe foreground region, keeping only the center-mostregion at each recursion step until the “shrink-expand”algorithm gives only one region.We can apply a straight

(a) Human brain imagewith a disease.

(b) Human brain imagewith another disease.

(c) Human brain imagewith yet another disease.

Figure 1. Human brain images taken from individualscontaining different diseases. The images are standard-ized.

forward manual method to extract the main brain image,we can use the MATLAB image processing tool to set theboundaries of the brain alone and then extract the regionthat is highlighted. Points are set along the boundary ofthe brain forming an enclosed polygon which is closeto the shape of an ellipse. The MRI scanned images weobtained from Harvard were pre standardized and wereused in many research projects, automatic preprocessingfor obtaining only a specific brain part of the MRIscanned image is a future extension of our research.

2.3 Image registration

Brain images extracted could have different position, ori-entation, scale and shape in the original MRI scans. Fora better comparison between patterns in the extractedbrain images, we can perform an image registration stepto transform the images, so that the comparison can be

589

Page 3: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

performed regardless their original position, orientation,scale and shape. Since our research is specifically de-signed, ignoring the orientation of the brain and consid-ering all the human brain images to be at a standardscale, it is very crucial to have all the brain imagespre-normalized with respect to the major axis and thetop and bottom of the human head. Once the humanbrain is highlighted in the original image, a new imagewith required dimensions having a dark background iscreated and the extracted brain image is placed in thenew image so that the brain is close to the center in thenew image with the major axis along the vertical. Thehuman head is straight and the neck section pointingdownwards. Figure 1(a) shows a human brain. Theimage contains dark region surrounding the brain. Thedark region varies in size as the size of the MRI scannedbrain varies. This means the dimensions of the wholepicture remains constant while the brain inside the imagevaries in size.

3 FEATURE EXTRACTION

3.1 Orientation histogram

Orientation histogram works very close to the way theSIFT algorithm works. In order to preserve the max-imum possible detail of the human brain, instead ofconsidering only the keypoints as SIFT, the orientationhistogram processes the entire image pixels and storesthe information in the histogram.

Orientation histogram is a low-level statistical rep-resentation of a local region. The motivation of usingstatistical representation comes from the texture domi-nant appearance of the human brain images. One imple-mentation of orientation histogram is based on gradientvectors, as shown in Figure 2. The upper 4× 4 windowis a zoom-in illustration of a sub-block in the lowerwindow. A rectangle in the upper window representsa pixel, and the associated arrow denotes the gradientvector of the pixel. Note that the direction of the gradientvector represents the local orientation of the pixel, andthe length of the vector represents the magnitude of thelocal variation of pixel values.

A way to construct an orientation histogram is to accu-mulate the gradient magnitudes in the same direction, asillustrated in Figure 2 (the lower window), and Figure 3.In the lower window of Figure 2, an arrow representsthe accumulation of gradient magnitude of the samedirection. Similar to [3], we discretize the angular spacefrom 0◦ to 360◦ by the step of 45◦. Therefore, we have8 bins, as shown in Figure 3, to accumulate the gradientmagnitudes. The algorithm proceeds with taking thegradient and orientation of each pixel of the image andaccumulating the gradient magnitude to the respectiveorientation bin. Each block will have 8 bins representingthe 8 angles which are populated with the gradientmagnitudes. Each image is divided in to 8 × 8 blockseach block will give out 8 bins each. For each pixel in a

given block, the gradient magnitude and the orientationof the pixel are calculated.

The gradient is the square-root of the squares of thepixel value difference around a given block pixel. Theorientation is the arc-tan of the vertical and horizontalpixel value differences. Once the gradient magnitudeand the orientation for a pixel is calculated, the gradientmagnitude of all the pixels is added to the correspondingbin of the block to which the pixel belongs to. All the binsfrom all the blocks are then concatenated to obtain thecomplete image feature which has a length of 8 timesthe number of blocks. The feature is extracted usingeach pixel of the image which gives an impression thatmost of the image detail is preserved in the feature. Themore the information preserved the more distinct is thefeature to the other image features. The feature is finallynormalized to convert the feature into a form of unitvector which helps in obtaining the similarity betweenthe feature vectors. Once the feature vector for an imageis obtained, the vector dot product is applied with theother image features.

The strategy used in the construction of SIFT [3],concatenating multiple orientation histograms of multi-ple regions, compared with using a single orientationhistogram, has the following two advantages: i) it ismore robust to inaccurate localization, and ii) it tendsto preserve more information. But the second advantageis somehow subtle, since using a larger number of sub-regions not only increases the dimension of the low-levelfeature, but also involves more noisy information.

3.2 Gabor Filter Features

Gabor filter has significant application in the signalprocessing [5] and it is also proven in the recent pastthat it can be applied in image processing for head poseidentification [6], scene analysis [7], human identification[8], etc. It is also applied in bio-medical physics andgeophysics to better understand the signals [9].

Gabor filters have been used in many applications,such as texture segmentation, target detection, fractaldimension management, fingerprint matching, edge de-tection, image coding and image reconstruction. Gaborfilter is a linear filter that is created by modulating asinusoid with a Gaussian. Figures: 4(a)- 4(d) show thevisualization of the Gabor function under a constantvariables except the orientation varying from −π to π.

g(x, y;λ, θ, φ, σ, γ) = e(−(x′2+γ2y′2)

σ2 ) cos (2πx′

λ+ θ)

wherex′ = x cos(θ) + y sin(θ)

y′ = −x sin(θ) + y cos(θ)

where the arguments x and y specify the position of alight impulse in the visual field and σ, γ, λ, θ and φ areparameters as follows:

590

Page 4: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

Figure 2. Demonstration of a block of 4 × 4 sub-blocksused to compute an orientation histogram, where anarrow indicates a gradient vector.

Figure 3. An orientation histogram is a 8-dimensionalvector, where each dimension represents a bin associ-ated with a specific directional degree, from 0◦ to 270◦

with the step of 45◦.

• σ is the standard deviation of the Gaussian factorand determines the (linear) size of its receptive field.

• λ specifies the wavelength of the cosine factor of theGabor filter.

• θ specifies the orientation of the normal to theparallel stripes of the Gabor filter.

• φ is the phase offset of the cosine factor and deter-mines the symmetry of the Gabor filter.

• γ is called the spatial aspect ratio and specifies theellipticity of the Gaussian factor.

Gabor Filter feature is also a low-level statistical rep-resentation of a local region and the motivation of usingstatistical representation comes from the texture domi-nant appearance of images. Our implementation of theGabor filter features consists of constructing a Gabormask and sliding the mask over the image.

Figure 4(a) is the visualization of the Gabor functionfor the orientation of 0◦ and the other parameters as con-stants. The Gabor function is obtained as a convolutionof the Gaussian and the COS functions. The resultingGabor function depicts the dominating, orientation andareas of the image.

When constructing a Gabor filter features we considermultiple scales and orientations. For each scale and ori-

(a) Visualization at 0◦ (b) Visualization at 45◦

(c) Visualization at 90◦ (d) Visualization at 135◦

Figure 4. Gabor function visualization from 0◦ to 135◦.

entation combination we create a mask and use the maskto obtain a co-efficient by the convolution of the maskwith the image block which is of the same size as themask. The mask is a matrix of size 8×8 and is populatedwith a double value obtained from the product of theGaussian function and COS function for a given scaleand orientation value. The scale varies from 1 to anypositive integer. We considered only 1 scale, though itis generally 3. The orientation varies from 0◦to 136◦ bysteps of 45◦ which gives out 4 different orientations theimage is sampled with. The image is totally sampled thenumber of scales times with the number of orientations.Gabor filter effectively samples the image with differentscale and orientation combinations and does it for allthe given scales and orientations, thus obtaining a verydistinctive image feature for a given image.

Once the mask for a given set of scale and orientationis populated the mask is slid over each block of the im-age resulting in a set of co-efficients from the convolutionof the mask with the image block. All the coefficientsresulting from the convolution for all the combinationsof scales and orientations are concatenated to obtain theGabor filter features. Since Gabor filter is very close tothe way the human vision system works, the image fea-tures obtained using this procedure is highly distinctivefor a given image and so the classification results arebetter than orientation histogram.

Figure 5 shows the setup of extracting the imagefeatures using the Gabor mask. The image is dividedin to number of sub-blocks, each of size equal to theGabor masks which are of 8× 8 pixel size. A coefficientis resulted when the image block is convoluted withthe Gabor mask, this coefficient is used to populate theGabor image feature. The complete image is convolutedfor all the given Gabor mask each obtained from a spe-cific combination of scale and orientation. The featuresare populated with the coefficient obtained from the

591

Page 5: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

Figure 5. Gabor features extraction from the image usingthe Gabor Mask.

Figure 6. Spacial Distribution of the Data

convolution of the image block with the Gabor mask.

4 DIMENSION REDUCTION4.1 Principal Component Analysis

Principal component analysis is appropriate when youhave obtained measures on a number of observed vari-ables and wish to develop a smaller number of artificialvariables (called principal components) that will accountfor most of the variance in the observed variables. Theprincipal components may then be used as predictoror criterion variables in subsequent analyses. Principalcomponent analysis is a variable reduction procedure.It is useful when you have obtained data on a numberof variables (possibly a large number of variables), andbelieve that there is some redundancy in those variables.In this case, redundancy means that some of the vari-ables are correlated with one another, possibly becausethey are measuring the same construct. Because of thisredundancy, you believe that it should be possible toreduce the observed variables into a smaller number ofprincipal components (artificial variables) that will ac-count for most of the variance in the observed variables.

Figure 7. Principal Components, Z1 and Z2

Figure 8. Projection of the data on to the first PC

Specifically, given the data in m-dimensional space.PCA describes the location and space of the m-dimensional data cloud. There are 2 steps involved,translation and rotation of the data cloud to and aboutthe origin respectively. Translation is done by meanclustering the data. If the data is not mean-centered, thenthe PC axes describes not only the shape of the data butalso the location. Rotation is done by aligning the firstPC axis along with the longest axis through the dataset. A principal component can be defined as a linearcombination of optimally-weighted observed variables.

Figure 6 shows the visualization of the spacial dis-tribution of a sample data along X1 and X2. Each pointrepresents a vector in the space. The data is distributed inthe form of an ellipse to better demonstrate the PrincipalComponents. The idea is to find the axis passing throughthe majority of the data, such axis is called the major axisand we also define a minor axis perpendicular to theaxis, which covers lower amounts of the data. As themajor axis passes through the majority of data, the datais projected on to the majority axis thus representing thecomplete data. Figure 7 shows the visualization of themajority and minority axes namely Z1 and Z2. Figure 8shows the visualization of the projects of the data pointson to the majority axis.

592

Page 6: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

4.1.1 Covariance

Covariance is always measured between 2 dimensions. Ifwe calculate the covariance between one dimension anditself, you get the variance. So, if we had a 3-dimensionaldata set (x,y,z), measuring the covariance between andx,y or y,z and z,x would find the variance of the x,y,zdimensions respectively. The formula for covariance is:

cov(X) =

∑ni=1 (Xi − X̄)(Xi − X̄)T

(n− 1)

The covariance is always measured between 2 dimen-sions. If we have a data set with more than 2 dimensions,there is more than one covariance measurement thatcan be calculated. A useful way to get all the possiblecovariance values between all the different dimensionsis to calculate them all and put them in a matrix.

4.1.2 SVD of Covariance

Let xi, i = 1, . . . , n be a set of training data points,where n be the number of data points. Matrix X =[x1, x2, . . . , xn] is the data matrix. Let x̄ be the centroid ofthe training data points. By subtracting each data pointby the centroid (i.e., the translation), we get the zero-mean data matrix X given as:

X = [x1 − x̄, x2 − x̄, . . . , xn − x̄]

Then we construct the covariance matrix

C = XXT

where T denotes the transpose of a matrix. SVD is nowapplied to the covariance matrix C to obtain the eigen-decomposition of C. A projection matrix P consists of aset of eigenvectors associated with largest eigenvalues.With the projection matrix P , we can obtain the subspacerepresentation of a data point x in the original m-dimensional space by Px.

A set of images are taken, and are converted to to alinear vector. The average of the considered vectors iscomputed and a different set of vectors are computedwhich are equal to the difference between the averagevector and original image vectors. All the vectors areagain grouped as a single matrix. The SVD is thenapplied on the matrix. By reshaping the resulting ma-trix rows which are vectors into the dimensions of theoriginal image we get the eigen brain images.

The variance of the nth principal component is the ntheigenvalue. Therefore, the total variation exhibited by thedata is equal to the sum of all eigenvalues. In the datalibrary, eigenvalues are normalized such that the sum ofall eigenvalues equals 1. A normalized eigenvalue willindicate the percentage of total variance explained byits corresponding structure. Structures have also beennormalized so that the root mean square equals 1. Thisway, the structures can be expressed in terms of standarddeviation.

Singular values are equal to the square root of theeigenvalues. Since eigenvalues are automatically normal-ized in the data library, they do not easily provide in-formation into the total amount of variance they explain.However, you may calculate the total variance explainedby each EOF by squaring the singular values.

In the data library there is a time series associated witheach structure. These time series are the principal com-ponents. The first time series is calculated by projectingthe data matrix onto the first eigenvector of the variance-covariance matrix of the data, the second time seriesby projecting onto the second eigenvector, and so on.The time series values indicate the amount of the givenstructure needed to complete the data field. It followsthat the structure (dimensionless) multiplied by the timeseries value at a single point in time (units of the data),summed over all structures, yields the original data atthat point in time.

Using a superscript T to denote the transpose of avector or matrix, we say two vectors x, y are orthogonalif

xT y = 0

In two or three dimensional space, this imply means thatthe vectors are perpendicular. Let A be a square matrixsuch that its columns are mutually orthogonal vectors oflength 1, i.e.

xTx = 1

The A is an orthogonal matrix and

ATA = I

the identity matrix. For simpler notation, assuming that amatrix A has at least as many rows as columns (M ≥ N).

A singular value decomposition of an M × N matrixA is any factorization of the form

A = UDV T

where U is an M ×M orthogonal matrix, and D is anM × N diagonal matrix with all the elements except inthe diagonal of the matrix are zeros.

5 EXPERIMENTS5.1 ClassifierK Nearest Neighbor [10] (KNN from now on) is one ofthose algorithms that are very simple to understand butworks incredibly well in practice. Also it is surprisinglyversatile and its applications range from vision to pro-teins to computational geometry to graphs.

KNN is an non parametric lazy learning algorithm[10], it means that it does not make any assumptions onthe underlying data distribution. This is pretty useful, asin the real world, most of the practical data does not obeythe typical theoretical assumptions made (e.g. Gaussianmixtures, linearly separable etc). Non parametric algo-rithms like KNN are of a great use here.

It is also a lazy algorithm, it does not use the trainingdata points to do any generalization. In other words,

593

Page 7: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

there is no explicit training phase or it is very minimal.This means the training phase is pretty fast. Lack ofgeneralization means that KNN keeps all the trainingdata. More exactly, all the training data is needed duringthe testing phase. This is in contrast to other populartechniques like Support Vector Machines where you candiscard all non support vectors without any problem.Most of the lazy algorithms, especially KNN makesdecision based on the entire training data set (in the bestcase a subset of them). The dichotomy is very obvioushere. There is a non existent or minimal training phasebut a costly testing phase. The cost is in terms of bothtime and memory. More time might be needed as in theworst case, all data points might take point in decision.More memory is needed as we need to store all trainingdata.

KNN assumes that the data is in a feature space.More exactly, the data points are in a metric space. Thedata can be scalars or possibly even multidimensionalvectors. Since the points are in feature space, they havea notion of distance. This need not necessarily be Eu-clidean distance although it is the one commonly used.In our experiments we used the Euclidean distance. Eachof the training data consists of a set of vectors andclass label associated with each vector. In the simplestcase, it will be either + or - (for positive or negativeclasses). But KNN, can work equally well with arbitrarynumber of classes. We can also give a single number"k". This number decides how many neighbours (whereneighbours is defined based on the distance metric)influence the classification.

In this section, we test the performance of the pro-posed features for stage recognition. Our dataset con-tains over 1000 human brain images obtained from theHarvard Whole Brain Atlas. The dataset is composed of41 different cases of diseases related to the human brain.Figure 5.1 shows an example of human brain images.These images have been standardized with respect toorientation and scale, and the dimension of the imagesare 256×256. The classifier we used is K nearest neighbor(K-NN). We use 2-Fold cross validation to estimate therecognition accuracy. We will compare the recognitionaccuracy of RGB features, orientation histogram, thesubspace representation of orientation histogram andthe Gabor features. We will also study their recognitionefficiency.

The experiments are performed on the MATLAB plat-form using the Image Processing toolkit. To attain higherperformance the classification algorithms were written inC language and compiled under MEX compiler, whichenables using the MATLAB tools inside the C programs.

5.2 Orientation Histogram

The orientation histogram builds the features in a veryclose fashion to the SIFT features. Figure 11 shows thatthe orientation histogram classification accuracy wasvery close for all the Nearest Neighbour values, the

Figure 9. Visualization of a single individual’s brain, usingMRI scanner. Each image is a cross-sectional represen-tation of the individual’s brain.

optimal accuracy was 88 under K-NN with K=1. Theorientation histograms are reduced using the PCA withPCs varying from 2 to 20. Figure 11 shows the optimalclassification accuracy for different N. The PCA run timeis considerably lower than the orientation histogram asthe length of the features are reduced. Though PCAis reliable and a robust technique the dimension re-duction of feature from 8 × 8 × 8 to less than 20 hasresulted in a fewer number of accurate matches whencompared to that of the original orientation histogram.The classification occur at reduced amount of the imagedetails, resulting in fewer number of accurate matches.The number of training data, which is small, in the testalso effects the classification accuracy.

Figure 10 shows the visualization of the orientationhistogram under different nearest neighbours and thereduced feature accuracies. The images are reduced froma length of 8× 8× 8 to a length of principle componentswe choose which varies from 2 to 20. The accuraciesafter the reduction are lower than the original unreducedfeature accuracies as the image detail is reduced, thedimension of the reduced features are significantly lowerthan the original features. As we can the brain imagesare surrounded by dark region which varies from one

594

Page 8: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

Figure 10. The Orientation histograms reduced by PCA.Accuracy for the number of Principle Components from 2to 20 and Nearest Neighbors from 1 to 5.

Figure 11. The Orientation histograms for Nearest Neigh-bors from 1 to 5.

to another. If the image contains a lot of dark region,its features comparatively carries less information andwhen the features are reduced, the dark region mightdominate, which can result in a near NULL/zero feature.Classifying these near NULL features is very difficult.Figure 11 shows the optimal reduced feature accura-cies under K=1 nearest neighbors. Figure 12 shows thevisualization of the accuracies of the optimal reducedfeatures under different numbers of nearest neighbours.

Table 1 shows the readings of the accuracies of theoriginal features against the reduced features under dif-ferent k values of the KNN algorithm. Table 2 showsthe time cost readings to finish the classification usingthe original features against the reduced features underdifferent k values of the KNN algorithm.

Table 1 shows the accuracy results for different Nvalues of the KNN. Table 2 shows the runtime costs forthe orientation histogram and the reduced orientationhistogram.

The Orientation histogram generally takes slightlylonger time to classify the images. We can safely ignorethis extra time as we need some considerable amountof time in creating the reduced features. Orientationhistogram involves calculation of the Gradient magni-

Figure 12. Accuracy comparison of Orientation his-tograms, PCA optimal values when classification is per-formed under 1 Nearest Neighbors. Accuracy for PCs = 2to 20.

Table 1Accuracy comparison of orientation histogram, PCA at

the optimal number of Principal Components for numberof neighbors from 1 to 5

K Orientation histogram PCA on Orientation histogram

1 88 20,(2 PCs)2 83 19,(2 PCs)3 83 23,(2 PCs)4 82 24,(2 PCs)5 79 24,(2 PCs)

Table 2Time cost comparison of orientation histogram, PCA at

the optimal number of Principal Components for numberof neighbors from 1 to 5. Time costs measured in

milliseconds, including the time taken to generate imagefeatures

K Orientation histogram PCA of Orientation histogram

1 1002 11245,(2 PCs)2 1134 10264,(2 PCs)3 1156 10426,(2 PCs)4 1242 10918,(2 PCs)5 1211 10290,(2 PCs)

tude and obtaining the orientation for each pixel andthen populating the histogram in the correct slot. Eachimage sub block will populate its 8 bins, which arepart of the bigger complete image histogram. When theorientation histogram is reduced from 512 to a lengthvarying between 2 and 20 using the PCA algorithm,the time cost remains neutral as the feature lengths arethe same. The orientation histogram outperforms theRGB features as the orientation histogram obtains thefeatures which distinguishes the image not only with theappearance but also the orientation of the pixels. Sincethere is more distinct information fed into the histogram,the orientation histogram outperforms the RGB features

595

Page 9: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

Figure 13. The Gabor Features reduced by PCA. Accu-racy for the number of Principle Components from 2 to 20and Nearest Neighbors from 1 to 5.

Figure 14. The Gabor Features for Nearest Neighborsfrom 1 to 5.

which depend solely on the appearance.

5.3 Gabor Filter featuresThe Gabor features builds the features in a very differentfashion to the RGB and the orientation histogram, but ina very effective manner. Figure 14 shows that the Gaborfeature classification accuracy was very close for all theNearest Neighbor values, the optimal accuracy was 83under K-NN with K=1. The Gabor features are reducedusing the PCA with PCs varying from 2 to 20. Figure 13shows the classification accuracy of the reduced featuresfor different N for K values from 1 to 5. The PCA runtime is considerably lower than the Gabor features as thelength of the features are reduced. Dimension reductionis applied on the feature and they are reduced to a lengthvarying between 2 and 20 . This has resulted in a smallernumber of accurate matches when compared to thatof the original orientation histogram. The classificationoccur at reduced amount of the image details, resultingin fewer accurate matches. The number of training data,which is small, in the test also effects the classificationaccuracy.

Table 3 shows the details about the accuracy results

Figure 15. Accuracy comparison of Gabor Image fea-tures, PCA optimal values when classification is per-formed under 5 Nearest Neighbors. Accuracy for PCs =2 to 20.

Table 3Accuracy comparison of Gabor, PCA of Gabor features

at the optimal number of Principal Components fornumber of neighbors from 1 to 5

K Gabor PCA of Gabor features

1 83 16,(2 PCs)2 79 15,(2 PCs)3 81 16,(2 PCs)4 77 14,(2 PCs)5 75 12,(2 PCs)

Table 4Time cost comparison of Gabor features, PCA at the

optimal number of Principal Components for number ofneighbors from 1 to 5. Time costs measured in

milliseconds, including the time taken to generate imagefeatures

K Gabor PCA of Gabor features

1 60234 13245,(17 PCs)2 60244 15324,(11 PCs)3 61039 12456,(12 PCs)4 62372 13958,(04 PCs)5 61231 13290,(13 PCs)

for different N values of the KNN. Table 4 shows theruntime costs for the Gabor features and the reducedGabor features.

Figure 15 shows the visualization of the Gabor featuresunder different nearest neighbours and the reduced fea-ture accuracies. The images are reduced from a lengthof 20 × 15 × scales × orientations × blocks to a lengthof principle components we choose which varies from2 to 20. The accuracies after the reduction are lowerthan the original unreduced feature accuracies as theimage detail is reduced, the dimension of the reducedfeatures are drastically lower than the original features.Figure 14 shows the optimal feature accuracies under

596

Page 10: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

Table 5Accuracy comparison of Hybrid, orientation histogram,

Gabor

K Hybrid Orientation histogram Gabor

1 87 88 832 80 83 793 80 83 814 77 82 775 75 79 75

Table 6Time cost comparison of Hybrid, Orientation histogram,

Gabor

K Hybrid Orientation histogram Gabor

1 62212 1002 602342 62847 1134 602443 63077 1156 610394 62895 1242 623725 62857 1211 61231

different number of nearest neighbours. Figure 15 showsthe visualization of the accuracies of the reduced featuresunder different number of nearest neighbours.

Table 3 shows the readings of the accuracies of theoriginal features against the reduced features under dif-ferent k values of the KNN algorithm. Table 4 showsthe time cost readings to finish the classification usingthe original features against the reduced features underdifferent k values of the KNN algorithm.

5.4 Hybrid features

The hybrid features are derived by using the orientationhistograms and the Gabor features. As we have seenin the previous sections, each image is run through thefeature extraction techniques and a feature is obtainedrepresenting the image. As a rule of thumb we know thata well distinctive set of features will give the best clas-sification environment. The idea of creating the hybridfeatures is to make the feature more distinctive. We usedthe simple feature concatenation where each Gabor fea-ture is concatenated with a orientation histogram featureat the end. In our experiments the Gabor feature lengthwas 4906 and the orientation histogram length was 512.By simply combining the two features the accuracy of theclassification was 87 compared to 83 when only Gaborfeatures were used, here we need to understand that theHybrid feature is dominated by the Gabor feature. Wealso used the PCA dimension reduction technique on thehybrid feature.

Table 5 and Figure 16 show the comparison of all thedifferent features. Table 6 shows the computation costsof all the different features.

The effects of brain images not being aligned correctlyand the normalization of the image with respect to theup-rightness clearly plays a very major role in the clas-

Figure 16. Accuracy comparison of Hybrid, Orientationhistogram, Gabor

sification of the images. Considering the alignment, animage not properly aligned produces an image featurevarying in values compared to that of an image properlyaligned. As the features are populated with respect to theimage blocks, any variation, effects the location of thevector in the space thus giving an inappropriate result.An image feature incorrectly normalized with respect tohead and neck will produce a feature which is a reverseof what the actual correctly normalized image wouldhave produced.

6 CONCLUSION AND FUTURE WORKUsing the orientation histogram features alone providesa good rate of accuracy. The performance can further beimproved greatly by implementing the PCA dimension-ality reduction technique on the orientation histogramfeatures and the Gabor features and combining themto form a hybrid feature. Training the algorithm withenough number of image samples may improve theaccuracy of classification using the reduced orientationhistogram features. The human brain looks very similarunder different disease conditions, with variations inonly a part of the brain, so when the features are re-duced, the distinctiveness is lost and the reduced featureis dominated by the common regions, this can be thereason for the poor performance of the PCA reducedfeatures. The subspace representation of orientation his-togram features considerably reduces the computationcost.

In the future, we plan to study stage classificationunder unconstrained localization (i.e., without the as-sumption of orientation and scale normalized humanbrain images).

REFERENCES[1] S. Kumar, K. Jayaraman, S. Panchanathan, R. Gurunathan, A.

Marti-Subirana and S.J. Newfeld, BEST: A novel computationalapproach for comparing gene expression patterns from early stages ofDrosophila melanogaster development. Genetics, Vol. 16(4), 2037-2047,2002.

597

Page 11: [IEEE 2012 16th International Conference on Information Visualisation (IV) - Montpellier, France (2012.07.11-2012.07.13)] 2012 16th International Conference on Information Visualisation

[2] Q. Li, J. Ye, M. Li, and C. Kambhamettu, Adaptive appearance basedface recognition, IEEE International Conference on Tools with ArtificialIntelligence (ICTAI), pp 677-684, 2006.

[3] D. Lowe, Distinctive Image Features from Scale-Invariant Keypoints,IJCV, Vol. 60(2), 91-110, 2004.

[4] Jieping Ye, Jianhui Chen, Qi Li, and Sudhir Kumar, Classificationof Drosophila embryonic developmental stage range based on geneexpression pattern images, Computational Systems BioinformaticsConference (CSB2006), 2006.

[5] Rioul, O. and Vetterli, M. Wavelets and signal processing SignalProcessing Magazine, IEEE

[6] Shen, L.L. and Bai, L. and Fairhurst, M. Gabor wavelets and generaldiscriminant analysis for face identification and verification Image andVision Computing

[7] Itti, L. and Koch, C. and Niebur, E. A model of saliency-basedvisual attention for rapid scene analysis Pattern Analysis and MachineIntelligence, IEEE Transactions, 2002.

[8] Zhu, Y. and Tan, T. and Wang, Y. Biometric personal identificationbased on iris patterns Published by the IEEE Computer Society, 2000.

[9] Margrave, G.F. and Lamoureux, M.P. Gabor deconvolution CSEGannual mtg., Expanded Abstract, 2002.

[10] Zhang, M.L. and Zhou, Z.H. ML-KNN: A lazy learning approach tomulti-label learning Elsevier, 2007

[11] Yamada, H. and Yamamoto, K. and Hosokawa, K. Directionalmathematical morphology and reformalized Hough transformation forthe analysis of topographic maps Published by IEEE Pattern Analysisand Machine Intelligences Transactions, 1993.

598