Final Report Indian Academy of Sciences Summer Fellowship

20
1 Eight-Week Report Registration / Application No: ENGS2229S Name: Amogha P Institution: National Institute of Technology, Surathkal Abstract: Alzheimer's disease is one of the main problem haunting the world today. According to the latest study 1 out of 8 Americans suffer from Alzheimer's disease. In developing countries like India, it is even worse as number of people with dementia is presently 58% and by 2050, it is projected to rise to 71% according to study conducted by Alzheimer's Disease International. It is important to have a method diagnosis, so that AD can be identified. The only way to confirm AD is Autopsy of Brain, Hence we need in vivo methods to identify AD. The present study, is to identify AD with the help of support vector machine. The Axial Images of 302 controls, 54 MCI patients and 62 AD patients in which Hippocampus atrophy was visible was taken. These images were processed to segment the images into four regions- Grey matter, white matter, Skull, Voids( Atrophy and background) by improved fuzzy c-means methods and further Images were classified using support vector machine and an accuracy of 79.18% was achieved. Hence, a better classifier was studied, a Hidden neural network classifier, with this 85.56% accuracy was achieved. This is a three class pattern classification problem, Hence the accuracy is very good for a three class problem

description

IAS summer Internship Final report

Transcript of Final Report Indian Academy of Sciences Summer Fellowship

Page 1: Final Report Indian Academy of Sciences Summer Fellowship

1

Eight-Week Report

Registration / Application No: ENGS2229S

Name: Amogha P

Institution: National Institute of Technology, Surathkal

Abstract:

Alzheimer's disease is one of the main problem haunting the world today. According to the latest

study 1 out of 8 Americans suffer from Alzheimer's disease. In developing countries like India, it

is even worse as number of people with dementia is presently 58% and by 2050, it is projected to

rise to 71% according to study conducted by Alzheimer's Disease International. It is important

to have a method diagnosis, so that AD can be identified. The only way to confirm AD is

Autopsy of Brain, Hence we need in vivo methods to identify AD. The present study, is to

identify AD with the help of support vector machine. The Axial Images of 302 controls, 54 MCI

patients and 62 AD patients in which Hippocampus atrophy was visible was taken. These

images were processed to segment the images into four regions- Grey matter, white matter,

Skull, Voids( Atrophy and background) by improved fuzzy c-means methods and further Images

were classified using support vector machine and an accuracy of 79.18% was achieved. Hence, a

better classifier was studied, a Hidden neural network classifier, with this 85.56% accuracy was

achieved. This is a three class pattern classification problem, Hence the accuracy is very good for

a three class problem

Page 2: Final Report Indian Academy of Sciences Summer Fellowship

2

Table of Contents

1. Introduction 3

2.Methodology and Discussion 4

a. Preprocessing 4

b. Image Segmentation 7

c. Training 10

d. Testing 14

3. Results

a. Support Vector Machine 15

b. Hidden Neural Network 16

4. Conclusion 18

5. References 19

Page 3: Final Report Indian Academy of Sciences Summer Fellowship

3

Introduction

Alzheimer's disease is one of the most common neurodegenerative diseases[1]. However, the

disease can be confirmed only after autopsy of brain. Hence, there is a need for a in vivo method

for diagnosis of Alzheimer's disease[1,2]. The methods presently used are MRI, MRS, PET scan,

CT scan etc. However, diagnosis is based on Atrophy rating or some visual features, which are

observed by an expert. However, this method leads to bias, as each expert may rate atrophy

differently. Hence, there is a need for an automated method to diagnose Alzheimer's disease. The

automated method that I am trying to implement is to identify Alzheimer's disease from the MRI

image of the test subject. The problem to be solved here is a pattern recognition problem. There

are lot of classifiers available in the literature. The classifier I have used is support vector

Machine, This is because in this study the sample size(number of test subjects on whom MRI is

done=400) is much smaller than sample size (pixels in the image=256*256).

This is a classic problem of pattern recognition. Hence, naturally it involves these steps[3]:

1. Preprocessing of the MRI image

2. Segmentation of Image

3. Training Classifier based from MRI images

4. Validation

Preprocessing is done to remove the noise in the image and improve the contrast of the image,

The information in the image must be usable for SVM based classification hence, segmentation

into different anatomical parts in Brain is a way to extract features for SVM. These segmented

images are used for training the SVM to distinguish between AD and control subjects. Validation

is done to test the accuracy of SVM.

Page 4: Final Report Indian Academy of Sciences Summer Fellowship

4

Methodology

1. Preprocessing

The MRI images used were from www.oasis-brains.org, this website has MRI images of 416

subjects. The subjects were also subjected to MMSE(Mini Mental State Exam) and hence, the

CDR(Clinical Dementia Rating) is obtained. A CDR of 1 and above indicates probable

Alzheimer's disease. The Images were processed using standard procedure, by applying Inverse

Fourier Transform and applying connected component analysis. For my study, I chose the

Coronal Image of brain as Medial Temporal Atrophy can be found out from this image.

Histogram equalization was applied to enhance, the intensity of the image. However, adaptive

histogram was found to outperform the normal histogram equalization.

Histogram Equalization:

Histogram equalization is common image contrast enhancement method. Consider a

discrete grayscale image {x} and let ni be the number of occurrences of gray level i. The

probability of an occurrence of a pixel of level i in the image is

L being the total number of gray levels in the image, n being the total number of pixels in the

image, and being in fact the image's histogram for pixel value i, normalized to [0,1].

The cumulative distribution function corresponding to px as

which is also the image's accumulated normalized histogram.

Transformation of the form y = T(x) to produce a new image {y}, such that its CDF will be

literalized across the value range, i.e.

Page 5: Final Report Indian Academy of Sciences Summer Fellowship

5

for some constant K. The properties of the CDF allow us to perform such a transform it is

defined as

In order to map the values back into their original range, the following simple transformation

needs to be applied on the result[4]:

Adaptive Histogram(CLAHE):

CLAHE differs from ordinary adaptive histogram equalization in its contrast limiting. This

feature can also be applied to global histogram equalization, giving rise to contrast-limited

histogram equalization (CLHE), which is rarely used in practice. In the case of CLAHE, the

contrast limiting procedure has to be applied for each neighborhood from which a transformation

function is derived. CLAHE was developed to prevent the overamplification of noise that

adaptive histogram equalization can give rise to.

This is achieved by limiting the contrast enhancement of AHE. The contrast amplification in the

vicinity of a given pixel value is given by the slope of the transformation function. This is

proportional to the slope of the neighbourhood cumulative distribution function (CDF) and

therefore to the value of the histogram at that pixel value. CLAHE limits the amplification by

clipping the histogram at a predefined value before computing the CDF. This limits the slope of

the CDF and therefore of the transformation function. The value at which the histogram is

clipped, the so-called clip limit, depends on the normalization of the histogram and thereby on

the size of the neighbourhood region. Common values limit the resulting amplification to

between 3 and 4. It is advantageous not to discard the part of the histogram that exceeds the clip

limit but to redistribute it equally among all histogram bins. The redistribution will push some

bins over the clip limit again (region shaded green in the figure), resulting in an effective clip

limit that is larger than the prescribed limit and the exact value of which depends on the image. If

this is undesirable, the redistribution procedure can be repeated recursively until the excess is

negligible[5].

Page 6: Final Report Indian Academy of Sciences Summer Fellowship

6

When Normal histogram equalization was applied to an image with greater voids, the contrast

equalization makes the white matter intensity dull. Hence, a problem in segmentation could be

possible. The images of this normal segmentation are attached below. Hence, an adaptive

histogram equalization is used. fig 1(a) and fig1(b) are attached to show the results, of both

methods, adaptive enhancement performs better hence, it was applied.

The above images are preprocessed images, 1(a). The image is equalized by normal histogram equalization, 1(b).

The image is enhanced by adaptive histogram.

Noise Removal:

After Image contrast adjustment, the noise gets boosted, Hence, noise removal is important to

have proper segmentation. The noise removal was done by wiener adaptive filter. Wiener filter

estimates the local mean and variance around each pixel[6].

where is the N-by-M local neighborhood of each pixel in the image A. Pixelwise Wiener filter

are created using these estimates,

where ν2 is the noise variance. Noise Variance is the average of all the local estimated variances.

Page 7: Final Report Indian Academy of Sciences Summer Fellowship

7

2. Segmentation of Image

The FCM algorithm assigns pixels to each category by using fuzzy memberships. Let

denotes an image with N pixels to be partitioned into c clusters, where represents

multispectral (features) data. The algorithm is an iterative optimization that minimizes the cost

function defined as follows:

where represents the membership of pixel in the ith cluster, vi is the ith cluster center, || || is

a norm metric, and m is a constant. The parameter m controls the fuzziness of the resulting

partition, and m=2 is used in this study.

The cost function is minimized when pixels close to the centroid of their clusters are assigned

high membership values, and low membership values are assigned to pixels with data far from

the centroid. The membership function represents the probability that a pixel belongs to a

specific cluster. In the FCM algorithm, the probability is dependent solely on the distance

between the pixel and each individual cluster center in the feature domain. The membership

functions and cluster centers are updated by the following[7]:

Page 8: Final Report Indian Academy of Sciences Summer Fellowship

8

Starting with an initial guess for each cluster center, the FCM converges to a solution for vi

representing the local minimum or a saddle point of the cost function. Convergence can be

detected by comparing the changes in the membership function or the cluster center at two

successive iteration steps.

However, this method treats whole image as a column vector and segments the image. However,

in an MRI it is a known fact that all white matter is a connected set. Also, all the grey matter is

connected set. Hence, small change was done in algorithm to account for this fact. Hence, the

Membership matrix is convolved by a 5x5 ones matrix and the update rule is modified so that the

neighboring elements probability is added up to the given pixel, hence connectivity is ensured[8]

This spatial relationship is important in clustering, but it is not utilized in a standard FCM

algorithm. To exploit the spatial information, a spatial function is defined as

where NB(xj) represents a square window centered on pixel xj in the spatial domain. A 5*5

window was used throughout this work. Just like the membership function, the spatial function

hij represents the probability that pixel xj belongs to ith cluster. The spatial function of a pixel for

a cluster is large if the majority of its neighborhood belongs to the same clusters. The spatial

function is incorporated into membership function as follows:

where p and q are parameters to control the relative importance of both functions. In a

homogenous region, the spatial functions simply fortify the original membership, and the

clustering result remains unchanged. However, for a noisy pixel, this formula reduces the

weighting of a noisy cluster by the labels of its neighboring pixels. As a result, misclassified

pixels from noisy regions or spurious blobs can easily be corrected. The spatial FCM with

parameter p and q is denoted sFCMp,q. Note that sFCM1,0 is identical to the conventional FCM.

Page 9: Final Report Indian Academy of Sciences Summer Fellowship

9

The clustering is a two-pass process at each iteration. The first pass is the same as that in

standard FCM to calculate the membership function in the spectral domain. In the second pass,

the membership information of each pixel is mapped to the spatial domain, and the spatial

function is computed from that. The FCM iteration proceeds with the new membership that is

incorporated with the spatial function. The iteration is stopped when the maximum difference

between two cluster centers at two successive iterations is less than a threshold (Z0.02). After the

convergence, defuzzification is applied to assign each pixel to a specific cluster for which the

membership is maximal[8].

The pictures shown above, 2(a). Original image 2(b). Segmented by fuzzy c-means 2(c). Segmented by improved

fuzzy c-means algorithm. The arrows in the third image show the atrophy (Medial Temporal Atrophy). Thus the

second segmentation method outperforms the first one

Page 10: Final Report Indian Academy of Sciences Summer Fellowship

10

3. Classifier Training:

1. Support Vector machine:

Support vector machines are one of the most common pattern recognition machines used in

Machine learning. It is considered as a successor to logistic regression due to the fact that it is

based on maximum class seperatability.

Mathematical Basis of Support Vector Machine:

a. An Intuition:

Support vectors machine aims at maximizing the distance between the two class vectors and the

discriminatory function. Hence the method is less probable to errors in testing phase than any

other method like logistic regression which just increases the expectation of the probability

density function. Thus, Support vector machines are called margin maximizers[9].

b. Mathematical definition

Problem:

Given a vector and , for given vector , is called a trusted source.

We need to find probability distribution , assuming that the data is IID (Independently

drawn and identically distributed).

Solution:

Consider, a machine or simply a function, f(X, α), where α is the training parameter.

Expectation of the tests error:

Where

Page 11: Final Report Indian Academy of Sciences Summer Fellowship

11

R(α) cannot be calculated as both and are not known, so a bound is to be found

out such that the limit of the bound approximates

Consider,

Now it can be proved that,

We need to find h, such that the RHS is almost equal to LHS, then R(α) can be estimated, h is

called VC dimension which is an integer, thus a perfect fit cannot be described, but a best fit is

described by changing VC dimensions.

Linear SVM:

It is one of the most important models used in neuro applications as the sample size is smaller

than the dimensionality of the system in many cases.

Mathematical Basis:

If W is the hyper plane, since the hyperplane or discriminatory function is linear,

we must have

The optimum W is decided by optimizing "Margin".

i.e.,

Thus the distance optimization gives rise to

, hence needs to be minimized to maximize

the distance[10].

However, doing this optimization directly is difficult. Hence, convex optimization tools are used

to get the solution,

One of the approach, commonly followed is Lagrange Multipliers,

Page 12: Final Report Indian Academy of Sciences Summer Fellowship

12

Lagrange Multipliers[11]:

Lagrange multipliers are used in optimization problems,

subject to:

Consider a two dimension problem to be solved by Lagrange Multipliers,

is a curve in 3 dimension. However, using makes the curve a constrained

curve restricted to 2 dimensions, differentiating the curve w.r to x, we get

Consider a two dimensional vector, T the tangent to the curve,

Consider a two dimensional vector,

Equation 1, can be written as , Also we need to minimize and this is possible

only if , This means that and are parallel to each other, which can be

equivalently written as,

consider, ,

Solving the above three linear equations simultaneously, we the value of the Lagrange

multiplier, hence by substituting the value of in the same set of equations we get the values of

y and x. x corresponds to minima point and y is the minima.

Now, we can use the concept of Lagrange multipliers shown above to solve the problem of ||w||

minimization. Here, we write the equation in terms of Lagrange multiplier αi, where i=1,2,...l

Page 13: Final Report Indian Academy of Sciences Summer Fellowship

13

, Where Lp is the Lagrangian and α is the Lagrange multiplier

vector

The solution of this equation is got by quadratic programming for those points for which αi>0

are called support vectors, These points lie on the two hyperplanes H1 and H2, These points in

hyperplanes are important because even though the points which are not in hyperplanes(points

that are far away from discriminatory function) are removed, the discriminatory function does

not change. However, If the points on Hyperplanes are removed the discriminatory function

changes. Hence, in feature selection it is important, to make sure that feature matrix does not

exclude the support vectors[12].

The two dotted lines are hyperplanes H1 and H2 on which the support vectors lie, The margin

maximization yields support vectors and the classifier.

The above discussed is theoretical method, hence it is an analytical way of soling for the

solution. However, once the sample size becomes large it is hard to solve the equation, as seen

from the example a two sample problem requires simultaneous solution of 3 equations, However,

there is no guarantee on the linearity of the equations, Hence, the algorithmic solution is required

for solving the equation. Hence, an iterative method is required to solve for solution[13].

Page 14: Final Report Indian Academy of Sciences Summer Fellowship

14

Dimensionality reduction:

To apply the SVM algorithm effectively the dimensionality as to be reduced. The feature matrix

would have high dimensionality, if an image is considered. Therefore PCA is applied on the data

to reduce the dimension. PCA stands for Principal component analysis, which eliminates the

vectors having higher variance than the threshold. The technique uses covariance matrix. LDA is

applied on the data to align the data to give the data greater class seperatability[14].

2. Hidden Neural Network

HMMs are based on a number of assumptions that limit their classification abilities. Combining

the CHMM framework with neural networks can lead to a more flexible and powerful model for

classification. The basic idea of the HNN presented here is to replace the probability parameters

of the CHMM by state-specific multilayer perceptrons that take the observations as input. Thus,

in the HNN, it is possible to assign up to three networks to each state: (1) a match network

outputting the “probability” that the current observation matches a given state, (2) a transition

network that outputs transition “probabilities” dependent on observations, and (3) a label

network that outputs the probability of the different labels in this state[15].

HMMs give very good results for problem of any size. In the problem, we have a data of 416

subjects. hence, 60% of the data was used for training by picking a random seed and 15% for

validation and 25% for testing. The random seed pick was done multiple times to reduce the bias

in the result and the median of the results was considered the result.

To validate the results the number of hidden neurons were varied as 10,15,20, and the best nueral

network was chosen. The network was formed by descent algorithm fitting, subject to iteration

and minimum difference constraints. The neural network hence formed is a good network and

hence the test results are very good

Page 15: Final Report Indian Academy of Sciences Summer Fellowship

15

3. Results :

As discussed in previous sections, pattern classification was carried out by two models, support

Vector Machine and Hidden Neural Network

a. Support Vector Machine:

The image data was preprocessed and reduced in dimensionality as described in previous

sections to reduce the dimensionality to 400. The training was done in two was a twofold and a

n fold approach

1. Twofold approach:

Here the image data was randomly divided into two data sets and support vector training was

done with one of the data and testing was done with remaining data set. Hence, 50% of data was

used for training and 50% of data was used for training.

2. n-fold approach:

Here the image data was randomly classified into n data sets, n=10. n-1 data sets were used for

training and another data set was used for testing. This process was repeated for various, n

combinations of testing and training. The best model out of the n models was chosen as the

support vector machine

Method Accuracy Comments

2-fold 75.46 Less accuracy as training sample

is small

n-fold 79.18 Greater accuracy due to cyclic

model selection and higher

training size

Hence, The support vector machine model was able to classify the images as control, MCI and

AD, with a accuracy of 79.18, which is a good accuracy for a three class problem.

Page 16: Final Report Indian Academy of Sciences Summer Fellowship

16

b. Hidden Neural Network:

Hidden Neural network was trained with :

a. 60% sample

b. 70% sample

The number of neurons was taken as:

a. 10

b.15

c.20

The training and testing was done for these combinations, by keeping validation sample at 15%

and test sample at 25% and 15% respectively, the best results were obtained for 70% training

with 15 hidden neurons. In order to confirm the results the training was redone with different

random data selection to ensure robustness of the results, and the results obtained were 93.76%,

87.98, 85.5% for three random training and testing. Hence, the model with lower accuracy was

chosen to eliminate the bias in the data. This is because, by choosing good data points for

training a good accuracy can to be achieved. However, a more important thing is to get a good

accuracy with any data point or image. Hence, a model with lowest accuracy was chosen.

Graph showing error reduction with increased epochs, for training, Validation and Testing,

and the best one is chosen

Page 17: Final Report Indian Academy of Sciences Summer Fellowship

17

Graph of gradient and validation check at last epoch

A complete confusion matrix, for the model,

class 1- Control, class 2- MCI, class 3- AD patient

Page 18: Final Report Indian Academy of Sciences Summer Fellowship

18

4.Conclusion:

The method discussed above gives very good accuracy of 85.5% , This means we can

effectively, diagnose AD and also track the progression as it classifies subjects in category of

MCI. Identifying MCI is very important in AD as most of the MCI, about 80% get converted to

AD in the period of 5 years, Hence precautionary steps can be taken for such patients[16].

This method can be further improved if a region of interest based approach like only

hippocampal area is taken. However, for such an approach a high quality MRI image is

necessary. In this, I used the images of resolution of 176x176, in which hippocampus covers a

mere area of 10x10, Hence results will not be good as segmentation is not possible for low

resolution images as a clear boundary cannot be drawn. If higher resolution images are available,

this method could be tried[17].

Page 19: Final Report Indian Academy of Sciences Summer Fellowship

19

5. References:

[1]Mayeux R. Early Alzheimer's disease. N Engl J Med. 2010 Jun 10;362(4):2194-2201

[2] P. Scheltens, “Early diagnosis of dementia: neuro-imaging,” J Neuro, vol. 246, pp. 16-20,

1999.

[3] Evanthia E. et al., " A supervised method to assist the diagnosis of Alzheimer’s Diseasebased

on functional Magnetic Resonance Imaging" Proceedings of the 29th Annual International

Conference of the IEEE EMBS Cité Internationale, Lyon, France pp.3426-3428 August 23-26,

2007.

[4] http://en.wikipedia.org/wiki/Histogram_equalization, accessed on 25th of July, 2012

[5] Stephen M. Pizer, E. Philip Amburn, John D. Austin, Robert Cromartie, Ari Geselowitz,

Trey,Greer, Bart ter Haar Romeny, John B. Zimmerman, Karel Zuiderveld, Adaptive histogram

equalization and its variations, Computer Vision, Graphics, and Image Processing, Volume 39,

Issue 3, September 1987, Pages 355-368

[6] A. Khireddine, K. Benmahammed, W. Puech, Digital image restoration by Wiener filter in

2D case, Advances in Engineering Software, Volume 38, Issue 7, July 2007, Pages 513-516,

ISSN 0965-9978

[7] Dao-Qiang Zhang, Song-Can Chen, A novel kernelized fuzzy C-means algorithm with

application in medical image segmentation, Artificial Intelligence in Medicine, Volume 32, Issue

1, September 2004, Pages 37-50, ISSN 0933-3657

[8] Keh-Shih Chuang, Hong-Long Tzeng, Sharon Chen, Jay Wu, Tzong-Jer Chen, Fuzzy c-

means clustering with spatial information for image segmentation, Computerized Medical

Imaging and Graphics, Volume 30, Issue 1, January 2006, Pages 9-15, ISSN 0895-6111

[9] Vapnik ,V.N., 2000. The Nature of Statistical Learning Theory, Springer, NY

[10] Nello Cristianini, John Shawe-Taylor.,2000. An Introduction to Support Vector Machines

and other kernel-based learning methods, Cambridge University Press

[11] Lagrange Multipliers, Com S 477/577 ,Nov 18, 2008

Page 20: Final Report Indian Academy of Sciences Summer Fellowship

20

[12] Christopher J.C. Burges ,A Tutorial on Support Vector Machines for Pattern Recognition,

Kluwer Academic Publishers, Boston

[13] http://nlp.stanford.edu/IR-book/html/htmledition/nonlinear-svms-1.html, accessed on may

23, 2012 at 3:34pm IST.

[14] A. Blum and P. Langley. Selection of relevant features and examples in machine learning.

Artificial Intelligence, 97(1-2):245–271, December 1997.

[15] Terence D. Sanger, Optimal unsupervised learning in a single-layer linear feedforward

neural network, Neural Networks, Volume 2, Issue 6, 1989, Pages 459-473, ISSN 0893-6080

[16] Pravat K. Mandal ,Magnetic resonance spectroscopy (MRS) and its application in

Alzheimer's disease, Concepts in Magnetic Resonance Part, Volume 30A, Issue 1, pages 40–

64, January 200