CS229, FALL 2016 1 Diabetic Retinopathy Identification and...

5
CS229, FALL 2016 1 Diabetic Retinopathy Identification and Severity Classification Sagar Honnungar, Sanyam Mehra, Samuel Joseph {sagarh, sanyam, josamuel}@stanford.edu Abstract—Manual examination of retina images for the diagnosis of diabetic retinopathy is a time consuming and error prone process, requiring identification of incon- spicuous anomalies like micro-aneurysms and exudates. In this work, we explore machine learning techniques for automatic identification and severity classification of diabetic retinopathy from retina images. The presented approach involves image pre-processing, feature extraction using the bag of visual words model, and a multi-class classifier to classify the image into different DR stages. We have considered SURF, LBP and HoG features for constructing the bag of visual words. For the multi-class classification, we have implemented multinomial logistic regression, SVM and random forests. I. I NTRODUCTION D IABETIC Retinopathy (DR) is one of the most frequent causes of visual impairment in developed countries and is the leading cause of new cases of blindness in the working age population. Altogether, nearly 75 people go blind every day as a consequence of DR [6]. Effective treatments for DR require early diagnosis and continuous monitoring of diabetic patients, but this is a challenging task as the disease shows few symptoms until it is too late to provide treatment. Currently, diagnosis of DR is performed by man- ual evaluation of retinal images by expert clinicians who identify presence of lesions in the eye such as micro-aneurysms (red lesions), hemorrhages and exudates (bright lesions). This turns out to be a slow and demanding process. Further, the expertise and equipment required for such evaluation may be lacking in many areas with a large DR affected population. Because of the aforementioned reasons, it is evi- dent that an automated system for DR detection of color fundus images could have a huge impact in making timely treatment accessible to more patients. Towards this end, we have explored machine learn- ing techniques to automatically detect the severity Fig. 1. Example images of different stages of DR: (a) No DR (b) Mild DR (c) Severe DR. These images also show the variance in image characteristics in the dataset—the images have different dimensions, positioning of retina and color profile of DR using information from retina images in this work. Our approach is to use a bag of words model with local feature descriptors such as SURF, LBP and HoG to extract important features from the image, which are fed into a multi-class classifier that predicts the severity of DR in the eye as Class 0 (No DR), Class 1 (Mild DR) and Class 2 (Severe DR). Fig. 1 shows example images of each class. Note that only a highly trained and experienced doctor can classify these images accurately – even the authors could not classify most images accu- rately with their eyes. Hence this is a highly chal- lenging image classification problem, when com- pared to other image classification problems like classifying animals or identifying sign language. II. RELATED WORK Kaggle conducted a competition on the same problem in 2015 [2]. All the top-performing Kaggle teams used sophisticated neural network models that require many days of training on high-end GPUs. [5], [3], [4]. Instead of trying to replicate those methods, we focused on trying to come up with simpler innovative method that can give comparable results. We focus on pre-processing the images and feature engineering using traditional image process- ing techniques and utilise classifiers to solve the multi-class classification problem.

Transcript of CS229, FALL 2016 1 Diabetic Retinopathy Identification and...

Page 1: CS229, FALL 2016 1 Diabetic Retinopathy Identification and …cs229.stanford.edu/proj2016/report/HonnungarMehraJoseph-DRISC-r… · making timely treatment accessible to more patients.

CS229, FALL 2016 1

Diabetic Retinopathy Identification and SeverityClassification

Sagar Honnungar, Sanyam Mehra, Samuel Joseph{sagarh, sanyam, josamuel}@stanford.edu

Abstract—Manual examination of retina images for thediagnosis of diabetic retinopathy is a time consumingand error prone process, requiring identification of incon-spicuous anomalies like micro-aneurysms and exudates.In this work, we explore machine learning techniquesfor automatic identification and severity classification ofdiabetic retinopathy from retina images. The presentedapproach involves image pre-processing, feature extractionusing the bag of visual words model, and a multi-classclassifier to classify the image into different DR stages.We have considered SURF, LBP and HoG features forconstructing the bag of visual words. For the multi-classclassification, we have implemented multinomial logisticregression, SVM and random forests.

I. INTRODUCTION

D IABETIC Retinopathy (DR) is one of themost frequent causes of visual impairment in

developed countries and is the leading cause of newcases of blindness in the working age population.Altogether, nearly 75 people go blind every day as aconsequence of DR [6]. Effective treatments for DRrequire early diagnosis and continuous monitoringof diabetic patients, but this is a challenging task asthe disease shows few symptoms until it is too lateto provide treatment.

Currently, diagnosis of DR is performed by man-ual evaluation of retinal images by expert clinicianswho identify presence of lesions in the eye suchas micro-aneurysms (red lesions), hemorrhages andexudates (bright lesions). This turns out to be aslow and demanding process. Further, the expertiseand equipment required for such evaluation maybe lacking in many areas with a large DR affectedpopulation.

Because of the aforementioned reasons, it is evi-dent that an automated system for DR detection ofcolor fundus images could have a huge impact inmaking timely treatment accessible to more patients.Towards this end, we have explored machine learn-ing techniques to automatically detect the severity

Fig. 1. Example images of different stages of DR: (a) No DR(b) Mild DR (c) Severe DR. These images also show the variancein image characteristics in the dataset—the images have differentdimensions, positioning of retina and color profile

of DR using information from retina images in thiswork. Our approach is to use a bag of words modelwith local feature descriptors such as SURF, LBPand HoG to extract important features from theimage, which are fed into a multi-class classifierthat predicts the severity of DR in the eye as Class0 (No DR), Class 1 (Mild DR) and Class 2 (SevereDR). Fig. 1 shows example images of each class.

Note that only a highly trained and experienceddoctor can classify these images accurately – eventhe authors could not classify most images accu-rately with their eyes. Hence this is a highly chal-lenging image classification problem, when com-pared to other image classification problems likeclassifying animals or identifying sign language.

II. RELATED WORK

Kaggle conducted a competition on the sameproblem in 2015 [2]. All the top-performing Kaggleteams used sophisticated neural network models thatrequire many days of training on high-end GPUs.[5], [3], [4]. Instead of trying to replicate thosemethods, we focused on trying to come up withsimpler innovative method that can give comparableresults. We focus on pre-processing the images andfeature engineering using traditional image process-ing techniques and utilise classifiers to solve themulti-class classification problem.

Page 2: CS229, FALL 2016 1 Diabetic Retinopathy Identification and …cs229.stanford.edu/proj2016/report/HonnungarMehraJoseph-DRISC-r… · making timely treatment accessible to more patients.

CS229, FALL 2016 2

Fig. 2. Entire methodology pipeline.

Other than the well-known Kaggle competition,there has been other work done on DR detec-tion/classification [8], [9]. However, most of theseapproaches have utilised smaller datasets (for ex-ample, [9] uses 94 images), or other metadata (likepatient history).

III. DATASET

We use the dataset provided by Kaggle [2]. Itcontains images belonging to 5 different classes –normal (0), mild (1), moderate (2), severe (3) andproliferative (4). For the purpose of evaluating ouralgorithm, we cluster these into three categories:

• Class 0 – This category corresponds to healthyeyes (0).

• Class 1 – This category corresponds to moder-ate retinopathy (1 and 2). In other words, thesepatients do not have a very bad prognosis.

• Class 2 – This category corresponds to patientswho have severe retinopathy (3 and 4) andrequire immediate medical attention.

IV. METHODOLOGY

The proposed method uses the bag of wordsapproach to classify the images into different stagesof DR. The pipeline involves image pre-processing,feature extraction, codebook generation and classi-fication. Each of these steps is described in detailbelow.

Fig. 3. CLAHE applied to a Category 3 image. It can be seen thatthe contrast in the image has increased which makes the exudatesmore visible.

A. Image Pre-processing

The dataset has images taken using different typesof cameras, which results in differences in visual ap-pearance and other image parameters. According tothe source, there is also noise in both the images andlabels. Some of the images may contain artifacts, beout of focus, underexposed, or overexposed.

Because of these factors, it is essential that wepre-process and standardize the dataset to a certainextent before extracting any information from theimages for classification.

Firstly, we crop out the black background in eachof the images and ensure that the eye is centeredand occupies the maximum area in the image. Aftercropping, we also resize the images to a uniformsize of 448× 448 pixels.

Secondly, we observed that the contrast tends to

Page 3: CS229, FALL 2016 1 Diabetic Retinopathy Identification and …cs229.stanford.edu/proj2016/report/HonnungarMehraJoseph-DRISC-r… · making timely treatment accessible to more patients.

CS229, FALL 2016 3

Fig. 4. Feature extraction based on keypoints applied on a pre-processed image. Note that the exudates are being captured askeypoints.

diminish toward the edge of the images. Contrastenhancement is therefore vital to clearly distinguishand identify important features indicative of DR. Forthis, we use Contrast-Limited Adaptive HistogramEqualization (CLAHE) [1]. CLAHE operates onsmall regions of the image and improves their con-trast by transforming the intensity through localisedhistogram equalization. After performing histogramequalization in small regions, the neighboring re-gions are combined using bilinear interpolation.Figure 3 shows application of CLAHE to one ofthe images in the dataset.

Thirdly, we extract only the green channel fromthe image for further processing as it has beenobserved that the lesions are most observable showthe largest contrast in the green channel and henceare most easily identifiable in this channel.

Finally, we subtract out the background imageestimated by a median filter. This step ensuresthat common prominent features present in all eyeimages that are not indicative of DR, are subduedand only the relevant differentiating features areaccentuated. [10]

B. Feature Extraction

Accurately capturing local information about thelesions present in images with DR such as micro-aneurysms, hemorrhages and hard exudates requirescareful selection of features. We have used threedifferent feature descriptors which capture the localinformation in different ways, and assessed theirrelative performance in our experiments.

1) SURF features: Speeded Up Robust Features(SURF) is a popular algorithm developed for de-tecting and describing local features of images.It captures keypoints in the image and provides

a “feature description” of the image using localfeatures of these keypoints, also known as keypointdescriptors. The algorithm selects keypoints usingthe Hessian blob detector and the feature descriptorsare obtained using the sum of the Haar waveletresponse around the interest point. For each interestpoint, SURF descriptor vector of length 128 isobtained. So the dimension of SURF descriptors perimage is: number of interest points × 128.

2) HoG features: Histogram of Oriented Gra-dients (HoG) is another popular feature descriptorfor object detection. It counts the number of occur-rences of different gradient orientations in localizedparts of the image. The image is divided into blocksof size 32 × 32 and a histogram of size 31 iscomputed for each block. Hence the dimensionof HoG descriptors for each image is: number ofblocks × 31

3) LBP features: Local Binary Patterns (LBP)have been found to be powerful as local descriptorsfor texture classification. An LBP descriptor is astring of bits, one for each neighborhood pixel,where each bit is 1 or 0 depending on whetherthe corresponding neighborhood pixel has greaterintensity than the central pixel. LBP features arecaptured from local image patches of size 32× 32,similar to HoG, but the feature length for each patchis 58. Hence the feature matrix for each image is ofsize: number of patches × 58.

C. Visual Codebook GenerationAfter capturing the features from each image, a

dictionary is constructed using K-means clusteringfrom the pool of all image features. This generatesa visual codebook containing words which are thecluster centroids resulting from K-means clustering.Then, each image feature is quantized to the nearestword in the codebook. This results in a histogramfor each image which counts the number of featureswhich are closest to each visual word in the code-book. The resultant image histogram is the featurevector which we use as input for our classifiers.

D. ClassificationOnce the distinctive features (image histograms

generated from the visual codebook) have beenextracted from the fundus images, the stage is set totrain a classifier that can accurately identify differentclasses of DR severity using these features. For

Page 4: CS229, FALL 2016 1 Diabetic Retinopathy Identification and …cs229.stanford.edu/proj2016/report/HonnungarMehraJoseph-DRISC-r… · making timely treatment accessible to more patients.

CS229, FALL 2016 4

TABLE ITRAIN AND TEST ACCURACIES WITH DENSE SURF AND

VOCABULARY SIZE 100

Model Train Accuracy Test AccuracyLogistic Regression 0.85 0.68

Random Forest 0.90 0.73SVM 0.83 0.72

TABLE IICONFUSION MATRIX WITH DENSE SURF, SVM AND

VOCABULARY SIZE 100

``````````KnownPredicted Class 0 Class 1 Class 2

Class 0 0.62 0.26 0.12Class 1 0.38 0.48 0.14Class 2 0 0.03 0.97

this multi-class classification problem, we have usedthree different classifiers:

1) Multinomial Logistic Regression: The firstmodel which we tried as a baseline was logisticregression. Since we have more than two classes, weconsider the natural extension of logistic regressionwhich is multinomial logistic regression, that usesthe softmax function for predicting the probabilitiesof each class. It was implemented using mnrfit() andmnrval() functions in MATLAB.

2) Support Vector Machines: The optimal mar-gin classifier with L1 regularization is used:

min1

2||w||2 + C

∑i

ζi (1)

s.t. yi(wTφ(xi) + b) ≥ 1− ζi (2)

The value of C is tuned using cross-validation.We have implemented SVM with a linear kernel:

K(x, z) = xT z (3)

3) Random Forests: This is an ensemble learningmethod which uses decision trees. Random Forestsbuild a number of decision trees from new datasetsthat were sampled with replacement from the orig-inal one (bootstrapping) and predict a class bytaking the trees majority vote. They also restrict thefeatures considered in every split to a random subsetof a certain size.

V. RESULTS AND DISCUSSION

We have 400 images of each category (class 0,class 1 and class 2). We used 40% of the images

Fig. 5. Performance of SURF, LBP and HOG with SVM for varyingvocabulary size (number of clusters).

as our training set for generating the bag of visualwords. Then we used hold-out cross-validation totune the number of clusters in the K-means cluster-ing algorithm, which corresponds to the vocabularysize of our codebook. We found that K = 100is the optimal number of clusters based on thisanalysis. Further, we also found that all three featuredescriptors (SURF, LBP and HoG) gave similaraccuracy results. Fig. 5 shows the results of thisexperiment. Hence, for the performance analysisof different classifiers, we have considered SURFfeatures with a vocabulary size of 100 codewords.

Table I shows the accuracy of different classifierson the training and test set. For SVM, the optimalvalue for C was found using cross-validation to be8. Similarly, the hyperparameters for the RandomForest model were tuned as follows based on vali-dation errors: number of trees = 10 and minimumleaf size = 8.

Table II shows the confusion matrix for SVMusing a vocabulary size of 100 and SURF features.It is observed that class 2 is classified correctlywith very high accuracy (97%) whereas class 1 ismisclassified as class 0 quite often. This suggeststhat we can ascertain cases of severe DR withoutmanual intervention successfully using the proposedmethod. This could therefore substantially reducethe number of images requiring manual screening.Misclassifying class 0 as class 1 or 2 is incorrect,but not harmfully consequential for the patient. Onthe other hand, misclassifying class 2 as class 1or class 0 is extremely unacceptable. The results

Page 5: CS229, FALL 2016 1 Diabetic Retinopathy Identification and …cs229.stanford.edu/proj2016/report/HonnungarMehraJoseph-DRISC-r… · making timely treatment accessible to more patients.

CS229, FALL 2016 5

obtained are in line with this understanding, as isalso observable from the confusion matrix.

VI. CONCLUSION AND FUTURE WORK

In this project, we developed a classificationmethod using the bag of words model for automati-cally diagnosing the severity of diabetic retinopathy.We were able to achieve high accuracy in detectingcases of severe DR and believe that this methodcould be very useful as an initial screening stepto augment and speed up the manual process ofdetection and also increase confidence and reducemisclassification due to human error.

In addition, we were specifically looking at meth-ods other than deep neural networks to satisfy ourcuriosity and for learning purposes. For image clas-sification problems like this, convolutional neuralnetworks (CNNs) have been shown to perform verywell. To extend the project without using CNNs,other image processing techniques like Moat Oper-ator [7] and recursive region growing segmentation(RRGS) algorithm can be used for better detectionof exudates and other lesions. To improve classifi-cation accuracy without utilizing neural networks,we could combine different classifiers through anensemble approach.

VII. ACKNOWLEDGEMENTS

The authors would like to thank MikeChrzanowski (Baidu) and Darvin Yi (Stanford)for introducing us to the nuances of the problemand for helping us in making a great start. Theauthors also thank the project TA Bo Wang for hisfeedback and suggestions throughout the course ofthe project. Lastly we thank Prof. Andrew Ng andProf. John Duchi for their guidance and support.

REFERENCES

[1] Adaptive histogram equalization - wikipedia. https://en.wikipedia.org/wiki/Adaptive histogram equalization. (Ac-cessed on 12/16/2016).

[2] Diabetic retinopathy detection — kaggle. https://www.kaggle.com/c/diabetic-retinopathy-detection. (Accessed on12/16/2016).

[3] Diabetic retinopathy winners interview: 4th place, julian &daniel — no free hunch. http://blog.kaggle.com/2015/08/14/diabetic-retinopathy-winners-interview-4th-place-julian-daniel/.(Accessed on 12/16/2016).

[4] Diagnosing diabetic retinopathy with deeplearning — deepsense.io. https://deepsense.io/diagnosing-diabetic-retinopathy-with-deep-learning/.(Accessed on 12/16/2016).

[5] Machine learning for diabetic retinopathy detection —alexander rakhlin — linkedin. https://www.linkedin.com/pulse/machine-learning-diabetic-retinopathy-detection-alexander-rakhlin.(Accessed on 12/16/2016).

[6] Balint Antal and Andras Hajdu. An ensemble-based system forautomatic screening of diabetic retinopathy. Knowledge-BasedSystems, 60:20–27, 2014.

[7] Jyothis Jose and Jinsa Kuruvilla. Detection of red lesions andhard exudates in color fundus images. International Journal OfEngineering And Computer Science, 3:8583–8588.

[8] Kwang Baek Kim. Extraction of canine cataract object fordeveloping handy pre-diagnostic tool with fuzzy stretchingand art2 learning. International Journal of Fuzzy Logic andIntelligent Systems, 16(1):21–26, 2016.

[9] R Priya and P Aruna. Diagnosis of diabetic retinopathy usingmachine learning techniques. Journal on Soft Computing,3(4):563–575.

[10] Ibrahim Sadek, Desire Sidibe, and F Meriaudeau. Automaticdiscrimination of color retinal images using the bag of wordsapproach. In SPIE Medical Imaging, pages 94141J–94141J.International Society for Optics and Photonics, 2015.