PRELIMINARY STUDY OF DIABETIC RETINOPATHY …eprints.utar.edu.my/3952/1/16ACB04891_FYP.pdf2.2.1...
Transcript of PRELIMINARY STUDY OF DIABETIC RETINOPATHY …eprints.utar.edu.my/3952/1/16ACB04891_FYP.pdf2.2.1...
PRELIMINARY STUDY OF DIABETIC RETINOPATHY CLASSIFICATION
FROM FUNDUS IMAGES USING DEEP LEARNING MODEL
BY
HOE YEAN SAM
A REPORT
SUBMITTED TO
Universiti Tunku Abdul Rahman
in partial fulfillment of the requirements
for the degree of
BACHELOR OF COMPUTER SCIENCE (HONS)
Faculty of Information and Communication Technology (Kampar Campus)
MAY 2020
REPORT STATUS DECLARATION FORM
BCS (Hons) Computer Science i
Faculty of Information and Communication Technology (Kampar Campus), UTAR
UNIVERSITI TUNKU ABDUL RAHMAN
REPORT STATUS DECLARATION FORM
Title: __________________________________________________________
__________________________________________________________
__________________________________________________________
Academic Session: _____________
I __________________________________________________________
(CAPITAL LETTER)
declare that I allow this Final Year Project Report to be kept in
Universiti Tunku Abdul Rahman Library subject to the regulations as follows:
1. The dissertation is a property of the Library.
2. The Library is allowed to make copies of this dissertation for academic purposes.
Verified by,
_________________________ _________________________
(Author’s signature) (Supervisor’s signature)
Address:
__________________________
__________________________ _________________________
__________________________ Supervisor’s name
Date: _____________________ Date: ____________________
Preliminary Study of Diabetic Retinopathy Classification from Fundus
Images using Deep Learning Model
May 2020
HOE YEAN SAM
15, Persiaran Gemilang 1,
Taman Gemilang, 35500
Bidor, Perak.
5/9/2020 5 September 2020
Sayed Ahmad Zikri Bin Sayed Aluwee
PRELIMINARY STUDY OF DIABETIC RETINOPATHY CLASSIFICATION
FROM FUNDUS IMAGES USING DEEP LEARNING MODEL
By
Hoe Yean Sam
A REPORT
SUBMITTED TO
Universiti Tunku Abdul Rahman
in partial fulfillment of the requirements
for the degree of
BACHELOR OF COMPUTER SCIENCE (HONS)
Faculty of Information and Communication Technology (Kampar Campus)
MAY 2020
DECLARATION OF ORIGINALITY
BCS (Hons) Computer Science iii
Faculty of Information and Communication Technology (Kampar Campus), UTAR
DECLARATION OF ORIGINALITY
I declare that this report entitled “PRELIMINARY STUDY OF DIABETIC
RETINOPATHY CLASSIFICATION FROM FUNDUS IMAGES USING DEEP
LEARNING MODEL” is my own work except as cited in the references. The report
has not been accepted for any degree and is not being submitted concurrently in
candidature for any degree or other award.
Signature : _________________________
Name : _________________________
Date : _________________________
DECLARATION OF ORIGINALITY
Hoe Yean Sam
5/9/2020
ACKNOWLEDGEMENTS
BCS (Hons) Computer Science iv
Faculty of Information and Communication Technology (Kampar Campus), UTAR
ACKNOWLEDGEMENTS
I would like to express my sincere thanks and appreciation to my supervisor, Dr. Sayed
Ahmad Zikri Bin Sayed Aluwee who has given me this bright opportunity to engage in
a deep learning project. It is my first step to establish a career in deep learning field. A
million thanks to you.
To a very special person in my life, Tang Mee Thye, for her patience, unconditional
support and love, and for standing by my side during hard times. Finally, I must say
thanks to my parents and my family for their love, support and continuous
encouragement throughout the course.
ACKNOWLEDGEMENTS
ABSTRACT
BCS (Hons) Computer Science v
Faculty of Information and Communication Technology (Kampar Campus), UTAR
ABSTRACT
Over the years, the cases of diabetes in Malaysia was increasing drastically. As a result,
diabetic retinopathy had emerged among the diabetic patients. Diabetic retinopathy was
a chronic eye disease that caused by diabetes, which would affect the eyesight and even
blindness. Despite the fact that the disease was becoming more common, doctors were
still conduct disease screening manually, which meant there was a risk of patients
diagnosed incorrectly. The doctors were still using the traditional method on the
diagnosis was because the lack of prediction data on diabetic retinopathy progression
locally. Eventually, researches on the diagnosis were difficult to be conducted.
Therefore, the preliminary study of the severity levels classification of diabetic
retinopathy from fundus images using deep learning model was introduced in this
project. Deep learning was a technique that could learn from the train fundus image
dataset and conduct prediction on the similar test dataset automatically. The model
architecture that used to train the dataset was DenseNet, which was a Convolutional
Neural Network (CNN) based architecture. In the development of this project, various
image pre-processing methods were done to enhance the image for training. Besides,
data validation and image transforming techniques including data augmentation and
test-time augmentation (TTA) were also used to evaluate training results and reduce
overfitting respectively.
The project involved the prediction testing on each image as well as the effects of data
augmentation and TTA by observing the quadratic weighted kappa values. At the end
of the project, a prediction model that able to predict and classify the severity labels of
fundus images was built using deep learning model. The prediction model had achieved
the quadratic weighted kappa score of 0.9308, whereas the overall accuracies attained
were higher than 74% (estimated) without TTA on APTOS test dataset and 65% on
Messidor-2 dataset, which were moderately accurate.
ABSTRACT
TABLE OF CONTENTS
BCS (Hons) Computer Science vi
Faculty of Information and Communication Technology (Kampar Campus), UTAR
TABLE OF CONTENTS
REPORT STATUS DECLARATION FORM i
TITLE PAGE ii
DECLARATION OF ORIGINALITY iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
TABLE OF CONTENTS vi
LIST OF FIGURES ix
LIST OF TABLES xii
LIST OF ABBREVIATIONS xiii
CHAPTER 1: INTRODUCTION 1
1.1 Problem statement 1
1.2 Background 1
1.3 Motivation 2
1.4 Objectives 3
1.5 Proposed approach 4
1.6 Report organisation 6
CHAPTER 2: LITERATURE REVIEW 7
2.1 Previous works on Deep Learning 7
2.1.1 U-Net: Convolutional Networks for Biomedical Image Segmentation 7
2.1.2 Development and Validation of a Deep Learning System for Diabetic
Retinopathy and Related Eye Diseases Using Retinal Images from Multiethnic
Populations With Diabetes 9
2.1.3 Automated Detection of Diabetic Retinopathy using Deep Learning 11
2.2 Previous works on image pre-processing 13
TABLE OF CONTENTS
BCS (Hons) Computer Science vii
Faculty of Information and Communication Technology (Kampar Campus), UTAR
2.2.1 Color Retinal Image Enhancement Based on Luminosity and Contrast
Adjustment with Image Fusion Technique 13
2.2.2 A Retinal Image Enhancement Technique for Blood Vessel
Segmentation Algorithm 15
CHAPTER 3: SYSTEM DESIGN 17
3.1 Project pre-development 18
3.2 Data pre-processing 26
3.3 Model training architecture building and Data training 36
3.4 Prediction on test dataset 42
CHAPTER 4: EXPERIMENTS AND RESULTS 44
4.1 Methodology 44
4.2 Tools and Requirements 45
4.3 Analysis 46
4.3.1 Model training 46
4.3.2 Post-training evaluation 51
4.3.3 Prediction testing 53
4.3.4 Verification of prediction results 57
CHAPTER 5: CONCLUSION 62
5.1 Project review 62
5.2 Problems encountered 63
5.3 Future work 65
5.4 Conclusion 66
BIBLIOGRAPHY 67
TABLE OF CONTENTS
BCS (Hons) Computer Science viii
Faculty of Information and Communication Technology (Kampar Campus), UTAR
APPENDIX A: POSTER A-1
APPENDIX B: PLAGIARISM CHECK RESULT B-1
APPENDIX C: FYP 2 CHECKLIST C-1
LIST OF FIGURES
BCS (Hons) Computer Science ix
Faculty of Information and Communication Technology (Kampar Campus), UTAR
LIST OF FIGURES
Figure Number Title Page
Figure 1-1 General development flow of the project. 4
Figure 2-1 U-net architecture. 7
Figure 2-2 Image before enhancement. 14
Figure 2-3 Image after enhancement. 14
Figure 2-4 Original image before SUACE. 15
Figure 2-5 Output image after SUACE. 15
Figure 3-1 Phases of project development. 17
Figure 3-2 The process flow of project pre-development phase. 18
Figure 3-3 Retina with no diabetic retinopathy. 20
Figure 3-4 Retina with mild diabetic retinopathy. 20
Figure 3-5 Retina with moderate diabetic retinopathy. 20
Figure 3-6 Retina with severe diabetic retinopathy. 20
Figure 3-7 Retina with proliferative diabetic retinopathy. 20
Figure 3-8 Sample image of APTOS. 21
Figure 3-9 Sample image of Messidor-2. 21
Figure 3-10 Records in train CSV file of APTOS. 22
Figure 3-11 Records in test CSV file of Messidor-2. 22
Figure 3-12 Bar chart of number of images to label distribution of
APTOS train dataset.
23
Figure 3-13 Test CSV after modification. 24
Figure 3-14 The process flow of data pre-processing phase. 26
Figure 3-15 The content of the CSV generated by find dark images
function.
27
Figure 3-16 Raw image before cropping. 27
Figure 3-17 Image after cropping. 28
Figure 3-18 Image after resized to 512x512. 28
Figure 3-19 Sample image after Gaussian Blur with SigmaX = 10. 29
Figure 3-20 Sample image after Gaussian Blur with SigmaX = 30. 30
Figure 3-21 Sample image in green channel. 31
LIST OF FIGURES
BCS (Hons) Computer Science x
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 3-22 Sample image applied with green channel extraction
and Gaussian Blur with sigmaX = 10.
31
Figure 3-23 Sample image applied with Gaussian Blur with sigmaX
= 10, followed by green channel extraction.
32
Figure 3-24 The Java-based code that contained the actions to be
done by ImageJ.
33
Figure 3-25 Sample of the output image by ImageJ. 33
Figure 3-26 Sample image of the output of SUACE. 34
Figure 3-27 224x224 sample image pre-processed with Gaussian
Blur with sigmaX = 10 that applied with Lanczos.
35
Figure 3-28 The process flow of model training architecture
building phase.
36
Figure 3-29 Summary of the model. 38
Figure 3-30 Images after data augmentation with number 5
combination.
41
Figure 3-31 The process flow of prediction on test dataset phase 42
Figure 4-1 General structure of DenseNet. 44
Figure 4-2 The information generated during training process. 46
Figure 4-3 The result of the dataset when the highest Kappa value
was attained during training.
47
Figure 4-4 Training details when the highest Kappa value was
attained, with the implementation of number 5 data
augmentation.
50
Figure 4-5 Line graph of mean square error (MSE) of loss versus
epoch.
51
Figure 4-6 Line graph of accuracy versus epoch in terms of
accuracy and validation accuracy.
51
Figure 4-7 Line graph of accuracy versus epoch in terms of kappa
score.
52
Figure 4-8 The progression of each severity level. 53
Figure 4-9 The format of prediction results in CSV. 53
Figure 4-10 Bar graph of the number of images versus the severity
labels of prediction without TTA.
54
LIST OF FIGURES
BCS (Hons) Computer Science xi
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 4-11 Bar graph of the number of images versus the severity
labels of prediction with TTA.
55
Figure 4-12 Test images from each predicted label from prediction
without TTA.
56
Figure 4-13 Test images from each predicted label from prediction
with TTA.
56
Figure 4-14 The format of prediction results in CSV (Messidor-2). 57
Figure 4-15 Bar graph of the number of images versus the severity
labels from the predicted results of Messidor-2.
58
Figure 4-16 Bar graph of the number of images versus the severity
labels from the actual results of Messidor-2.
59
Figure 4-17 Confusion matrix of actual and predicted results. 60
Figure 4-18 Information of the accuracy of the model on Messidor-
2 test dataset.
61
Figure 4-19 Images from Messidor-2 test dataset on each predicted
severity label.
61
Figure 5-1 The inconsistency of ophthalmologists on the
judgements on severity labels of a batch of images.
64
LIST OF TABLES
BCS (Hons) Computer Science xii
Faculty of Information and Communication Technology (Kampar Campus), UTAR
LIST OF TABLES
Table Number Title Page
Table 3-1 The table of numbered scales to the severity levels of
diabetic retinopathy and corresponded images.
20
Table 3-2 The tested combinations of data augmentation. 40
Table 4-1
Table 4-2
Table 4-3
Highest quadratic weighted kappa values achieved by
each pre-processed dataset.
Highest quadratic weighted kappa values achieved by
each data augmentation combination.
Overall accuracy of each threshold range in prediction
with and without TTA.
47
49
57
LIST OF ABBREVIATIONS
BCS (Hons) Computer Science xiii
Faculty of Information and Communication Technology (Kampar Campus), UTAR
LIST OF ABBREVIATIONS
CNN Convolutional Neural Network
TTA Test-time Augmentation
A.I Artificial Intelligence
APTOS Asia Pacific Tele-Ophthalmology Society
CLAHE Contrast Limited Adaptive Histogram Equalisation
SUACE Speeded Up Adaptive Contrast Enhancement
EDA
MSE
Exploratory Data Analysis
Mean Square Error
CHAPTER 1: INTRODUCTION
BCS (Hons) Computer Science 1
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Chapter 1: Introduction
1.1 Problem statement
One of the problem statements of this project was a drastic increase in diabetes over
the years in Malaysia. According to the statistic Ministry of Health Malaysia, the
prevalence projection of diabetes was increasing drastically over the years with the
population of Malaysia (Bt. Ngah et al. 2017). On top of that, the current projection
already exceeded the reckoned projection. It was due to most of the Malaysians
received treatment solely at the primary level of healthcare, which led to ineffective
diabetes screening as they did not get the proper screening equipment to diagnose the
disease.
The next problem statement was lack of prediction data on diabetic retinopathy
progression in Malaysia. Although Malaysia ranked the top in diabetes rate in Asia
(Rakin 2018), it was difficult to find related researches or data on the prediction on
diabetic retinopathy. One of the reasons was the majority of diabetics were not
proactive in diabetic retinopathy screening since they did not become aware that
diabetes could cause this disease, making the prediction data extremely limited (Bt.
Ngah et al. 2017).
On the other hand, the next problem statement was that the manual prediction of
diabetic retinopathy was challenging. Diabetic retinopathy that found on each image
of the patient’s eyeball was incredibly tiny to detect and had the risk of diagnosing false
positive or negative screening results for some inexperience readers.
1.2 Background
Diabetic retinopathy was an eye disease that would affect the vision of eyes. It was
caused by diabetes, which was a condition whereby the blood glucose level exceeded
the normal level. The most natural thing was to apply for an eyes check-up as early as
possible to prevent this disease. Due to the rise of Artificial Intelligence (A.I), one of
the A.I technique called deep learning was involved in the prediction of diseases based
on medical images for more efficient use of resources as well as accurate results.
Therefore, understanding the diseases itself in the first place was required to have an
CHAPTER 1: INTRODUCTION
BCS (Hons) Computer Science 2
Faculty of Information and Communication Technology (Kampar Campus), UTAR
in-depth understanding of the relationship between these chronic diseases and how they
were being applied in deep learning.
Diabetic retinopathy was an eye disease, when a diabetic patient had an outrageously
high in the blood glucose level for an extended time, it caused damage to the blood
vessels of the retina and resulting in abnormal growth of blood vessels on the retina and
affects eyesight. The symptoms including poor night vision and blindness (Boyd 2019).
Diabetic retinopathy was a severe disease as it could result in malfunctioning of the
eyes. Therefore, the prediction of this disease by examining the retinal images was
carried out in a time interval fashion to identify the existence or severity of it, especially
for people with diabetes. In the past, detection of disease was solely conducted by
ophthalmologists, and they did it manually by examining the images merely by their
knowledge and judgements. It made the prediction process relatively inefficient since
the process was labour-intensive and might have made mistakes. The utilisation of deep
learning technique was one of the best solutions to automate the task while maintained
the precision and accuracy of the results.
Deep learning was a machine learning technique whereby the algorithms learned by
itself by studying the input dataset and build a prediction model due to deep learning
worked with a neural algorithm network, which mimicked the way of human think and
learned (Reyes 2020).
Deep learning was capable in speeding up the process of analysing and interpreting
large dataset (Brush et al. 2016). In terms of making a prediction based on images, deep
learning was capable of recognising a specific pattern and solved complex problems.
1.3 Motivation
The motivation behind this project was to develop a diabetic retinopathy prediction
tool that could assist doctors in referring to the detection results. It could increase
the efficiency of the diagnostic process drastically and enabled patients to receive early
detection on this disease.
Moreover, the project was motivated by the fact that Malaysia did not adopt the latest
technology like deep learning prediction in diabetic retinopathy screening. While
other countries like the United Kingdom and India were utilising deep learning
CHAPTER 1: INTRODUCTION
BCS (Hons) Computer Science 3
Faculty of Information and Communication Technology (Kampar Campus), UTAR
prediction technology in their healthcare, it was rare to see that healthcare of Malaysia
to acquire and made use of it to provide better diagnose and treatment to this disease.
Besides, the project was to automate the task of detection of diabetic retinopathy by
using deep learning technique, to reduce the labour of doctors while maintained the
certainty of results that were comparable to doctor’s judgement.
1.4 Objectives
The objective of this project was to analyse the suitable deep learning architecture
for the prediction of diabetic retinopathy. In deep learning, there were many
architectures, and each of them performed differently in solving problems. In the case
of the prediction of diabetic retinopathy, various type of deep learning techniques that
were suitable for image segmentation would be analysed to find the algorithm that
performed the best.
The next objective of this project was to conduct a prediction of diabetes retinopathy
using deep learning-based architecture. The project made used of the chosen deep
learning technique to train the dataset and construct a prediction model that contained
the information regarding the detection of diabetic retinopathy in the retinal images.
From the model, results would be validated and generated.
Furthermore, the project was to determine the severity levels of diabetic retinopathy
from the classification of the retinal images. By harnessing of the power of deep
learning, the classification of the severity levels of diabetic retinopathy would be
established, which categorised them into several levels including absence, moderate
and severe level that tell the risk of each patient. Information was relatively useful
because it could simplify the differentiation of each image in terms of severity as well
as showing the differences between each retinal image and the features that defined the
severities.
CHAPTER 1: INTRODUCTION
BCS (Hons) Computer Science 4
Faculty of Information and Communication Technology (Kampar Campus), UTAR
1.5 Proposed approach
Figure 1-1 General development flow of the project
CHAPTER 1: INTRODUCTION
BCS (Hons) Computer Science 5
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 1-1 illustrated the general development flow of this project. To kick start the
development of this project, datasets were sourced for model training. The datasets used
in this project were APTOS (Asia Pacific Tele-Ophthalmology Society) 2019 blindness
detection competition dataset for training and testing while Messidor-2 dataset for
further prediction testing and validation purpose since the testing labels for the APTOS
test dataset were unknown. Both APTOS and Messidor-2 contained fundus retinal
images that taken using fundus photography as well as CSV file that contained the
labels for each image with the range from 0 to 4 that represented the severity of the
diagnosis of the images, the higher the number the more serious the severity of the
disease. After that, the datasets were loaded into the temporary storage of Google Colab
from Google Drive.
Next, choosing model architecture for training was also a crucial step in the
development of this project. There were many types of model architecture for this
application. Hence, various types of deep learning network architectures were observed
and evaluated by referring to the quadratic weighted kappa value obtained that acted as
the evaluation parameter for prediction performance, the architecture that achieved
highest value would be chosen for this project, which was DenseNet.
In the core development of the project, the datasets were pre-processed with various
methods. Data pre-processing could help in enhancing the clarity and contrast of the
images. Besides, data augmentation and test-time augmentation (TTA) techniques were
introduced in this project to prevent overfitting of data training and improved the
prediction performance on test dataset respectively.
Then, followed by building model training architecture. This was the step whereby the
layers of the network, activation function, training batch size and epoch were defined
and ready to train the pre-processed dataset.
Furthermore, with the implementation of train/test split evaluation, the model was
evaluated by studying the train accuracy, train loss, validation accuracy, validation loss
and quadratic weighted kappa value generated by each epoch to observe the
performance during training and investigate the issue such as data overfitting. After that,
hyperparameters were tuned to increase the performance even further.
The entire process of data training was repeated until the highest quadratic weighted
kappa value was obtained. Then, the model was validated using the test dataset in order
CHAPTER 1: INTRODUCTION
BCS (Hons) Computer Science 6
Faculty of Information and Communication Technology (Kampar Campus), UTAR
to observe the performance of the model in terms of accuracy. All the implementations
in this project were done in Google Colab, which was a cloud-based Python notebook
with GPU acceleration enabled.
1.6 Report organisation
The chapter 1 of the project was the introduction. This section included the problem
statements, background and motivation, objectives as well as the proposed approach
for the development of this project.
Next, chapter 2 was the literature review of previous works regarding to the project.
This section included the previous works on deep learning and image pre-processing.
On top of that, the comparisons between previous works and the proposed study were
discussed as well.
Moreover, chapter 3 was the system design of the project. The system design was
categorised into different phases, which were project pre-development, data pre-
processing, model training architecture building and data training and prediction on test
dataset.
Furthermore, chapter 4 was the experiments and results achieved by the project. This
section had mentioned the methodology used and tools and requirements. Besides, the
analysis regarding to model training, post-training evaluation, prediction testing and
verification of prediction results were also discussed in details.
Lastly, chapter 5 was the conclusion of the project. This section included project review,
discussion of problems encountered, future work as well as the conclusion regarding to
the entire development of the project.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 7
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Chapter 2: Literature Review
2.1 Previous works on Deep Learning
2.1.1 U-Net: Convolutional Networks for Biomedical Image Segmentation
(Olaf et al. 2015) proposed a research paper that was inspired by deep convolutional
network technique to conduct segmentation on biomedical images. Yet, the
development of this kind of technique was restricted by the size of the networks as well
as the available training dataset. Convolutional networks were usually used in
classification of images as a single class label but each class label required to be
assigned to each pixel of the image, this process was called localisation. On the contrary,
there was another paper proposed a sliding-window setup to conduct classification but
the construction of the network was rather slow and low in localisation accuracy and
contexts that allowed the network to examine.
From all these examples, this paper built a “fully convolutional network” named U-net
architecture that worked with limited training dataset while increased the precision of
segmentations. Figure 2-1 was the U-net architecture in overall.
Figure 2-1 U-net architecture
The downside of this architecture was GPU-intensive, especially for large dataset.
Hence, the memory of GPU would be the determining factor of the resolution of the
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 8
Faculty of Information and Communication Technology (Kampar Campus), UTAR
outputs. U-net architecture was applicable in various types of biomedical segmentations,
which made it relatively popular in healthcare recently.
This paper proposed a U-net approach to solve images segmentation problems. The
main difference of this paper from current project was they trained the dataset by using
local GPU whereby the current project preferred cloud service that was GPU supported.
This method was extremely costly, although it could speed up the training process
drastically and made training done efficiently.
Since the current approach in this project was to utilise the GPU that was built into
Google Colab, which was only offered rather limited GPU power, the U-net was not
practical for cloud-based training as the limited resources would not be able to handle
such GPU-intensive task, which eventually would hit the bottleneck of the GPU easily.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 9
Faculty of Information and Communication Technology (Kampar Campus), UTAR
2.1.2 Development and Validation of a Deep Learning System for Diabetic
Retinopathy and Related Eye Diseases Using Retinal Images from Multiethnic
Populations With Diabetes
(Quelleca et al. 2017) referred and inspired by 2015 Kaggle Diabetic Retinopathy
competition, which was a deep learning competition that was to create a system that
could automate the detection of diabetic retinopathy with given retinal images. This
paper investigated the solutions that ranked top in the leader board and the solutions
were using an artificial neural network called ConvNets, which also known as
Convolutional Neural Network (CNN), to build the prediction model. However, this
paper mentioned that this technique was not trustable by professional physicians. Hence,
a new solution was proposed by creating a heatmap from the prediction referable
diabetic retinopathy (moderate or higher severity levels) and its features by pixels. The
paper proposed a modified version of ConvNet that integrated non-mydriatic
retinographs and algorithms to automate the diagnosis of this disease. It also mentioned
that although the solution was questioned on replacing manual detection of this disease
and only suitable for reference, it had the potential on discovering new findings from
the images.
In this paper, the prediction was solely on referable diabetic retinopathy images as it
was the most meaningful context as it determined the decision on whether the diabetics
should refer to an ophthalmologist. However, some of the patients that were neared to
moderate severity level might not got the opportunity to receive treatments and missed
out the golden timing on getting medications. Attentions should be given to all kind of
severity levels in order to provide a holistic severities reference and made early
diagnosis truly purposeful. On another hand, the solution presented in form of heat
map that ease in identifying the severity and it was able to be applied to majority of the
relevant problems as well, which could save a lot of time as compared to developing
the algorithm from ground up.
Overall, the proposed solution from the paper was based on CNN due to its promising
performance. Despite the fact that the solution did not required expert knowledge in the
field of diabetic retinopathy and only needs to refer the decisions based on evaluation
records, it still required experts in image segmentation of this disease to further
optimised and tuned the detection in order to improve its overall performance. On top
of that, the training dataset that was used in this paper was not entirely developed by
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 10
Faculty of Information and Communication Technology (Kampar Campus), UTAR
retinal professionals, which meant that not all images were graded by retinal specialist.
Ultimately, this would affect the predicting performance in real world. In addition, the
identification of diabetic retinopathy traits still required clinical examination in order
to identify the possible cases of this disease from the fundus images.
On another hand, this paper proposed a heatmap approach to detect and classify the
severity of diabetic retinopathy, whereby the current project solely focused on
classifying the diagnosed severity of the disease and output the result into a CSV file.
Besides, the results generated required to be verified by specialists. In order to create
the model that performed the best, the project would only use specialist verified datasets.
In other words, all datasets used in this project were graded by professional
ophthalmologists.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 11
Faculty of Information and Communication Technology (Kampar Campus), UTAR
2.1.3 Automated Detection of Diabetic Retinopathy using Deep Learning
(Lam et al. 2018) demonstrated the automated detection of diabetic retinopathy based
on Convolutional Neural Network (CNN), which was one of the deep learning
techniques. This paper achieved 95% of validation sensitivity during performance
testing. In addition, transferred learning on other CNN architectures including
GoogLeNet and AlexNet pretrained models from ImageNet to determine the best
performing architecture for prediction on this disease. In terms of dataset, the paper
used retinal images that obtained from Kaggle with stated severity levels that were
classified into 5 classes, from normal to end stage. Besides, the project also used
Messidor-1 dataset that was verified by physician for the algorithms.
Pre-processing of images and data augmentation were introduced in the paper. For pre-
processing, the images were cropped into size of 256x256 and extracted the retina by
using method of Otsu, followed by normalising the images. Next, this paper utilised
Contrast Limited Adaptive Histogram Equalisation (CLAHE) algorithm to adjust the
contrast of images. On another hand, data augmentation was done with the padding of
zeroes, zoom, rolling and rotation to reduce overfitting of data and improve localisation
of network.
The reason that this paper used architectures from ImageNet was that it was widely
used and optimised to detect features in biological images. Despite the fact that CNN
was able to achieve high variance and low bias in the prediction models, the increased
of dataset classes or multi-stage classification would cause the decreasing in
performance of the models. Besides, this paper also mentioned that CNN was unable to
detect obscure features and caused the incorrect classification of the images between
normal and mild severity. Therefore, verification by experts would be required to
improve the performance in this particular classification. Furthermore, the paper stated
that the input data was insufficient for prediction as CNN required vast size of data to
perform better with high detection accuracy. The model of GPU used was Tesla K80.
The good thing about this paper was that the alternatives of CNN-based architectures
were tested by using transfer learning method to find out the suitable technique for the
prediction of this disease as well as improved the pre-trained model even further.
Besides, the network was validated further by using the physician verified dataset to
make sure it could be used with various sources of the dataset of retinal images.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 12
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Overall, the approach of this paper was rather similar to the current development of this
project, including the number of severity level classes, reduction of image sizes, GPU
model, utilisation of data augmentation technique and physicians verified dataset and
limitations mentioned in this paper such as the insufficient of input data. The
insufficient of input data was inevitable considering the limitations in the insufficient
available resources in terms of GPU power and storage as well as the availability and
reliability of the sources of dataset. On another hand, the network used for this project
was DenseNet, which resulted in great performance, even with limited size of input
dataset and computing power. Furthermore, the CLAHE algorithm that used in the
paper was tested this this project as well, turned out dataset with normalised colouration
worked the best since CLAHE algorithm not only adjusted the contrast of image details
but also the noises.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 13
Faculty of Information and Communication Technology (Kampar Campus), UTAR
2.2 Previous works on image pre-processing
2.2.1 Color Retinal Image Enhancement Based on Luminosity and Contrast
Adjustment with Image Fusion Technique
(Vanmathi & Devarajan 2017) were inspired by the fact that the algorithms that classify
the types of diabetes required the input of high definition retinal images. Therefore, the
enhancement of distorted or low-quality images were necessary via some image
enhancing algorithms to avoid false diagnosis. In this proposed paper, some retinal
image enhancing techniques were introduced including Luminance enhancement,
Image Fusion as well as Contrast Limited Adaptive Histogram Equalisation (CLAHE).
The difficulties in extracting necessary traits from images were mainly due to the
presence of image noises and low image contrast, which eventually made the
diagnosing process even challenging for retinal specialists.
Image Fusion was a technique that combined the images with different focus point,
which could retain information in the image. Moreover, the paper also implemented
RGB Extraction. This method was to extract the colour of an image into different colour
channels, which were red, green and blue colour channel, more colour planes aided in
enhancement of images. On another hand, due to the uneven distribution of luminosity
of images, Luminosity Enhancement was introduced to make the overall of images had
the balanced luminosity. In order to improve the clarity of images even further, CLAHE
was used to enhance the contrast on grey images as well as reduce the distortion of
coloured images. By applying these methods, the enhancement of the retinal structures
yet retained the raw information in the images were achieved. Figure 2-2 and Figure 2-
3 was the comparison of retinal image before and after enhancement respectively.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 14
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 2-2 Image before enhancement Figure 2-3 Image after enhancement
This paper proposed a rather effective and interesting image enhancement methods, but
the entire process was relatively time consuming for this project as some of algorithms
required detailed investigations and tuning to obtain the best enhancement parameters
for a particular dataset. Therefore, the current project applied prebuilt algorithms that
suitable for the dataset chosen such as retinal image cropping and gaussian blur to
enhance the clarity and contrast of the fundus images.
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 15
Faculty of Information and Communication Technology (Kampar Campus), UTAR
2.2.2 A Retinal Image Enhancement Technique for Blood Vessel Segmentation
Algorithm
Segmentation of retinal images was a process that involved the segmenting of blood
vessel from the background. (Bandara & Giragama) had proposed an algorithm to
improve the accuracy of image segmentation by improving the quality of retinal images.
In order to produce high quality image, the contrast must be consistent throughout the
entire image. There were many contrast enhancing methods such as contrast limited
adaptive histogram equalisation (CLAHE), but this method would erase important
details and made unwanted details significant instead.
Hence, this paper proposed a method named Speeded Up Adaptive Contrast
Enhancement (SUACE) to solve this problem in the segmentation of blood vessels from
retinal images. This method would convert image into greyscale and transformed noises
looked like discontinuation from the blood vessels in order to ease the removal of noises.
Figure 2-4 and Figure 2-5 showed the original image and the output image after SUACE
respectively.
Figure 2-4 Original image before SUACE Figure 2-5 Output image after SUACE
Despite this method was capable in enhancing the quality of retinal images, SUACE
was not suitable in the application of diabetic retinopathy prediction. This was because
the application of current project was focusing on the features of this disease, which
mostly scattered throughout the entire surface, while this paper was solely focused on
the clarity of blood vessels. Therefore, this technique would had suppressed some of
the important features that were related to the disease. For the current project, a contrast
enhancement of coloured image would be more effective in highlighting the features of
CHAPTER 2: LITERATURE REVIEW
BCS (Hons) Computer Science 16
Faculty of Information and Communication Technology (Kampar Campus), UTAR
diabetic retinopathy as well as better presentation. In order to achieve this, gaussian
blur technique was implemented to enhance the image contrast and normalised the
overall colouration of images.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 17
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Chapter 3: System Design
The processes of the project were categorised into different phases in the development,
which were project pre-development, data pre-processing, model training architecture
building and data training, and prediction on test dataset. Figure 3-1 illustrated the
phases of the development. The implementations of this project were done in Google
Colab with Python and GPU acceleration enabled.
Figure 3-1 Phases of project development
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 18
Faculty of Information and Communication Technology (Kampar Campus), UTAR
3.1 Project pre-development
Figure 3-2 showed the process flow of pre-development phase and each process inside
the flow was explained in details.
Figure 3-2 The process flow of project pre-development phase
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 19
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Data sourcing
This was the phase whereby the datasets of the fundus images of retina for this
project were identified and acquired. There were 2 datasets required, which were
training and testing dataset. The training dataset would be the input dataset in
training phase to conduct the training of the data via deep learning network in order
to create a prediction model. On another hand, the testing dataset would be used to
evaluate and validate the performance of the classification. Besides, the severity
label of the disease for each image was also required in data training and
performance evaluation of the project. The labels were scaled from 0 to 4, each of
the scale represented the severity of the diabetic retinopathy disease of a retinal
image, the higher the scale, the more severe of the disease. Table 3-1 showed the
severity and image of diabetic retinopathy for each scale.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 20
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Scale Severity level of diabetic retinopathy and image
0 No
Figure 3-3 Retina with no diabetic retinopathy
1 Mild
Figure 3-4 Retina with mild diabetic retinopathy
2 Moderate
Figure 3-5 Retina with moderate diabetic retinopathy
3 Severe
Figure 3-6 Retina with severe diabetic retinopathy
4 Proliferative
Figure 3-7 Retina with proliferative diabetic retinopathy
Table 3-1 The table of numbered scales to the severity levels of diabetic
retinopathy and corresponded images
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 21
Faculty of Information and Communication Technology (Kampar Campus), UTAR
In order to fulfil these criteria, APTOS (Asia Pacific Tele-Ophthalmology Society)
2019 blindness detection dataset, which was a Kaggle’s competition dataset, was
acquired for the development. APTOS dataset had both training and testing datasets
that contained the filename of the fundus retinal images with the severity labels of
diagnosis of the images in separated train and test CSV files, but only the training
images labelled with severity scales. The dataset contained 3663 and 1928 coloured
train and test images respectively, which contributed 9.52GB of data size. Due to
the fact that only the training dataset was labelled with severity, the testing dataset
could not be used for validation because the actual severity of each test image was
unknown. On top of that, the severity labels in the test CSV file were not published
for the public even the competition was ended, which made the evaluation and
validation of the result unable to be conducted for this project. Therefore, another
dataset with the similar fundus images and labels needed to be acquired to act as
the test dataset for this project.
Hence, Messidor-2 dataset was acquired to act as the second test dataset. The dataset
contained 1748 coloured fundus retinal images, which contributed 2.30GB of data
size. However, Messidor-2 did not come with the severity labels in the CSV file
together with the images. Therefore, the CSV file with the labels was downloaded
from Kaggle dataset repository, which graded by retina specialists. Since the train
and test datasets were from different sources, the simulation of real-world
conditions during the evaluation of the prediction results could be achieved as there
were many types of fundus retinal images from different sources in the actual world.
Figure 3-8, Figure 3-9, Figure 3-10 and Figure 3-11 showed the sample image of
APTOS, sample image of Messidor-2, records in train CSV file of APTOS and
records in test CSV file of Messidor-2 respectively.
Figure 3-8 Sample image of APTOS Figure 3-9 Sample image of Messidor-2
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 22
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 3-10 Records in train CSV file of APTOS
Figure 3-11 Records in test CSV file of Messidor-2
• Data loading
The datasets were uploaded into Google Drive and transferred the data to Google
Colab. The purpose of doing this was because Google Colab did not support
persistent storage, therefore Google Drive would be integrated with Google Colab
and became the alternate persistent storage for Google Colab. The method was able
to speed up the transfer speed tremendously between Google Drive and Google
Colab because both platforms were cloud-based platform, which meant that the data
was transferred from cloud to cloud. On the contrary, manually upload from local
machine to Google Colab would be extremely time consuming as this scenario was
from local to cloud.
• Data labelling
The CSV files were transformed into data frames that labelled the images in in terms
of the name of the image and severity level accordingly.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 23
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Exploratory data analysis (EDA)
The purpose of having EDA was to explore and understand the data before any
modification or development done to the dataset. The main feature to look for in
the dataset was to study the number of images on each severity level in order to
observe the overall distribution of the number of labels on train dataset. Figure 3-
12 illustrated the bar chart of number of images to label distribution of APTOS train
dataset.
Figure 3-12 Bar chart of number of images to label distribution of APTOS train
dataset
In Figure 3-12, the labels on number of images from the most to least were label 0,
2, 1, 4 and finally 3. Label 0 was significantly more than the other labels, which
meant there was a drastic difference in terms of number of images between label 0
and other labels. Such difference showed that the dataset was imbalanced in
distribution of labels.
Furthermore, the resolution of each images was noted in order to consider whether
resizing was needed to prevent resource exhaustion as well as the samples of dataset
were displayed to make sure the images were loaded correctly.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 24
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Next, in order to ensure the images were able to be loaded by matching the filename
records in the CSV file, the records were checked to have the same filenames with
the filenames of the images. On top of that, the generic declaration on the formats
of the filenames were also studied in order to create a function to load all the images
automatically.
From the Figure 3-11, there were more columns in the test CSV of Messidor-2
compared to APTOS train CSV (Figure 3-10). Hence, the test CSV required to be
modified to match the numbers of columns in the train CSV. The
adjudicated_dr_grade was same as the diagnosis column in the train CSV, which
represented the grade or severity scale of diabetic retinopathy of the images. The
additional columns were adjudicated_dme and adjudicated_gradable. The column
of adjudicated_dme represented the referable diabetic macular edema. This
particular column would be removed because it was not the feature required in this
project. On another hand, adjudicated_gradable represented the image quality grade,
1 meant gradable while 0 meant ungradable. Therefore, the rows with 0
adjudicated_gradable would be removed since the adjudicated_dr_grade was empty,
which meant the image was not gradable and unable to be used for evaluation
purpose. Ultimately, there were 1744 images would be used as test dataset. After
that, the entire adjudicated_gradable column was removed as well. Figure 3-13
showed the modified test CSV.
Figure 3-13 Test CSV after modification
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 25
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Identify and evaluate model architecture
In this process, various types of model architecture were identified and evaluated in
order to find the suitable network architecture for this project. The repositories of
notebooks regarding to the APTOS competition were referred on GitHub and
investigated the quadratic weighted kappa values that achieved by different network
architectures that tested by the notebook authors. Hence, the quadratic weighted
kappa value would be acted as the evaluation parameter that described the
prediction performance of the architectures.
• Selecting the model architecture for development
After the model architectures were identified and evaluated, the architecture that
achieved the highest quadratic weighted kappa would be chosen for the
development of this project, which was a CNN-based architecture called DenseNet.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 26
Faculty of Information and Communication Technology (Kampar Campus), UTAR
3.2 Data pre-processing
The reason of conducting data pre-processing was due to the fact that not all images
were perfectly captured, some images were not preferable due to certain circumstances
such as noises, unnecessary background, overexposed and underexposed. Hence, it
could help in enhancing the images in terms of contrast and clarity. Figure 3-14 showed
the process flow of data pre-processing phase and each process inside the flow was
explained in details. In this phase, libraries regarding to computer vision including
Pillow and cv2 were used to allow the system to recognise image format and made
modifications on the images.
Figure 3-14 The process flow of data pre-processing phase
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 27
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Find dark Images
This function was to identify the dark images in the dataset, which were not usable
for training because the low-quality images would provide the incorrect information
when the network was learning the data and affected the performance of the
prediction model. A CSV file would be generated after the identification of dark or
unusable images. The file had the “black” column that listed the type of the images,
0 was not dark while 1 was dark image. In APTOS dataset, all images were usable.
Figure 3-15 showed the content of the CSV generated by the function. Messidor-2
dataset would not require to be in this function because the CSV already stated the
gradeability of the images.
Figure 3-15 The content of the CSV generated by find dark images function
• Image borders cropping
The purpose of implementing this function was to crop the excessive dark borders
in the images. This was due to the fact that only the retina was necessary for the
prediction and the dark borders would become an unwanted details or noise in the
image. Figure 3-16 illustrated the raw image before cropping and Figure 3-17
illustrated the image after cropping.
Figure 3-16 Raw image before cropping
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 28
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 3-17 Image after cropping
• Image resizing
The images were resized to dimensions of 512x512, which was one of the common
choices in image-based deep learning domain. Resizing the images could unify the
dimensions, which turned out all the images were squared. Besides, resizing could
also downsizing the resolution of the images in order to reduce the consumption of
computational resources to avoid system crashing. Figure 3-18 showed the image
after resized to 512x512.
Figure 3-18 Image after resized to 512x512
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 29
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Various image enhancements
In order to determine the best image enhancement method for the project, trial and
error of different types of image enhancing methods were applied to the train and
test datasets. The following were the methods applied in enhancing the images:
a) Gaussian Blur with sigmaX = 10
This method was able to enhance the overall contrast of the image and made
the features of the retina more apparent including blood vessels and diabetic
retinopathy spots such as exudates. Besides, Gaussian Blur normalised the
colour of images to ensure the pixels of the images had similar distribution
(B 2017). The parameter sigmaX was the standard deviation in X-axis
direction of the Gaussian kernel. The kernel was implemented from a cv2
library.
In fact, this method was used together with images borders cropping and
packaged as Ben’s Pre-processing in other sources. Figure 3-19 was the
sample image after pre-processed with Gaussian Blur with sigmaX = 10.
Figure 3-19 Sample image after Gaussian Blur with sigmaX = 10
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 30
Faculty of Information and Communication Technology (Kampar Campus), UTAR
b) Gaussian Blur with sigmaX = 30
This was a similar process of Gaussian Blur, except that sigmaX = 30
parameter was set.
The processed image was yellowish in colour and brighter compared to
sigmax = 10. Figure 3-20 showed the sample image after Gaussian Blur with
sigmaX = 30.
Figure 3-20 Sample image after Gaussian Blur with sigmaX = 30
c) Green channel extraction
The images in the project were made up by 3 colour channels, which were
red, green and blue colour. Among these colour channels, the green channel
was able to provide a better contrast in illustrating the image details (Sisodia,
Nair & Khobragade 2017). Hence, contrast limited adaptive histogram
equalisation (CLAHE) was applied to extract the green channel from the
images, which resulted the images appeared to be greyscale. In addition, the
network architecture required 3 colour channels but the processed images
only had 1 channel. Therefore, merging of the same colour channels was
required to make the images contained 3 channels without distorting the
images. Figure 3-21 showed sample image in green channel.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 31
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 3-21 Sample image in green channel
d) Green channel extraction, followed by Gaussian Blur with sigmaX = 10
Figure 3-22 showed the sample image applied with green channel extraction
and Gaussian Blur with sigmaX = 10. In figure 3-22, the image appeared to
be overexposed and made the noises more apparent.
Figure 3-22 Sample image applied with green channel extraction and
Gaussian Blur with sigmaX = 10
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 32
Faculty of Information and Communication Technology (Kampar Campus), UTAR
e) Gaussian Blur with sigmaX = 10, followed by green channel extraction
Figure 3-23 showed the sample image applied with Gaussian Blur with
sigmaX = 10, followed by green channel extraction. In figure 3-23, the
image appeared to be slightly brighter than the one with green channel
extraction only.
Figure 3-23 Sample image applied with Gaussian Blur with sigmaX = 10,
followed by green channel extraction
f) External pre-processing with ImageJ
Instead of processing the datasets in Google Colab, external processing by
utilising ImageJ, which was an image processing program, was done to
enhance the images on local machine. The reason behind this was because
ImageJ had the features that were not yet developed in Python and more
user friendly. The enhancements done to the datasets in sequence were:
i. Channels splitting
ii. Green channel was selected
iii. Sharpening
iv. “Northeast” shadows were applied
In order to automate the enhancements in intended sequences, an ImageJ
built-in feature was used, called Macros. By using this feature, a generic
Java-based sequences were written to allow the program to read all the
images and apply the enhancements accordingly. Figure 3-24 and Figure 3-
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 33
Faculty of Information and Communication Technology (Kampar Campus), UTAR
25 showed the Java-based code that contained the actions to be done by
ImageJ and the sample of output image by ImageJ.
Figure 3-24 The Java-based code that contained the actions to be done by
ImageJ
Figure 3-25 Sample of the output image by ImageJ
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 34
Faculty of Information and Communication Technology (Kampar Campus), UTAR
g) Speeded Up Adaptive Contrast Enhancement (SUACE)
This method was inspired by a paper proposed in literature review and the
source codes were available to the public on GitHub. This method was based
on C++ and OpenCV to conduct retinal image enhancement. Since this
method would be ran in local machine, C++ compiler and OpenCV library
for C++ must be available. This method was initially developed to enhance
the details of blood vessels in the retina, but the author mentioned that this
method was also applicable to other image enhancement applications
because the SUACE was an algorithm for enhancing the contrast of image.
Figure 3-26 showed the sample image of the output of SUACE. In Figure
3-26, the image appeared to be flattened and smoothen some noises, yet also
removed some of the important details including the features of diabetic
retinopathy.
Figure 3-26 Sample image of the output of SUACE
• Create and upload pre-processed datasets
After the pre-processing of images were completed, the images would be outputted
and packaged as a dataset. Then, the prepared datasets were uploaded to Google
Drive and readied for training.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 35
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Image resizing (224x224) [optional]
The images were further resized to 224x224 due to the limitations of RAM and
GPU memory in Google Colab that unable to compute 512x512 dataset. In order to
prevent session crashing due to lack of RAM and GPU memory, reduction of the
dimensions of images was necessary in reducing the usage of resources. Since the
reduction of image dimension would also reduce image resolution, this process was
optional.
• Applying Lanczos filter [optional]
This was an optional process. Lanczos was an anti-aliasing filter to “smooth” out
the edge of the pixels. The application of the filter was an optional process because
Lanczos showed no noticeable difference in presentation compared to the image
without the filter. Therefore, it was better to compare the results of prediction with
and without the filter. Figure 3-27 showed the 224x224 sample image pre-processed
with Gaussian Blur with sigmaX = 10 that applied with Lanczos.
The process was allocated after the creation of datasets because the filter was for
trial and error and comparison purpose, hence another creation of dataset was not
necessary.
Figure 3-27 224x224 sample image pre-processed with Gaussian Blur with
sigmaX = 10 that applied with Lanczos
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 36
Faculty of Information and Communication Technology (Kampar Campus), UTAR
3.3 Model training architecture building and Data training
The packages used in development of the architecture included TensorFlow, Keras and
Scikit-learn. Figure 3-28 showed the process flow of model training architecture
building and data training phase and each process inside the flow was explained in
details.
Figure 3-28 The process flow of model training architecture building phase
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 37
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Creating train and test arrays
The x_train, y_train and x_test arrays were defined, whereby x was the dataset and
y were the labels of the dataset.
• Transforming labels to multi-labels
Each of the y_train label array was in the format of [0 0 1 0 0] by default, whereby
the sequence location of “1” indicated the severity of the disease. For instance, in
[0 0 1 0 0], “1” was located at third place among the zeroes, which indicated
moderate severity of the disease of an image. The problem of this kind of labelling,
which was single labelling, was that there was no evidence on the progression of
the severity scales. Single labelling also carried lesser information since the labels
treated the problem as binary classification, which meant that the classification was
either no or certain level of severity. Hence, the labels were transformed to multi-
labels, which resulted in array format of [1 1 1 0 0]. The “1” in multi-labels were
filling the previous zeroes and went until certain level. In multi-labels, the
progression until certain level of severity was shown and the progression could be
treated as the indication of indexes of each level.
• Train/test data splitting for validation
Train/test data splitting technique was used in order to carry out model validation
during data training. The train dataset was split into 85/15 for validation, which
indicated 85% of data for training while 15% was for testing. This technique was
relatively useful in generating the validation accuracy and loss in each training
epoch during data training. Besides, this type of validation technique also consumed
lesser resources.
• Defining model architecture
This process was to initialise the model architecture. In this project, DenseNet121
was defined with the weights from ImageNet and without the fully-connected layer
at the top of the model in order to take in custom input shape. Since the input dataset
was resized to 224x224 and contained 3 channels, the input shape would be defined
as (224, 224, 3).
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 38
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Next, the layers of the network were defined and built as sequential. The activation
function of the activation layer was sigmoid. This was due to the fact that this
project was deemed as a multi-label classification and sigmoid was able to produce
independent probabilities of each of severity level in an array (Draelos 2019). This
allowed the observation of the progression of the indexes of each level.
Furthermore, the model was compiled with the loss function of binary cross-entropy,
optimizer instance of Adam algorithm with learning rate of 0.00005 and prediction
metrics in accuracy. Figure 3-29 listed the summary of the model.
Figure 3-29 Summary of the model
• Creating training call-back metrics
The purpose of creating call-back metrics was to define the actions and metric to be
computed at the end of each epoch. When an epoch was completed, the call-back
function would compute and generate the quadratic weighted kappa score that
indicated the performance of the model. If the quadratic weighted kappa value
model in current epoch higher than the last one, the function would save the model
in h5 format, which would be used in prediction testing.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 39
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Data augmentation
Data augmentation was a technique to transform the data to increase the amount of
data, which eventually diversified the data. This method could avoid overfitting of
the data and results in higher accuracy and lower loss of the prediction model.
Hence, a function called ImageDataGenerator was implemented to achieve this.
There were various transforming combinations to transform the data. In this project,
5 different combinations of data augmentation were tested with batch size of 32 in
order to obtain the most suitable combination of data transformation. Table 3-2
listed the tested data augmentations and Figure 3-30 showed the images after data
augmentation with number 5 combination from Table 3-2.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 40
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Number Combinations of data transforming functions
1 zoom_range = 0.15,
fill_mode = ’constant’,
cval = 0.,
horizontal_flip = True,
vertical_flip = True
2 rotation_range = 360,
brightness_range = [0.5, 1.5],
zoom_range = [1, 1.2],
zca_whitening = True,
horizontal_flip = True,
vertical_flip = True,
fill_mode = ‘constant’
3 featurewise_center = True,
featurewise_std_normalization = True,
rotation_range = 20,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True
4 featurewise_center = True,
horizontal_flip = True,
fill_mode = ‘nearest’,
zoom_range = 0.1,
rotation_range = 45
5 rotation_range = 360,
horizontal_flip = True,
vertical_flip = True,
width_shift_range = 0.2,
height_shift_range = 0.2
Table 3-2 The tested combinations of data augmentation
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 41
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 3-30 Images after data augmentation with number 5 combination
• Model training
This process was to execute the training of the model after the architecture for
model training was built. When the model was trained for the first time, 50 epochs
was chosen to observe the loss, accuracy, quadratic weighted kappa values and
overall performance the of training in order to determine whether the training had
the potential to improve further. In order to ensure the training had achieve the best
possible result, the epoch number was increased to 150 to allow the data had more
times to pass through the training process. The number of steps in an epoch was set
as the number of rows in split train dataset divided by the batch size.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 42
Faculty of Information and Communication Technology (Kampar Campus), UTAR
3.4 Prediction on test dataset
Figure 3-31 showed the process flow of prediction on test dataset phase and each
process inside the flow was explained in details.
Figure 3-31 The process flow of prediction on test dataset phase
• Loading model
After the training was completed, the saved model would be loaded.
• Test-test augmentation (TTA)
TTA was a data augmentation technique that transformed the test dataset. In this
project, TTA was done by using the combinations configured previously in data
augmentation. TTA was set with 6 steps, which indicated that the test datasets
would be went through TTA for 6 times. The steps were already the maximum
in this development due to the limitation of RAM. TTA was referred as the
alternative path before predicting the test dataset because there was no guarantee
that TTA could help in improving the performance, or worsen the performance
instead. Therefore, the prediction was done with and without the TTA in order
to test and compare both output results to investigate that which performed the
best.
CHAPTER 3: SYSTEM DESIGN
BCS (Hons) Computer Science 43
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Predicting test dataset
The loaded model would be used to predict both APTOS and Messidor-2 test
datasets. The prediction threshold was set at greater than 0.5, which meant that
only the predicted severity result that exceeded the prediction index or threshold
of 0.5 would be counted. In other words, the prediction outcomes would be
confirmed as the final results if the model had the confidence that was greater
than 50% in terms of the prediction index of each level. Due to this, the
threshold could be deemed as the adjustable prediction sensitivity parameter.
After the prediction results were generated, the results would be appended into
the test CSV and outputted as another result CSV.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 44
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Chapter 4: Experiments and Results
4.1 Methodology
The network architecture utilised in this project was DenseNet. DenseNet was one of
the recent Convolutional Neural Network (CNN). In many CNN-based architectures,
there was a case where the input information in the network would be disappeared as it
approached the end. Therefore, DensseNet tackled this problem by connecting each
layer to other layers as well in a feed-forward pattern (Chablani 2017). Each of the layer
received inputs from every layer in the network and dedicated the features of the layer
itself to other layers at the same time. In other words, all the layers had concatenated
the information from other layers in the network in order to maximise the information
flow in network. For illustration, Figure 4-1 illustrated the structure of DenseNet
architecture (Tsang 2018).
Figure 4-1 General structure of DenseNet
As a result, DenseNet was able to reduce the number of channels and parameters, which
eliminated the relearning of redundant features from each layer. On top of that,
DenseNet could also reduce overfitting on smaller dataset on training as well as
increased the efficiency of the model computation by making the network more
compact in general (Huang et al. 2018).
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 45
Faculty of Information and Communication Technology (Kampar Campus), UTAR
4.2 Tools and Requirements
1. Asus X556UF laptop with specifications:
• Processor: Intel Core-i5-6200U 2.30Ghz
• GPU: Nvidia Geforce 930M with 2GB memory
• Memory: 8GB RAM
• Storage: 265GB SSD
• Operating system: Windows 10
2. Google Colaboratory (Colab) on Google Chrome with specifications:
• Processor: Intel Xeon 2.30Ghz
• GPU: Tesla K80 with 11.4GB memory
• Memory: 13GB RAM
• Storage: 35GB
3. Google Drive
4. ImageJ 1.53c
5. Microsoft Visual Studio 2015 with C++ compiler and OpenCV library
6. Sublime Text 3
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 46
Faculty of Information and Communication Technology (Kampar Campus), UTAR
4.3 Analysis
4.3.1 Model training
Figure 4-2 showed the information generated during the execution of training process.
Figure 4-2 The information generated during training process
In Figure 4-2, the number 98/97 that highlighted in red colour rectangle was the number
of steps in an epoch. This meant that the training process was progressing in dividing
the image dataset into 98 batches, and each batch contained 32 images to allow it trained
the dataset by batches in order to prevent the system from overwhelming. With the
implementation of train/test split data validation method, the details about the loss,
accuracy, validation loss and validation accuracy were generated, which highlighted in
orange colour rectangle. The accuracy referred to the accuracy of the model on split
train dataset on training while validation accuracy was the accuracy on the split test
dataset on model validation, which could be referred as the expected accuracy. By
having the same concept, loss was referred to the error of the model on training while
validation loss was the error on model validation. The purpose of observing these values
was to detect overfitting of the dataset.
On another hand, the validation kappa value that highlighted in green rectangle was
acted as the indication or benchmark of the model performance. This was because
kappa was to measure accuracy of classification and more suitable in solving problems
with imbalance class (Brownlee 2016). On top of that, Kappa was a measurement that
compared the accuracy and the validation accuracy in order to measure the closeness
of the results of predicted instances from the ground truth. Once a trained model with
higher validation kappa value was generated, that particular model would be saved and
overwrote the previous model (if any), which would be used for evaluation and severity
labels prediction.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 47
Faculty of Information and Communication Technology (Kampar Campus), UTAR
In order to obtain the best model, the created pre-processed datasets with number 1 data
augmentation combination from Table 3-2 were trained in order to generate a model.
The raw dataset with the same data augmentation combination was also trained to act
as control. Table 4-1 was the highest quadratic weighted kappa values achieved by each
pre-processed dataset.
Number Pre-processed dataset Highest quadratic
weighted kappa
value obtained
1 Gaussian Blur with sigmaX = 10 0.9288
2 Gaussian Blur with sigmaX = 10, followed by Lanczos
filter
0.9231
3 Gaussian Blur with sigmaX = 30 0.9271
4 Green channel extraction 0.9172
5 Green channel extraction, followed by Gaussian Blur with
sigmaX = 10
0.9144
6 Gaussian Blur with sigmaX = 10, followed by green
channel extraction
0.9217
7 External pre-processing with ImageJ 0.8650
8 Speeded Up Adaptive Contrast Enhancement (SUACE) 0.8018
9 Raw dataset 0.9265
Table 4-1 Highest quadratic weighted kappa values achieved by each pre-processed
dataset
From Table 4-1, the model trained with dataset that pre-processed with Gaussian Blur
with sigmaX = 10 had obtained the highest quadratic weighted kappa in comparison, at
0.9288. Hence, the dataset would be selected for further analysis. In addition, Figure 4-
3 showed the result of the dataset when it attained the highest Kappa value during
training and the validation loss (highlighted in orange colour rectangle) was slightly
high in value, at 0.1192. This was an indication of overfitting.
Figure 4-3 The result of the dataset when the highest Kappa value was attained during
training
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 48
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Next, each data augmentation combination was used to transform the previously
selected dataset and it was put on training to generate a model in order to investigate
which combination achieved the highest quadratic weighted kappa value. Table 4-2
listed the highest quadratic weighted kappa values achieved by each data augmentation
combination on the dataset.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 49
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Number Combinations of data transforming
functions
Highest quadratic weighted
kappa value obtained
1 zoom_range = 0.15,
fill_mode = ’constant’,
cval = 0.,
horizontal_flip = True,
vertical_flip = True
0.9288
2 rotation_range = 360,
brightness_range = [0.5, 1.5],
zoom_range = [1, 1.2],
zca_whitening = True,
horizontal_flip = True,
vertical_flip = True,
fill_mode = ‘constant’
0.9241
3 featurewise_center = True,
featurewise_std_normalization = True,
rotation_range = 20,
width_shift_range = 0.2,
height_shift_range = 0.2,
horizontal_flip = True
0.9265
4 featurewise_center = True,
horizontal_flip = True,
fill_mode = ‘nearest’,
zoom_range = 0.1,
rotation_range = 45
0.9161
5 rotation_range = 360,
horizontal_flip = True,
vertical_flip = True,
width_shift_range = 0.2,
height_shift_range = 0.2
0.9308
Table 4-2 Highest quadratic weighted kappa values achieved by each data
augmentation combination
From Table 4-2, the combination number 5 obtained the highest quadratic weighted
kappa value in comparison, at 0.9308. In addition, Figure 4-4 showed the training
details when the highest Kappa value was attained. The figure showed that the current
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 50
Faculty of Information and Communication Technology (Kampar Campus), UTAR
approach was still had slight overfitting, because the validation loss (highlighted in
orange colour rectangle) was still considered a bit high although it was slightly reduced
to 0.1188.
Figure 4-4 Training details when the highest Kappa value was attained, with the
implementation of number 5 data augmentation
In conclusion, the best configuration of dataset for this project would include data pre-
processing with image enhancing method of Gaussian Blur with sigmaX = 10 and data
augmentation combination of number 5 from Table 4-2.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 51
Faculty of Information and Communication Technology (Kampar Campus), UTAR
4.3.2 Post-training evaluation
The statistics of the training were created and evaluated by using the best dataset
configuration. Figure 4-5, Figure 4-6 and Figure 4-7 illustrated the line graphs
regarding to the loss and validation loss in graph of mean square error (MSE) of loss
versus epoch, accuracy and validation accuracy in graph of accuracy versus epoch and
Kappa score in graph accuracy versus epoch respectively.
Figure 4-5 Line graph of mean square error (MSE) of loss versus epoch
Figure 4-6 Line graph of accuracy versus epoch in terms of accuracy and validation
accuracy
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 52
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 4-7 Line graph of accuracy versus epoch in terms of kappa score
From the statistics, the loss was decreasing while accuracy was increasing. This showed
that the model was trying to improve over epochs during training, which indicated a
good sign. Yet, there were fluctuations in validation loss, validation accuracy and
Kappa value over the epochs. In fact, this was a sign of overfitting. This indicated the
model not only learnt the details but also the noises as well, which would negatively
affect the performance of the model. In order to avoid overfitting, reportedly increasing
the batch size to 64 could help in stabilising the fluctuations. However, this project
unable to conduct that due to limitation of GPU memory.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 53
Faculty of Information and Communication Technology (Kampar Campus), UTAR
4.3.3 Prediction testing
The model that trained with the best dataset configuration was loaded and performed
severity labels prediction on the APTOS’s test dataset. Since the project had converted
the labels from single labels to multi-labels, the progression of the predicted severity
levels could be observed. Therefore, the progressions of an image were illustrated into
a bar graph in Figure 4-8.
Figure 4-8 The progression of each severity level
From Figure 4-8, the image would be considered as moderate severity instead of severe
or proliferative diabetic retinopathy. This was due to the fact that the threshold of
prediction was set at greater than 0.5. Therefore, the levels that were not greater than
0.5 would not be counted in. At this point, the threshold would not be adjusted.
Figure 4-9 showed the of the format of prediction results in the outputted CSV.
Figure 4-9 The format of prediction results in CSV
On another hand, the prediction on the test dataset was sorted into 2 types, which were
prediction without TTA and prediction with TTA in order to investigate the results
differences and determine which method performed the best.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 54
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Prediction without TTA
The number of images versus each severity label were illustrated in Figure 4-10 as
bar graph. From Figure 4-10, the number of images with severity labels were ranked
as below, from highest to lowest number:
1. Level 2
2. Level 1
3. Level 0
4. Level 3
5. Level 4
Level 2 was drastically more images compared to other levels while level 4 had the
least.
Figure 4-10 Bar graph of the number of images versus the severity labels of
prediction without TTA
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 55
Faculty of Information and Communication Technology (Kampar Campus), UTAR
• Prediction with TTA
Figure 4-11 showed the bar graph presentation of the number of images versus the
severity labels. From Figure 4-11, the number of images with severity labels were
ranked as below, from highest to lowest number:
1. Level 2
2. Level 3
3. Level 0
4. Level 1
5. Level 4
Similar result with prediction without TTA, Level 2 was significantly more images
compared to other levels while level 4 had the least. On the contrary, prediction
with and without TTA had different opinions on level 0, 1, 3, which were the
middle-class severity levels.
Figure 4-11 Bar graph of the number of images versus the severity labels of
prediction with TTA
In addition, Figure 4-12 and Figure 4-13 showed the test images from each predicted
label from prediction without TTA and prediction with TTA respectively.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 56
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 4-12 Test images from each predicted label from prediction without TTA
Figure 4-13 Test images from each predicted label from prediction with TTA
However, verification on the correctness of the predicted labels was unable to be
conducted due to the fact of unavailability of the actual labels on the test dataset from
APTOS, eventually the comparison between predicted and actual labels could not be
done. Therefore, Messidor-2 had become the crucial test dataset for the verification of
the prediction outcomes.
In fact, a competitor of the Kaggle’s competition achieved the quadratic weighted kappa
score of 0.9265, which resulted in the final prediction accuracy of 74% without TTA.
Theoretically, the project attained quadratic weighted kappa score at 0.9308 would
result in the final prediction accuracy that was estimated to be higher than 74% without
TTA.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 57
Faculty of Information and Communication Technology (Kampar Campus), UTAR
4.3.4 Verification of prediction results
Similar to prediction testing, the verification of the prediction results also included the
severity labels progression, prediction with and without TTA and used the same model
as in prediction testing. The difference of this section was that the test dataset was
switched to Messidor-2, which made the verification of the validity of the prediction
results could be done since the actual labels of the dataset were available. On top of
that, the test dataset from Messidor-2 could simulate real world prediction since the
model was trained with APTOS train dataset, which from a different source. This was
due to the fact that real world situations would have many images from various of
sources.
In this section, the prediction threshold was tuned from the range of 0.5 to 0.3 in the
prediction with and without TTA in order to obtain the threshold that attained the
highest overall accuracy possible. The reason of the selection of the range was because
the ranges were ideal in avoiding prediction sensitivities that were too low or too high,
which would reduce the overall accuracy drastically.
Figure 4-14 showed the of the format of prediction results in the outputted CSV during
prediction of Messidor-2 dataset whereas Table 4-3 listed the overall accuracy of each
threshold range in prediction with and without TTA.
Figure 4-14 The format of prediction results in CSV (Messidor-2)
Threshold
0.5 0.4 0.3
Without TTA 0.63 0.64 0.64
With TTA 0.63 0.64 0.65
Table 4-3 Overall accuracy of each threshold range in prediction with and without
TTA
From table 4-3, the prediction with TTA with the threshold of 0.3 achieved the highest
overall accuracy, which was 0.65 and this particular setup was configured for further
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 58
Faculty of Information and Communication Technology (Kampar Campus), UTAR
analysis. On another hand, Table 4-3 also showed that TTA had the inconsistent effect
on improving the overall accuracy by slight margin.
Figure 4-15 showed the bar graph presentation of the number of images versus the
severity labels from the predicted results of Messidor-2. From Figure 4-15, the number
of images with severity labels were ranked as below, from highest to lowest number:
1. Level 0
2. Level 2
3. Level 1
4. Level 3
5. Level 4
Figure 4-15 Bar graph of the number of images versus the severity labels from the
predicted results of Messidor-2
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 59
Faculty of Information and Communication Technology (Kampar Campus), UTAR
The Figure 4-15 illustrated that level 0 had a lot more images than other levels, which
made it the majority. Figure 4-16 showed the bar graph presentation of the number of
images versus the severity labels from the actual results of Messidor-2. From Figure 4-
16, the number of images with severity labels were ranked as below, from highest to
lowest number:
1. Level 0
2. Level 2
3. Level 1
4. Level 3
5. Level 4
Figure 4-16 Bar graph of the number of images versus the severity labels from the
actual results of Messidor-2
Similarly, the level 0 of actual results was also the majority despite the actual result of
level 0 had much lesser images compared to predicted result. This showed that the
predicted results were more towards level 0. In general, both results were similar in
terms of proportion.
Furthermore, a confusion matrix of actual and predicted results was constructed in
Figure 4-17.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 60
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 4-17 Confusion matrix of actual and predicted results
In Figure 4-17, there were 5 images had the actual severity labels of level 0 but
predicted as severity labels of level 4, as highlighted in red colour circle. This was
extremely dangerous in real world diagnosis because the level difference was too far
and it would provide a false positive diagnosis on patients.
Moreover, Figure 4-18 showed the information of the accuracy of the model on
Messidor-2 test dataset.
CHAPTER 4: EXPERIMENTS AND RESULTS
BCS (Hons) Computer Science 61
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Figure 4-18 Information of the accuracy of the model on Messidor-2 test dataset
F1-score (highlighted in green colour rectangle) would be the reference on the overall
accuracy of the prediction. From the f1-score values, only accuracy of level 0 was on
par, scored by 0.80. Whereas the other levels were rather low in f1-score. Ultimately,
the overall accuracy achieved was 0.65 or 65%, which was considered as moderately
accurate. Figure 4-19 showed the images from Messidor-2 test dataset that
corresponded with each predicted severity label.
Figure 4-19 Images from Messidor-2 test dataset on each predicted severity label
CHAPTER 5: CONCLUSION
BCS (Hons) Computer Science 62
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Chapter 5: Conclusion
5.1 Project review
In general, the project involved several phases that were crucial in development in order
to achieve the prediction and classification of the prediction labels.
In project pre-development phase, datasets from APTOS and Messidor-2 that contained
fundus retinal images and image labels were sourced and explored. On top of that, the
suitable model architecture was evaluated and selected for the training of the APTOS
train dataset, which was DenseNet, a CNN-based network architecture.
Moreover, in data pre-processing phase, the images of the datasets were processed with
dark images finding, borders cropping, resizing and most importantly applying each
image enhancement method for trial and error in finding the quadratic weighted kappa
value of each pre-processed dataset in data training.
Furthermore, in model training architecture building and data training phase, the labels
were converted to multi-labels. Next, train/test data splitting technique was utilized for
data validation and the model architecture was built. Before data training, the images
also went through data augmentation process to transform the images in order to reduce
overfitting.
In the last phase, which was prediction on test dataset, the model was involved in
prediction testing on APTOS and Messidor-2 test datasets to evaluate the accuracy of
the prediction model with and without TTA and results were analysed.
Ultimately, a prediction model that could prediction the severity of each fundus retinal
image was built. In terms of validity of prediction results, the highest quadratic
weighted kappa achieved was 0.9308 that resulted in prediction accuracy of higher than
74% (estimated) without TTA on APTOS test dataset and 65% on Messidor-2 dataset.
CHAPTER 5: CONCLUSION
BCS (Hons) Computer Science 63
Faculty of Information and Communication Technology (Kampar Campus), UTAR
5.2 Problems encountered
There were some problems encountered during the development of this project.
1. Resource limitations
Goolge Colab had rather limited RAM and GPU memory. This caused some
restrictions during the fine-tuning of hyperparameters such as batch size as well as
images resolution used. Consequently, the prediction model was unable to be
improved further despite there was potential to be refined. Besides, it also caused
the training process extremely time consuming and reduced the efficiency of project
development due to the fact that insufficient computation power by Google Colab.
The training process that involved 50 epochs took about 3 hours to complete.
Although Google Colab offered integration of Google Cloud resources, but the
subscription fee was rather expensive.
2. Insufficient training dataset
In real world, there were many fundus retinal images from different sources, larger
dataset would contribute better data training and prediction performance. In this
project, the size of dataset used was considered relatively small and insufficient to
build a prediction model that was deployable in actual medical application. Even if
there were sufficient datasets available, the computational resources must be
upgraded in the first place in order to ensure it was capable to train the datasets.
3. Unstandardized severity labels that affected prediction accuracy
The accuracy of the prediction of Messidor-2 test dataset was considered as
moderately accurate and had lower accuracy compared to the prediction testing on
APTOS test. This was due to the unstandardized severity labels between APTOS
train dataset and Messidor-2 test dataset. This was because both datasets were from
different sources, which meant that the datasets were graded by different
ophthalmologists who had different opinions on labelling the severity levels. For
instance, when APTOS train dataset labelled an image as level 2 severity but
Messidor-2 dataset had labelled the similar image as level 1 during prediction
testing. In this scenario, the prediction result would be judged as incorrect
prediction and eventually affected the overall accuracy of the model. Such
CHAPTER 5: CONCLUSION
BCS (Hons) Computer Science 64
Faculty of Information and Communication Technology (Kampar Campus), UTAR
inconsistency was reflected in Figure 5-1 that showed ophthalmologists were
inconsistent on the judgements on severity labels of a batch of images.
Figure 5-1 The inconsistency of ophthalmologists on the judgements on severity
labels of a batch of images
On another hand, the accuracy result of the prediction model was also due to multi-
class prediction instead of binary prediction. For instance, the prediction would be
easier when the prediction options were only true or false. On the contrary, when
the prediction had a range of prediction options that scale from 0 to 4, it would be
rather challenging for the model to choose the correct option due to lower
probability of being the correct prediction compared to binary prediction.
Eventually, such challenge had largely affected the overall accuracy of the
prediction.
CHAPTER 5: CONCLUSION
BCS (Hons) Computer Science 65
Faculty of Information and Communication Technology (Kampar Campus), UTAR
5.3 Future work
In order to further improve the accuracy of the prediction model, some improvements
and hyperparameters fine-tuning could be made, which were:
• Use larger dataset
• Discover alternative image pre-processing methods
• Discover alternative data augmentation combinations
• Switch model architecture and activation function
• Switch loss function
• Switch optimizer
• Find a more suitable learning rate
• Use larger batch size
• Use larger image resolution
• Switch to cross validation
• Adopt transfer learning technique
CHAPTER 5: CONCLUSION
BCS (Hons) Computer Science 66
Faculty of Information and Communication Technology (Kampar Campus), UTAR
5.4 Conclusion
Diabetic retinopathy was one of the most common diseases among Malaysians due to
the increasing of diabetic patients. Due to the advancing of A.I, deep learning was
widely used in image-based prediction in medical industry, especially eye disease
prediction. This was because deep learning was a technique that able to automate the
learning process on the data and conduct prediction on similar cases.
In this project, deep learning was applied in the prediction and classification of severity
labels of diabetic retinopathy from fundus retinal images. In order to achieve this, a
CNN-based architecture named DenseNet was built to train the datasets and generated
a prediction model that was capable in predicting the severities of test images. Besides,
various pre-processing methods were also tested in order to enhance the contrast of
images. On top of that, data augmentation and TTA were also utilised and tested in the
development with the purpose of observing the effects of these techniques on the
prediction accuracy.
Ultimately, the prediction results were validated and turned out the prediction model’s
highest quadratic weighted kappa score of 0.9308, which led to the final accuracy of
higher than 74% (estimated) without TTA and 65% on APTOS and Messidor-2 test
datasets respectively. Apparently, the prediction model still had huge room for
improvement.
In conclusion, the development of this project required in-depth deep learning
knowledge and much researches and patience on deciding the hyperparameters and
techniques to be used. Besides, this project provided an exposure or insight on how
deep learning could help in predicting the severities by learning from the fundus retinal
images as well as the encountered problems including the unstandardized severity
labels, which would be useful for relevant research and development in the future.
BIBLIOGRAPHY
BCS (Hons) Computer Science 67
Faculty of Information and Communication Technology (Kampar Campus), UTAR
BIBLIOGRAPHY
B, N., 2017. Image Data Pre-Processing for Neural Networks. [Online]
Available at: https://becominghuman.ai/image-data-pre-processing-for-neural-
networks-498289068258
[Accessed 10 June 2020].
Bandara, A. M. R. R. & Giragama, P. W. G. R. M. P. B., 2017. A Retinal Image
Enhancement Technique for Blood. IEEE International Conference on
Industrial and Information Systems , p. 6.
Brownlee, J., 2019. Machine Learning Evaluation Metrics in R. [Online]
Available at: https://machinelearningmastery.com/machine-learning-
evaluation-metrics-in-r/
[Accessed 2 August 2020].
Bt. Ngah, N. F. et al., 2017. DIABETIC RETINOPATHY SCREENING. 2nd ed.
Ampang: Ministry of Health Malaysia.
Boyd, K., 2019. What Is Diabetic Retinopathy?. [Online]
Available at: https://www.aao.org/eye-health/diseases/what-is-diabetic-
retinopathy
[Accessed 30 January 2020].
Brush, K., Burns, E. & Rouse, M., 2016. deep learning. [Online]
Available at: https://searchenterpriseai.techtarget.com/definition/deep-
learning-deep-neural-network
[Accessed 2 February 2020].
Chablani, M., 2017. DenseNet. [Online]
Available at: https://towardsdatascience.com/densenet-2810936aeebb
[Accessed 21 July 2020].
BIBLIOGRAPHY
BCS (Hons) Computer Science 68
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Draelos, R., 2019. Multi-label vs. Multi-class Classification: Sigmoid vs. Softmax.
[Online]
Available at: https://glassboxmedicine.com/2019/05/26/classification-sigmoid-
vs-softmax/
[Accessed 20 July 2020].
Huang, G., Liu, Z., Maaten, L. v. d. & Weinberger, K. Q., 2018. Densely Connected
Convolutional Networks. p. 9.
Lam, C., Yi, D., Guo, M. & Lindsey, T., 2018. Automated Detection of Diabetic
Retinopathy using Deep Learning. AMIA Joint Summits, Volume 2018, p. 9.
Olaf, R., Fischer, P. & Brox, T., 2015. U-Net: Convolutional Networks for Biomedical
Image Segmentation. Computer Vision and Pattern Recognition, p. 8.
Quelleca, G. et al., 2017. Deep Image Mining for Diabetic Retinopathy Screening.
Medical Image Analysis, Volume 39, p. 19.
Rakin, E., 2018. Malaysia has the highest rate of diabetes in Asia – doctors have
classified the disease as another ‘silent killer. [Online]
Available at: https://www.businessinsider.my/malaysia-highest-rate-diabetes-
silent-killer-asia
[Accessed 25 January 2020].
Reyes, K., 2020. What is Deep Learning and How Does Deep Learning Work. [Online]
Available at: https://www.simplilearn.com/what-is-deep-learning-article
[Accessed 2 February 2020].
Sisodia, D. S., Nair, S. & Khobragade, P., 2017. Diabetic Retinal Fundus Images:
Preprocessing and Feature Extraction for Early Detection of Diabetic
Retinopathy. [Online]
Available at: https://biomedpharmajournal.org/vol10no2/diabetic-retinal-
fundus-images-preprocessing-and-feature-extraction-for-early-detection-of-
diabetic-retinopathy/
[Accessed 5 July 2020].
BIBLIOGRAPHY
BCS (Hons) Computer Science 69
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Tsang, S.-H., 2018. Review: DenseNet — Dense Convolutional Network (Image
Classification). [Online]
Available at: https://towardsdatascience.com/review-densenet-image-
classification-b6631a8ef803
[Accessed 21 July 2020].
Vanmathi, P. & D.Devarajan, 2017. Color Retinal Image Enhancement Based on
Luminosity. Middle-East Journal of Scientific Research, p. 11.
APPENDIX A: POSTER
BCS (Hons) Computer Science A-1
Faculty of Information and Communication Technology (Kampar Campus), UTAR
APPENDIX A: POSTER
APPENDIX B: PLAGIARISM CHECK RESULT
BCS (Hons) Computer Science B-1
Faculty of Information and Communication Technology (Kampar Campus), UTAR
APPENDIX B: PLAGIARISM CHECK RESULT
APPENDIX B: PLAGIARISM CHECK RESULT
BCS (Hons) Computer Science B-2
Faculty of Information and Communication Technology (Kampar Campus), UTAR
Universiti Tunku Abdul Rahman Form Title : Supervisor’s Comments on Originality Report Generated by Turnitin for Submission of Final Year Project Report (for Undergraduate Programmes) Form Number: FM-IAD-005 Rev No.: 0 Effective Date: 01/10/2013 Page No.: 1of 1
FACULTY OF INFORMATION AND COMMUNICATION TECHNOLOGY
Full Name(s) of Candidate(s)
ID Number(s)
Programme / Course
Title of Final Year Project
Similarity Supervisor’s Comments
(Compulsory if parameters of originality exceeds the limits approved by UTAR)
Overall similarity index: %
Similarity by source Internet Sources: % Publications: % Student Papers: %
Number of individual sources listed of more than 3% similarity:
Parameters of originality required and limits approved by UTAR are as follows: (i) Overall similarity index is 20% and below, and (ii) Matching of individual sources listed must be less than 3% each, and (iii) Matching texts in continuous block must not exceed 8 words
Note: Parameters (i) – (ii) shall exclude quotes, bibliography and text matches which are less than 8 words.
Note Supervisor/Candidate(s) is/are required to provide softcopy of full set of the originality report to Faculty/Institute
Based on the above results, I hereby declare that I am satisfied with the originality of the Final Year Project Report submitted by my student(s) as named above.
Signature of Supervisor Signature of Co-Supervisor Name: Name:
Date: Date:
Hoe Yean Sam
16ACB04891
CS
Preliminary Study of Diabetic Retinopathy Classification fromFundus Images using Deep Learning Model
8
5
54
-
Sayed Ahmad Zikri Bin Sayed Aluwee
5 September 2020
APPENDIX C: FYP 2 CHECKLIST
BCS (Hons) Computer Science C-1
Faculty of Information and Communication Technology (Kampar Campus), UTAR
UNIVERSITI TUNKU ABDUL RAHMAN
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY (KAMPAR CAMPUS)
CHECKLIST FOR FYP2 THESIS SUBMISSION
Student Id
Student Name
Supervisor Name
TICK (√) DOCUMENT ITEMS Your report must include all the items below. Put a tick on the left column after you have
checked your report with respect to the corresponding item. Front Cover
Signed Report Status Declaration Form
Title Page
Signed form of the Declaration of Originality
Acknowledgement
Abstract
Table of Contents
List of Figures (if applicable)
List of Tables (if applicable)
List of Symbols (if applicable)
List of Abbreviations (if applicable)
Chapters / Content
Bibliography (or References)
All references in bibliography are cited in the thesis, especially in the chapter of literature review
Appendices (if applicable)
Poster
Signed Turnitin Report (Plagiarism Check Result - Form Number: FM-IAD-005)
*Include this form (checklist) in the thesis (Bind together as the last page)
I, the author, have checked and confirmed all the items listed in the table are included in my report. ______________________ (Signature of Student) Date:
Supervisor verification. Report with incorrect format can get 5 mark (1 grade) reduction. ______________________ (Signature of Supervisor) Date:
16ACB04891
Hoe Yean Sam
Dr Sayed Ahmad Zikri Bin Sayed Aluwee
5/9/2020 5 September 2020