i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version...

56
COMP4801 Final Year Project Object Recognition by Deep Learning Neural Networks Final Report Du Haiyang (3035087124) Supervisor: Dr. KP Chan

Transcript of i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version...

Page 1: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

COMP4801 Final Year Project

Object Recognition by Deep Learning

Neural Networks

Final Report

Du Haiyang (3035087124)

Supervisor: Dr. KP Chan

Submitted on Apr 15th, 2018

Page 2: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

ABSTRACT

Deep Learning Neural Networks have been commonly used in the field of

object recognition. This draft interim report intends to give a detailed overview

on the Final Year Project “Object Recognition by Deep Learning Neural

Networks”. The ultimate objective of this project is to: 1) reproduce R-CNN on

Python. In order to achieve the goal, project team will utilize public datasets to

train and evaluate the algorithm. Final result to be Python version R-CNN has

been implemented with a 45.2% mAP.

1

Patrick, 13/12/17,
Can provide slightly more background information and highlight the existing problem.
Page 3: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

ACKNOWLEDGEMENT

Progress of this project and featured interim report would not have been

possible without the kind support and help of two individuals.

We would like to express our gratitude to Dr. KP Chan, our project supervisor,

for providing useful guidance, valuable hardware resources and inspiring

advices throughout the project.

2

Page 4: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

TABLE OF CONTENTS

1. INTRODUCTION................................................................................................................8

2. PROJECT BACKGROUND AND RELATED STUDIES.............................................10Related studies on R-CNN will be covered in this section.....................................................................10This subsection will cover the background of this project, our motivations toward this project and the potentially beneficial outcomes of this project......................................................................................13

3. SCOPE................................................................................................................................15 3.1 R-CNN Focused......................................................................................................................15 3.2 Single Controlled Variable.....................................................................................................15 3.3 Datasets for Research.............................................................................................................15 3.4 Code Base Only.......................................................................................................................15

4. PREREQUISITES.............................................................................................................17

5. METHODOLOGY.............................................................................................................18 5.1 Matlab Version Source Code Implementation........................................................................18 5.2 Python Version Implementation..............................................................................................18

6. PROGRESS........................................................................................................................20 6.1 Stages of Work.........................................................................................................................20 6.3 Timeframe...............................................................................................................................31

7. FINAL RESULT................................................................................................................33

8. DELIVERABLES..............................................................................................................34

9. CHALLENGES AND MITIGATIONS...........................................................................35 10.1 Hardware Constraint............................................................................................................35 10.2 Uncertainty in Python Version Performance........................................................................35 10.3 Modifications on Selective Search........................................................................................36

10. FUTURE WORK.............................................................................................................37 LDA Version R-CNN mAP > 49.6%.............................................................................................38 LDA Version R-CNN mAP = 49.6%.............................................................................................38 LDA Version R-CNN mAP < 49.6%.............................................................................................38

11. CONCLUSION.................................................................................................................39

REFERENCE.........................................................................................................................40

3

Page 5: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

LIST OF FIGURES

Figure 1: R-CNN Process

Figure 2: Current Stage of Work

Figure 3: Colour Scale Example

Figure 4: Texture Scale Example

Figure 5: Texture Scale Example

Figure 6: Selective Search Visual Example

Figure 7: Interim Result Example

Figure 8: CNN

Figure 9: Pre-train CNN

Figure 10: Fine-tune CNN

Figure 11: Train SVM

4

Page 6: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

LIST OF TABLES

Table 1: Comparison on Performance of R-CNN and Its Advanced

Versions

Table 2: Project Schedule (Completed)

5

Page 7: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

LIST OF FORMULAS

Formula 1

Formula 2

Formula 3

Formula 4

Formula 5

6

Page 8: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

GLOSSARY

CNN -- Convolutional Neural Networks

R-CNN – Regional Convolutional Neural Networks

LDA – Latent Dirichlet Allocation

mAP – Mean Average Precision

RPN – Region Proposal Network

SVM – Support Vector Machine

7

Page 9: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

1. INTRODUCTION

Human beings are capable of identifying objects through their eyes with little

effort, even if the objects merely vary from each other in the slightest.

Moreover, biological visual system of human beings can recognize the objects

from different viewpoints, even when the objects are partially obstructed.

However, the same kinds of task are extremely difficult for computer system to

imitate, despite the fact that computers nowadays can surpass human easily in

many ways given the powerful calculation speed and enormous memory size.

The technology to capture and recognize objects in an image or a series of

images by utilizing computer vision algorithm is called object recognition and

extensive research in this field of study has been conducted over the past

decades.

It is generally acknowledged that the best algorithm for object recognition tasks

are based on convolutional neural networks (CNN), a widely recognized deep

learning neural networks that has proven successful in processing visual images.

CNN is a methodology for machines to understand and interpret data

representations without specific tasks being enforced. It typically consists of an

input, an output layer and several hidden convolutional layers. The multi-layer

design enables it to pre-process at a minimal time, and to a large extent,

mimicking the behaviour of biological neuron process.

Regional Convolutional Neural Network (R-CNN) represents the usage of deep

learning neural networks on identification of target object [1]. The key objective

of this Final Year Project is to implement R-CNN algorithm in Python language

environment. The R-CNN build up generally consists of four individual

8

Page 10: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

modules, namely image input, proposal extraction, CNN computation and SVM

classification. This Final Year Project will focus on tackling each one of the

four modules in Python language and assemble them together to complete a

Python language based R-CNN, which has never been done before. Although

original Matlab version has its own benefits, we believe Python version serves

the purpose of better extensibility and provides an platform for more

programmers, who are not familiar with Matlab but generally easy to handle

with Python, to study on the topic of “Object Recognition by Deep Learning

Neural Networks”.

This final report intends to present a comprehensive illustration of this Final

Year Project. It starts by examining the project background and previous related

studies in the field of object recognition using deep learning neural networks.

Then, it outlines the scope and prerequisites of the final year project. After that,

it discusses in detail the approach applied to conduct the project and report the

progress of this project in different phases and corresponding results. Moreover,

it presents some challenges we met during the research and how we overcome

them. Eventually, it proposes potential aspects for future work development on

this Final Year Project topic and gives a conclusion to the result.

9

Page 11: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

2. PROJECT BACKGROUND AND RELATED STUDIES

This section will give you a first glance at this Final Year Project, by presenting

two separate parts: related studies to this project and corresponding project

background.

2.1 RELATED STUDIES

Related studies on R-CNN will be covered in this section.

o R-CNN

R-CNN operates four individual modules, image input, selective

search on image, CNN feature extraction and object identification

through Support Vector Machine (SVM) classifier [1]. This process is

illustrated in Figure 1 below.

Figure 1: R-CNN Process [1]

In Figure 1, R-CNN first input the image from its data source and then

extract region proposals that may contain objects. Then it computes

10

Page 12: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

the CNN features of each region and eventually classify each region

according its specific characteristic. The main data sources for image

input used in R-CNN are PASCAL VOC 2007 dataset and PASCAL

VOC 2012 dataset. Region proposal extraction are based on selective

search algorithm, in expectation to achieve better proposal generation.

The classification is done by linear SVM classifier, based on the CNN

computed features output.

Since the introduction of R-CNN, research on improving the

algorithm performance based on original version attracted uprising

interest in this field of study. Main objectives of these research have

concentrated on accelerating training time and raising recognition

accuracy rate. Fast R-CNN, a variation of R-CNN that speeds up the

training process by using Regional Proposal Network (RPN) to pre-

scan the image area and improved the recognition accuracy rate by

replacing original SVM classifier with Softmax classifier [2]. In

addition, optimized RPN and Fast R-CNN to share convolutional

features can generate better performance and is referred to as Faster R-

CNN [3].

11

Page 13: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

R-CNN FAST R-CNN FASTER R-CNN

TEST TIME PER

IMAGE

50 seconds 2 seconds 0.2 seconds

SPEEDUP 1x 25x 250x

MAP (VOC 2007) N/A 66.9 66.9

MAP (VOC 2012) 49.6 ~66.0 N/A

Table 1: Comparison on Performance of R-CNN and Its Advanced Versions [2]

[4]

Table 1 above gives an overview to the comparative performance

among R-CNN, Fast R-CNN and Faster R-CNN. The accuracy rate,

represented as Mean Average Precision (mAP), was calculated as the

mean of all classes’ average precision under the precision-recall curve.

Higher mAP indicated more successful rate in identifying objects.

Therefore, it can be spotted from Table 1 that both Fast-RCNN and

Faster R-CNN yielded a significant better accuracy than R-CNN under

two different evaluating databases. On top of that, since the

introduction of RPN, the training speed has exceedingly improved in

Fast R-CNN and Faster R-CNN.

12

Patrick, 13/12/17,
Why past tense?
Page 14: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

2.2 PROJECT BACKGROUND

This subsection will cover the background of this project, our motivations

toward this project and the potentially beneficial outcomes of this project.

o Background of this project

Although sophisticated algorithm such as Fast R-CNN and Faster R-

CNN has been introduced, this project will base its framework on R-

CNN given its originality and better extensibility compare to other

advanced algorithms. Considering the technological constraint and

time limit, this project cannot perform substantial improvement to R-

CNN algorithm as what Fast R-CNN and Faster R-CNN do. The

scope of the research that will be covered in this project will be

introduced in the next section.

o Motivations toward this project

Study of object recognition has been conducted in an extensive way,

and since the introduction of R-CNN in 2014, which is not very long

time ago, tremendous improvement has been operated and achieved

spectacular results. Therefore, R-CNN may not be on the edge of

research nowadays.

However, the reason why we choose R-CNN is mainly because of its

modularity. The four individually working modules will allow us to

build up a version that can be of great use for future study, can be

tested and conduct comparative experiments individually, and can

allow us to easily separate our work. It is also an inspiration of

literature review on Fast R-CNN that makes us realize that we want to

13

Page 15: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

examine how each module can be modified to help improving on the

accuracy rate of object recognition. Classifier is clearly a spot to

explore, and we will propose to use Topic Model to replace SVM for

future study on this project.

Moreover, the reason why we want to re-implement R-CNN in Python

language instead of other languages, is because of Python’s wide

acceptance by programmers who are exploring the field of computer

vision and machine learning. Implementation in Python would help

fellow programmers to exploit the modularity design of R-CNN, and

research on the module they want to improve and combine for further

advancement in this field of study.

o Potentially beneficial outcomes

The first potentially beneficial outcome would be a Python language

implementation of R-CNN. This is beneficial to all of the

programmers who want to start exploring the field of object

recognition and only familiar with Python language.

The second potential outcome would be providing a platform for

future research on how classifier plays an important role in improving

recognition accuracy. We will propose a future research prospective

on using Topic Model for future development.

14

Page 16: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

3. SCOPE

Given thorough consideration on the complexity in conducting this research, the

scope of this project is currently limited to the following four parts and will

subject to changes that may occur during actual research progress:

3.1 R-CNN Focused

Although improved versions of R-CNN, such as Fast R-CNN and Faster

R-CNN, have emerged, this FYP project will focus solely on R-CNN,

considering the better extensibility of R-CNN.

3.2 Single Controlled Variable

Despite different approaches to improve the performance of R-CNN

algorithm, this project will only concentrate on raising mAP accuracy rate

of R-CNN. The research will realize by controlling variables of selective

search algorithm and CNN algorithm embedded in R-CNN.

3.3 Datasets for Research

Among numerous public datasets available for object recognition study,

two datasets will be leveraged in this project. PASCAL VOC 2007 will

be used for training the algorithm and PASCAL VOC 2012 will be

utilized to evaluate the performance for implementations.

15

Page 17: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

3.4 Code Base Only

Given the project nature, no user interface will be developed. Instead, a

code base will be provided for implementation.

In short, the scope of this project will be limited to training and evaluating

algorithm’s performance by two pre-selected public datasets, with ultimate

deliverable to be a code-based algorithm that will generate better mAP accuracy

rate.

In light of the challenging aspects from both hardware and code perspective,

several high standard prerequisites will be required, listing in the next section.

16

Page 18: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

4. PREREQUISITES

Two prerequisites on hardware and one prerequisite on source code for this

project is stated as follows:

1. Caffe [5] with Python layer and pycaffe framework.

2. GNU Parallel [6] for GPU parallel computing.

3. R-CNN source code which based on Matlab.

17

Page 19: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

5. METHODOLOGY

In order to successfully conduct this project, research process is separated into

three two phases as follows:

5.1 Matlab Version Source Code Implementation

R-CNN source code is originally implemented in Matlab environment.

This version will be used as benchmark for the project, which will be

trained by Pascal VOC 2007 and evaluated by Pascal VOC 2012 to keep

all training and evaluating environment on accord.

5.2 Python Version Implementation

R-CNN will be reprogrammed to implement in Python environment

based on the Matlab version. It is generally recognized that Python offers

a better adaption to all programmers, since that Python is more commonly

used compare to Matlab. Moreover, Python environment offers more

extensibility so that further study based on this research project can be

easily conducted. The same training and evaluating environment will be

provided for this replicated version. The evaluation result will be

recorded as well to compare with Matlab version’s result, and also set as

another benchmark for the project. It is expected to achieve relatively the

same training speed and accuracy rate as Matlab version. However, a 5%

fluctuation will be allowed given the transition in language environment.

Noted that deviation from the expectation may be encountered, and with

the progress of research, the project schedule will subject to changes due

to this reason.

18

Page 20: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

To summarize, the research will be conducted in 2 phases, starting with Matlab

version R-CNN implementation, then with Python version R-CNN

reproduction. The performance of Python version R-CNN is expected to be

generally in line with that of original Matlab version R-CNN. Progress of the

final year project will be reported in the next section.

19

Page 21: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

6. PROGRESS

This section intends to give a detailed overview on all stages of work that we

have done during the research process and report corresponding results of each

stage of work. Then, it will tape back and provide the whole timeframe of our

project.

6.1 Stages of Work

This subsection will report all stages of work we achieve during this Final

Year Project, namely environment setup, Matlab version R-CNN

implementation, selective search implementation in Python version R-

CNN, CNN implementation in Python version R-CNN, SVM classifier

implementation in Python version R-CNN

o Environment Setup

A computer embedded with Intel i7 and 32G RAM was provided.

Ubuntu was chosen to be the operating system given that Linux

environment is able to process both Matlab and Python language.

Considering the high-standard hardware requirements for R-CNN

algorithm, 332G disk space was granted to this final year project.

In addition, GeForce GTX GPU was also setup to facilitate large

scale image processing, which will be used in this project.

On top of the hardware, Python and Caffe were installed in the

system, as well as Matlab.

o Matlab Version R-CNN Implementation

20

Page 22: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Original Matlab Version R-CNN was retrieved from Github, an

open source code community. This version was trained by

PASCAL VOC 2007 dataset and evaluated by PASCAL VOC

2012 dataset. The evaluation result was 49.6% mAP, close to the

value provided in original research paper. This proves that the

software and hardware environment was sufficient for Matlab

version R-CNN to implement. Moreover, 49.6% mAP will be set

as one of the benchmarks for the whole project.

o Selective Search Implementation in Python Version R-CNN

Figure 2: Selective Search Stage of Work [1] (Figure was reproduced on top of

original graph in paper)

Figure 2 illustrates the selective search stage of Python version R-

CNN implementation, which is highlighted in the red square. The

first part was to set up environment and prepared PASCAL VOC

21

Patrick, 12/13/17,
Why future?
Patrick, 12/13/17,
Citation?
Page 23: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

2007 and PASCAL VOC 2012 datasets for image input. The

second part was to implement selective search algorithm in Python.

Selective search is an algorithm used in object recognition that is

designed to propose all class-independent object locations [9]. The

main rationale behind selective search is that images are naturally

hierarchical, and all objects can be identified through four multi-

scales: color, size, texture and fit.

Figure 3: Colour Scale Example [9]

Figure 3 illustrates an example for identifying objects through

colour. The image of two cats can be distinguished by their colour,

instead of their size or texture.

22

Page 24: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Figure 4: Texture Scale Example [9]

Figure 4 gives an example for identifying objects through texture.

The chameleon can be identified from the leaves that have similar

colour to itself through texture.

Figure 5: Fit Scale Example [9]

23

Page 25: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Figure 5 offers an example for identifying objects through fit. The

tyre of the red car is recognized as part of the car because that a

tyre naturally fit into a car, not because of its colour, texture or

size.

Moving forward, for Python version R-CNN algorithm to compute

the similarity of each scale, several mathematical formulas were

leveraged to conduct the computation.

Formula 1 [9]

Formula 1 denotes the similarity between any two regions i and j.

As mentioned before, regions can be distinguished from each other

through four multi-scales and thus similarity between two regions

can be computed as the sum of similarity of those four scales.

Constant terms a1, a2, a3 and a4 can only be 0 or 1, depending on

whether the corresponding scale is actually used to compute the

similarity.

The detailed formula for similarity of each scale is illustrated

below:

24

Page 26: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Formula 2 [9]

Formula 2 denotes the similarity of colour between any two

regions i and j, which is calculated as the minimal sum of each

colour point.

Formula 3 [9]

Formula 3 denotes the similarity of texture between any two

regions i and j, which is calculated as the minimal sum of each

texture point.

Formula 4 [9]

25

Page 27: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Formula 4 denotes the similarity of size between any two regions i

and j, which recognize the relative size scale of region i and region

j.

Formula 5 [9]

Formula 5 denotes the similarity of fit between any two regions i

and j, which aims at assembling parts of the object together.

With the backup of similarity computation, selective search is able

to select the minimum regions that are most likely to have objects

in it quantitatively. Higher similarity value indicates higher

possibility to contain objects in the region. By comparing the

similarity value computed through formulas presented above, R-

CNN algorithm is able to come up with the minimal set of region

proposals. Below is a visual explanation on how selective search

works on an image.

26

Patrick, 13/12/17,
“Figure 6”
Page 28: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Figure 6: Selective Search Visual Example [9]

Figure 6 illustrates how selective search works on an input image.

From the left most of the figure, it first segmented image into

different regions and converted regions to boxes, which

represented the regions that have high possibility to contain

objects. Then according to the four multi-scales (colour, size,

texture and fit) mentioned before, selective search merged regions

wherever suitable and reduced the number of boxes. Eventually, as

shown in the right most of Figure 6, it output the minimum number

of regions that may possess objects.

The result of this stage is that Python Version R-CNN is now able

to process image input and apply selective search algorithm to

segment minimum regions with the highest possibility to contain

objects.

27

Patrick, 13/12/17,
Why simple past?
Page 29: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Figure 7: Interim Result Example [9]

Figure 7 gives a visual example on the result of this stage. In the

figure, an input image with two milk cows and fence was applied

to selective search. The proposed regions with the highest

possibility to contain object were circled in green. The less possible

regions were circled in blue for CNN computation use, which

would be implemented in later phases.

o CNN Implementation in Python Version R-CNN

CNN is a very crucial part in this project, which takes the region

proposals extracted through selective search and computed through

five hidden layers and two fully connected layers, to get features.

The structure of CNN we used are basically illustrated as below:

28

Patrick, 13/12/17,
Why simple past?
Page 30: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Figure 8: CNN [11]

We downloaded pre-trained model of CNN from Caffe official

website to ensure its compatibility. Slightly unlike the paper

suggested, the model we downloaded is pre-trained on ILSVRC

2013 dataset instead of ILSVRC 2012 dataset. This process can be

illustrated as below:

Figure 9: Pre-train CNN [12]

Then we build up the whole CNN module, enabling it to take in the

227 x 227 – dimensional feature vector from each region proposed.

29

Page 31: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Eventually, we use PASCAL VOC 2007 to fine-tune the CNN so

that it is ready for object detection. This step can be illustrated as

below:

Figure 10: Fine-tune CNN [12]

o SVM Classifier Implementation in Python Version R-CNN

SVM is served as classifier in this project, which takes features

computed in the CNN and classified them at a per class basis.

Here, we first set up 20 SVM corresponding to 20 different classes.

Then we first use the regional proposal extracted and undergone

CNN computation with training labels to train each class’s SVM,

to prepare it for final evaluation. This step can be illustrated as

below:

Figure 11: Train SVM [12]

30

Page 32: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

6.3 Timeframe

This timeframe has been changing since we first proposed. As the

research goes on, we gradually realized the difficulty underlines this

project and we adopted flexible timeframe.

For example, we were halt from proceeding when we tried to get a

Matlab license for the machine we used to run the project. It took several

weeks before we got reply that we could not be granted a license there.

Therefore, we have to push later our original plan and seek new way to

familiar ourselves with Matlab version’s code.

Table 2 below gives a detailed overview on the time schedule of this

project:

Time Frame Task

Sep. 2017

(Completed)

● Detailed project plan submission

● Project webpage creation

● Initial meeting with Dr. KP Chan to confirm

research topic and setup regular meeting time

slots

● Related research paper reading

Oct. 2017

(Completed)

● Download R-CNN Matlab version source code

and familiarize with the algorithm

● Train and evaluate Matlab version R-CNN

31

Page 33: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Nov. 2017 – Jan.

2017

(Completed)

● Python version R-CNN implementation

● Check hardware status and decide on whether

higher standard hardware will be used

● Build up Python version R-CNN

● Interim report submission

Jan. 2018 – Apr.

2018

(Completed)

● Train and evaluate Python version R-CNN

● Experiment on Topic Model to see if it is

plausible for using in object recognition

Apr. 2018

(Completed)

● Final report submission

● Final project presentation

Table 2: Project Schedule

32

Page 34: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

7. FINAL RESULT

We evaluate our Python version R-CNN by using PASCAL VOC 2012 dataset.

The result we achieved is roughly satisfying, with the best mAP rate to be

45.2%. The comparison between our result and original paper’s result is as

follows:

ORIGINAL R-CNN PYTHON VERSION R-CNN

TEST TIME PER

IMAGE

50 seconds N/A

SPEEDUP 1x N/A

MAP (VOC 2012) 49.6 45.2

Table 3: Comparison on Performance of Original R-CNN and Python Version

R-CNN [2][4]

Although the mAP is slightly lower than the original R-CNN, it is

understandable, considering the fact that we have changed the programming

language used. Also, out implementation may not be as perfect as the paper did,

and future perfection may help bring up the mAP rate.

33

Page 35: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

8. DELIVERABLES

This project will deliver a Python version R-CNN algorithm using in object

recognition study, in the form of code base.

The algorithm achieved a mAP accuracy rate around 45.2% evaluating by

PASCAL VOC 2012 dataset. Project progress can be checked on:

http://i.cs.hku.hk/fyp/2017/fyp17015/

34

Page 36: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

9. CHALLENGES AND MITIGATIONS

Challenges have been met and conquered for this project are hardware

constraint, uncertainty in Python version performance and modifications on

selective search. Detailed obstacles that occurred and corresponding mitigations

can be found as follow:

10.1 Hardware Constraint

R-CNN Matlab version requires sophisticated hardware support, such as

high-performance GPU, large disk spaces (around 200G) to cache image

and feature vector. The requirements may vary when we implement it in

Python environment, and as a consequence, potential delay to original

project schedule and lower-than-expected performance may occur.

Mitigation:

1) Current hardware provided by supervisor has been tested and

additional support such as higher memory space was provided by Dr.

KP Chan, which helped a lot.

2) Flexible project schedule was adopted.

10.2 Uncertainty in Python Version Performance

Due to the change in language environment, Python version R-CNN

performed worse than original Matlab version. Slower processing time

and less accuracy rate was encountered.

35

Page 37: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Mitigation:

1) As the difference was not significant, the project is considered

satisfactory and can be used to further perfection.

10.3 Modifications on Selective Search

Selective search algorithm used in the final year project generates around

2000 segments with different sizes and shapes. Nonetheless, the CNN

requires all the input segmentations to have exactly same size. To resolve

this conflict, two measures can be taken.

Mitigation:

1) Wrap all the images to a certain size after generating the images.

However, this may cause some misunderstanding for CNN. Because

the size of the object has been modified, a small little tree may be

recognized and treated as a big wooden door after resizing.

2) Pad the picture with some background color. This may also cause

confusion since the neural network is taking the whole picture into

consideration.

36

Page 38: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

10. FUTURE WORK

Future work that can be based on this project are tremendous. We hereby

propose one aspect that may be very beneficial and promising. We could use

Topic Model with Latent Dirichlet Allocation (LDA) to serve as classifier

instead of original SVM classifier.

LDA, one of the topic models, is generally used for text mining. However, we

suspect that replacing it for SVM can help raising the object recognition

accuracy rate. And the replacement would not be too messy, given that we

adopted the original R-CNN modular design, so that classifier module as an

individual module, can be replaced without influencing the whole algorithm.

LDA is one of the most popular topic model, which uses Dirichlet distribution

[10]. Topic model technology are broadly used in document classification and

reading recommendation. By applying topic modelling to a reader’s reading

history, content providers can learn which topic the reader is more interested in

and will then offer similar documents to this reader.

LDA works by treating documents as a collection of words without sequential

order. Basing its idea on the fact that document is composed of topics and each

word belongs to one of these topics, it is quite similar to the idea of image

classification and thus may be able to use.

To achieve comparable result, a LDA Classifier version R-CNN can be trained

and evaluated by the exact same PASCAL datasets, in order to keep apple-to-

apple principle. Based on the result of output, specific parameter can be

calipered for achieving optimal performance.

37

Page 39: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

Some potential results and corresponding explanation are as follows:

LDA Version R-CNN mAP > 49.6%

This indicates that LDA classifier indeed enhance the object recognition

accuracy rate. To interpret, classifier part is actually the key to solve the

accuracy rate problem and LDA classifier is more powerful compare to

original SVM classifier. In summary, this research could draw the

conclusion to be successful, if achieving mAP > 49.6%.

LDA Version R-CNN mAP = 49.6%

This indicates that LDA classifier only perform the same classification

capability as the original SVM classifier. In order to improve the

accuracy rate, classifier may not be the key to the problem under this

expected result condition.

LDA Version R-CNN mAP < 49.6%

This indicates that LDA classifier generates a worse result compare to

original classifier. The result may occur due to two possible reasons:

incapability between LDA classifier and CNN computation, and

incapability between selective search segmentation and LDA classifier. In

order to address this issue, additional efforts should be expected to make

corresponding modifications to selective search segmentation or CNN

computation.

38

Page 40: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

39

Page 41: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

11. CONCLUSION

This project researches on the heated object recognition topic, targeting to re-

implement the widely used R-CNN algorithm in Python language and aiming to

provide a better platform for future in the form of code base. The project has

finished all of the following parts: environment setup, Matlab version R-CNN

implementation, selective search implementation in Python version R-CNN,

CNN implementation in Python version R-CNN, SVM classifier

implementation in Python version R-CNN

Future research will focus on tapping the modularity of design, and research on

how different modules can be of help to increase accuracy rate.

In light of the upside benefit of potential improvement on object identification

accuracy with future development, this project is regarded as scientifically

meaningful. It is believed that the ultimate deliverable can inspire researchers in

the field of object recognition by deep learning neural networks.

40

Page 42: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

REFERENCE

[1]Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature

hierarchies for accurate object detection and semantic segmentation.

In Proceedings of the IEEE conference on computer vision and pattern

recognition (pp. 580-587).

[2]Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international

conference on computer vision (pp. 1440-1448).

[3]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards

real-time object detection with region proposal networks. In Advances in

neural information processing systems (pp. 91-99).

[4]Artificial Intelligence. Retrived from

https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/

object_localization_and_detection.html

[5] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... &

Darrell, T. (2014, November). Caffe: Convolutional architecture for fast

feature embedding. In Proceedings of the 22nd ACM international

conference on Multimedia (pp. 675-678). ACM.

[6]Ludwig, T. (2004). Research trends in high performance parallel

input/output for cluster environments.

[7]Gu, S., Tan, Y., & He, X. (2010). Discriminant analysis via support

vectors. Neurocomputing, 73(10), 1669-1675.

[8]Bellingegni, A. D., Gruppioni, E., Colazzo, G., Davalli, A., Sacchetti, R.,

Guglielmelli, E., & Zollo, L. (2017). NLR, MLP, SVM, and LDA: a

comparative analysis on EMG data from people with trans-radial

amputation. Journal of neuroengineering and rehabilitation, 14(1), 82.

41

Page 43: i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version Source Code Implementation18 ...

[9]Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013).

Selective search for object recognition. International journal of computer

vision, 104(2), 154-171.

[10] D. M. Blei, A. Y. Ng and M. I. Jordan, "Latent Dirichlet Allocation,"

Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.

[11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet

classification with deep convolutional neural networks. In Advances in

neural information processing systems (pp. 1097-1105).

[12] Picture retrieved from https://zhuanlan.zhihu.com/p/32564990

42