i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version...
Transcript of i.cs.hku.hki.cs.hku.hk/fyp/2017/report/final_report/Du Haiyang_118… · Web view5.1 Matlab Version...
COMP4801 Final Year Project
Object Recognition by Deep Learning
Neural Networks
Final Report
Du Haiyang (3035087124)
Supervisor: Dr. KP Chan
Submitted on Apr 15th, 2018
ABSTRACT
Deep Learning Neural Networks have been commonly used in the field of
object recognition. This draft interim report intends to give a detailed overview
on the Final Year Project “Object Recognition by Deep Learning Neural
Networks”. The ultimate objective of this project is to: 1) reproduce R-CNN on
Python. In order to achieve the goal, project team will utilize public datasets to
train and evaluate the algorithm. Final result to be Python version R-CNN has
been implemented with a 45.2% mAP.
1
ACKNOWLEDGEMENT
Progress of this project and featured interim report would not have been
possible without the kind support and help of two individuals.
We would like to express our gratitude to Dr. KP Chan, our project supervisor,
for providing useful guidance, valuable hardware resources and inspiring
advices throughout the project.
2
TABLE OF CONTENTS
1. INTRODUCTION................................................................................................................8
2. PROJECT BACKGROUND AND RELATED STUDIES.............................................10Related studies on R-CNN will be covered in this section.....................................................................10This subsection will cover the background of this project, our motivations toward this project and the potentially beneficial outcomes of this project......................................................................................13
3. SCOPE................................................................................................................................15 3.1 R-CNN Focused......................................................................................................................15 3.2 Single Controlled Variable.....................................................................................................15 3.3 Datasets for Research.............................................................................................................15 3.4 Code Base Only.......................................................................................................................15
4. PREREQUISITES.............................................................................................................17
5. METHODOLOGY.............................................................................................................18 5.1 Matlab Version Source Code Implementation........................................................................18 5.2 Python Version Implementation..............................................................................................18
6. PROGRESS........................................................................................................................20 6.1 Stages of Work.........................................................................................................................20 6.3 Timeframe...............................................................................................................................31
7. FINAL RESULT................................................................................................................33
8. DELIVERABLES..............................................................................................................34
9. CHALLENGES AND MITIGATIONS...........................................................................35 10.1 Hardware Constraint............................................................................................................35 10.2 Uncertainty in Python Version Performance........................................................................35 10.3 Modifications on Selective Search........................................................................................36
10. FUTURE WORK.............................................................................................................37 LDA Version R-CNN mAP > 49.6%.............................................................................................38 LDA Version R-CNN mAP = 49.6%.............................................................................................38 LDA Version R-CNN mAP < 49.6%.............................................................................................38
11. CONCLUSION.................................................................................................................39
REFERENCE.........................................................................................................................40
3
LIST OF FIGURES
Figure 1: R-CNN Process
Figure 2: Current Stage of Work
Figure 3: Colour Scale Example
Figure 4: Texture Scale Example
Figure 5: Texture Scale Example
Figure 6: Selective Search Visual Example
Figure 7: Interim Result Example
Figure 8: CNN
Figure 9: Pre-train CNN
Figure 10: Fine-tune CNN
Figure 11: Train SVM
4
LIST OF TABLES
Table 1: Comparison on Performance of R-CNN and Its Advanced
Versions
Table 2: Project Schedule (Completed)
5
LIST OF FORMULAS
Formula 1
Formula 2
Formula 3
Formula 4
Formula 5
6
GLOSSARY
CNN -- Convolutional Neural Networks
R-CNN – Regional Convolutional Neural Networks
LDA – Latent Dirichlet Allocation
mAP – Mean Average Precision
RPN – Region Proposal Network
SVM – Support Vector Machine
7
1. INTRODUCTION
Human beings are capable of identifying objects through their eyes with little
effort, even if the objects merely vary from each other in the slightest.
Moreover, biological visual system of human beings can recognize the objects
from different viewpoints, even when the objects are partially obstructed.
However, the same kinds of task are extremely difficult for computer system to
imitate, despite the fact that computers nowadays can surpass human easily in
many ways given the powerful calculation speed and enormous memory size.
The technology to capture and recognize objects in an image or a series of
images by utilizing computer vision algorithm is called object recognition and
extensive research in this field of study has been conducted over the past
decades.
It is generally acknowledged that the best algorithm for object recognition tasks
are based on convolutional neural networks (CNN), a widely recognized deep
learning neural networks that has proven successful in processing visual images.
CNN is a methodology for machines to understand and interpret data
representations without specific tasks being enforced. It typically consists of an
input, an output layer and several hidden convolutional layers. The multi-layer
design enables it to pre-process at a minimal time, and to a large extent,
mimicking the behaviour of biological neuron process.
Regional Convolutional Neural Network (R-CNN) represents the usage of deep
learning neural networks on identification of target object [1]. The key objective
of this Final Year Project is to implement R-CNN algorithm in Python language
environment. The R-CNN build up generally consists of four individual
8
modules, namely image input, proposal extraction, CNN computation and SVM
classification. This Final Year Project will focus on tackling each one of the
four modules in Python language and assemble them together to complete a
Python language based R-CNN, which has never been done before. Although
original Matlab version has its own benefits, we believe Python version serves
the purpose of better extensibility and provides an platform for more
programmers, who are not familiar with Matlab but generally easy to handle
with Python, to study on the topic of “Object Recognition by Deep Learning
Neural Networks”.
This final report intends to present a comprehensive illustration of this Final
Year Project. It starts by examining the project background and previous related
studies in the field of object recognition using deep learning neural networks.
Then, it outlines the scope and prerequisites of the final year project. After that,
it discusses in detail the approach applied to conduct the project and report the
progress of this project in different phases and corresponding results. Moreover,
it presents some challenges we met during the research and how we overcome
them. Eventually, it proposes potential aspects for future work development on
this Final Year Project topic and gives a conclusion to the result.
9
2. PROJECT BACKGROUND AND RELATED STUDIES
This section will give you a first glance at this Final Year Project, by presenting
two separate parts: related studies to this project and corresponding project
background.
2.1 RELATED STUDIES
Related studies on R-CNN will be covered in this section.
o R-CNN
R-CNN operates four individual modules, image input, selective
search on image, CNN feature extraction and object identification
through Support Vector Machine (SVM) classifier [1]. This process is
illustrated in Figure 1 below.
Figure 1: R-CNN Process [1]
In Figure 1, R-CNN first input the image from its data source and then
extract region proposals that may contain objects. Then it computes
10
the CNN features of each region and eventually classify each region
according its specific characteristic. The main data sources for image
input used in R-CNN are PASCAL VOC 2007 dataset and PASCAL
VOC 2012 dataset. Region proposal extraction are based on selective
search algorithm, in expectation to achieve better proposal generation.
The classification is done by linear SVM classifier, based on the CNN
computed features output.
Since the introduction of R-CNN, research on improving the
algorithm performance based on original version attracted uprising
interest in this field of study. Main objectives of these research have
concentrated on accelerating training time and raising recognition
accuracy rate. Fast R-CNN, a variation of R-CNN that speeds up the
training process by using Regional Proposal Network (RPN) to pre-
scan the image area and improved the recognition accuracy rate by
replacing original SVM classifier with Softmax classifier [2]. In
addition, optimized RPN and Fast R-CNN to share convolutional
features can generate better performance and is referred to as Faster R-
CNN [3].
11
R-CNN FAST R-CNN FASTER R-CNN
TEST TIME PER
IMAGE
50 seconds 2 seconds 0.2 seconds
SPEEDUP 1x 25x 250x
MAP (VOC 2007) N/A 66.9 66.9
MAP (VOC 2012) 49.6 ~66.0 N/A
Table 1: Comparison on Performance of R-CNN and Its Advanced Versions [2]
[4]
Table 1 above gives an overview to the comparative performance
among R-CNN, Fast R-CNN and Faster R-CNN. The accuracy rate,
represented as Mean Average Precision (mAP), was calculated as the
mean of all classes’ average precision under the precision-recall curve.
Higher mAP indicated more successful rate in identifying objects.
Therefore, it can be spotted from Table 1 that both Fast-RCNN and
Faster R-CNN yielded a significant better accuracy than R-CNN under
two different evaluating databases. On top of that, since the
introduction of RPN, the training speed has exceedingly improved in
Fast R-CNN and Faster R-CNN.
12
2.2 PROJECT BACKGROUND
This subsection will cover the background of this project, our motivations
toward this project and the potentially beneficial outcomes of this project.
o Background of this project
Although sophisticated algorithm such as Fast R-CNN and Faster R-
CNN has been introduced, this project will base its framework on R-
CNN given its originality and better extensibility compare to other
advanced algorithms. Considering the technological constraint and
time limit, this project cannot perform substantial improvement to R-
CNN algorithm as what Fast R-CNN and Faster R-CNN do. The
scope of the research that will be covered in this project will be
introduced in the next section.
o Motivations toward this project
Study of object recognition has been conducted in an extensive way,
and since the introduction of R-CNN in 2014, which is not very long
time ago, tremendous improvement has been operated and achieved
spectacular results. Therefore, R-CNN may not be on the edge of
research nowadays.
However, the reason why we choose R-CNN is mainly because of its
modularity. The four individually working modules will allow us to
build up a version that can be of great use for future study, can be
tested and conduct comparative experiments individually, and can
allow us to easily separate our work. It is also an inspiration of
literature review on Fast R-CNN that makes us realize that we want to
13
examine how each module can be modified to help improving on the
accuracy rate of object recognition. Classifier is clearly a spot to
explore, and we will propose to use Topic Model to replace SVM for
future study on this project.
Moreover, the reason why we want to re-implement R-CNN in Python
language instead of other languages, is because of Python’s wide
acceptance by programmers who are exploring the field of computer
vision and machine learning. Implementation in Python would help
fellow programmers to exploit the modularity design of R-CNN, and
research on the module they want to improve and combine for further
advancement in this field of study.
o Potentially beneficial outcomes
The first potentially beneficial outcome would be a Python language
implementation of R-CNN. This is beneficial to all of the
programmers who want to start exploring the field of object
recognition and only familiar with Python language.
The second potential outcome would be providing a platform for
future research on how classifier plays an important role in improving
recognition accuracy. We will propose a future research prospective
on using Topic Model for future development.
14
3. SCOPE
Given thorough consideration on the complexity in conducting this research, the
scope of this project is currently limited to the following four parts and will
subject to changes that may occur during actual research progress:
3.1 R-CNN Focused
Although improved versions of R-CNN, such as Fast R-CNN and Faster
R-CNN, have emerged, this FYP project will focus solely on R-CNN,
considering the better extensibility of R-CNN.
3.2 Single Controlled Variable
Despite different approaches to improve the performance of R-CNN
algorithm, this project will only concentrate on raising mAP accuracy rate
of R-CNN. The research will realize by controlling variables of selective
search algorithm and CNN algorithm embedded in R-CNN.
3.3 Datasets for Research
Among numerous public datasets available for object recognition study,
two datasets will be leveraged in this project. PASCAL VOC 2007 will
be used for training the algorithm and PASCAL VOC 2012 will be
utilized to evaluate the performance for implementations.
15
3.4 Code Base Only
Given the project nature, no user interface will be developed. Instead, a
code base will be provided for implementation.
In short, the scope of this project will be limited to training and evaluating
algorithm’s performance by two pre-selected public datasets, with ultimate
deliverable to be a code-based algorithm that will generate better mAP accuracy
rate.
In light of the challenging aspects from both hardware and code perspective,
several high standard prerequisites will be required, listing in the next section.
16
4. PREREQUISITES
Two prerequisites on hardware and one prerequisite on source code for this
project is stated as follows:
1. Caffe [5] with Python layer and pycaffe framework.
2. GNU Parallel [6] for GPU parallel computing.
3. R-CNN source code which based on Matlab.
17
5. METHODOLOGY
In order to successfully conduct this project, research process is separated into
three two phases as follows:
5.1 Matlab Version Source Code Implementation
R-CNN source code is originally implemented in Matlab environment.
This version will be used as benchmark for the project, which will be
trained by Pascal VOC 2007 and evaluated by Pascal VOC 2012 to keep
all training and evaluating environment on accord.
5.2 Python Version Implementation
R-CNN will be reprogrammed to implement in Python environment
based on the Matlab version. It is generally recognized that Python offers
a better adaption to all programmers, since that Python is more commonly
used compare to Matlab. Moreover, Python environment offers more
extensibility so that further study based on this research project can be
easily conducted. The same training and evaluating environment will be
provided for this replicated version. The evaluation result will be
recorded as well to compare with Matlab version’s result, and also set as
another benchmark for the project. It is expected to achieve relatively the
same training speed and accuracy rate as Matlab version. However, a 5%
fluctuation will be allowed given the transition in language environment.
Noted that deviation from the expectation may be encountered, and with
the progress of research, the project schedule will subject to changes due
to this reason.
18
To summarize, the research will be conducted in 2 phases, starting with Matlab
version R-CNN implementation, then with Python version R-CNN
reproduction. The performance of Python version R-CNN is expected to be
generally in line with that of original Matlab version R-CNN. Progress of the
final year project will be reported in the next section.
19
6. PROGRESS
This section intends to give a detailed overview on all stages of work that we
have done during the research process and report corresponding results of each
stage of work. Then, it will tape back and provide the whole timeframe of our
project.
6.1 Stages of Work
This subsection will report all stages of work we achieve during this Final
Year Project, namely environment setup, Matlab version R-CNN
implementation, selective search implementation in Python version R-
CNN, CNN implementation in Python version R-CNN, SVM classifier
implementation in Python version R-CNN
o Environment Setup
A computer embedded with Intel i7 and 32G RAM was provided.
Ubuntu was chosen to be the operating system given that Linux
environment is able to process both Matlab and Python language.
Considering the high-standard hardware requirements for R-CNN
algorithm, 332G disk space was granted to this final year project.
In addition, GeForce GTX GPU was also setup to facilitate large
scale image processing, which will be used in this project.
On top of the hardware, Python and Caffe were installed in the
system, as well as Matlab.
o Matlab Version R-CNN Implementation
20
Original Matlab Version R-CNN was retrieved from Github, an
open source code community. This version was trained by
PASCAL VOC 2007 dataset and evaluated by PASCAL VOC
2012 dataset. The evaluation result was 49.6% mAP, close to the
value provided in original research paper. This proves that the
software and hardware environment was sufficient for Matlab
version R-CNN to implement. Moreover, 49.6% mAP will be set
as one of the benchmarks for the whole project.
o Selective Search Implementation in Python Version R-CNN
Figure 2: Selective Search Stage of Work [1] (Figure was reproduced on top of
original graph in paper)
Figure 2 illustrates the selective search stage of Python version R-
CNN implementation, which is highlighted in the red square. The
first part was to set up environment and prepared PASCAL VOC
21
2007 and PASCAL VOC 2012 datasets for image input. The
second part was to implement selective search algorithm in Python.
Selective search is an algorithm used in object recognition that is
designed to propose all class-independent object locations [9]. The
main rationale behind selective search is that images are naturally
hierarchical, and all objects can be identified through four multi-
scales: color, size, texture and fit.
Figure 3: Colour Scale Example [9]
Figure 3 illustrates an example for identifying objects through
colour. The image of two cats can be distinguished by their colour,
instead of their size or texture.
22
Figure 4: Texture Scale Example [9]
Figure 4 gives an example for identifying objects through texture.
The chameleon can be identified from the leaves that have similar
colour to itself through texture.
Figure 5: Fit Scale Example [9]
23
Figure 5 offers an example for identifying objects through fit. The
tyre of the red car is recognized as part of the car because that a
tyre naturally fit into a car, not because of its colour, texture or
size.
Moving forward, for Python version R-CNN algorithm to compute
the similarity of each scale, several mathematical formulas were
leveraged to conduct the computation.
Formula 1 [9]
Formula 1 denotes the similarity between any two regions i and j.
As mentioned before, regions can be distinguished from each other
through four multi-scales and thus similarity between two regions
can be computed as the sum of similarity of those four scales.
Constant terms a1, a2, a3 and a4 can only be 0 or 1, depending on
whether the corresponding scale is actually used to compute the
similarity.
The detailed formula for similarity of each scale is illustrated
below:
24
Formula 2 [9]
Formula 2 denotes the similarity of colour between any two
regions i and j, which is calculated as the minimal sum of each
colour point.
Formula 3 [9]
Formula 3 denotes the similarity of texture between any two
regions i and j, which is calculated as the minimal sum of each
texture point.
Formula 4 [9]
25
Formula 4 denotes the similarity of size between any two regions i
and j, which recognize the relative size scale of region i and region
j.
Formula 5 [9]
Formula 5 denotes the similarity of fit between any two regions i
and j, which aims at assembling parts of the object together.
With the backup of similarity computation, selective search is able
to select the minimum regions that are most likely to have objects
in it quantitatively. Higher similarity value indicates higher
possibility to contain objects in the region. By comparing the
similarity value computed through formulas presented above, R-
CNN algorithm is able to come up with the minimal set of region
proposals. Below is a visual explanation on how selective search
works on an image.
26
Figure 6: Selective Search Visual Example [9]
Figure 6 illustrates how selective search works on an input image.
From the left most of the figure, it first segmented image into
different regions and converted regions to boxes, which
represented the regions that have high possibility to contain
objects. Then according to the four multi-scales (colour, size,
texture and fit) mentioned before, selective search merged regions
wherever suitable and reduced the number of boxes. Eventually, as
shown in the right most of Figure 6, it output the minimum number
of regions that may possess objects.
The result of this stage is that Python Version R-CNN is now able
to process image input and apply selective search algorithm to
segment minimum regions with the highest possibility to contain
objects.
27
Figure 7: Interim Result Example [9]
Figure 7 gives a visual example on the result of this stage. In the
figure, an input image with two milk cows and fence was applied
to selective search. The proposed regions with the highest
possibility to contain object were circled in green. The less possible
regions were circled in blue for CNN computation use, which
would be implemented in later phases.
o CNN Implementation in Python Version R-CNN
CNN is a very crucial part in this project, which takes the region
proposals extracted through selective search and computed through
five hidden layers and two fully connected layers, to get features.
The structure of CNN we used are basically illustrated as below:
28
Figure 8: CNN [11]
We downloaded pre-trained model of CNN from Caffe official
website to ensure its compatibility. Slightly unlike the paper
suggested, the model we downloaded is pre-trained on ILSVRC
2013 dataset instead of ILSVRC 2012 dataset. This process can be
illustrated as below:
Figure 9: Pre-train CNN [12]
Then we build up the whole CNN module, enabling it to take in the
227 x 227 – dimensional feature vector from each region proposed.
29
Eventually, we use PASCAL VOC 2007 to fine-tune the CNN so
that it is ready for object detection. This step can be illustrated as
below:
Figure 10: Fine-tune CNN [12]
o SVM Classifier Implementation in Python Version R-CNN
SVM is served as classifier in this project, which takes features
computed in the CNN and classified them at a per class basis.
Here, we first set up 20 SVM corresponding to 20 different classes.
Then we first use the regional proposal extracted and undergone
CNN computation with training labels to train each class’s SVM,
to prepare it for final evaluation. This step can be illustrated as
below:
Figure 11: Train SVM [12]
30
6.3 Timeframe
This timeframe has been changing since we first proposed. As the
research goes on, we gradually realized the difficulty underlines this
project and we adopted flexible timeframe.
For example, we were halt from proceeding when we tried to get a
Matlab license for the machine we used to run the project. It took several
weeks before we got reply that we could not be granted a license there.
Therefore, we have to push later our original plan and seek new way to
familiar ourselves with Matlab version’s code.
Table 2 below gives a detailed overview on the time schedule of this
project:
Time Frame Task
Sep. 2017
(Completed)
● Detailed project plan submission
● Project webpage creation
● Initial meeting with Dr. KP Chan to confirm
research topic and setup regular meeting time
slots
● Related research paper reading
Oct. 2017
(Completed)
● Download R-CNN Matlab version source code
and familiarize with the algorithm
● Train and evaluate Matlab version R-CNN
31
Nov. 2017 – Jan.
2017
(Completed)
● Python version R-CNN implementation
● Check hardware status and decide on whether
higher standard hardware will be used
● Build up Python version R-CNN
● Interim report submission
Jan. 2018 – Apr.
2018
(Completed)
● Train and evaluate Python version R-CNN
● Experiment on Topic Model to see if it is
plausible for using in object recognition
Apr. 2018
(Completed)
● Final report submission
● Final project presentation
Table 2: Project Schedule
32
7. FINAL RESULT
We evaluate our Python version R-CNN by using PASCAL VOC 2012 dataset.
The result we achieved is roughly satisfying, with the best mAP rate to be
45.2%. The comparison between our result and original paper’s result is as
follows:
ORIGINAL R-CNN PYTHON VERSION R-CNN
TEST TIME PER
IMAGE
50 seconds N/A
SPEEDUP 1x N/A
MAP (VOC 2012) 49.6 45.2
Table 3: Comparison on Performance of Original R-CNN and Python Version
R-CNN [2][4]
Although the mAP is slightly lower than the original R-CNN, it is
understandable, considering the fact that we have changed the programming
language used. Also, out implementation may not be as perfect as the paper did,
and future perfection may help bring up the mAP rate.
33
8. DELIVERABLES
This project will deliver a Python version R-CNN algorithm using in object
recognition study, in the form of code base.
The algorithm achieved a mAP accuracy rate around 45.2% evaluating by
PASCAL VOC 2012 dataset. Project progress can be checked on:
http://i.cs.hku.hk/fyp/2017/fyp17015/
34
9. CHALLENGES AND MITIGATIONS
Challenges have been met and conquered for this project are hardware
constraint, uncertainty in Python version performance and modifications on
selective search. Detailed obstacles that occurred and corresponding mitigations
can be found as follow:
10.1 Hardware Constraint
R-CNN Matlab version requires sophisticated hardware support, such as
high-performance GPU, large disk spaces (around 200G) to cache image
and feature vector. The requirements may vary when we implement it in
Python environment, and as a consequence, potential delay to original
project schedule and lower-than-expected performance may occur.
Mitigation:
1) Current hardware provided by supervisor has been tested and
additional support such as higher memory space was provided by Dr.
KP Chan, which helped a lot.
2) Flexible project schedule was adopted.
10.2 Uncertainty in Python Version Performance
Due to the change in language environment, Python version R-CNN
performed worse than original Matlab version. Slower processing time
and less accuracy rate was encountered.
35
Mitigation:
1) As the difference was not significant, the project is considered
satisfactory and can be used to further perfection.
10.3 Modifications on Selective Search
Selective search algorithm used in the final year project generates around
2000 segments with different sizes and shapes. Nonetheless, the CNN
requires all the input segmentations to have exactly same size. To resolve
this conflict, two measures can be taken.
Mitigation:
1) Wrap all the images to a certain size after generating the images.
However, this may cause some misunderstanding for CNN. Because
the size of the object has been modified, a small little tree may be
recognized and treated as a big wooden door after resizing.
2) Pad the picture with some background color. This may also cause
confusion since the neural network is taking the whole picture into
consideration.
36
10. FUTURE WORK
Future work that can be based on this project are tremendous. We hereby
propose one aspect that may be very beneficial and promising. We could use
Topic Model with Latent Dirichlet Allocation (LDA) to serve as classifier
instead of original SVM classifier.
LDA, one of the topic models, is generally used for text mining. However, we
suspect that replacing it for SVM can help raising the object recognition
accuracy rate. And the replacement would not be too messy, given that we
adopted the original R-CNN modular design, so that classifier module as an
individual module, can be replaced without influencing the whole algorithm.
LDA is one of the most popular topic model, which uses Dirichlet distribution
[10]. Topic model technology are broadly used in document classification and
reading recommendation. By applying topic modelling to a reader’s reading
history, content providers can learn which topic the reader is more interested in
and will then offer similar documents to this reader.
LDA works by treating documents as a collection of words without sequential
order. Basing its idea on the fact that document is composed of topics and each
word belongs to one of these topics, it is quite similar to the idea of image
classification and thus may be able to use.
To achieve comparable result, a LDA Classifier version R-CNN can be trained
and evaluated by the exact same PASCAL datasets, in order to keep apple-to-
apple principle. Based on the result of output, specific parameter can be
calipered for achieving optimal performance.
37
Some potential results and corresponding explanation are as follows:
LDA Version R-CNN mAP > 49.6%
This indicates that LDA classifier indeed enhance the object recognition
accuracy rate. To interpret, classifier part is actually the key to solve the
accuracy rate problem and LDA classifier is more powerful compare to
original SVM classifier. In summary, this research could draw the
conclusion to be successful, if achieving mAP > 49.6%.
LDA Version R-CNN mAP = 49.6%
This indicates that LDA classifier only perform the same classification
capability as the original SVM classifier. In order to improve the
accuracy rate, classifier may not be the key to the problem under this
expected result condition.
LDA Version R-CNN mAP < 49.6%
This indicates that LDA classifier generates a worse result compare to
original classifier. The result may occur due to two possible reasons:
incapability between LDA classifier and CNN computation, and
incapability between selective search segmentation and LDA classifier. In
order to address this issue, additional efforts should be expected to make
corresponding modifications to selective search segmentation or CNN
computation.
38
39
11. CONCLUSION
This project researches on the heated object recognition topic, targeting to re-
implement the widely used R-CNN algorithm in Python language and aiming to
provide a better platform for future in the form of code base. The project has
finished all of the following parts: environment setup, Matlab version R-CNN
implementation, selective search implementation in Python version R-CNN,
CNN implementation in Python version R-CNN, SVM classifier
implementation in Python version R-CNN
Future research will focus on tapping the modularity of design, and research on
how different modules can be of help to increase accuracy rate.
In light of the upside benefit of potential improvement on object identification
accuracy with future development, this project is regarded as scientifically
meaningful. It is believed that the ultimate deliverable can inspire researchers in
the field of object recognition by deep learning neural networks.
40
REFERENCE
[1]Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature
hierarchies for accurate object detection and semantic segmentation.
In Proceedings of the IEEE conference on computer vision and pattern
recognition (pp. 580-587).
[2]Girshick, R. (2015). Fast r-cnn. In Proceedings of the IEEE international
conference on computer vision (pp. 1440-1448).
[3]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards
real-time object detection with region proposal networks. In Advances in
neural information processing systems (pp. 91-99).
[4]Artificial Intelligence. Retrived from
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/
object_localization_and_detection.html
[5] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... &
Darrell, T. (2014, November). Caffe: Convolutional architecture for fast
feature embedding. In Proceedings of the 22nd ACM international
conference on Multimedia (pp. 675-678). ACM.
[6]Ludwig, T. (2004). Research trends in high performance parallel
input/output for cluster environments.
[7]Gu, S., Tan, Y., & He, X. (2010). Discriminant analysis via support
vectors. Neurocomputing, 73(10), 1669-1675.
[8]Bellingegni, A. D., Gruppioni, E., Colazzo, G., Davalli, A., Sacchetti, R.,
Guglielmelli, E., & Zollo, L. (2017). NLR, MLP, SVM, and LDA: a
comparative analysis on EMG data from people with trans-radial
amputation. Journal of neuroengineering and rehabilitation, 14(1), 82.
41
[9]Uijlings, J. R., Van De Sande, K. E., Gevers, T., & Smeulders, A. W. (2013).
Selective search for object recognition. International journal of computer
vision, 104(2), 154-171.
[10] D. M. Blei, A. Y. Ng and M. I. Jordan, "Latent Dirichlet Allocation,"
Journal of machine Learning research, vol. 3, no. Jan, pp. 993-1022, 2003.
[11] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet
classification with deep convolutional neural networks. In Advances in
neural information processing systems (pp. 1097-1105).
[12] Picture retrieved from https://zhuanlan.zhihu.com/p/32564990
42