Thai Digit Recognition on License Plates using YoloV3

Thai Digit Recognition on License Plates using YoloV3

by

Nadimpalli Lakshmi Manasa

A thesis submitted in partial fulfillment of the requirements for the

degree of Master of Engineering in

Microelectronics and Embedded Systems

Examination Committee: Dr. Mongkol Ekpanyapong (Chairperson)

Dr. Matthew N. Dailey

Dr. A.M.Harsha S. Abeykoon

Nationality:

India

Previous Degree: B.Tech Electronics and Communications Engineering

Jawaharlal Nehru Technological University,

Hyderabad

Telangana, India

Scholarship Donor: AIT Fellowship

Asian Institute of Technology

School of Engineering and Technology

Thailand

May 2019

ACKNOWLEDGEMENTS

I am grateful to my family for their love, support and motivating me.

I would take this opportunity to thank my advisor Dr. Mongkol Ekpanyapong for his support valuable

suggestions guidance and encouragement have helped me throughout to complete my thesis.

I would be grateful to thank Mr.Chatchai and Mr.Clifford for their assistance in assembling GPU and

CPU.I would like to thank the committee members, Dr. A.M. Harsha S.Abeykoon & Dr. Matthew N.

Dailey for their valuable comments and suggestions for my thesis work. I thank Mr.Vasan Timpong

and Mr.Teerapon for sharing the data taken from Thai military base.

N.L.Manasa

May 2019

ii

ABSTRACT

Identification of license plate, Thai numerals including logo is proposed using deep learning

method. This is done by collecting the Thai character numerals (0-9) from which a dataset is

created to identify them and is constructed to train convolutional neural network (CNN). CNN

has proved to achieve state-of-the-art results in such tasks such as optical character recognition,

generic objects recognition, and real-time face detection and pose estimation, speech

recognition, license plate recognition etc. Subsequently, this method involves detection and

recognition of all the Thai numeral characters (0-9), using CNN model. With the trained model,

character recognition system is designed and implemented. In this thesis we make use of object

detection algorithm YOLOV3 with the help of network architecture Darknet. The proposed

method includes recognition and character segmentation in a given license plate image. The

experimental results indicate that the proposed deep learning algorithm method is efficient in

detecting the characters.

Keywords: Thai Digit Recognition, License plate, deep learning, convolutional neural

networks, YOLOv3

iii

TABLE OF CONTENTS

PAGE

CHAPTER TITLE

Title page

i

Acknowledgements ii

Abstract iii

Table of Contents iv

List of Figures v

List of Tables vi

1 Introduction

1.1 Description 1

1.2 Problem statement 2

1.3 Objectives

1.4 Limitations and Scope

2

2

2 Literature Review

2.1 Background 3

2.2 License Plate Detection 4

2.3 Uses of license plate recognition 5

2.4 Universal OCR 7

2.5 Character Recognition 7

2.6 Thai License Plate 8

2.7 Convolutional neural networks 10

2.8 Yolo v3 11

3 Methodology

3.1 Data collection 13

3.2 Annotating Images 16

3.3 Training and Testing 18

4 Experimental results 23

4.1 Results 23

5 Conclusion and Recommendations

5.1 Conclusion 39

5.2 Recommendations 39

References 40

iv

LIST OF FIGURES

FIGURE TITLE PAGE

1.1 Image and Characters segmented 1

2.1 Thai License Plate 8

2.2 License Plate with 2 letters and 4 serial numbers 9

2.3 License Plate with number followed by 2 letters and 4 serial

numbers

9

2.4 Examples of Thai military license plate 9

2.5 Typical CNN architecture 10

2.6 Yolo v3 architecture 12

3.1 Data collection for Thai license plates from a video

3.2 Image labelled using Bbox-label tool 16

3.3 Image labelled using LabelImg 17

3.4 Yolo format of txt files 17

3.5 Intersection over union 18

3.6 Screenshot taken from terminal during training of Images 19

3.7 Representation of algorithm 20

3.7 Display text on Image 21

3.8 Display text on video frame 22

4.1 (a) Screenshot from prediction window for detecting single digit 24

4.1 (b) Screenshot from prediction window for single digit 24

4.2 (a) License plate detection from an Image 25

4.2 (b) Confidence of predicted image in terminal 25

4.3 (a) License plate detection from an Image 26

4.3 (b) Confidence of predicted image in terminal 26

4.4 (a) Detection of all digits in license plate 27

4.4 (b) Confidence of each digit observed in terminal 27

4.5 (a) Detection of all digits in license plate 28

4.5 (b) Miss prediction in detecting digits 28

4.6 Wrong prediction of license plate in an image 29

4.7 Detection tested on full car image 29

4.8 Screenshot of terminal window after prediction 30

4.9 Result tested under different light conditions 31

4.10 Missed predictions of digits 31

4.11 Confidence of predicted digits of an image from East Entrance gate

31

4.12 Results displaying whole license plate number on the image 32

4.13 Graphical representation of average loss vs number of iterations 35

v

LIST OF TABLES

TABLE TITLE

PAGE

2.1 Comparison for different approaches of license plate detection 6

2.2 Comparison of ALPR using deep learning techniques 6

4.1 Testing videos on new data and results obtained 33

4.2 Overall accuracy on videos tested on new data 34

4.3 Precision values of cropped license plate with Thai digits 36

4.4 Precision values of Thai digits along with alpha-numeric numbers 37

4.5 True positive, False positive, False negative comparison

38

1

1.1 Description

CHAPTER 1

INTRODUCTION

Automatic number-plate recognition is a technology that uses optical character recognition on

images to read vehicle registration plates to create and find out vehicle location data. It can use

existing closed-circuit television, road-rule enforcement cameras, or cameras specifically

designed for the task. ANPR is mainly used by police forces around the world for law

enforcement purposes, to check if a vehicle is registered or licensed. It is also used for electronic

toll collection on pay-per-use roads in highways and as a method of traffic movement, for

example by highways agencies.

Automatic number plate recognition can be used to store the data and images captured by the

cameras as well as the text from the license plate, and also can be configurable to capture and

store a photo of the vehicle’s driver. Existing Systems commonly use infrared lighting to allow

the camera to take the picture at any time of day or night. ANPR technology must take into

account plate variations from place to place.

The main concerns about these systems have focused on privacy fears of government tracking

citizens' movements, misidentification, high error rates, and increased government spending. For

Identification of characters in a number plate there are many algorithms involved in the software to

accurately determine and recognize the license plate. They include plate localization, plate

orientation and sizing, normalization, character segmentation and optical character recognition. The

complexity of each of these sections of the program determines the accuracy of the system. During

normalization, some systems use edge detection approach to increase the picture difference between

the letters and the plate.

Figure 1.1: Image and Characters segmented

(source: https://en.wikipedia.org/wiki/Automatic_number-plate_recognition)

Text recognition from natural scene image belongs to the field of pattern recognition which

comes under the application of image processing technique. Pattern recognition is termed as a

study of supervised learning and how a system or machine can observe, analyze the environment

and learn to distinguish pattern. Machine learning is used for detecting the pattern of various

objects. Machine learning is of two types. First one is Supervised learning and another one is

Unsupervised learning. Supervised learning can learn by its past experience or it can learn with

the help of teacher. Unsupervised learning can draw interfaces from dataset. The text recognition

from natural scene images taken from various datasets is the first important step in image

https://en.wikipedia.org/wiki/Optical_character_recognition

https://en.wikipedia.org/wiki/Vehicle_registration_plate

https://en.wikipedia.org/wiki/Vehicle_location_data

https://en.wikipedia.org/wiki/Closed-circuit_television

https://en.wikipedia.org/wiki/Road-rule_enforcement_camera

https://en.wikipedia.org/wiki/Vehicle_registration

https://en.wikipedia.org/wiki/Vehicle_licence

https://en.wikipedia.org/wiki/Electronic_toll_collection



https://en.wikipedia.org/wiki/Road_pricing

https://en.wikipedia.org/wiki/Infrared

2

acquisition or taking images from different resources Pre-processing the image followed by

character segmentation and feature extraction. Pre-processing includes removal of noise or

resizing the image for proper segmentation. In given reference various method are described

for character segmentation. For achieving high recognition performance, the most important

thing is the selection of feature extraction method. Several feature extraction method are shown

in literature. The diagonal based feature learning method is used with the help of genetic

algorithm and provides accuracy in recognition. Template matching is also another method for

feature extraction.

1.2 Problem statement

In general, all the number plates contain alpha-numeric characters which are commonly used

in most of the countries. These alpha-numeric characters uniquely identify the vehicles within

the issuing regions database. But some license plates include the local language of their country

and these characters are difficult to recognize as they are in other form containing different

fonts and need to be trained by different algorithms to identify the characters accurately and get

the result. In the past, optical character recognition is used to perform this and now- a- days,

machine learning algorithms have developed to train any kind of data and recognize the

characters more efficiently compared to the existing ones.

1.3 Objectives

To recognize Thai numerals on a license plate in an image by creating a dataset of characters

in Thai language and training them using yolo method.

• Design a system which can detect Thai digits in a License plate image using YOLOV3.

• To collect large Data from Thai military base required to train the model. The dataset

includes different types of light conditions taken from all environments including

lowlight.

1.4 Limitations and Scope

Various researchers have developed several methods and techniques for the application of this

process. However, all the techniques have their own advantages and disadvantages. Moreover,

each country has its own system of numbering the LP, background, size, colors and language

of characters. Although some studies have been conducted on LP detection and recognition,

this is different from previous work studies because this is done by deep learning architecture

represented by a CNN model in both LP detection and recognition. Besides, to improve the

system, future work will focus on the accuracy rate of improvement in the detection and

recognition plaques with various constraints mentioned above. In addition, we can also try to

develop our real-time system using new technologies (smartphone, tablet, etc.) to exploit it in

a mobile environment.

3

CHAPTER 2

LITERATURE REVIEW

2.1 Background

Automatic License Plate Recognition (ALPR) method helps to recognize the number plate of a

vehicle in an efficient manner without the need for major human resources, human intervention

and this has become more important in the recent years. There are several reasons why need to

identification have increased. In the recent scenario, there are a growing number of cars on the

roads and all of them contain license plates. The quick development in digital image processing

technology has made it possible to identify and detect license plates at a faster rate. This whole

process may be done in less than 50 ms and gives 20 frames per second which is sufficient to

process real-time video streams.

Identification of vehicle number plates is useful for many different operators. For example, it is

used by government agencies to find cars and other vehicles that are involved in crime, check if

annual fees are paid or identify the owner of vehicle who violate the traffic rules. Many countries

like U.S., Japan, Germany, Italy, U.K and France have successfully applied and implemented

ALPR in their traffic management. Several private operators may also benefit from ALPR

systems. One such case is the inspiration behind the system developed in projects which relates

to a parking ticket payment system established by the Trondheim which is a localized company

WTW AS and uses major parking companies in Norway. The system allows the users to register

the license plate number of their cars either through a mobile application like various available

apps or through message sent along with the parking time they want to pay for. If the person

parking wants to check if a car has a valid parking ticket, he must manually enter the car license

plate number and search in a database. Each entry takes not more than a few seconds, but since

this is the main task of the person and must enter hundreds of vehicle data during a workday, this

becomes very difficult and burden. There is also an upper limit of how many cars a parking

attendant can check during a working day. Many systems are there to solve this problem and to

reduce the man work is to mount cameras on vehicles that can drive around the parking lot and

photo or film the parked car license plates. The ALPR system’s main purpose is to recognize the

license plate from the image or video stream, look it up in a database and see if the parking ticket

from the vehicle is valid. The requirement for such an ALPR system is high accuracy when

reading the license plates and also fast processing time.

The difficulty in recognizing the license plates from vehicles in the different test sets will impact

the accuracy of the system making direct comparisons of the accuracy and without considering

the complexity is meaningless. As pointed out by the authors, it is inappropriate to declare which

method gives the highest performance because of the lack of uniform ways to evaluate them.

A typical ALPR system can be split into two major stages:

1. License plate detection - detect the plate in the captured image.

2. Character/Digit segmentation - extract the alphanumeric characters from the plate

3. Digit recognition - recognize each individual Digit on the license plate.

Each stage has been implemented using various machine learning techniques. Traditional

machine learning techniques include features chosen by humans to represent the underlying

features of the image.

4

2.2 License Plate Detection

The license plate detection step greatly influences the accuracy of the next steps. The input is

an image containing none, one or multiple license plates and the output should be the portion

of the image containing the license plates only. A number of various methods and algorithms

have been proposed in the last years to solve this challenging problem. The accuracy has always

been an issue in this case and improved in the recent years but first locating a license plate in

captured images from a certain viewpoint considering different factors as occlusion and

illumination which refers to different light conditions affecting the process is still a challenge.

The brute-force method of processing each pixel present in an image gives a very high

processing time. As an alternative, the commonly used approach is to make use of notable

features in the license plate and only process the pixels which have these features reducing the

processing time considerably.

License plate recognition combined with optical character recognition is a combination of

integrated hardware and software that reads the license plates of a vehicle without the need of

human intervention. The main purpose is to identify and locate the vehicle properly. The main

aim is to replace the manual systems with an automated systems using license plates. Automatic

number plate recognition is a huge surveillance process which uses optical character

recognition on images to read the plates on different types of vehicles. They can use the closed-

circuit television or road- rule enforcement cameras, or ones specifically designed for

performing this task. They are used by various police forces as a method of toll collection on

pay-per-use roads and to monitor the traffic activities in huge cities. ANPR is used to store the

images that are being captured from the cameras as well as the text from the license plate. The

systems commonly use infrared lighting to allow the cameras to take the picture at any time of

the day. This mostly tends to be region specific due to plate variation from place to place.

License plate recognition (LPR) has received tremendous interest in past and recent years as a

challenging research topic. This is due to the fact that the conditions (e.g. light, color, dirt,

shadows, character sharpness, language etc.) and types of license plates are varied from place

to place. LPR has become an important part of many applications, for example road safety

enforcement, automatic parking lot control, automatic toll collection, speed limit enforcement,

and vehicle tracking and identification. LPR system may be installed as a part of traffic

monitoring systems working together with traffic light in order to identify the car that break

traffic rules or detect prohibited vehicles. In many cases, LPR system is useful for detecting

motorcycles with harmful behaviors, for example, riding against the traffic direction, riding

over speed limit, and not wearing helmet. A huge number of cars and vehicles have been used in

Thailand, where these activities are regularly found and the accidents occurring from these are

extremely high in number. These days Motorcycles have been widely used as household

vehicles and used by students as vehicles to go to school. Therefore, road safety enforcement is

now getting high attention as an important issue in order to reduce the possibility of accidence

from irresponsible motorists. Although many robust approaches have been employed by prior

research, deep learning approach has gain dramatic attention in recent day.

Till now, OCR tasks are solved by applying several steps which include text detection,

segmentation along with different pre-processing and character recognition as the final step of

the process which involves feature extractions and classification. Different text detection

methods depend on feature engineering techniques that are using: boundary features (boundary

of license plate for example), color features (color of the plate), specific color of the plate and

texture features (color transitions on the plate) or character features. There are various methods

of feature extractions for character recognition which depend on image representation forms

5

(grey level, binary, vector), such as template matching. There are modern approaches in

character recognition which involve using the most advanced techniques in deep learning. As

an example, deep convolutional neural networks (CNN) are used for multi-digit number

recognition task.

[C. Anagnostopoulos, I. Anagnostopoulos, V. Loumos, and E. Kayafas.] .A license plate-

recognition algorithm for intelligent transportation system applications,”. In this paper, a new

algorithm for vehicle license plate identification is proposed, on the basis of adaptive image

segmentation technique (sliding concentric windows) and connected component analysis with

a character recognition neural network. The algorithm was tested with natural-scene grey-level

vehicle images of different backgrounds and ambient illumination. The camera focused in the

plate, while the angle of view and the distance from the vehicle varied according to the

experimental setup.

[Kaushik Deba, Md. Ibrahim Khana, Anik Sahaa, and KangHyun Job, (2012)]. There is

segmentation technique used to sliding concentric windows (scw). In this we extract license

plate from natural properties by finding vertical and horizontal edges from vehicle region. On

the Basis of a novel adaptive image segmentation technique is for detecting candidate region

and Colour verification for candidate region on the basis of using hue and intensity in HSI

colour model verifying green and yellow LP and white LP, respectively. Basically they focused

on artificial neural network (AAN) new algorithm which is based on Korean number plate

system.

[M.I.Khalil, (2010)] .This LPR system consists of several modules: Image acquisition, licensed

plate extraction, segmentation & recognition of individual character. After the license plate

extraction phase, Information recognition phase (IPR) is applied. For this phase "moving

window technique" is used. For recognizing the image license plate, country name is load as

source image. Then the first image entry of country image set is loaded as an object. The

moving window technique is applied to detect that object within the image.

[J.M. Guo and Y.F. Liu]. “License plate localization and character segmentation with feedback

self-learning and hybrid binarization techniques,”. License plate localization (LPL) and

character segmentation (CS) play key roles in the license plate (LP) recognition system. In this

paper, they mentioned to these two issues. In LPL, histogram equalization is employed to solve

the low-contrast and dynamic-range problems; the texture properties, e.g., aspect ratio, and

color similarity are used to locate the LP; and the Hough transform is adopted to correct the

rotation problem.

2.3 Uses of license plate recognition

Check on Authenticity of Abandoned Vehicles: This method is used by the police forces

for law enforcement purposes in different scenarios. Here, the application can enable

them to take a quick look and check the authenticity of number plates installed on

abandoned or left out vehicles via mobile devices. Hence, this can curb illegal acts/anti-

social elements. A camera which is fixed additionally focuses on the car driver’s face,

capture and save the image for future security tests. In addition, this technology does not

need separate installations for each and every individual vehicle unlike other technologies

that install a transmitter for each vehicle.

Automation of Electronic Toll Collection: LPR can be used for automating the entry of

vehicles through toll barriers or toll gates. It combines with access control system to

recognize number plates listed in the toll collection database.

6

Table 2.1: Comparison for different approaches of license plate detection

Proposed method Methodology Results

SSD detector In this method, license plate is

detected by SSD detector and

then vertical projection

method is applied for

character segmentation.

Model: Lenet. Testing on

MNIST dataset-99.03%.

Custom data-73.27%.

CNN In this, Bangla character

recognition is performed by

training with method gradient-

based learning algorithm.

Large samples of Each

character data is collected and

feature are extracted.

Performance is evaluated by

using matlab based deep

learning framework.

Achieved 82.2% accuracy

when tested on different

samples.

Limitation was smaller

memory and computational

power

Brazilian license plate

detection using CNN model

Here, two CNN networks are

created. One is (FV/LPD-

NET) to detect car front view

and (LPS/CR-NET) to

recognize characters.

All the seven characters

detected correctly turns out to

be 63.18% and considering

partial matches, it is around

90.55%.

Smart lpr based on image

processing using neural

network

In this, Fusion technique is

used to extract the license

plate isolate the characters and

identify them using ANN.

The neural network is able to

recognize correct characters

with 95% with added noise.

Number plate recognition

without segmentation using

cnn

constructed a network which

includes convolutional

pooling and FC layers

Achieved an accuracy of

88.6% on a test set of 700

plate images.

Comparison: Different convolutional architectures used for license plate detection. Table

shows the time it takes to process an image and prediction accuracy score on the VOC2007

dataset.

Table 2.2 Comparison of ALPR using deep learning techniques

Architecture mAP FPS

R-CNN 66.0 0.05

Fast R-CNN 70.0 0.5

YOLO 63.4 45

YOLOv2 (288 x

288)

69.0 91

YOLOv2 (352 x

352)

73.7 81

Proposed method

YOLOv3

85 30

7

2.4 Universal OCR

There is no prominent trial on universal OCR. In most cases, a two-class classifier is

employed in the pre-processing step for separating input characters (or words) into

handwritten characters and machine-printed characters and then each character is fed

into an OCR module for handwritten characters or machine printed characters according

to its type. One of recent trials on this separation task is Zagoris et al. A possible reason

why universal OCR has not been tried is the difference between the distributions of

handwritten characters and machine-printed characters. Roughly speaking, handwritten

characters will have a rather anisotropic and wider Gaussian distribution (or Gaussian

mixture) whereas machine-printed characters more isotropic and narrower. This

difference might lead the use of different recognition techniques.

2.5 Character Recognition

The last step is to recognize each segmented character also known as optical character

recognition (OCR). This can be seen as an image classification problem with one class

per alpha-numeric character. In total there are 36 possible classes when analyzing most

western license plates, 26 letters and 10 digits. Existing methods can be split into two

categories: template-matching-based and learning-based approaches. A summary of the

advantages and disadvantages of each approach.

Learning-based approaches use machine learning techniques to discriminate characters

based on one or multiple features. Jiao et al. uses a neural network that learns to

discriminate based on image density. A number of CNN architectures also work well to

this approach. CNNs use multiple features and they are not required to define the

features in advance. In some cases, a pre-trained 9-layer CNN model sliding across the

bounding boxes is used.

Related Researches

There are many researchers who are trying to apply many new techniques to detect a car

license plate. The techniques and procedures that researchers used for license plate

detection are as follows:

1. Sobel edge detection technique combined with texture, moving windows to

detect license plate location. 2. Region-based technique to find a license plate location.

3. Gray level variation to detect a vehicle license plate.

4. Genetic Algorithm technique to find a car license plate.

All the license plate detection techniques mentioned above are the first step to identify

an exact location and a clear image of a license plate before recognition processes.

8

2.6 Thai License Plate

Department of Land Transportation of Thailand has imposed some rules, regulations

and specifications of Thai license plates. Each car must be classified by its purpose of

service. The plate is designed in order to distinguish and identify the service at a single

glance. This is done by the color of the printed characters and the background color of

a plate. For example, a private car has a plate with white background and black printed

characters. A public car such as a taxi has black characters printed on the yellow plate

while a special servicing car or a limousine has black characters on the green plate.

Figure 2.1 shows an example of a Thai license plate.

Figure 2.1: Thai License Plate

(source: https://www.beamng.com/resources/thailand-licence-plate-pack.2386/)

There are two rows of characters on the plate. They must be clearly printed at the center

of the row. The upper row is divided into two parts: the category on the left and the

running number on the right. The category part consists of one or two Thai consonants,

each from one of the 29 characters (out of all 44 consonants) or number between 0–9

including 0,1,2,3,4,5,6,7,8,9. The running number is made up from one to four digits,

each of which is a number between 0–9. The lower row shows the province in which the

car is registered, printed with smaller characters in the same color as those in the upper

row.

A normal Thai license plate is rectangular in shape and consists of 2 lines, the upper line and

lower line. The upper line is divided into 2 parts, the first consists of 2 characters that can be a

character and a number, and the second of which consists of 4 numbers. The upper line gives

identification of the car. The lower part shows the name of province in Thailand in which the car

license has been registered. The plate is 15 by 34 centimeters in size, with a colored and

embossed outline. The registration ID consists of two series letters followed by a serial number

http://www.beamng.com/resources/thailand-licence-plate-pack.2386/)

http://www.beamng.com/resources/thailand-licence-plate-pack.2386/)

9

up to four digits, from 1 to 9999, without leading zero’s, e.g. “กข 1” or “กข 1234”. A alpha-

numeric number may be added in front of the two characters if the letter pool has been

exhausted, as is the case in Bangkok since 2012, giving the format “1กข 1234”. Both license

plate styles are shown in Figure 2.2 and 2.3. Due to this case, the new plates, since 2012, have

reduced text size to keep the license plate size smaller. The province of registration is displayed

below the registration ID of number plate.

Figure 2.2: License Plate with 2 letters and 4 serial numbers

Figure 2.3: License Plate with a number followed by 2 letters and 4

serial numbers style

Thai military vehicles number plates: The Thai military base located in Chonburi,

Thailand has vehicles with different license plate and these are totally different from

normal Thai license plates. The vehicle include huge trucks to transport goods, cars, ambulance. So here are the samples of Thai military license plates.

Figure 2.4: Examples of Thai military license plates

10

2.7 Convolutional neural networks

A convolutional neural network termed as (CNN, or Conv Net) is a class of deep neural networks

which is most commonly used and applied for analyzing visual images. There are a number of

different types of CNN’s such as Googlenet, Resnet, Alexnet, VGG etc., Convolutional Neural

Networks(CNN) are made up of neurons that have learning weights. Each neuron receives some

inputs, then it performs a dot product and follows the next steps with a non-linearity. There are

basically two learning models while performing image processing tasks. They are named as

Supervised and Unsupervised learning. Supervised learning refers to learning through pre-

labelled inputs, which act as targets. The goal of this Supervised learning training is to reduce the

models overall error classification, through correct calculation of the output value of training

example. Unsupervised learning means it contains training set which does not contain any

labels.A convolutional neural network consists of an input and an output layer, as well as

multiple hidden layers inside it. The hidden layers consists of a CNN typically consist of

convolutional layers, RELU layer i.e. activation function, pooling layers, fully connected layers.

Each layer has a specific function and these are explained as follows.

Figure 2.5 : Typical CNN architecture

https://data-flair.training/blogs/cnn-tensorflow-cifar-10/

Convolution layer

The convolution layer is the main building block of a convolutional neural network. As we go

deeper to other convolution layers, the filters are doing dot products to the input of the

previous convolution layers. Convolution is the first layer to extract features from an input image.

Convolution preserves the relationship between pixels by learning image features using small

squares of input data. It is a mathematical operation that takes two inputs such as image matrix

and a filter or kernel.

Pooling layer

Pooling layers section would reduce the number of parameters when the images are too large.

Spatial pooling also called subsampling or down sampling which reduces the dimensionality of

each map but retains the important information. Spatial pooling can be of different types like

Max pooling sum pooling and average pooling. Max pooling and Average pooling takes the

https://data-flair.training/blogs/cnn-tensorflow-cifar-10/

11

largest element from the feature maps. The total sum of all the elements in a feature map is

known as sum pooling.

Fully connected layer

After multiple layers of convolution and padding, we would need the output in the form of a class.

The convolution and pooling layers would only be able to extract features and reduce the number

of parameters from the original images. However, to generate the final output we need to apply a

fully connected layer to generate an output equal to the number of classes we need. It becomes

tough to reach that number with just the convolution layers. Convolution layers generate 3D

activation maps while we just need the output as whether or not an image belongs to a particular

class.

2.8 YOLO V3

Yolov3 is an advanced model and latest over the existing yolo. Before this came into

picture, yolo9000 was the fastest and one of the most accurate algorithm among the

existing ones. The improvements made in yolov3 exceeds the previous existing methods.

Multi-label classification is made in this model and used. This makes use of a classifier

which is required to calculate the object detected in a specific label. Other one is the use of

various bounding boxes used for predicting the output. In yolov3, it assigns one bounding

box for each image. Yolo v3 predicts the boxes at 3 different scales and then it extracts the

features from those scales.

Yolov2 uses a deep architecture darknet-19, which is a 19-layer network with additionally

11 more layers included for object detection. However, this architecture is still lacking

behind the most important features that are now included and present in all algorithms.

Yolov3 is designed with all the new features lacking in the existing such as up-sampling,

having skip connections and residual blocks,

YOLO v3 uses a variant of Darknet, which originally has 53 layer network trained on

Imagenet. For the task of detection, 53 more layers are added and stacked onto it, giving us

a total of 106 layer fully convolutional underlying architecture for yolov3. The architecture

of YOLO now looks as follows.

12

Figure 2.6 Yolo v3 architecture

YOLO is a fully convolutional network and its eventual output is generated by applying a 1

x 1 kernel on a feature map. In YOLO v3, the detection is done by applying 1 x 1 detection

kernels on feature maps of three different sizes at three different places in the network. The

shape of the detection kernel is 1 x 1 x (B x (5 + C). Here B is the number of bounding

boxes a cell on the feature map can predict, “5” is for the 4 bounding box attributes and one

object confidence, and C is the number of classes. In YOLO v3 trained on COCO, B = 3 and

C = 80, so the kernel size is 1 x 1 x 255.

The first detection is made by the 82nd layer. For the first 81 layers, the image is down

sampled by the network, such that the 81st layer has a stride of 32. If we have an image of

416 x 416, the resultant feature map would be of size 13 x 13. One detection is made here

using the 1 x 1 detection kernel, giving us a detection feature map of 13 x 13 x 255. Then,

the feature map from layer 79 is subjected to a few convolutional layers before being up

sampled by 2x to dimensions of 26 x 26. This feature map is then depth concatenated with

the feature map from layer 61. Then the combined feature maps is again subjected a few 1

x 1 convolutional layers to fuse the features from the earlier layer (61). Then, the second

detection is made by the 94th layer, yielding a detection feature map of 26 x 26 x 255.

A similar procedure is followed again, where the feature map from layer 91 is subjected to few

convolutional layers before being depth concatenated with a feature map from layer 36. Like

before, a few 1 x 1 convolutional layers follow to fuse the information from the previous layer

(36). We make the final of the 3 at 106th layer, yielding feature map of size 52 x 52 x 25

13

CHAPTER-3

METHODOLOGY

3.1 Data collection

My work involved collecting images of Thai license plates from Thai military base using

surveillance camera which is placed in a position to detect the vehicle number plates at an

angle. From this we have a clear view of the license plates. The camera is accessed and

everyday data is send in the form of videos. The images are collected from two gates in the

military base. The images are collected in all the light conditions. We took snaps of all the

images obtained from the camera. We collected 600 images of Thai license plates and trained

them. Testing is done by collecting new videos from other days from Thai military base which

are not included in the training dataset.

The videos are taken from the East gate entrance, Main gate entrance and East gate exit of Thai

military base area. The camera is fixed in such a position so that only the license plates are seen clearly

as the vehicle moves towards it. Data is collected every day from 8 months and from each day there

are about 100-140 video of 3 minute duration each. Videos containing Thai digit number plates are

collected from them and each vehicle of such type is captured. Then it is labelled using LabelImg tool.

Here, from all the captured images, firstly only the license plates are cropped and further license plate

including the whole vehicle image is cropped.

Figure 3.1: Data collection for Thai license plates from a video

14

Images collected from East gate entrance and exit

Thai military base has many entrance and exit gates through which vehicles continuously move in and

out. The dataset is taken from all the possible gates and above are the pictures taken from East entrance

and exit gates. The vehicles coming through them contains Thai number plate cars which have Thai digits

printed on them along with numerals.

15

Images collected from main gate entrance and exit

The pictures above are taken from the Main gate entrance and exit from different positions with camera

placed at various angles. The data is captured from morning to evening time at different light conditions.

The cars moving from this gate also consists of Thai as well as normal numbers printed on them.

16

3.2 Annotating Images

We annotated the images using BBox-label tool in the first stage. But in this, we faced a

problem regarding labels as it was a bit difficult to convert them to yolo format. In BBox-label

tool, there is no option for labelling more than one class each time. So, as this takes more time

for labelling each class separately, we used LabelImg software which is an open source for our

work. Then annotated the images and all the files were saved in txt format. Here, we have totally

10 classes i.e. (0-9).With including the logo present on the plate, a total of 11 classes are present

on the license plate.

Figure 3.2: Image labelled using Bbox-label tool

17

Figure 3.3: Image labelled using LabelImg

Data Preparation

In yolov3, the values are fed into the system which has a specified format and the data is in txt

format which contains the classes and some values. These values look like below.

The order of yolo format txt files follows class, x, y, w, h

x = Absolute x / width of total image

y = Absolute y / height of total image

w = Absolute width / width of total image

h = Absolute height / height of total image

Where Absolute x, Absolute y, Absolute width, Absolute height are given below.

Absolute x = (Xmin + (Absolute width/2))

Absolute y = (Ymin + (Absolute height/2))

Absolute width = abs(Xmax - Xmin)

Absolute height = abs(Ymax - Ymin)

Figure 3.4 : Yolo format of txt files

18

3.3 Training and Testing

As we are using neural networks in our work, it requires a lot of computational power. Here,

we are using Intel core i7-7700k CPU with 4.2 GHz with Asus Prime Z270-A and Asus Rog-

SYRIX-GTX1080TI which has 16GB RAM for CPU and 11GB for GPU. This is well suited

to perform high end executions without any interventions.

In YOLOV3, there are filters present in the convolutional layers before the yolo layers. We

need to change the filters and is given by the formula

• Filters = (classes +5)*3

The values in the configuration file of YOLOv3 are set to batch number 64, subdivisions 16, learning

rate 0.001, saturation 1.5, exposure 1.5, hue 0.1, steps 400000, 450000, scales 0.1, 0.1 and anchors

of COCO dataset. The width and height are set to 416*416 to which the resolution of all the input

images were resized. And training is done, the number of iterations can be set at maximum batches.

When to stop the training depends on average loss function. This average loss function changes

gradually and decreases to some point where there is no further change for a few iterations and then it

can stop training. If the training is not stopped at that point, after some more iterations again the

average loss function increases and will start decreases after some more iterations and this process goes

on (Redmon,Mar 26, 2018).

Thresh is a minimum threshold of IOU(Intersection Over Union) considered during training.

IOU(Intersection over Union) is the fraction of the area of overlap to the area of union. Figure 3.5: Intersection over union

https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/

19

The resolution of the input image are resized to a width and height of 416. The learning

rate is set to 0.001, saturation and exposure values to 1.5.Then the training is started and

we can see this on the terminal window. In this method, there is no end point for the number

of iterations going on and we should know where the neural network will remain constant

while detection. This depends on the average loss function which is very high value at the

starting iterations and goes on decreasing as it moves to a high iteration number. We see

this on the terminal and when it is nearly zero and remains constant for next few epochs

then we can stop the training.

The weights are saved for every 100 iterations and this can be changed manually as

required. As soon as we stop training check the mean average precision (mAP) which

determines the accuracy of the model. To get this precisely, check for all the weights and

weigh with more precision is taken to detect the output. It is then tested on a test image.

Figure 3.6: Screenshot taken from terminal during training of Images

20

Flow chart for training and testing YOLOV3 model on custom data

Figure 3.7: Representation of algorithm

21

Flowchart for displaying text on Image

The function is modified to display detected classes text on the top of the image for increasing

the readability.

Figure 3.8: Display text on image

22

Flow chart for displaying text on video frames

Figure 3.9: Text display on video frame

23

CHAPTER 4

EXPERIMENTAL RESULTS

4.1 Results

After testing the images, we can see the output when an image is given. For every

weights file we check the precision and confidence percentage for each class and

check the performance of the system. From all the results, we can check and

clearly observe the system performance. The accuracy is good as far when tested.

For the license plates the accuracy is about 85 percent and for the digits it turns

out to be 83 percent. The videos have been collected for the whole day. But in

some cases it includes miss predictions.

Precision and recall are useful way to measure the quality of predictions. In pattern recognition,

precision which is also known as positive predicted value is given as the fraction of relevant

instances among all the given retrieved instances.

Precision = TP/TP+FP

TP: Case was positive and predicted positive

FP: Case was negative but predicted positive

For detection in video, we added alpha numeric plates along with Thai military license plates

because from the given video we have both vehicles moving through the gate continuously. So to

get better detection the images are trained also for normal digits for several iterations and then

tested on video. The video runs at 30-40 fps and saved result is stored in a file. The accuracy for

images of Thai military plates is 85% where as after adding alpha-numeric it has increased to

87%.There are also some miss predictions in all cases. Some license plates are not clear and the

digits are not clearly visible from all the license plates. Final testing is done by collecting new

videos which are not included in the training part. The output results from each testing case are

as shown below.

Figure 4.1(a) and 4.1(b) shows the output when trained for single digit detection. Figure 4.2(a),

4.3(a), 4.2(b), 4.3(b) shows the license plate detected in a car image and the confidence percentage

of detected license plate in the terminal. Figure 4.4(a), 4.5(a) shows the output of all the Thai

digits detected including the logo in cropped licence plate and 4.4(b) is the confidence of each

digit predicted. Figure 4.5(b) displays the wrong detections or the miss predictions taking place.

Figure 4.6 shows dual detections for a single license plate and Figure 4.7 is taken when tested

when we input a full car image and digits are detected.

24

Figure 4.1 (a): Screenshot from prediction window for detecting single digit

Figure 4.1 (b): Screenshot from prediction window for detecting single digit

25

Figure 4.2 (a): License plate detection from an Image

Figure 4.2 (b): Confidence of predicted image in terminal

26

Figure 4.3 (a): License plate detection from an image

Figure 4.3 (b): Confidence of predicted image in terminal

27

Figure 4.4 (a): Detection of all digits in license plate

Figure 4.4 (b): Confidence of each digit observed in terminal

28

Figure 4.5(a): Detection of all digits in license plate

Figure 4.5(b): Miss prediction in detecting digits

29

Figure 4.6: Wrong prediction of license plate in an image

Figure 4.7: Detection tested on full car image

30

Results of predictions from various gates and confidence score of detected digits

Figure 4.8: Screenshot of terminal window after prediction

These images are tested from the Thai vehicles coming from different gates and the

digits along with the license plate are detected and confidence percentage for each

digit is taken screenshot from the terminal window.

31

Figure 4.9: Result tested under different light conditions

Figure 4.10: Missed predictions of digits

Figure 4.11: Confidence of predicted digits of an image from East Entrance gate

32

Figure 4.12: Results displaying the whole license plate number on the image

The detected digits are all combined and to see the whole license plate number, code

is written to display the number detected by drawing a rectangular box and display

inside it using OpenCV drawing functions.

33

Table 4.1: Testing videos on new data and results obtained

Testin

g

videos

Numbe

r of cars

in video

Detecte

d license

plates

Miss

prediction

s

Test

video 1

8

All

license

plates

are

detected

-

Test

video 2

6

All

plates

detected

-

Test

video 3

12

10

license

plates

detected

Cars with

red license

plate.

Numbers

not

detected

Test

video 4

8

All

license

plates

detected

-

Test

video 5

9

4-5

license

plates

detected

Pole

obstructing

the camera

and sign

board

detected as

license

plate

Test

video 6

22

19 plates

detected

correctly

1 plate not

clear but

detecting

random

numbers, 1

plate not

clear

Test

video 7

5 All

digits in

lp

detected

correctly

-

Test

video 8

2 All

license

plates

detected

-

34

While testing in videos, each digit has its own confidence value and keeps changing

based on the detection. As the video keeps running continuously the license plate

number is being displayed on the left corner of the frame as long as car is seen in

the frame. There are total 12 classes. logo, license plate and digits from 0-9. The

testing is done on several videos in which Thai plate cars are present.

Table 4.2: Overall accuracy on videos tested on new data

Test

videos

Total

cars

Correct Incorrect Accuracy

1 8 8 - 100

2 6 6 - 100

3 12 10 2 83

4 22 21 1 95

5 9 7 2 77

6 8 8 - 100

7 5 5 - 100

8 3 3 - 100

9 5 4 1 80

10 4 3 1 75

11 7 7 - 100

12 7 6 1 85

13 3 3 - 100

14 7 6 1 83

Data is taken from another day from the military base for final testing

and each video duration in 3minutes. From this, the total number of cars

moving in the video are noted and license plate which are not detected are

taken as miss predictions. Each plate has a minimum of 4 digits for

normal plates and 6 for Thai plates. Each video accuracy is noted

manually based on true detections and the overall accuracy seems to be

around 91% for tested videos.

35

To display total number on the image, a rectangle box is first drawn using Opencv

drawing functions based upon the required dimensions. Then the data detected is to

be printed inside the rectangular box. For this the detected digits are stored in a string

and the labels in the string are used to print the detected output inside the rectangle

box. We can change the color of the rectangle box by giving required values to fill

the rectangle.

Figure 4.13: Graphical representation of average loss vs number of iterations for

license plate

The training and validation accuracy is calculated by taking down the values for every

1000 iterations and manually plotting them. The train accuracy is represented by blue and

validation is represented as red.

36

Table 4.3: Precision values of cropped license plate with Thai digits

Number of Iterations

Precision(%)

1000

90

2000

93

3000

93

4000

92

5000

90

6000

89

7000

89

8000

90

9000

89

10000

89

The license plates containing only Thai digits are cropped and each single digit is

labelled and trained to 13800 iterations and result is tested by giving test images

randomly and output precision for every 1000 iterations are noted.

37

Table 4.4: Precision values of Thai digits along with alpha-numeric numbers

Number of Iterations

Precision(%)

1000

49

2000

85

3000

86

4000

80

5000

85

6000

87

7000

87

8000

86

9000

85

10000

86

The vehicle images contains both Thai digit license plates and alpha-numeric

numbers. They are trained up to 15000 iterations and tested by giving a test image

and this weights are used while testing on a video since there are more normal cars

compared to the Thai military cars in video.

38

Table 4.5: True positive, False positive, False negative comparison table of

YOLOV3

Epoch’s TP FP FN

1000 384 273 227

1500 507 101 104

2000 544 76 67

2500 544 89 67

3000 545 69 66

3500 547 65 64

4000 526 77 85

4500 540 76 71

5000 548 69 63

5500 545 71 66

6000 550 62 61

6500 552 63 59

7000 549 65 62

7500 543 69 68

8000 549 70 62

8500 548 66 63

9000 546 73 65

9500 549 66 62

10000 548 56 63

The above is the table of confusion for all the digits predicted in an image and the values

are noted for every 1000 iterations. It includes true positives present in detecting the image

including the false positives and false negatives.

39

CHAPTER 5

Conclusion and Recommendations

5.1 Conclusion

Thai digit recognition is done by collecting data from different environments in real-time

scenario. It is performed and tested on images including the videos at various light conditions.

5.2 Recommendations

This can further recognize the characters along with the digits and can be extended to all the

languages to identify them.

40

REFERENCES

1. U. Yadav, S. Verma, D. K. Xaxa and C. Mahobiya, "A deep learning based character

recognition system from multimedia document," 2017 Innovations in Power and

Advanced Computing Technologies (i-PACT), Vellore, 2017.

2. Richard G. Casey and Eric Lecolinet , “A survey of methods and strategies in

character segmentation, “IEEE Transactions on Pattern Analysis and Machine

Intelligence,Vol.18,No.7, July 1966.

3. Anil. K. Jain ,and Torfin Taxt, “Feature extraction method for character recognition-

A Survey,” Pattern Recognition, vol. 29 ,no. 4, pp. 641-662, 1996.

4. Z. Selmi, M. Ben Halima and A. M. Alimi, "Deep Learning System for Automatic

License Plate Detection and Recognition," 2017 14th IAPR International Conference

on Document Analysis and Recognition (ICDAR), Kyoto, 2017.

5. C. Pornpanomchai and N. Anawatmongkon, "Thai License Plate Detection from a

Video Frame," 2009 WRI Global Congress on Intelligent Systems, Xiamen, 2009.

6. Shen-Zheng Wang and Hsi-Jian Lee, "Detection and recognition of license plate

characters with different appearances," Proceedings of the 2003 IEEE International

Conference on Intelligent Transportation Systems, 2003.

7. Phokharatkul, Pisit, and Chom Kimpan. ”Recognition of handprinted Thai characters

using the cavity features of character based on neural network.”Circuits and Systems,

1998.

8. S. Z. Masood, G. Shu, A. Dehghan, and E. G. Ortiz, “License plate detection and

recognition using deeply learned convolutional neural networks,” CoRR, vol.

abs/1703.07330, 2017.

9. S. Uchida, S. Ide, B. K. Iwana and A. Zhu, "A Further Step to Perfect Accuracy by

Training CNN with Larger Data," 2016 15th International Conference on Frontiers in

Handwriting Recognition (ICFHR), Shenzhen, 2016.

Thai Digit Recognition on License Plates using YoloV3

Documents

Transcript of Thai Digit Recognition on License Plates using YoloV3