Download - Offline Handwritten Character Recognition using Neural Networkshabdbooks.com/gallery/395-april2020as.pdf · Neural Networks, Geometric Feature Extraction, Segmentation, Handwriting

Offline Handwritten Character Recognition using

Neural Network

KONATHALA YOGITHA #1

, JAGGAPU SWETHA #2

,

GHATTAMANENI PRAHARSHA #3

,CHAKKA SWAPNA #4

, GUBBALA SANDHYA#5

#1,

2,

3, 4 B.Tech Scholars, Department of Computer Science and Engineering,

Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi post, Backside of VSEZ

Kapujaggaraju Peta, Visakhapatnam, Andhra Pradesh,530046.

#5

Assistant Professor, Department of Computer Science and Engineering,

Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi post, Backside of VSEZ

Kapujaggaraju Peta, Visakhapatnam, Andhra Pradesh,530046.

ABSTRACT

An Ultimate objective of handwritten character recognition is to simulate the human

reading capabilities so that the computer can read, understand, edit and work as human do with

text using neural networks. Handwriting Character recognition has been one of the most

challenging research areas in field of image processing. Even though a lots of research work

have been done in the field of HCR, there is a problem we are facing in getting the best accuracy.

This paper describes the techniques for converting textual content from a paper document into

machine readable format. The purpose is to develop the software with a very high accuracy rate

and to complete in a minimum amount of time and space complexity and also optimal.

Key Words:

Neural Networks, Geometric Feature Extraction, Segmentation, Handwriting Character

Recognition, Textual Content.

1. INTRODUCTION

It is very simple to the human beings to understand the handwritten characters or typed

documents as we have ability to learn. This ability can also be induced to the machines by using

machine learning and artificial intelligence.The field which deals with this problem

Mukt Shabd Journal

Volume IX, Issue IV, APRIL/2020

ISSN NO : 2347-3150

Page No : 3890

known as (OCR) optical character recognition. This is the system used for changing electronic and

image text into the digital character to be read by machines[1].

Handwritten Character Recognition (HCR) is an area of pattern recognition used for

defining the ability of machine to analyse patterns and identify the characters[2]. Pattern

recognition is the science of making inferences from perceptual data based on prior knowledge

or statistical information[3].

One of the most successful applications of Neural Network is Character Recognition.

There are two categories in character recognition[4]-[6]

Offline character recognition

Online character recognition

Offline Character Recognition deals with the scanned handwritten document. Online

character recognition deals with conversion of characters that written using a special digitizer, it

takes the input at runtime.

There are two main areas in Character Recognition:

Printed Character Recognition

Handwritten Character Recognition

Figure 1. Represent the Classification of Optical Character Recognition

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3891

Printed Character Recognition consists of all printed texts of newspaper, magazines,

books and outputs of typewriters, printers or plotters. This paper presents system for recognition

of offline handwritten character using Neural Network in the MATLAB.

II. LITERATURE WORK

In this section we mainly discuss about the background work that was carried out in

finding the work that is related to offline handwritten character recognition using neural

networks.

MOTIVATION

In early stage, a notable attempt has been done in the area of character recognition by

Grimsdale in 1959. The research work in the early sixties was based upon an approach called

analysis-by-synthesis method suggested by Eden in 1968. Eden has formally proved that all

handwritten characters are formed by a finite number of schematic features, a point that was

implicitly included in previous works[7]. Later, this notion has been used in all methods in

syntactic approaches of character recognition. K. Gaurav, Bhatia P. K. Et al, this paper deals

with the various pre-processing techniques which are involved in the character recognition with

different kind of images ranges from a simple handwritten form-based documents and

documents containing coloured and complex background and varied intensities. In this, different

pre-processing techniques are used like skew detection. Here, we also use image enhancement

techniques for the contrast stretching, binarization, noise removal[8].

Salvador Espana-Boquera analyst proposed the utilization of hybrid or half plus half

concealed Marko show (HMM) Hidden Markov Model to perceive the handwritten content in

disconnected mode. The optical model's basic part was prepared with Marko chain procedure

and a multilayer perceptron was likewise used to gauge the probabilities[9].

In future, character recognition system might serve as a key factor to create a paperless

environment by digitizing and processing existing paper documents.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3892

III. THE PROPOSED USED METHODOLOGIES

In this section we mainly discuss about the proposed methodologies used for offline

handwritten character recognition using neural networks.

PRELIMINARY KNOWLEDGE

The proposed offline handwritten character recognition using neural networks has mainly 4

steps. They are as follows:

Image Acquisition and Pre-processing

Segmentation

Feature Extraction

Classification and Recognition

1. IMAGE ACQUISITION AND PRE-PROCESSING

In Image acquisition stage, the input image is provided to the recognition system. The

input can be either in an image format such as JPEG, BMT, etc. or scanned image, digital camera

or any other suitable digital input device.

PRE-PROCESSING

The series of operations performed on scanned input image is called pre-processing. It is

used for the enhancement of the image rendering it for segmentation. Normalization and noise

filtering are done in this step and also it defines the compact representation of pattern.

Binarization is a process of converting gray scale image into a binary image. Dilation of edges in

is done using sobel technique for the binarized image[10].

A) NOISE REMOVING:

Noise Removing technique is used to eliminate unwanted patterns. Techniques like

uniform and non-uniform filtering are used.

B) BINARIZATION:

All typed characters are translated into grey-scale picture. Every image of character is

catched vertically after translating gray scale image into binary matrix.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3893

C) NORMALIZATION:

The process of translating a picture data into the standard required form is called

Normalization. Size modifies the image in to pre-defined fixed size. Whereas skew is used

during scanning, when the text is deviated from the base line and for this skewing, detections and

their back-propagation results are required.

Figure 2. Represent the Block Diagram of Proposed Architecture

2. HWCR SEGMENTATION

In this stage, an image of characters is decomposed into sub-images of individual character.

The input in pre-processing is segmented into isolated characters by assigning a number using a

labeling process. This labeling process is used to extract the information about the number of

characters in the image. Every individual character in the given image is resized into fixed

pixels. The Segmentation techniques used in this system are word segmentation, line

segmentation and character segmentation.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3894

3. FEATURE EXTRACTION

Feature Extraction extracts different line types that form a particular character and also

concentrates on the positional features of the same. This technique explained was tested using a

Neural Network which was trained with the feature vectors obtained from the system proposed.

1. UNIVERSAL OF DISCOURSE:

It is defined as the shortest and smallest matrix that fits the entire character skeleton.

Every character image has its different line segments. So, every character in the scanned

image is independent of its size.

2. ZONING:

Zoning is dividing the image into fixed windows of equal size. This is done after universe

of discourse. At the time of system implementation two types of zoning were used. The

image was zoned into 9 equal windows sized. This was applied on individual zones rather

than complete image. To extract the different line segments in a zone, the entire skeleton

should be traversed in that zone. Certain pixels in the character skeleton were defined as

starters, intersections and minor for doing this purpose

3. STARTERS:

These are the pixels with one neighbor in the character skeleton. Before character

traversal starts, all the starters in the zone are found and made into a list.

Figure 3. Starters for Character ‘H’

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3895

4. INTERSECTIONS:

The definition is little bit complicated. The intersection pixel should contain more

than one neighbor. Neighboring pixels are classified into two categories i.e. diagonal pixels

and direct pixels. All those Pixels which are in the neighborhood of the intersection pixel that

are in horizontal and vertical directions are called as Direct Pixels. The pixels in the

neighborhood which are in diagonal direction to the pixel under consideration are called as

Diagonal pixels. The number of neighbors of a character skeleton also plays a key role. The

pixels are classified with 3, 4, 5 or neighbors. Once the intersections of an image are

identified then they are made into a list.

Figure 4. Intersections for Character ‘H’

5. MINOR STARTERS :

When the considered pixel has more than two neighbors then minor starters are created.

They are found along the course of traversal of the character skeleton. There are two conditions

that may occur i.e. intersection and non-intersection.

Figure 5. Minor Starters for Character ‘H’

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3896

Value = 1 - ((number of lines/10) x 2)

Length = (Total Pixels in that line type)/ (Total zone pixels)

After line segments are extracted, they have to be classified into following types:

Horizontal line, Vertical line, right diagonal line, Left diagonal line. After getting the

information about the line type of each segment Feature vector is formed. Every zone has a

corresponding feature vector. For the proposed algorithm every zone has a feature vector of

length 9. They are:

1) Number of horizontal lines

2) Number of vertical lines.

3) Number of Right diagonal lines.

4) Number of Left diagonal lines.

5) Normalized Length of all horizontal lines.

6) Normalized Length of all vertical lines.

7) Normalized Length of all right diagonal lines.

8) Normalized Length of all left diagonal lines.

9) Normalized Area of the Skeleton.

The number of any particular line type is given by using the following method,

Normalized length of any particular line is found by,

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3897

Each feature vector explained here is extracted individually for each zone. So, if there are

N zones, there will be 9N elements in feature vector for each zone. For the proposed system, the

original image first zoned into 9 zones by diving the matrix of an image. Then the features were

extracted for each zone.

4. CLASSIFICATION AND RECOGNITION

Classification is the important decision-making part of the system. Classifiers are used to

classify the feature vectors in predefined classes. Classifiers are first trained by a training set of

pattern samples to prepare a model which is then used to recognize the test samples. In this

paper, Neural network is used for classifying and recognizing of handwritten characters.

IV. EXPERIMENTAL REPORTS

We have conducted experiment on several sample offline data using MATLAB

R2017a Simulator for showing the performance of the proposed application. MATLAB (Matrix

Laboratory) is a multi-paradigm numerical computing environment and proprietary

programming language developed by Math Works. Finally, we developed an application which

can able to show the performance of our proposed application by taking a sample handwritten

text with both lower and uppercase letters and try to identify the result.

MAIN PAGE

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3898

https://en.wikipedia.org/wiki/Multi-paradigm_programming_language

https://en.wikipedia.org/wiki/Multi-paradigm_programming_language

https://en.wikipedia.org/wiki/Proprietary_programming_language

https://en.wikipedia.org/wiki/Proprietary_programming_language

https://en.wikipedia.org/wiki/MathWorks

From the above window we can see the main page having load image option and try to

load the image containing text. Once the image is uploaded in the text box, we try to conduct

train option and finally using neural networks the text is extracted separately and show as output.

IMAGE IS PRE-PROCESSED AND APPLIED NEURAL NETWORKS

From the above figure, we can able to see the image is selected with a sample text and we

applied neural networks method to extract the characters from that image.

From the above figure, we can see that the input image is preprocessed which results

in the noise removing, dilation and image filling of given scanned input image.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3899

From the above figure, we can able to see the text is extracted successfully from the input

image and stored the text in notepad.

V. CONCLUSION

We can conclude that diagonal and direction techniques of Feature Extraction are better

that generating high accuracy results compared to many of the traditional feature extraction

methods. The system has produced good results for images containing handwritten text written in

different styles, different size and alignment with varying background. The system is developed

in MATLAB and evaluated for a set of sample images containing handwritten text.

VI. ACKNOWLEDGMENT

We would like to acknowledge the support of the Department of Computer Science and

Engineering, Vignan’s Institute of Engineering for Women for their guidance and support they

have provided especially Mrs. G. Sandhya, Assistant Professor, Computer Science and

Engineering.

VII. REFERENCES

1. Ms. Neha Sahu, Mr. Nitin Kali Raman “An Efficient Handwritten Devnagari Character

Recognition System Using Neural Network” IEEE, pp.173-177, year 2013.

2. Rajneesh Rani, Renu Dhir, Gurpreet Singh Lehal “Script Identification of Pre-Segmented

Multi-Font Characters and Digits” IEEE 12th International Conference on Document

Analysis and Recognition,pp.1150-1154,Year 2013.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3900

3. Gabriela Castellano and Mark B. Sandler “Handwritten Digits Recognition Using Hough

Transform and Neural Networks” IEEE International Conference on Intelligent Processing

Systems, pp.313-316, Year 1996.

4. Online and off-line handwriting recognition: a comprehensive survey Plamondon R, Srihari

SN. IEEE Transactions on Pattern Analysis and Machine Intelligence.2000;:63-84.

5. Recognition of Six Popular Indian Scripts Pal U, Sharma N, Wakabayashi T, Kimura

F. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)

Vol 2.2007.

6. Indian script character recognition: a survey Pal U, Chaudhuri BB. Pattern

Recognition.2004-sep;:1887-1899.

7. An overview of character recognition methodologies Mantas J. Pattern Recognition.1986-

jan;:425-430.

8. Digital Image Processing, Third Edition Gonzalez RafaelC, Woods RichardE, Masters

BarryR. Journal of Biomedical Optics.2009.

9. A survey of methods and strategies in character segmentation Casey RG, Lecolinet E. IEEE

Transactions on Pattern Analysis and Machine Intelligence.1996-jul;:690-706.

10. A neural network approach to character recognition Rajavelu A, Musavi MT, Shirvaikar

MV. Neural Networks.1989-jan;:387-393.

VIII. ABOUT THE AUTHORS

1) KONATHALA YOGITHA is currently pursuing her final year B.Tech in Computer

Science and Engineering at Vignan’s Institute of Engineering for Women, Duvvada,

Vadlapudi post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam, Andhra

Pradesh,530046.Her area of interests includes Image Processing and Android

Applications.

2) JAGGAPU SWETHA is currently pursuing her final year B.Tech in Computer Science

and Engineering at Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi

post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam, Andhra

Pradesh,530046.Her area of interests includes Networking.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3901

3) GHATTAMANENI PRAHARSHA is currently pursuing her final year B.Tech in

Computer Science and Engineering at Vignan’s Institute of Engineering for Women,

Duvvada, Vadlapudi post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam,

Andhra Pradesh,530046.Her area of interests includes Networks and Internet of Things.

4) CHAKKA SWAPNA is currently pursuing her final year B.Tech in Computer Science

and Engineering at Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi

post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam, Andhra

Pradesh,530046.Her area of interests includes Data Structures.

5) GUBBALA SANDHYA is currently working as an Assistant Professor in the

Department of Computer Science and Engineering at Vignan’s Institute of Engineering

for Women, Duvvada, Vadlapudi post, Backside of VSEZ Kapujaggaraju Peta,

Visakhapatnam, Andhra Pradesh, 530046.She has more than 3 years of teaching

experience and her research interests includes Data Mining and Image Processing.

Mukt Shabd Journal


ISSN NO : 2347-3150

Page No : 3902