Offline Handwritten Character Recognition using
Neural Network
KONATHALA YOGITHA #1
, JAGGAPU SWETHA #2
,
GHATTAMANENI PRAHARSHA #3
,CHAKKA SWAPNA #4
, GUBBALA SANDHYA#5
#1,
2,
3, 4 B.Tech Scholars, Department of Computer Science and Engineering,
Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi post, Backside of VSEZ
Kapujaggaraju Peta, Visakhapatnam, Andhra Pradesh,530046.
#5
Assistant Professor, Department of Computer Science and Engineering,
Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi post, Backside of VSEZ
Kapujaggaraju Peta, Visakhapatnam, Andhra Pradesh,530046.
ABSTRACT
An Ultimate objective of handwritten character recognition is to simulate the human
reading capabilities so that the computer can read, understand, edit and work as human do with
text using neural networks. Handwriting Character recognition has been one of the most
challenging research areas in field of image processing. Even though a lots of research work
have been done in the field of HCR, there is a problem we are facing in getting the best accuracy.
This paper describes the techniques for converting textual content from a paper document into
machine readable format. The purpose is to develop the software with a very high accuracy rate
and to complete in a minimum amount of time and space complexity and also optimal.
Key Words:
Neural Networks, Geometric Feature Extraction, Segmentation, Handwriting Character
Recognition, Textual Content.
1. INTRODUCTION
It is very simple to the human beings to understand the handwritten characters or typed
documents as we have ability to learn. This ability can also be induced to the machines by using
machine learning and artificial intelligence.The field which deals with this problem
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3890
known as (OCR) optical character recognition. This is the system used for changing electronic and
image text into the digital character to be read by machines[1].
Handwritten Character Recognition (HCR) is an area of pattern recognition used for
defining the ability of machine to analyse patterns and identify the characters[2]. Pattern
recognition is the science of making inferences from perceptual data based on prior knowledge
or statistical information[3].
One of the most successful applications of Neural Network is Character Recognition.
There are two categories in character recognition[4]-[6]
Offline character recognition
Online character recognition
Offline Character Recognition deals with the scanned handwritten document. Online
character recognition deals with conversion of characters that written using a special digitizer, it
takes the input at runtime.
There are two main areas in Character Recognition:
Printed Character Recognition
Handwritten Character Recognition
Figure 1. Represent the Classification of Optical Character Recognition
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3891
Printed Character Recognition consists of all printed texts of newspaper, magazines,
books and outputs of typewriters, printers or plotters. This paper presents system for recognition
of offline handwritten character using Neural Network in the MATLAB.
II. LITERATURE WORK
In this section we mainly discuss about the background work that was carried out in
finding the work that is related to offline handwritten character recognition using neural
networks.
MOTIVATION
In early stage, a notable attempt has been done in the area of character recognition by
Grimsdale in 1959. The research work in the early sixties was based upon an approach called
analysis-by-synthesis method suggested by Eden in 1968. Eden has formally proved that all
handwritten characters are formed by a finite number of schematic features, a point that was
implicitly included in previous works[7]. Later, this notion has been used in all methods in
syntactic approaches of character recognition. K. Gaurav, Bhatia P. K. Et al, this paper deals
with the various pre-processing techniques which are involved in the character recognition with
different kind of images ranges from a simple handwritten form-based documents and
documents containing coloured and complex background and varied intensities. In this, different
pre-processing techniques are used like skew detection. Here, we also use image enhancement
techniques for the contrast stretching, binarization, noise removal[8].
Salvador Espana-Boquera analyst proposed the utilization of hybrid or half plus half
concealed Marko show (HMM) Hidden Markov Model to perceive the handwritten content in
disconnected mode. The optical model's basic part was prepared with Marko chain procedure
and a multilayer perceptron was likewise used to gauge the probabilities[9].
In future, character recognition system might serve as a key factor to create a paperless
environment by digitizing and processing existing paper documents.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3892
III. THE PROPOSED USED METHODOLOGIES
In this section we mainly discuss about the proposed methodologies used for offline
handwritten character recognition using neural networks.
PRELIMINARY KNOWLEDGE
The proposed offline handwritten character recognition using neural networks has mainly 4
steps. They are as follows:
Image Acquisition and Pre-processing
Segmentation
Feature Extraction
Classification and Recognition
1. IMAGE ACQUISITION AND PRE-PROCESSING
In Image acquisition stage, the input image is provided to the recognition system. The
input can be either in an image format such as JPEG, BMT, etc. or scanned image, digital camera
or any other suitable digital input device.
PRE-PROCESSING
The series of operations performed on scanned input image is called pre-processing. It is
used for the enhancement of the image rendering it for segmentation. Normalization and noise
filtering are done in this step and also it defines the compact representation of pattern.
Binarization is a process of converting gray scale image into a binary image. Dilation of edges in
is done using sobel technique for the binarized image[10].
A) NOISE REMOVING:
Noise Removing technique is used to eliminate unwanted patterns. Techniques like
uniform and non-uniform filtering are used.
B) BINARIZATION:
All typed characters are translated into grey-scale picture. Every image of character is
catched vertically after translating gray scale image into binary matrix.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3893
C) NORMALIZATION:
The process of translating a picture data into the standard required form is called
Normalization. Size modifies the image in to pre-defined fixed size. Whereas skew is used
during scanning, when the text is deviated from the base line and for this skewing, detections and
their back-propagation results are required.
Figure 2. Represent the Block Diagram of Proposed Architecture
2. HWCR SEGMENTATION
In this stage, an image of characters is decomposed into sub-images of individual character.
The input in pre-processing is segmented into isolated characters by assigning a number using a
labeling process. This labeling process is used to extract the information about the number of
characters in the image. Every individual character in the given image is resized into fixed
pixels. The Segmentation techniques used in this system are word segmentation, line
segmentation and character segmentation.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3894
3. FEATURE EXTRACTION
Feature Extraction extracts different line types that form a particular character and also
concentrates on the positional features of the same. This technique explained was tested using a
Neural Network which was trained with the feature vectors obtained from the system proposed.
1. UNIVERSAL OF DISCOURSE:
It is defined as the shortest and smallest matrix that fits the entire character skeleton.
Every character image has its different line segments. So, every character in the scanned
image is independent of its size.
2. ZONING:
Zoning is dividing the image into fixed windows of equal size. This is done after universe
of discourse. At the time of system implementation two types of zoning were used. The
image was zoned into 9 equal windows sized. This was applied on individual zones rather
than complete image. To extract the different line segments in a zone, the entire skeleton
should be traversed in that zone. Certain pixels in the character skeleton were defined as
starters, intersections and minor for doing this purpose
3. STARTERS:
These are the pixels with one neighbor in the character skeleton. Before character
traversal starts, all the starters in the zone are found and made into a list.
Figure 3. Starters for Character ‘H’
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3895
4. INTERSECTIONS:
The definition is little bit complicated. The intersection pixel should contain more
than one neighbor. Neighboring pixels are classified into two categories i.e. diagonal pixels
and direct pixels. All those Pixels which are in the neighborhood of the intersection pixel that
are in horizontal and vertical directions are called as Direct Pixels. The pixels in the
neighborhood which are in diagonal direction to the pixel under consideration are called as
Diagonal pixels. The number of neighbors of a character skeleton also plays a key role. The
pixels are classified with 3, 4, 5 or neighbors. Once the intersections of an image are
identified then they are made into a list.
Figure 4. Intersections for Character ‘H’
5. MINOR STARTERS :
When the considered pixel has more than two neighbors then minor starters are created.
They are found along the course of traversal of the character skeleton. There are two conditions
that may occur i.e. intersection and non-intersection.
Figure 5. Minor Starters for Character ‘H’
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3896
Value = 1 - ((number of lines/10) x 2)
Length = (Total Pixels in that line type)/ (Total zone pixels)
After line segments are extracted, they have to be classified into following types:
Horizontal line, Vertical line, right diagonal line, Left diagonal line. After getting the
information about the line type of each segment Feature vector is formed. Every zone has a
corresponding feature vector. For the proposed algorithm every zone has a feature vector of
length 9. They are:
1) Number of horizontal lines
2) Number of vertical lines.
3) Number of Right diagonal lines.
4) Number of Left diagonal lines.
5) Normalized Length of all horizontal lines.
6) Normalized Length of all vertical lines.
7) Normalized Length of all right diagonal lines.
8) Normalized Length of all left diagonal lines.
9) Normalized Area of the Skeleton.
The number of any particular line type is given by using the following method,
Normalized length of any particular line is found by,
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3897
Each feature vector explained here is extracted individually for each zone. So, if there are
N zones, there will be 9N elements in feature vector for each zone. For the proposed system, the
original image first zoned into 9 zones by diving the matrix of an image. Then the features were
extracted for each zone.
4. CLASSIFICATION AND RECOGNITION
Classification is the important decision-making part of the system. Classifiers are used to
classify the feature vectors in predefined classes. Classifiers are first trained by a training set of
pattern samples to prepare a model which is then used to recognize the test samples. In this
paper, Neural network is used for classifying and recognizing of handwritten characters.
IV. EXPERIMENTAL REPORTS
We have conducted experiment on several sample offline data using MATLAB
R2017a Simulator for showing the performance of the proposed application. MATLAB (Matrix
Laboratory) is a multi-paradigm numerical computing environment and proprietary
programming language developed by Math Works. Finally, we developed an application which
can able to show the performance of our proposed application by taking a sample handwritten
text with both lower and uppercase letters and try to identify the result.
MAIN PAGE
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3898
From the above window we can see the main page having load image option and try to
load the image containing text. Once the image is uploaded in the text box, we try to conduct
train option and finally using neural networks the text is extracted separately and show as output.
IMAGE IS PRE-PROCESSED AND APPLIED NEURAL NETWORKS
From the above figure, we can able to see the image is selected with a sample text and we
applied neural networks method to extract the characters from that image.
From the above figure, we can see that the input image is preprocessed which results
in the noise removing, dilation and image filling of given scanned input image.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3899
From the above figure, we can able to see the text is extracted successfully from the input
image and stored the text in notepad.
V. CONCLUSION
We can conclude that diagonal and direction techniques of Feature Extraction are better
that generating high accuracy results compared to many of the traditional feature extraction
methods. The system has produced good results for images containing handwritten text written in
different styles, different size and alignment with varying background. The system is developed
in MATLAB and evaluated for a set of sample images containing handwritten text.
VI. ACKNOWLEDGMENT
We would like to acknowledge the support of the Department of Computer Science and
Engineering, Vignan’s Institute of Engineering for Women for their guidance and support they
have provided especially Mrs. G. Sandhya, Assistant Professor, Computer Science and
Engineering.
VII. REFERENCES
1. Ms. Neha Sahu, Mr. Nitin Kali Raman “An Efficient Handwritten Devnagari Character
Recognition System Using Neural Network” IEEE, pp.173-177, year 2013.
2. Rajneesh Rani, Renu Dhir, Gurpreet Singh Lehal “Script Identification of Pre-Segmented
Multi-Font Characters and Digits” IEEE 12th International Conference on Document
Analysis and Recognition,pp.1150-1154,Year 2013.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3900
3. Gabriela Castellano and Mark B. Sandler “Handwritten Digits Recognition Using Hough
Transform and Neural Networks” IEEE International Conference on Intelligent Processing
Systems, pp.313-316, Year 1996.
4. Online and off-line handwriting recognition: a comprehensive survey Plamondon R, Srihari
SN. IEEE Transactions on Pattern Analysis and Machine Intelligence.2000;:63-84.
5. Recognition of Six Popular Indian Scripts Pal U, Sharma N, Wakabayashi T, Kimura
F. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)
Vol 2.2007.
6. Indian script character recognition: a survey Pal U, Chaudhuri BB. Pattern
Recognition.2004-sep;:1887-1899.
7. An overview of character recognition methodologies Mantas J. Pattern Recognition.1986-
jan;:425-430.
8. Digital Image Processing, Third Edition Gonzalez RafaelC, Woods RichardE, Masters
BarryR. Journal of Biomedical Optics.2009.
9. A survey of methods and strategies in character segmentation Casey RG, Lecolinet E. IEEE
Transactions on Pattern Analysis and Machine Intelligence.1996-jul;:690-706.
10. A neural network approach to character recognition Rajavelu A, Musavi MT, Shirvaikar
MV. Neural Networks.1989-jan;:387-393.
VIII. ABOUT THE AUTHORS
1) KONATHALA YOGITHA is currently pursuing her final year B.Tech in Computer
Science and Engineering at Vignan’s Institute of Engineering for Women, Duvvada,
Vadlapudi post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam, Andhra
Pradesh,530046.Her area of interests includes Image Processing and Android
Applications.
2) JAGGAPU SWETHA is currently pursuing her final year B.Tech in Computer Science
and Engineering at Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi
post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam, Andhra
Pradesh,530046.Her area of interests includes Networking.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3901
3) GHATTAMANENI PRAHARSHA is currently pursuing her final year B.Tech in
Computer Science and Engineering at Vignan’s Institute of Engineering for Women,
Duvvada, Vadlapudi post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam,
Andhra Pradesh,530046.Her area of interests includes Networks and Internet of Things.
4) CHAKKA SWAPNA is currently pursuing her final year B.Tech in Computer Science
and Engineering at Vignan’s Institute of Engineering for Women, Duvvada, Vadlapudi
post, Backside of VSEZ Kapujaggaraju Peta, Visakhapatnam, Andhra
Pradesh,530046.Her area of interests includes Data Structures.
5) GUBBALA SANDHYA is currently working as an Assistant Professor in the
Department of Computer Science and Engineering at Vignan’s Institute of Engineering
for Women, Duvvada, Vadlapudi post, Backside of VSEZ Kapujaggaraju Peta,
Visakhapatnam, Andhra Pradesh, 530046.She has more than 3 years of teaching
experience and her research interests includes Data Mining and Image Processing.
Mukt Shabd Journal
Volume IX, Issue IV, APRIL/2020
ISSN NO : 2347-3150
Page No : 3902
Top Related