09_chapter_01.pdf

CHAPTER 1

INTRODUCTION

A Neural Network Based Handwritten Character Recognition for Marathi Script

1. Introduction 2

Pattern recognition is a set of mathematical, statistical and heuristic techniques

used in executing ‘man-like’ tasks on computers. Pattern recognition plays an important

role in many applications such as document processing, robot vision, optical character

recognition, writer identification and verification etc. Automation of pattern recognition

helps to speed up processing time as well as to automate processes without human

intervention. Computer recognition of characters or word is one of the most successful

applications in computer vision. It is an automated process that uses pattern recognition

and machine learning techniques to recognize characters or words. With the ever

increasing computational speed, memory, technical advances in scanning devices and

maturity of recognition techniques, a tremendous progress is seen in the development of

character recognition. Section 1.1 discusses optical character recognition. Section 1.2

presents the types of handwriting OCR systems. The research in OCR systems is

discussed in Section 1.3. The components of an OCR system are described in Section

1.4. Section 1.5 presents the evolution of OCR. Marathi script and its properties are

discussed in Section 1.6. The Institutes in India doing research in Devanagari are

mentioned in Section 1.7. The challenges in Marathi handwriting recognition are

mentioned in Section 1.8. Section 1.9 puts forth the problem statement, while Section

1.10 and 1.11 discusses the objectives and the scope of the proposed work respectively.

Section 1.12 presents the organization of the thesis and finally Section 1.13 presents the

concluding remarks.

1.1 Optical character recognition

Optical character recognition (OCR) is the recognition of printed or handwritten

text by a computer. This involves photo scanning of the text character-by-character,

analysis of the scanned-in image, and then translation of the character image into

character codes, such as ASCII, commonly used in data processing. In OCR processing,

the scanned-in image or bitmap is analyzed for light and dark areas in order to identify

each alphabetic letter or numeric digit. When a character is recognized, it is converted

into an ASCII code. Special circuit boards and computer chips designed especially for

OCR are used to speed up the recognition process. Research in OCR is popular for its

various application potentials in office automation, cheque verification in banks, post-

offices, and a large variety of business and data entry applications. Other applications


1. Introduction 3

involve reading aid for the blind, library automation, language processing and

multimedia design.

1.2 Types of handwriting OCR systems

There are two major problem domains in handwriting recognition: online and

offline. Online handwriting recognition involves the automatic conversion of text as it is

written on a special digitizer or personal digital assistant (PDA), where a sensor picks up

the pen-tip movements as well as pen-up/pen-down switching. That kind of data is

known as digital ink and can be regarded as a dynamic representation of handwriting.

The obtained signal is converted into letter codes which are usable within computer and

text-processing applications. The elements of an on-line handwriting recognition

interface typically include:

• a pen or stylus for the user to write with.

• a touch sensitive surface, which may be integrated with, or adjacent to, an output

display.

• a software application which interprets the movements of the stylus across the

writing surface, translating the resulting strokes into digital text.

Online character recognition is sometimes confused with OCR. OCR is an

instance of offline character recognition, where the system recognizes the fixed static

shape of the character, while on-line character recognition instead recognizes the

dynamic motion during handwriting. For example, online recognition, such as that used

for gestures in the Tablet PC can tell whether a horizontal mark was drawn right-to-left,

or left-to-right. Online character recognition is also referred to by other terms such as

dynamic character recognition, real-time character recognition, and Intelligent Character

Recognition or ICR. On-line systems for recognizing hand-printed text on the fly have

become well-known as commercial products in recent years. Among these are the input

devices for personal digital assistants such as those running Palm OS. The Apple Newton

pioneered this product. The algorithms used in these devices take advantage of the fact

that the order, speed, and direction of individual lines segments at input are known. Also,

the user can be retrained to use only specific letter shapes. These methods cannot be used

in software that scans paper documents, so accurate recognition of offline character

recognition is still largely an open problem.


1. Introduction 4

In offline handwriting recognition, digitized spatial information is available to the

recognition system, e.g. the image of the address scanned from an envelope or an amount

shown on a cheque. Off-line handwriting recognition involves the automatic conversion

of text in an image into letter codes which are usable within computer and text-

processing applications. Off-line handwriting recognition is comparatively difficult, as

different people have different handwriting styles and the information available to the

recognition system is limited.

In terms of processing word data, there are two approaches: the analytical

approach which treats the word as an orderly collection of parts. In such approaches, the

word image is segmented into subunits based on some properties or rules, e.g. character-

like properties or shapes that match a part list e.g. alphabet. Before recognition, the word

has to be reconstructed. Segmenting a word image into meaningful parts is itself a

difficult task; there are no techniques that are able to reliably segment a word into

characters. Although excessive research has been done along this direction, it remains an

open question as to whether it is the most promising approach.

Another approach is the holistic approach which considers the word as a single

entity. This avoids the problem of character segmentation, but is an area which has been

less studied than the analytical approach. However, research along this line has recently

gained considerable interest.

1.3 Research in OCR systems

Handwriting Recognition has an active community of academics studying it. The

biggest conferences for handwriting recognition are the International Conference on

Frontiers in Handwriting Recognition (ICFHR), held in even-numbered years, and the

International Conference on Document Analysis and Recognition (ICDAR), held in odd-

numbered years. Both the conferences are scrutinized by the IEEE. Active areas of

research in these conferences include:

• Character Recognition

• Graphics Recognition

• Document Image Analysis

• Document Understanding

• Camera-based Document Processing


1. Introduction 5

• Document Databases and Digital Libraries

• Sketching Interfaces

• Historical Documents

• Handwriting Recognition Techniques

• Preprocessing and Segmentation Techniques

• Classifiers and their Combinations

• Multiple Sources and Multiple Experts

• Innovative Approaches in Handwriting Recognition

• Soft Computing for Handwriting Processing and Understanding

• Error Reduction and Performance Enhancement

• Writer Verification and Identification

• Multimedia Systems

• Color Information Utilization

• WWW Applications

The International Journal on Document Analysis and Recognition (IJDAR)

sponsored by the International Association for Pattern Recognition is focused on

publishing articles that cover all areas related to document analysis and recognition. This

includes contributions dealing with computer recognition of characters, symbols, text,

lines, graphics, images, handwriting, signatures, as well as automatic analyses of the

overall physical and logical structures of documents, with the ultimate objective of a

high-level understanding of their semantic content.

1.4 Components of an OCR system

Similar to any pattern recognition system, the character recognition system

contains the basic components as shown in Figure 1.1.

Figure 1.1 Components of an OCR

Pre-

processing

Feature

Extraction

Recognition Character

image


1. Introduction 6

The character image is captured using an acquisition device such as flatbed

scanner. In pre-processing, the system carries out data cleaning tasks which may include

noise reduction, skew detection and correction, slant detection and correction,

normalization, binarization etc. This is usually followed by a segmentation process to

separate the lines, words and characters in the image.

Feature extraction is of vital importance to character recognition systems,

particularly handwriting recognition systems. It serves two main purposes: extracting the

most representative features that are used by classifiers, and reducing redundancies in

data. In online systems, input is composed of unit width line-segments with dynamic

information so that feature extraction is a relatively easy task. In offline systems, feature

extraction is affected by several factors:

• Different background of documents

• Non-uniform illumination of the scanner

• Noise introduced by electronics and writing tools

• Different qualities of paper and types of ink

In order to overcome these problems, many methods have been proposed for

offline handwriting recognition. In general, there are four basic approaches for pattern

recognition, namely, statistical, structural, artificial neural network and soft-computing.

In the statistical or decision-theoretic approach, the recognition is based on the

decision boundaries established in the feature space by statistical distributions of the

patterns. A decision is usually made by maximizing a posteriori probability, where the

recognition error of this approach is called Bayes error.

In structural (syntactic) approaches, each pattern is defined by using structural

descriptions or representations. The recognition is performed according to structural

similarities. This is based upon the fact that structural relationships between features are

also essential to recognize patterns.

Since the mid 1980s, neural networks have become popular in handwriting

recognition as they learn from examples, are robust, insensitive to noise and have

generalization ability. Among various architectures, the feed-forward neural networks

remain dominant because the well-known training methods, back-propagation of errors

and approximation function are available. Multi-layer perceptron (MLP) classifiers are

able to form complex hyper plane decision regions that can classify large number of

classes. In comparison with the three approaches introduced above, there is strong


1. Introduction 7

evidence that structural approaches and neural-network based approaches may offer

better solution to the problems in handwriting recognition. Statistical approaches are

more suitable to the problems with large random variations in data, such as speech

recognition. Words and characters are highly structured entities and many uncertainties

in handwriting are not random by nature or may be difficult to model by traditional

probabilistic techniques. For such uncertainties statistical approaches may fail. In order

to develop handwriting systems that are able to make use of structural information and

handle different types of uncertainty, we need to combine different paradigms.

Recently, soft computing approaches are used by researchers to develop hybrid

handwriting recognition systems. It is a new problem-solving paradigm that combines

emerging techniques and theories such as neural networks, fuzzy logic, genetic

algorithms and other evolutionary methods. Such handwriting recognition systems can

be classified into two categories: in the first category, they recognize patterns based on a

single classifier that is formed by combining different soft computing methods. They

generally are multi-stage classifiers. The recognition systems belonging to the second

category make decisions based on results of several classifiers to produce higher

recognition rates. The outputs from the multiple classifiers are combined using various

combining rules like: majority vote, sum, max, product and median.

1.5 Evolution of OCR

The origin of character recognition can be found in 1870 when Carey invented

the retina scanner, and image transmission system using a mosaic of photocells. Later in

1890, Nipkow invented the sequential scanner which was a major breakthrough both for

modern television and reading machines. Character recognition as an aid to the visually

handicapped was at first attempted by the Russian scientist Tyurin in1900.

The OCR technology took a major turn in the middle of 1950s with the

development of digital computer and improved scanning devices. For the first time OCR

was realized as a data processing approach, with particular applications to the business

world. From that perspective, David Shepard, founder of the Intelligent Machine

Research Co. can be considered as a pioneer of the development of commercial OCR

equipment. Currently, PC-based systems are commercially available to read printed

documents of single font with very high accuracy and documents of multiple fonts with


1. Introduction 8

reasonable accuracy. Most of the available systems work on European scripts which are

based on Roman alphabets. Research reports on oriental language scripts are few, except

for Korean, Chinese and Japanese scripts. Depending on versatility, robustness and

efficiency, the commercial OCR systems can be divided into four generations.

The first generation systems can be characterized by the constrained letter shapes

which the OCRs read. Such machines appeared in the beginning of 1960s. The first

widely commercialized OCR of this generation was the IBM 1418, which was designed

to read a special IBM font, 407. The recognition method was logical template matching

where the positional relationship was fully utilized.

The next generation is characterized by the recognition capabilities of a set of

regular machine printed characters as well as hand-printed characters. At the early stages,

the scope was restricted to numerals only. Such machines appeared in early1970s. In this

generation, the first and famous OCR system was IBM 1287, which was exhibited at the

1965 New York world fair. In terms of hardware configuration, the system was a hybrid

one, combining analog and digital technology. The first automatic letter-sorting machine

for postal code numbers of Toshiba was also developed during this period. The methods

were based on the structural analysis approach.

The third generation can be characterized by the OCR of poor print quality

characters, and hand-printed characters for a large category character set. Commercial

OCR systems with such capabilities appeared roughly during the decade 1975 to 1985.

The fourth generation can be characterized by the OCR of complex documents

intermixing with text, graphics, table and mathematical symbols, unconstrained

handwritten characters, color document, low quality noise documents like photocopy and

fax etc. some pieces of work on complex documents provided good results. Although

many pieces of work on unconstrained handwritten character are available in the

literature, the recognition accuracy hardly exceeds 85%.

Among other commercial products, postal address readers are available in the

market. In the United States, about 60% of the hand printed is sorted automatically.

Reading aid for the blind is also available. An integrated OCR with speech output system

for the blind has been marketed by Xerox-Kurxweil for English language. At present,

more sophisticated optical readers are available for Roman, Chinese, Japanese and

Arabic text. These readers can process a document which has been typewritten or

printed. They can recognize characters with different fonts and sizes as well as different

formats including intermixed text and graphics. Although lot of research is carried out


1. Introduction 9

for the OCR in these scripts, no OCR systems are found for the recognition of

handwritten Indian scripts. Extensive research is being carried out in these languages

recently for the recognition of handwritten characters and words.

In a multi-lingual country like India, which has many languages with their own

distinctive scripts and rich literary traditions, it is particularly important to develop

computer systems that allow users to interact with them in Indian languages. There are

14 Indic scripts and there is a huge untapped potential for Indian population to access

Information Technology through Indian languages. Handwriting being a natural interface

to computers, recognition of handwritten Indian documents offers a huge area for

research. In spite of widespread use of computers, paper documents will continue to

remain important for a long period of time and hence it is necessary to have computer

systems that can seamlessly integrate paper documents with other electronically created

ones.

1.6 Marathi script and its properties

Unconstrained handwritten character recognition for Indian scripts is an area of

extensive research over recent years. This pattern recognition task is quite challenging

due to the variability in the writing style, similarity in the character shapes, presence of

modifiers and various other features of Indian scripts. Research is being carried out

extensively in Bangla and Devanagari scripts, the two most popular scripts in India. In

India, there are twenty two Indian official (Indian constitution accepted) languages,

namely Assamese, Bangla, Gujarati, Hindi, Konkani, Kannada, Kashmiri, Malayalam,

Marathi, Nepali, Oriya, Punjabi, Bodo, Dogri, Maithili, Manipuri, Santhali, Sindhi,

Sanskrit, Tamil, Telugu and Urdu. Different scripts are used for writing these official

languages. Most Indian scripts originated from ancient Brahmi through various

transformations. Two or more of these languages may be written in one script. For

example, Devanagari is used to write Hindi, Marathi, Rajasthani, Sanskrit and Nepali

while Bangla script is used to write Assamese and Bangla (Bengali) languages.

In the proposed work, we aim at developing a comprehensive system for

recognition of unconstrained handwritten Marathi characters.

Marathi is a language spoken by the Maharashtrian people of western India. It is

the official language of the state of Maharashtra and is the 4th most spoken language in


1. Introduction 10

India. It is spoken by about 63 million people. It is derived from Devanagari script and it

consists of 16 vowels and 36 consonants. Figure 1.2 and Figure 1.3 present the vowels

and the consonants in Marathi script respectively. Vowels are combined with consonants

with the help of specific characteristic marks. These marks occur in line, at the top, or at

the bottom of a character in a word and are called as modifiers. An illustration of how

the vowels combine with the consonants is shown in Figure 1.4.

Marathi is written from left to right. It has no upper and lower case characters as

in English; however the alphabet itself contains more number of symbols than that of

English. While line segments (strokes) are the predominant features for English, most of

the characters in Devanagari script are formed by curves, holes, and also strokes. Marathi

has conjunct characters which are formed by joining two or more consonants. Every

character has a horizontal line at the top called as the header line. The header line joins

the characters in a word. This script has two-dimensional compositions of symbols: core

characters in the middle strip, optional modifiers above and/or below core characters.

Figure 1.2 Marathi vowels

Figure 1.3 Marathi consonants

Figure 1.4 Vowels combined with consonants


1. Introduction 11

Figure 1.5 shows a Marathi word partitioned into three character zones: A core

zone that contains most consonant, half consonant, vowel and conjunct forms (core

components), an upper zone containing ascenders or upper modifiers and a lower zone

containing descenders or lower modifiers. The core and upper zones are separated by the

header line.

1.7 Institutes in India doing research in Devanagari

Some of the leading institutes in India doing research in Devanagari OCR are:

• Indian Statistical Institute, Kolkata,

• International Institute of Information Technology, Hyderabad,

• Indian Institute of Science, Bangalore, and

• Indian Institute of Technology, New Delhi.

1.8 Challenges in Marathi handwriting recognition

The Marathi script has large number of characters and vowels. There are various

issues other than this which make the recognition of handwritten characters a challenging

task and affect the recognition rate to a considerable extent. Some examples of such

issues are:

• Variations in the writing style of the writers,

• Variations in the font size, pen width, pen ink,

• Shape changes due to pre-processing parameters,

• Variations in the character spacing, skew and slant,

• In case of compound characters the strategies for joining two or more consonants

are different. Characters may also get split during pre-processing,

Figure 1.5 An example of Marathi word


1. Introduction 12

• Some character pairs are very much similar to each other,

• The header line may or may not be drawn by the writer,

• Segmentation of modifiers,

• Segmentation of touching characters.

This demands an efficient system which takes care of these issues at all the stages

in the OCR system, from pre-processing to recognition.

1.9 Problem statement

As discussed earlier, there is a need of developing OCR for handwritten Indian

languages. A lot of research is required for handwritten Marathi script for development

of sophisticated systems. In the last few years, there has been a great interest in applying

artificial neural network (ANN) technology in various fields of conventional computing.

This is due to the fact that ANN provides parallelism, they learn from examples, has the

capacity of handling a classification problem comprising of large number of classes and

has generalization ability. This motivates the implementation of neural network for the

recognition of handwritten Marathi characters.

Figure 1.6 Characters used in the proposed system


1. Introduction 13

The proposed system aims at recognizing the unconstrained handwritten Marathi

characters using neural networks. The input to the system is an image containing

handwritten characters. The output of the system is the recognized characters displayed

as text in Kiran font. The Marathi characters used in the proposed OCR system are

shown in Figure 1.6. They include vowels, consonants, and compound characters along

with split characters.

1.10 Objectives

The following are the objectives of the proposed work:

• To develop a system for handwritten Marathi character recognition in an

authoritative manner outlining all the issues involved.

• To select pre-processing algorithms that retains the shape of the character.

• To work at feature extraction level and classification level and select combination

that gives best results.

• To analyze the performance using various neural network classifiers and their

combination for the robust recognition of the handwritten characters.

• To design hybrid systems that combine several classification and/or recognition

techniques along with neural network.

• To analyze the efficiency of these classifiers and find the optimum network

1.11 Scope

The scope of the proposed work is as follows:

• The proposed work focuses on the recognition of isolated handwritten Marathi

characters without any modifiers.

• A multilayer perceptron neural network is used for recognition along with other

types of recognition methods.

• Two evaluation parameters are to be used in the comparative analysis namely,

recognition rate and processing time.


1. Introduction 14

1.12 Organization of the thesis

The rest of the thesis is organized as follows. Chapter 2 presents the literature

review. Chapter 3 puts forth the problem identification. The handwritten Marathi

numeral recognition system is discussed in Chapter 4. Chapter 5 describes the

handwritten Marathi character recognition system. Chapter 6 discusses the optimization

techniques for performance improvement. The hybrid models for accuracy improvement

are presented in Chapter 7. Chapter 8 describes the models for handwritten Marathi

compound characters. Finally Chapter 9 puts forth the discussions and conclusions on the

proposed work.

1.13 Concluding remarks

A brief introduction of OCR technology and its evolution is introduced in this

chapter. Marathi script and its features are introduced as well. Further the challenges and

the need for handwritten Marathi character recognition is emphasized.

The stages in the OCR system are discussed in brief. The soft computing

techniques are adopted recently to improve the recognition rate of offline handwritten

character recognition. The problem statement, objectives and the scope of the proposed

work are also discussed. The detailed literature survey related to Devanagari and Marathi

character recognition is presented in the next chapter.

09_chapter_01.pdf

Documents

Transcript of 09_chapter_01.pdf