Arabic Handwritten Script Recognition Towards Generalization: A Survey

1

Arabic Handwritten Script Arabic Handwritten Script Recognit ion Towards Recognit ion Towards

General ization: A SurveyGeneral ization: A Survey Authors:Authors: Randa I. M. ElanwarRanda I. M. ElanwarAssistant Researcher, Electronic Research Institute

Prof. Dr. Mohsen A. A. RashwanProf. Dr. Mohsen A. A. RashwanProfessor of Digital Signal Processing, Electronic and communication dept, Cairo University

Prof. Dr. Samia A. A. MashaliProf. Dr. Samia A. A. MashaliHead of computers and systems dept, Electronic Research Institute

2

Presentation ContentsPresentation Contents

Introduction

Paper Objective

Arabic handwriting recognition problem

Main Challenges

Recent off-line Arabic handwriting recognition systems

Recent on-line Arabic handwriting recognition systems

Summary and Conclusion

3

IntroductionIntroduction Handwriting recognition can be defined as the task of transforming text represented in the spatial form of graphical marks into its symbolic representation

The main components of a recognizer are:1. Capturing Data & acquisition

2. Preprocessing & segmentation

3. Defining patterns and model selection

4. Feature Extraction

5. Training

6. Classification

4

IntroductionIntroduction

• First the input device captures an image and convert it to a usable format

• Data is then preprocessed to eliminate noise for simplification without loosing relevant information and may also be segmented to smaller data units

5


• The information of each data unit is sent to feature extractor to reduce them by measuring certain “features” or “properties”

• Patterns (or classes) should be defined and models should be selected. These models are trained using the extracted features.

6


• The model for a pattern may be a single specific set of features

• To recognize (or classify) a novel pattern means to recover the model that generated the pattern based on the extracted features

7

IntroductionIntroduction The feature extractor has reduced the data unit to a

point or feature vector X in a 2D feature space (or observation space)

Classification rule: Classify the input as Class I if its feature vector falls below the decision boundary shown, and as Class II otherwise.

8

IntroductionIntroduction The problem is that designing a very complex

recognizer is unlikely to give good generalization since it seems to be “tuned” to the particular training samples

The question is how to optimize this tradeoff: generalization versus simple classifier

9

IntroductionIntroduction Usually there is an action taken based on the

classification decision. Each action should be assigned a certain cost.

We design our decision boundary (classification rule) so that on the average, the Risk will be as small as possible.

The Risk (R) is the expected value of cost

Minimizing (R) leads to complex boundaries

The question is how to optimize this tradeoff: generalization versus minimum risk?

10

IntroductionIntroduction In order to achieve general purpose recognizer

(unbiased) we should have a sufficient number of training samples (N) for each class in the data set.

A theoretical estimate claims that

N ≅ 100 / P where P ≡ prob. of misclassification

I.e., for P ≈ 0.01, N ≈ 10000 and for P ≈ 0.03, N ≈ 3000

Such large data set (if available) needs large storage and long processing time (time complexity)

The question is how to optimize this tradeoff: generalization versus complexity?

11

Paper Object ivePaper Object ive

Our concern in this paper is to:

1. provide a comprehensive review of recent off-line

and on-line trends in Arabic cursive handwriting

recognition (last 10 years publications)

2. clarify the challenges standing against obtaining a

reliable, accurate, simple, general purpose recognizer

based on these trends.

12

Arabic Handwriting Recognition ProblemArabic Handwriting Recognition Problem

Arabic Script Recognition Systems are categorized as:

1. On-line or Off-line

2. Writer Dependent or Writer Independent

3. Open-vocabulary or closed-vocabulary

13


Types of Recognition:

When the input device is a digitizer tablet that

transmits the signal in real time or includes timing

information together with pen position, this is mostly

referred to as on-line or dynamic recognition

14


Types of Recognition:

When the input device is a still camera or a scanner,

which captures the position of digital ink on the page

but not the order in which it was laid down, this is

defined as off-line or image-based OCR

15


Special Characteristics of Arabic Script:

Always written from right to left

Arabic word consists of one or more portions; each

has one or more characters

Many characters differ only by the position and the

number of dots attached

16



Every character has more than one shape, depending

on its position

Characters overlap

17



Existence of ligatures

Due to having these special characteristics, Arabic handwriting recognition systems still need more research to be established commercially

18

Main ChallengesMain Challenges

Feature Extraction

Noise

Model Selection and Complexity

Segmentation

Context

Evidence Pooling

Costs and Risks

Computational Complexity

Learning and Adaptation

19


Feature Extraction:

A good feature set should helps distinguishing a class

from other classes, be invariant to differences and

contains no redundant information

20


Feature Extraction:




… How to know which features are most

promising ?

… Is there ways to automatically learn which features are

best for a c lassifier?

21


Feature Extraction:





promising ?



It should be limited in number for computational ease

and to limit the amount of training data

22


Feature Extraction:





promising ?



It should be limited in number for computational ease

and to limit the amount of training data

… How many features

to use?

… How to train or used a c lassifier when some

features are miss ing?

23


Noise:

Random error in a pixel value (deformation) due to

signal-independent, signal-dependent and salt &

pepper noise.

Noise cannot always be totally eliminated; but

smoothing is done

24


Noise:

Random error in a pixel value (deformation) due to

signal-independent, signal-dependent and salt &

pepper noise.

Noise cannot always be totally eliminated; but

smoothing is done

… Is the deformation in some signal is noise? or natural

varieties in true models?

… How can we use this information to improve

our c lass ifier?

25


Modeling Selection and Complexity:

Determining the complexity of the model: not so

simple that it cannot explain the differences between

the categories, yet not so complex as to give poor

classification on novel patterns.

26


Modeling Selection and Complexity:

Determining the complexity of the model: not so

simple that it cannot explain the differences between

the categories, yet not so complex as to give poor

classification on novel patterns.

… how to know when to re ject a c lass of models and

try another one?

… Are there principled methods for finding the best

complexity for a c lass ifier?

… Is it a matter of random tr ial & error not even guided by

expectations of performance?

27


Segmentation:

Segmentation subdivides image into its constituent

regions or objects. Segmentation should stop when the

objects of interest in an application have been isolated.

28


Segmentation:

Segmentation subdivides image into its constituent

regions or objects. Segmentation should stop when the

objects of interest in an application have been isolated.

… How do we know where one character “ends” and the

next one “begin”?

… Shall we segment the images before they have been categorized or

categorize them

before they have been segmented?

29


Context:

The accuracy of automatic handwriting recognition

systems based on purely visual information seems to

have a ceiling

Incorporating Symantec and syntactic knowledge

sources into the automatic recognition of text can offer

potential improvements in performance

… how, precise ly , should we incorporate such

information?

30


Evidence Pooling:

For high classification performance or for increased

class coverage, different classification tools are

developed either in parallel or sequentially

When having several component classifiers, and

these categorizers agree on a particular pattern, there

is no difficulty. But suppose they disagree !!!

31


Evidence Pooling:

For high classification performance or for increased

class coverage, different classification tools are

developed either in parallel or sequentially

When having several component classifiers, and

these categorizers agree on a particular pattern, there

is no difficulty. But suppose they disagree !!!

… How should a “super” c lassif ier pool the evidence from the component

recognizers to achieve the best decis ion?

… How would the “super” categorizer know when to base a decision on

a minority opinion when required?

32


Costs and Risks:

A classifier is generally used to recommend actions,

each action having an associated cost or risk

We often design our classifier to recommend actions

that minimize some total expected cost or risk

33


Costs and Risks:

A classifier is generally used to recommend actions,

each action having an associated cost or risk

We often design our classifier to recommend actions

that minimize some total expected cost or risk

… How do we incorporate knowledge about such r isks and how wil l they

affect the c lassification decision?

… Is there a way to estimate the total r isk and thus te l l whether our

c lassif ier is acceptable even before we f ie ld it?

34


Computational Complexity:

Although we might achieve error-free recognition, the

time & storage requirements would be quite prohibitive

Some pattern recognition problems can be solved

using algorithms that are highly impractical.

35


Computational Complexity:

Although we might achieve error-free recognition, the

time & storage requirements would be quite prohibitive

Some pattern recognition problems can be solved

using algorithms that are highly impractical.

… What is the tradeoff between computational ease

and performance?

… How can we optimize an exce l lent recognizer within the

engineer ing constraints ?

36


Learning and Adaptation: Any method that incorporates information from training

samples in the design of a classifier employs learning

If the models were extremely complicated, the classifier

would have complex decision boundaries

To overcome this, more training samples are needed to

obtain a better estimate of the true underlying features

In case of limited training samples, we should incorporate

knowledge of the problem domain. The production

representation is the “best” representation for classification.

37


Learning and Adaptation: Any method that incorporates information from training

samples in the design of a classifier employs learning

If the models were extremely complicated, the classifier

would have complex decision boundaries

To overcome this, more training samples are needed to

obtain a better estimate of the true underlying features

In case of limited training samples, we should incorporate

knowledge of the problem domain. The production

representation is the “best” representation for classification.

… How much training samples are needed for good general ization?

… How can we insure that the learning algorithm favors “s imple”

so lutions rather than complicated ones?

38

Recent off- l ine Arabic handwriting recognition Recent off- l ine Arabic handwriting recognition

systemssystems

Example: Pechwitz et al research [17]

proposed a recognition system based on a semi-continuous 1-D HMM using the IFN/ENIT database of handwritten Tunisian town/village names.

Preprocessing:

1. Extracting image contour and Performing a noise reduction filtering.

2. Skeletonization and normalization are performed.

3. Baseline estimation and word length normalization are performed.

39


systemssystems


Feature Extraction:

1. A rectangular window is shifted from right to left across the normalized gray level script image .

2. A Loeve-Karhunen Transformation is performed on the gray values of each frame to reduce the number of features.

Modeling:

1. A HMM-model is generated for each character shape (all possible positions) up to 160 different HMM-models.

2. Semi Continuous HMMs are used with 7 states per character.

40


systemssystems


Database:

1. This database is split into four sets A, B, C & D.

2. The 4 sets contain 26,459 images of segmented Tunisian town names (115,585 PAWs) handwritten by 411 unique writers.

3. 946 unique word labels, and 762 unique PAW labels.

4. For each image the ground truth information is available.

Lexicon:

The character shape HMM-models are combined to valid word models using a tree structured lexicon with all 946 different Tunisian town/village names.

41


systemssystems


Recognition:

The standard Viterbi Algorithm is used together with the lexicon.

The authors applied the recognition algorithm to the database twice, once using the baseline coming from GT (ground truth) and once using baseline they estimated.

Results:

Recognition rates 82 – 89% are obtained using baseline estimation

Recognition rates 89 – 95% are obtained using GT baseline

42


systemssystems


Challenges:

1. Working on available database skips the limited training samples challenge

43


systemssystems


Challenges:


2. It is not easy to generalize this classifier for open vocabulary applications because it works on a limited lexicon of words (segmentation-free recognizer) otherwise context will be a must.

44


systemssystems


Challenges:



3. Generating the same HMM structure for all characters and ligatures i.e., modeling selection & complexity .. we think it would be much better to vary the model structure according to each character requirement (ض shouldn’t have the same model as ة for example).

45


systemssystems


Challenges:



3. Generating the same HMM structure for all characters and ligatures i.e., modeling selection & complexity .. we think it would be much better to vary the model structure according to each character requirement (ض shouldn’t have the same model as ة for example).

4. Feature Extraction: The idea of normalizing the word width to use a sliding window feature extractor is pretty good except for the great dependency on the baseline estimation which is in itself a great source of error.

46

Recent on-l ine Arabic handwriting recognition Recent on-l ine Arabic handwriting recognition

systemssystems

Example: Biadsy et al research [24]

Preprocessing:

1. Geometrical processing phase to minimize handwriting variations.

2. A low-pass filter is used to reduce noise and remove imperfections caused by acquisition devices.

3. The writing-speed is normalized by re-sampling the consequent point sequences.

Feature Extraction:

Mainly angles (with x-axis) and loop-presence

47


systemssystems


Modeling:

1. The recognition framework uses discrete Left-to-right HMMs to represent each Arabic letter shape (isolated, initial, medial, and final).

2. The number of states for each letter shape model is based on the geometric complexity of the letter shape. It varies from 5 to 11 states.

For example: 11 states are assigned to isolated ش, and 5 states to isolated أ.

48


systemssystems


Lexicon:

1. The Arabic dictionary D is subdivided into a set of sub-dictionaries {D1, D2, …, Dn} based on the number of word parts in each word.

2. Letter-shape models are embedded in a network that represents a word-part dictionary. The segmentation of word parts into letter-shapes and their recognition are performed simultaneously in an integrated process. D = {D = {وسام، هل، معلم، محمود، محمد، فادى، رواية، جامعة، ثقافة، التحدى، انسانوسام، هل، معلم، محمود، محمد، فادى، رواية، جامعة، ثقافة، التحدى، انسان}}

Sub-dictionaries of DSub-dictionaries of D Word-Part Dictionary for D3Word-Part Dictionary for D3

D1 = {D1 = {هل، معلم، محمدهل، معلم، محمد}}

D2 = {D2 = {محمود، جامعة، ثقافةمحمود، جامعة، ثقافة}}

D3 = {D3 = {وسام، فادى، التحدى، انسانوسام، فادى، التحدى، انسان}}

D4 = {D4 = {روايةرواية}}

WPD3,1 = {WPD3,1 = {و، فا، او، فا، ا}}

WPD3,2 = {WPD3,2 = {سا، د، لتحد، نساسا، د، لتحد، نسا}}

WPD3,3 = {WPD3,3 = {م، ى، نم، ى، ن}}

49


systemssystems


Database:

1. 4 trainers are asked to write 800 selected words each.

2. For testing, 10 testers (the 4 trainers, in addition to 6 new volunteers) are asked to write 280 words not in the training data (2,358 words in total).

3. 5 different dictionary sizes (5K, 10K, 20K, 30K, and 40K words) selected from different Arabic websites are used. The 280 test words are present in

all dictionary sizes.

Recognition:

Writer dependent (WD) and writer independent (WI) experiments are done and average word recognition rates 88 – 96% are obtained. The

performance degrades as ambiguity (dictionary size) increases.

50


systemssystems


Challenges:

1. Feature Extraction: The features they use are not enough to lead to satisfying classification of general unconstrained handwritings. Thus they are in a great need to work under limited vocabulary. The word parts must be present in the dictionary or the will not be recognized.

51


systemssystems


Challenges:

1. Feature Extraction: The features they use are not enough to lead to satisfying classification of general unconstrained handwritings. Thus they are in a great need to work under limited vocabulary. The word parts must be present in the dictionary or the will not be recognized.

2. Database they use looks unnatural. Volunteers are asked to follow restrict methodology of writing which affects their individual writing style. Besides, the system handles limited handwriting varieties due to the small number of volunteers who wrote the database.

52

Summary and ConclusionSummary and Conclusion

Foreign recognizers have found their way to the

markets as commercial products since years while

Arabic recognizers still need more time.

53





in the case of Arabic handwritten words many

researchers use a specific, more or less small data set

of their own ∴ it is impossible to compare different

results which would be important to improve existent

methods

54





in the case of Arabic handwritten words many

researchers use a specific, more or less small data set

of their own ∴ it is impossible to compare different

results which would be important to improve existent

methods

The complexity of the problem is greatly increased by

noise and by the infinite variability of handwritings

55


Cursive script requires the segmentation of words in

characters or parts of characters, i.e. graphemes, and

then the detection of individual features.

56





Generally, the holistic approach can be used if the

size of the vocabulary is small (such as the recognition

of the legal amount in cheques)

57





Generally, the holistic approach can be used if the

size of the vocabulary is small (such as the recognition

of the legal amount in cheques)

The character-based approach is the preferred

method for recognition applications that are

unconstrained or involve large-size vocabularies to

insure good generalization together with reasonable

complexity

58

Thank Thank YouYou

Arabic Handwritten Script Recognition Towards Generalization: A Survey

Science

Transcript of Arabic Handwritten Script Recognition Towards Generalization: A Survey