Abstractfadi/papers/to-ms-thesis.pdf · - 1 - Online Arabic Handwriting Recognition Using Hidden...

Online Arabic Handwriting Recognition Using Hidden Markov Models

Fadi Biadsy

Jun 2005

Thesis Submitted as Part of the Requirements for the

Master of Science Degree in Computer Science at Ben-Gurion University of the Negev

Abstract

This thesis describes an online writer-independent handwriting recognition system for

the Arabic script. Recognition of Arabic script is a difficult problem since it is naturally

both cursive and unconstrained. The analysis of Arabic script is further complicated in

comparison with other scripts, since most Arabic letters include dots/delayed strokes that

are placed above or below letters. The number and placement of these dots/delayed

strokes are crucial parameters used to distinguish words – any minor/minute change may

completely alter one word into an entirely different one. This work introduces a Hidden

Markov Model (HMM) based system to provide solutions for most of the difficulties

inherent in recognizing Arabic script including: the natural connectivity of writing, the

changing letter-shape depending on position in the word, and the dots/delayed stroke

problem. This work consists of several phases. Initially, a preprocessing phase is used to

reduce hardware imperfections and the amount of writing variations. A feature extraction

phase used the preprocessed data to construct observation sequences which are provided

to an HMM structure for recognition or training. This is the first work which is known to

the author that is based on HMM for recognizing Arabic handwriting script. We report

successful experimental results for writer-dependent and writer-independent using data

collected by four trainers and tested by ten users.

Keywords:

Online Handwriting Recognition, Arabic Script, Hidden Markov Model, Writer

Independent.

Acknowledgments

I am deeply indebted to Dr. Jihad El-Sana, my advisor, for his wonderful support,

guidance, encouragement and inspiration which made this study possible, enjoyable, and

highly successful. In addition to being an excellent professor of mine he is everything

that one wants from an advisor. I am exceedingly thankful to Dr. Nizar Habash for his

constant support, advice, and the time given while editing and revising. Without their

support, these pages would not have been written and my thoughts would not find a way

to advance beyond inception. I would also like to thank meaningfully Mr. Meni Adler for

assisting me with his expertise in statistical models. I would like to give special thanks to

my professor Dr. Michael Elhadad who directed me to find the right references and

material from time to time, in addition to his useful discussion and comments. I would

like to thank my best friends: Azzam Marai, Gabriel Zaccak, Guy Wiener, Nasim Biadsy

and Wisam Dakka for the main support and encouragement I always get, in addition to

their marvelous excitement when a new step is implemented in my system. Their

constructive discussion made this work interesting and enjoyable. I am also thankful to

my close friends Jennifer Miller and Vinnee Tong for reviewing my thesis and providing

valuable feedback. Finally, I would like to thank my family for the great support and the

warm care they always provide.

TABLE OF CONTENTS Chapter 1 – Introduction ...................................................................................... - 11 - 1.1. Offline versus Online Recognition ............................................................. - 12 - 1.2. Writer-Dependent versus Writer-Independent ............................................ - 13 - 1.3. Lexicon-Dependent versus Lexicon-Independent ....................................... - 14 - 1.4. Arabic Script Background .......................................................................... - 15 - 1.4.1. History of Arabic Language and Script ........................................ - 15 - 1.4.2. Characteristics of Arabic Script ................................................... - 17 - 1.4.3. Dots and Additional Strokes ........................................................ - 20 - 1.5. Recognition Difficulties and Ambiguity ..................................................... - 22 - 1.6. Dot Problem in Words ............................................................................... - 22 - 1.7. Absence of Diacritics ................................................................................. - 22 - Chapter 2 – Previous Work ................................................................................... - 25 - 2.1. Data Acquisition ........................................................................................ - 25 - 2.2. Segmentation ............................................................................................. - 27 - 2.3. Modeling Methods ..................................................................................... - 29 - 2.3.1. Template Matching Models ......................................................... - 29 - 2.3.2. Statistical Models ........................................................................ - 30 - 2.3.3. Neural Networks Models ............................................................. - 31 - 2.4. Postprocessing ........................................................................................... - 32 - 2.5. Related Work ............................................................................................. - 32 - 2.6. Previous Handwriting Recognition System Summarization ........................ - 33 - 2.7. Online Handwriting Recognition Work for Arabic Script ........................... - 41 - Chapter 3 – Background – Hidden Markov Model .............................................. - 42 - 3.1. Discrete Markov Process............................................................................ - 43 - 3.2. Extension to Hidden Markov Models ......................................................... - 45 - 3.3. The Three Fundamental Problems for HMMs ............................................ - 46 - 3.3.1. The Evaluation Problem .............................................................. - 47 - 3.3.2. The Decoding Problem – Viterbi Algorithm ................................ - 48 - 3.3.3. The Training Algorithm – Baum-Welch Algorithm ..................... - 50 - Chapter 4 – Preprocessing .................................................................................... - 53 - 4.1. Step 1: Noise Reduction ............................................................................. - 53 - 4.2. Step 2: Data Simplification ........................................................................ - 55 - 4.3. Step 3: Normalization ................................................................................ - 55 - 4.3.1. Size and Orientation Normalization ............................................. - 56 - 4.3.2. Speed Normalization ................................................................... - 58 - Chapter 5 – Feature Extraction ............................................................................. - 60 - 5.1. Local Feature ............................................................................................. - 60 - 5.2. Semi-Local Features .................................................................................. - 61 - 5.3. Global Features .......................................................................................... - 63 -

5.3.1. Loop Determination ........................................................................................ - 65 - 5.4. Feature Vector Construction ...................................................................... - 66 - 5.5. Feature Vector to Observation Code .......................................................... - 67 - 5.6. Delayed Strokes ......................................................................................... - 69 - 5.6.1. Delayed-Stroke Projection ........................................................... - 70 - Chapter 6 – The Recognition Framework ............................................................. - 74 - 6.1. Framework Models .................................................................................... - 74 - 6.1.1. Letter Models .............................................................................. - 74 - 6.1.2. Word-Part Models ....................................................................... - 75 - 6.1.3. Word Models ............................................................................... - 76 - 6.2. Word and Word-Part Dictionaries .............................................................. - 76 - 6.3. Arabic-Word Recognizer ........................................................................... - 77 - 6.3.1. Word Recognizer - Algorithm I ................................................... - 78 - 6.3.2. Word Recognizer - Algorithm II .................................................. - 79 - 6.4. Optimized Grammar Network .................................................................... - 81 - 6.4.1. Optimized Word-Part Network .................................................... - 81 - 6.4.2. Word Network ............................................................................. - 83 - 6.4.3. Word-Dictionary Database Architecture ...................................... - 84 - 6.5. Optimized Recognizer................................................................................ - 85 - 6.5.1. Word Recognizer - Algorithm III ................................................. - 86 - 6.6. Support for Writing Style ........................................................................... - 88 - 6.7. Model Training .......................................................................................... - 90 - 6.7.1. Training Problems ....................................................................... - 91 - 6.7.2. The Training Process ................................................................... - 93 - Chapter 7 – Implementation .................................................................................. - 98 - Chapter 8 – Results ...............................................................................................- 102 - Chapter 9 – Future Work .....................................................................................- 116 - Chapter 10 – Conclusion .......................................................................................- 119 -

LIST OF FIGURES

Figure 1-1: Development of scripts, courtesy: Mamoun Sakkal [3] ...................................................... - 16 -

Figure 1-2: Example of Arabic calligraphy. Two words: � م�� ‘love peace’, courtesy: [3]............... - 17 -

Figure 1-3: Word (a) is a result of moving the dot to the left from word (b).......................................... - 21 -

Figure 1-4, Word (a) is a result of eliminating the dot above the first letter from word (b) .................... - 21 -

Figure 1-5: Delayed strokes in Arabic script ........................................................................................ - 21 -

Figure 1-6: Illustration of words with the same meaning written in different styles............................... - 22 -

Figure 1-7: Illustration of an ambiguous Arabic word .......................................................................... - 22 -

Figure 1-8: Unclear Arabic sentence because of the last word .............................................................. - 23 -

Figure 1-9: The graph of the recognized possible sentences of the sentence in Figure 1-8 ..................... - 23 -

Figure 2-1: Digital Tablet .................................................................................................................. - 26 -

Figure 2-2: Optical Pen ....................................................................................................................... - 26 -

Figure 2-3: CrossPad .......................................................................................................................... - 26 -

Figure 2-4: PDA ................................................................................................................................. - 27 -

Figure 2-5: Tablet PC ........................................................................................................................ - 27 -

Figure 2-6: Handwriting styles in English. The order is from easy to difficult in term of recognition [10]- 28 -

Figure 2-7: The block diagram of the recognition system in [18] ......................................................... - 35 -

Figure 2-8: 7-state HMM used to model each character, courtesy: [6] .................................................. - 36 -

Figure 2-9: Illustration of contextual effects on a cursively written “i”. The left “i” is written after “m”,

the right “i” is written after “v”, courtesy: [12] ................................................................................... - 36 -

Figure 2-10: Connecting strokes (in dashed lines), courtesy: [6] ......................................................... - 37 -

Figure 2-11: Illustration of the three high-level features, courtesy: [19]................................................ - 39 -

Figure 2-12: A grammar network implementing a word dictionary, courtesy: [19] ............................... - 39 -

Figure 3-1: Illustration of Markov chain with N = 3 states ................................................................... - 43 -

Figure 3-2: The weather Markov chain ................................................................................................ - 44 -

Figure 3-3: Illustration of a hidden Markov model with three states (N = 3) and two observable symbols

(M = 2) ............................................................................................................................................... - 46 -

Figure 4-1: Illustration of applying a low-pass filter on a word which was written in a moving train..... - 54 -

Figure 4-2: Polyline simplification ...................................................................................................... - 55 -

Figure 4-3: Word boundary lines ......................................................................................................... - 57 -

Figure 4-4: Illustration of a word written in the top-down writing style. The left word is the printed style

of the right. �� (bHajm) ‘in size’. In the right word, the letter ـ is above the letter ــ, which is above

the letter ـ�ـ and above the letter 57 - ..................................................................................................... .ـ� -

Figure 4-5: Illustration of preprocessing steps applied on the handwritten word � (jAmEp) ج��

‘university’ ......................................................................................................................................... - 58 -

Figure 4-6: Illustration of preprocessing steps applied on the handwritten word �� (vqAfp) ‘education’- 59 -

Figure 5-1: Illustration of αi-angle (the angle between the vector (pi-1pi) and the X-axis) ........................ - 61 -

Figure 5-2: Illustration of running Douglas&Peucker's algorithm. Figure (a) is a handwritten word �� (mElm) ‘teacher’, red points in Figure (b) are the skeleton points returned by Douglas&Peucker's algorithm applied on the point sequence in Figure (a). ......................................................................... - 62 -

Figure 5-3: Illustration of the skeleton points and vectors, the red points are the skeleton points, the red segments show the skeleton vectors. ................................................................................................... - 62 -

Figure 5-4: Illustration of the β-angle (the angle between a skeleton vector and X-axis) ....................... - 63 -

Figure 5-5: Illustration of the three high-level features, courtesy: [19] ................................................. - 64 -

Figure 5-6: Illustration of loops (in red) in an Arabic word .................................................................. - 65 -

Figure 5-7: Illustration of an accidental loop in a letter that should not include a loop .......................... - 66 -

Figure 5-8: Illustration of how fi is extracted from each point pi ........................................................... - 67 -

Figure 5-9: Figure (a) is a division of angle’s space to 16 directions. Figure (b) is a division of angle’s space to 8 directions. .......................................................................................................................... - 68 -

Figure 5-10: Illustration of how observation i is computed ................................................................... - 69 -

Figure 5-11: Figure (a): five delayed strokes for word-part body 1. Figure (b): two delayed strokes for

word-part body 3. Figure (c): three delayed strokes for word-part body 1. Figure (d): one delayed stroke for word-part body 1. .......................................................................................................................... - 70 -

Figure 5-12: Illustration of the delayed-stroke projection, black points are the letter body, blue points are the delayed stroke, red points are virtual points. ............................................................................. - 71 -

Figure 5-13: Illustration of point sequence with delayed stokes to observation sequence ...................... - 72 -

Figure 5-14: Delayed-strokes projection in the handwritten word: اق�� - (AlAst$rAq) ‘orientalism’- 73 ا

Figure 5-15: Delayed-strokes projection in the handwritten word: ع�� (AlAnTbAE) ‘the ا

impression’......................................................................................................................................... - 73 -

Figure 6-1: left-to-right HMM ............................................................................................................ - 75 -

Figure 6-2: Word-part model consists of n letter models ...................................................................... - 75 -

Figure 6-3: Illustration of word model for the word: � (jAmEap) ‘university’. Word-part model for ج�� and word-part model for (ـ� and جـ :two letters) ج�� (three letters: ـ ,�ـ - 76 - .............................. .(ـ� and ـ

Figure 6-4: Optimized-grammar network implements a word-part dictionary includes k word parts, with

grouping all shared suffixes and replacing each node by its corresponding letter model ........................ - 81 -

Figure 6-5: Illustration of running wordPartDictionaryToWPN({��, ��!, ��", ر }) .............................. - 83 -

Figure 6-6: Illustration of the word network of the sub-dictionaryD3 described in example 6-2 ............. - 84 -

Figure 6-7: The Word-dictionary architecture ...................................................................................... - 85 -

Figure 6-8: Illustration of four letter classes of the medial letter shape ـ$ـ (h) ....................................... - 88 -

Figure 6-9: Multiple-letter-class model contains n letter-class models .................................................. - 89 -

Figure 6-10: The word-part network with replacing each node by a multiple-letter-class model ............ - 89 -

Figure 6-11: A screen shot of the trainer system; the green vertical lines are the split lines. The dashed

lines are automatically given by the system to avid word-part overlapping. .......................................... - 94 -

Figure 6-12: Multiple-letter-class model chain of m word parts ............................................................ - 96 -

Figure 6-13: Screen shot of our automatic letter splitting system ......................................................... - 96 -

Figure 6-14: Illustration of letter-class model chain ............................................................................. - 97 -

Figure 7-1: Screen shot 1 of the handwriting recognition system........................................................ - 100 -

Figure 7-2: Screen shot 2 of the handwriting recognition system........................................................ - 100 -

Figure 7-3: The global architecture of our system .............................................................................. - 101 -

Figure 8-1: The average of the word recognition results of all users in each dictionary. ...................... - 106 -

Figure 8-2: Illustration of correctly recognized words written by different users utilizing 40K word dictionary ......................................................................................................................................... - 114 -

Figure 8-3: Illustration of the same words written by different users, all of these words were correctly

recognized by our system with 40K word-dictionary size .................................................................. - 115 -

LIST OF TABLES

Table 1-1: Offline verses online features ............................................................................................. - 13 -

Table 1-2: The basic Arabic alphabets ................................................................................................. - 19 -

Table 1-3: Special Arabic letters ......................................................................................................... - 19 -

Table 1-4: Backwalter transliterations for Arabic letters and diacritics [5] ............................................ - 20 -

Table 5-1: Category I – 31 letter shapes that can not be written without loops ..................................... - 64 -

Table 5-2: Category II – 17 letter shapes that it is not clear if they are written with loops or not .......... - 64 -

Table 8-1: The four users who trained our system.............................................................................. - 102 -

Table 8-2: The users who tested the system, (different from those who did the training) ..................... - 102 -

Table 8-3: The writer-dependent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size- 104 -

Table 8-4: The writer-independent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size- 105 -

Table 8-5: The recognition of word-part results with 5K, 10K, 20K word-dictionary size of all users . - 107 -

Table 8-6: The recognition of word-part results with 30K and 40K word-dictionary size of all users .. - 108 -

Table 8-7: The word recognition results of the three options with 5K, 10K and 20K word-dictionary size................................................................................................................................................... - 109 -

Table 8-8: The word recognition results of the three options with 30K and 40K word-dictionary size . - 110 -

Table 8-9: Incorrectly recognized words of “User 1” with 5K word-dictionary size ............................ - 111 -

Table 8-10: Incorrectly recognized words of “User 2” with 5K word-dictionary size .......................... - 111 -

- 10 -

Table 8-18: Incorrectly recognized words of “User 10” with 5K word-dictionary size ........................ - 114 -

Table 8-19: Example of correctly recognized words with 40K word-dictionary size ........................... - 115 -

Table 9-1: Possible errors reported by our recognizer. Sometimes, the word-parts/letter-shapes in the

left column are exchanged to their appropriate letter-shapes/word-parts in the right column and vice versa................................................................................................................................................. - 117 -

Table 9-2: Not supported letters ........................................................................................................ - 118 -

- 11 -

Chapter 1

Introduction Keyboards and electronic mice are the basic input devices for computers today. Still,

these devices may not endure as the only prevalent means of transmitting electronic data

to computers. Other methods may become necessary, particularly with regard to the size

of newer mobile devices and the method of transmission. Hand-held computers, mobile

technology, for example, present significant opportunities for alternative devices that

work in forms smaller than the traditional keyboard and mouse. In addition, the need for

more natural human-machine interfaces becomes ever more important as computer use

reaches a larger number of people. The need grows particularly acute in developing

countries, where new adopters use computers in an effort to improve living conditions. In

this, a potential problem arises: millions of average, ordinary people do not know how to

type. One could think about two alternatives (to typing) speech and handwriting, which

are universal and natural methods of communicating since they have been used for

thousands of years. Therefore, they are better alternatives for many new adopters as a

way of interacting with computers. Notice that speech does not currently provide an

adequate interface to ensure privacy and confidentiality. These requirements, however,

can be achieved by using handwriting.

The following three sections explore three aspects of handwriting recognition

systems: Mode of operation online and offline processing, writer dependence, and

lexicon dependence respectively. Section 1.4 is dedicated to give a background of Arabic

- 12 -

script and the way it is written. The recognition difficulties and ambiguity, particularly

for Arabic script, is discussed in Section 1.5. Finally, the dot separation problem and

diacritics in Arabic script are explored in Sections 1.6 and 1.7 respectively.

1.1. Offline versus Online Recognition

Handwriting recognition can be divided into two categories, which differ in the

presentation of the data to the system – Offline handwriting recognition and online

handwriting recognition.

In offline handwriting recognition systems, the input is a digitized image of scanned

paper of handwritten or printed text. There is no interaction with the user, the text can be

written or printed any time before the recognition process starts. In the case of Optical

Character Recognition (OCR) the text was printed mechanically in uniform font.

In online handwriting recognition system, the user writes on a digital device using a

special stylus, the system samples and records the point sequence [(x0, y0), …, (xn, yn)] as it

is being written. Therefore, the online handwriting samples contain additional temporal

data, which is not present in offline sampled data.

When implementing handwriting recognition system, differences between online and

offline should be taken into account. Next we review these differences.

1. Current technology of scanning and digitizing of images into a discrete grid form

eliminates the continuity of the original shapes. Such continuity is maintained in the

online or temporal recognition systems.

2. The offline recognition systems are sensitive to noise over the entire scanned paper

(page) while the online recognition systems are sensitive to the smoothness of the

drawn shapes.

- 13 -

3. An essential step in any offline recognition systems is to normalize the thickness of

letters/words (reduce the pen thickness to one). Such step usually involves image-

processing algorithms with high complexity and low robustness level. In contrast, the

notion of pen thickness does not exist in online recognition systems.

4. Pen features such as pen-up and pen-down are used to segment the input point

sequence. Obviously, these features are irrelevant to offline recognition systems.

5. Online recognition systems are interactive nature. Therefore, the recognition time is

critical for online recognition. However, real-time performance is not crucial for

offline recognition systems.

Table 1-1 summarizes the comparison between offline and online recognition systems.

For detailed comparison between online and offline handwriting recognition see [1].

Feature

Offline handwriting recognition system

Online handwriting recognition system

Data acquisition Optical scanner / Digital camera

Digital tablet / Optical pen / CrossPad / Tablet PC

Input Bit-map image Point time-sequence

Noise type Global Local

Stroke thickness Exists Does not exist

Pen features Do not exist Exist

Real-time Recognition

Not Critical Required

User interaction Yes No

Table 1-1: Offline verses online features

1.2. Writer-Dependent versus Writer-Independent

There is a wide range of variation among handwriting of writers in a specific script

(Latin, Arabic, Hebrew, Hangul and etc). These variations are perceived to be

individualistic and even expository of personal traits such as: emotional indicators,

mental ability, indicators of self-image, thinking styles, attitude, and modes of

communication [2]. Handwriting recognition systems can be classified into two

categories regarding how these variations are handled.

- 14 -

1. Writer-dependent systems recognize handwriting of users who train the

system in their own handwriting style. They usually achieve a high accuracy only

to those users, since variant features of those users/trainers are taken into

consideration. Examples of well-known features are speed of writing and pen

pressure.

2. Writer-independent systems are capable of recognizing handwriting of users

who have not trained the system. Therefore, they have to learn global features

fundamentally common to all users and invariant to handwriting styles. Writer-

independent system is more difficult to develop than the writer-dependent system,

since the variant features should be removed carefully to avoid destroying useful

data. Usually every script has its own characteristic and global features. For

example, the Latin scripts have different characteristics than the Arabic scripts.

1.3. Lexicon-Dependent versus Lexicon-Independent

Handwriting recognition systems are divided into two types: Lexicon-dependent (LD)

and Lexicon-independent (LI).

1. Lexicon-dependent systems recognize a handwritten word by searching for the

most matching word to the input point sequences from a given arbitrary size lexicon.

LD systems can be divided into two types: systems with static-lexicon and systems

with dynamic-lexicon. Static-lexicon usually requires a direct modeling of every word

in the lexicon. Thus, adding a new word to static-lexicon systems may require an

architectural changes or an explicit new training for this word. In contrast, dynamic-

lexicon systems allow adding new word to the lexicon database with no architectural

changes and without retraining. However, adding a large number of words to the

- 15 -

lexicon will increase the search space measurably leading to higher error rates and

degrading performance.

2. Lexicon-independent systems are capable to recognize a handwritten word that

consists of any permutation of letters without the constraint of being in a lexicon.

1.4. Arabic Script Background

This section is dedicated to explain the Arabic script and the way it is written. The

lack of knowledge of Arabic script may complicate the understanding of the important

key issues.

1.4.1. History of Arabic Language and Script1

The language of Arabic originated from the earliest-known alphabet, the North

Semitic alphabet that was developed in Syria around 1700 B.C. The North Semitic

alphabet consisted of 22 consonants. Similarly, the Arabic alphabet is comprised solely of

consonants. In Arabic, vowels are represented by diacritics, which are accent mark used

to denote pronunciation.

Arabic was one of three languages, along with Hebrew and Phoenician, that were

developed from the North Semitic alphabet. In 1000 B.C., the Greeks took the Phoenician

alphabet as a model and added vowels to it. About two hundred years later, the Etruscans

used the Greek alphabet as a model for theirs. Then, the ancient Romans used this model,

and that became the basis for all Western alphabets. Figure 1-1 shows the script

development with the alphabet of each script.

1 The material presented in this section is based on [3]

- 16 -

Figure 1-1: Development of scripts, courtesy: Mamoun Sakkal [3]

The North Arabic script eventually became the Arabic script of the Quran. As Islam

spread outside the Arab world, the language of the Quran spread with it. Several non-

Arab nations adopted Arabic for their own languages. Examples include Farsi in Iran,

where four letters were added ((, گ ,ژ ,چ), and the Ottoman Turks, who added yet

another letter. Other languages that also used the Arabic alphabet at times include Urdu,

Malay, Swahili, Hausa, and others.

With the rise of Islam in the 7th century, Arabic developed into an art form. There are

two main categories of calligraphic style (as illustrated in Figure 1-2). One is the dry

style, which is generally called the Kufic. The other is soft cursive, which includes

Nasikh, Thuluth, Nastaliq and many others.

- 17 -

Figure 1-2: Example of Arabic calligraphy. Two words: م�� ‘love peace’, courtesy: [3]

1.4.2. Characteristics of Arabic Script

This section explains the nature of Arabic script. Which consists of 28 basic letters,

12 additional special letters, 8 diacritics2, and it is written from right to left. Arabic words

are written (machine printed and handwritten) in a cursive style. Discreet styles of

handwriting in Arabic have been proposed but never became popular [4]. Most letters are

written in four different shapes depending on their position in a word (e.g. the letter (E) is

written: isolated: ع, initial: ـ!, medial: ـ (see the alphabet Table 1-2). There (ـ, :and final ـ

are 6 letters { ا (A), د (d), ر ,(*) ذ (r), ز (z), و (w) } from the basic letters that have no

medial and initial shape, these letters do not connect the following letter and will be

referred to as disconnectives <Discon-letter>. These letters appearing will cause the

continuity of the graphic form of the word to be interrupted. We denote letters in a word

that are connected together as a word part. If a word part is composed only of one letter,

this letter will be in an isolated shape. Arabic script is also similar to English in that it

uses spaces and punctuation markers to separate words.

2 The diacritics are not explored here, since they are almost never used in handwriting.

- 18 -

Example 1-1: Consider the Arabic word: ت� ’(mrtfEAt)3 ‘heights ��ت2

• By separating the letters we will get: ـ ـ� ت .�ـ ـ� تـ ـ2ـ ـ

• This word consists of 7 letters: ا ,ع ,ف ,ت ,ر ,م, and ت

• This word includes three word parts:

o �� - this is a word part because it ends with a letter (r) that

has no medial and initial shape.

o � this is a word part because it ends with a letter (A) that - ت2

has no medial and initial shape.

o ت - this is a word part because it contains only one letter.

Formally, Arabic word can be described as follows:

<Word> ::= {<Word-part>}+

<Word-Part> ::= <Con-letter>.initial • <Con-letter>.medial* • <Letter>.final

<Con-letter> ::= ت ب|| ||ث|| ج|| خ|| ح ||س ||ض|| ص|| ش || || ظ|| ط || غ || ع ||ق|| ف ||ل|| ك ||ن || م F ي || ئ||

<Discon-letter> ::= ) ء || ة || L || M ||N || �ا|| ر|| ز|| د || ذ || و|| ) S|| أ || إ || ى ||ئ || ؤ ||

3 In this work, all Arabic words and letters are transliterated in Buckwalter’s Arabic transliteration format (without diacritics) [5]

- 19 -

Letter Name

Isolated

Initial Medial Final

Letter Name Isolated Final

Ba ـ ب ـ� ا Alef ـ� ـ�ـ

Ta ـ�ـ تـ ت Uـ Dal د Vـ

Tha ـ �ـ ثWـ Xـ Dhal ذ Yـ

Jim ـ�ـ جـ ج Zـ Ra ـ� ر

H’a ـ �ـ حـ] ـ Zay ـ\ ز

Kha ـ_ ـ^ـ [ـ خ Waw ـ` و

Sin ـ �ـ سaـ bـ Basic Arabic diconnective Letters

Shin ـ�ـ شـ ش dـ

Sad ـ صe ـfـ gـ

Dad ـ ضh ـiـ jـ

Ta ـ طk ـ�ـ lـ

Zha ـ ظm ـnـ oـ

A’yn ـ !ـ ع ـ, ـ

Ghayn ـ غp ـqـ rـ

Fa ـ2ـ �ـ ف sـ

Qaf ـ قt ـ�ـ uـ

Kaf ـ آـ كwـ xـ

Lam ـ�ـ لـ ل zـ

Mim ـ� ـ}ـ �ـ م

Nun ـ{ ـ|ـ ـ ن

Ha F ـ� ـ$ـ هـ

Ya ـ� ـ�ـ "ـ ي

Basic Arabic connective letters

Table 1-2: The basic Arabic alphabets

Table 1-3: Special Arabic letters

Letter Name

Isolated Initial Medial Final

Hamza Ala Korsi

ـ� ـ�ـ �ـ ئ

Special letter – does not appear in begin of words

Letter Name Isolated Final

Alef + Hamza above أ � ـ

Alef + Hamza below ـ� إ

Alef + Mada S ـ�

Lam Alef � ـ�

Lam Alef + Hamza a. M ـ�

Lam Alef + Hamza b. L ـ�

Lam-Alef+Mada N ـ�

Waw+Hamza ـ� ؤ

Additional special diconnective letters

Ta Marbota ـ� ة

Alef Maksora ـ� ى

Special letters written only at the end of the

word (no medial and initial shape)

Letter Name Isolated

Hamza ء Special letter-written only in an isolated shape, does

not appear in begin of words

- 20 -

Letter Buckwalter Transliteration

l ل * ذ ' ء

S | ر r م m

n ن z ز < أ

s F h س & ؤ

w و $ ش > إ

Y ى S ص { ئ

y ي D ض A ا

ب b ط T ًـ F

N ـٌ Z ظ p ة

K ـٍ E ع t ت

a ـَ g غ v ث

u ـُ _ _ j ج

i ـِ f ف H ح

~ ـّ q ق x خ

o ـْ k ك d د

Table 1-4: Backwalter transliterations for Arabic letters and diacritics [5]

1.4.3. Dots and Additional Strokes

Most Arabic letters contain dots in addition to the letter body, such as ش ($) which

consists of س (s) letter body and three dots above it. Some other letters also composed of

strokes which attached to the letter-shape body such as ـ� ,ط and ك. These dots and

strokes are called delayed strokes since they are usually drawn last in a handwritten

word-part/word. This is similar to handwriting of Latin scripts, the cross in letters t, the

slash in x letter and the dots in i and j letters, they are also usually drawn at last.

Dots in Arabic script are very important to distinguish among letters that have the

same letter body but different in the number of dots put under or above the letter.

Eliminating, adding or moving a dot could turn into totally different letter sounds, as

illustrated in Figure 1-3 and Figure 1-4.

- 21 -

(ِa) ام\! (b) ام�p

Figure 1-3: Word (a) is a result of moving the dot to the left from word (b)

Word (a): (EzAm) ‘lion’

Word (b): (grAm) ‘love’

(a) ب�! (b) ب�p

Figure 1-4, Word (a) is a result of eliminating the dot above the first letter from word (b)

Word (a): (Erb) ‘Arab’

Word (b): (grb) ‘west’

Figure 1-5: Delayed strokes in Arabic script

Figure 1-5 illustrates the delayed strokes possibilities – above or under the letter body

(bold line) in two main and common styles:

a. One dot under the letter body (e.g. ج)

b. One dot above the letter body (e.g. ن)

c. Two dots or horizontal line above the letter body (e.g. ق)

d. Two dots or horizontal line under the letter body (e.g. ي)

e. Three dots or a little “hat” above the letter body (e.g. ش)

f. One vertical line above the letter body (e.g. ط )

g. Hamza (ء) above the letter body (e.g. ك)

(a) (b) (c) (d) (e) (f) (g)

- 22 -

1.5. Recognition Difficulties and Ambiguity

Arabic words can be written in different styles. A common word writing style is the

top-down style, as shown in Figure 1-6-(a)/(c)/(d). In this style, some letters are written

above follows letters. Such style causes a problem in estimating the word baseline that is

used usually for normalization in the preprocessing phase (we will discuss it in Chapter

(a) (b) (c) (d)

Figure 1-6: Illustration of words with the same meaning written in different styles

1.6. Dot Problem in Words

In some cases mainly in the top-down writing style, dots may cause ambiguity of

understanding handwritten words, it will be impossible to recognize that words without

knowing the sentence. As shown in Figure 1-7, if the second dot from the right belongs to

the first letter, the word will be zت (thl) ‘to solve’ or ‘to be solved’; if the second dot

belongs to the second letter, the word will be z^ (nxl) ‘palms’.

Figure 1-7: Illustration of an ambiguous Arabic word

1.7. Absence of Diacritics

Arabic diacritization is optional and almost never written. This leads to a reduction in

the average length of words, which increases word similarity and ambiguity. Absence of

these diacritics causes the same word to be pronounced differently and this may lead to

- 23 -

different meanings. For example the Arabic word �ر�V� without diacritics has two

meanings, the first means ‘teacher’, with diacritics: Vَ�ُ��َر (mudar~_sap) and the second

means ‘school’, with diacritics: ��ََرVْ�َ (madorasap).

In [6] has shown that language models (e.g. bi-gram) significantly reduce perplexities

in handwriting recognition problem. However, absence of diacritics may complicate the

selected language model(s). Since this absence will increase the number of meanings for

a single word. The following example is chosen to illustrate the difficulty in bi-grams

model (using statistics on the probability of one word following another):

Figure 1-8: Unclear Arabic sentence because of the last word

Let us assume that running a certain handwriting recognizer on the handwritten

sentence, in Figure 1-8, returns a maximum probability for the first two words to their

correct words in the lexicon; However, it returns an equal probability for the last word to

be � ,outstanding’. To eliminate this confusion‘ (ra'Ep) را� � (wAsEp) ‘large’ or وا�

addition model should be used.

Figure 1-9: The graph of the recognized possible sentences of the sentence in Figure 1-8

�Vر��Teacher or

School

� را�

Outstanding

� وا�

FYه This is

- 24 -

The possible sentences could be obtained from the graph in Figure 1-9:

- This is an outstanding school – grammatically and semantically correct.

- This is an outstanding teacher – grammatically and semantically correct.

- This is a wide school – grammatically and semantically correct.

- This is a large teacher – grammatically correct (according to the lexical

classification), but non-sense on an Arabic semantic level.

Knowing that the second word is ‘teacher’, the language model leads to an immediate

elimination of the ‘large’ option. However, as mentioned, this can not be known without

the diacritics. This problem is out of the boundary of this work, since language models

are not supported.

- 25 -

Chapter 2

Previous Work This chapter explores several of the issues and methods involved in typical online

handwriting recognition which have been investigated in previous work. Typically,

handwriting recognition systems in past studies have been implemented using part or all

of the following phases: data acquisition, segmentation, preprocessing, feature

extraction, recognition/modeling, and postprocessing. Sections 2.1, 2.2, 2.3, and 2.4

describe data acquisition devices, segmentation, modeling methods, and postprocessing

respectively. Chapters 4 and 5 have been dedicated for an in-depth analysis of

preprocessing and feature extraction both in previous works and this work respectively.

For a more complete survey of techniques that have been used in handwriting recognition

see [7]- [9].

2.1. Data Acquisition

Online handwritten data are acquired using a special device. Typically, a digital tablet

(see Figure 2-1) is used to sample the pen position with a sampling rate of 80 to 200

samples per second. This generates a point (x- and y-coordinate) sequence of the

handwriting data. The pen pressure of every sampled point is additionally captured by

some devices and used by several systems. Recently, other devices are used to collect

these data, such as: an optical pen to write anywhere on some paper (see Figure 2-2),

touch screens – used in many PDAs and Tablet PCs (see Figure 2-4 and Figure 2-5), and

- 26 -

CrossPad (see Figure 2-3). The advantage of optical pens and CrossPads compared to the

other devices is that the writing surface is not part of the hardware, and the paper makes

the writing ability more natural.

Figure 2-1: Digital Tablet

Figure 2-2: Optical Pen Figure 2-3: CrossPad

- 27 -

Figure 2-4: PDA Figure 2-5: Tablet PC

2.2. Segmentation

The difficulty of handwriting recognition tasks forced different researchers in the past

to make some assumptions regarding the style and manner of writing. Figure 2-6

illustrates different writing styles in English. The writing style of the first three lines is

generally referred to as discrete handwriting, in which the writer is asked to write each

letter within a bounding box or to isolate each letter. The writing style of the fourth line is

commonly referred to as pure cursive or connected handwriting, in which the writers are

asked to connect all of the lower case letters within a word. Usually, people write in a

mixed style, a combination of printed and cursive/connected styles, similar to the writing

of the fifth line. Cursive style (fourth and fifth lines) is the most difficult style to be

recognized, since the recognizer has to segment each word to its compound letters, either

in a separated phase from the recognition or simultaneously with the recognition process.

- 28 -

Figure 2-6: Handwriting styles in English. The order is from easy to difficult in term of recognition [10]

Depending on the form in which data is acquired to a recognition system,

segmentation of the input may be required. Data which is given in the form of a single

word will require letter segmentation, if writing is modeled at the letter level (not the

word level). While data which is given in the form of several words – in a line, or even an

entire page – a segmentation method is required to separate words, if writing is modeled

at the word level (not a sentence modeling).

For well-separated letters, a simple spatial gap threshold can be used to segment

letters. However, more complicated method is required for connected or overlapping

letters. One method begins by over segmentation in which candidate segmentation points

are selected [11]. A graph of possible segmentation sequence is constructed by

connecting consecutive segments to form a sequence of possible whole letters. Each of

these combined segments is then classified into a letter class by the recognizer and the

“best” letter sequence is identified during the post processing phase [9]. A further

- 29 -

extension of this approach is the method of segmentation by using hidden Markov

models which is also adopted in our work, described in Chapter 6.

When writing sentences, a word separation method is necessary. Particularly, in

mixed writing style, since it is not clear if a subword belongs to the current word, belongs

to the next one, or even it is a new separated word. This is a reason which turns the mix

writing style into the most difficult style to be recognized. A spatial gap threshold can be

used to segment words. Some work, such as [6] and [12] used HMM to find the most

likely hypothesis sentence (described in detail in Section 2.6.2)

As mentioned in Chapter 1, there is no discrete style in Arabic and words may contain

several word parts. Thus, Arabic script belongs to the mixed writing style. Sentence

segmentation in our work has been done by a spatial gap threshold depending on the size

of the word.

2.3. Modeling Methods

Modeling methods are the core of the handwriting recognizer; three main modeling

methods have been used in previous work: template matching models, statistical models,

and neural network models are described in the following sections respectively.

2.3.1. Template Matching Models

Template-matching is widely used in the pattern-recognition field. A template is

every type of pattern that can be provided to the recognizer. Template-matching has been

employed for offline handwriting recognition which is usually conjunct with other

techniques. Some previous work has applied template matching for online handwriting

recognition described next.

- 30 -

At the training phase, template-matching systems define the template dictionary of

the basic handwriting segments and represent them by feature-vector notions used for

recognition. These templates are denoted as predefined templates, unknown pattern refers

to the input data shape which is needed to be recognized.

At the recognition phase, template-matching systems compute a distance measure

from the segment of data to templates from the predefined-template dictionary. Different

distance measures have been introduced in the past, such as using some minimum

distance approach of the input data to the predefined template, by creating a likelihood

value depending on the closeness of the unknown pattern to a predefined template. One

very popular matching algorithm which has been used widely in the literature is elastic

matching [13]. The basic notions behind elastic matching is that the computation of the

minimum distance measure between a set of points and a predefined template which does

not necessarily match to the same number of points or size [7]. In addition, the distance

measure can be found between an unknown pattern and the predefined templates by

applying a number of local transformations to the unknown or the predefined template.

The amount of distortion between the unknown and the predefined templates has been

used as a distance measure.

2.3.2. Statistical Models

Statistical models such as hidden Markov models (HMMs) were first applied to

speech recognition [14]- [17] with great success. Recently, researchers have taken the

advantage of the similarity of online handwriting recognition problem and the speech

recognition to apply similar techniques used in speech recognition to handwriting

recognition [6], [12], [18], [20]- [22]. HMM is widely used because of the time sequential

- 31 -

natural of online scripts as well as its capability of modeling shape variability in

probabilistic terms. Typically, each letter class has been represented by one left-to-right

HMM with or without state skipping and variant techniques to choose the states number.

These models are trained using either Viterbi or forward-backward training algorithms.

These letter models are embedded in a connected graph to represent the word/sub-word

dictionary. Viterbi algorithm is used to find the hypothesis word that matches the input

observation sequence the most in such graph. The results of handwriting recognition for

English cursive script have been better than those achieved with speech recognition [19].

2.3.3. Neural Networks Models

Artificial Neural Networks (ANN) have been used successfully in many particular

problems, such as learning to recognize spoken words, learning to recognize faces, OCR,

equation recognition, signature verification, and other pattern recognition problems.

Conventional ANN has been employed to recognize online handwriting for discrete

letters. However, it has not been employed to recognize cursive and unconsecrated words

– unless templates of each word are made. This is due to the problem of letter

segmentation that exists in cursive and unconstrained handwritten words (see Section

2.2). To resolve this problem, many researchers use Time-Delay Neural Networks

(TDNNs) in different manners, described next.

TDNNs were fist introduced for speech recognition and are well suited for sequential

signal processing. TDNN is known to be an appropriate architecture for time series

prediction, due to its internal memory to represent temporal relationships.

TDNNs have been explored for the online handwriting recognition problem. A set of

networks are used to look at adjacent data frames. The outputs of these networks are then

- 32 -

passed along to another network which has outputs corresponding to each character in the

alphabet that may be assigned to a point sequence. The output of each character node

represents the likelihood of the data being a part of that character [7].

2.4. Postprocessing

The goal of the postprocessing phase is to use additional contextual information about

the characters or words being recognized, which has not been taken into account in

previous phases, in order to correct possible errors. In addition, postprocessing is used in

some systems to determine the “best” characters, words or sentences among some

candidates returned by the recognizer. For instance, some systems strip off some data,

such as dots and delayed strokes, in the preprocessing phase and then use it in the

postprocessing phase to find the correct candidate. Postprocessing may also be used to

correct commonly-made mistakes that are well-understood in terms of detection and

correction. More advanced postprocessing techniques can be used to involve language

models such as a grammar, bi-gram or Markov chains to determine the best sentence that

matches predefined rules.

2.5. Related Work

• Signature Verification: Biometric authentication methods have been explored

recently, such as voice identification, fingerprint identification, face recognition,

and retina scan. Signature verification is an addition biometric-authentication

technology that is used to positively identify a person from their handwritten

signature. Similarly to handwriting recognition, there are two types of signature

verification systems, online and offline. With online signature versification, it is

not the shape or look of the signature that is meaningful; it is the changes in

- 33 -

speed, pressure, and timing that occur during the act of signing. Only the original

signer can recreate the changes in timing and X, Y, and Z (pressure).

• Equation Recognition: Handwriting-based equation editor is a system that

allows users to enter a handwritten mathematical formulas using one of the

acquisition devices described above. Typically, such system uses online character

recognition techniques and a graph grammar to generate an internal parse tree of

the input, which can then be converted into an output representation such as

LATEX.

2.6. Previous Handwriting Recognition System Summarization

The aim of this section is to summarize previously published works with an emphasis

on the implemented methodologies to give the reader a sense of how online handwriting

recognition systems are constructed. Furthermore, our research has implemented

similarly related concepts and ideas to those previously utilized in these articles.

2.6.1. Real-Time Online Unconstrained Handwriting Recognition Using Statistical Methods

Krishna S. Nathan and his colleagues in [18] discussed a general handwriting

recognition system based on Hidden Markov Models for large English vocabulary,

writer-independent and unconstrained handwriting in any combination of styles (discrete,

cursive or both). A key characteristic of the system was that it performed recognition in

real-time on PC 486 platforms without requiring large amounts of memory. Language

models were not employed in this work.

o Preprocessing: The acquired point sequence was normalized to a standard size and

re-sampled spatially.

- 34 -

o Feature Extraction: A feature vector consisting of ( ), ( ), cos( ) and sin( )x y θ θ∆ ∆

was constructed at equi-spaced points along the trajectory, where θ is the angle at the

sample points. Contextual information is incorporated by splicing several individual

feature vectors into one large feature vector, so that it spans a window of adjacent

points called frame.

o Real Time: To gain a real-time performance they divided the search process to two

phases, fast match and detailed match. Fast-match search is a simpler degenerate

single state model to generate a short list of candidate hypotheses strings. The

detailed-match search is computationally more expensive model that used to reorder

each word in the short list.

o Recognition Framework: A character was represented by a set of left-to-right

HMMs. Each HMM in this set represents a specific writing style for this character

(called lexeme). Either Viterbi or forward-backward training used to train the lexeme

HMMs. The diagram of the implemented recognition system in this wok is shown in

Figure 2-7.

o Delayed Strokes: The proposed solution to the delayed-strokes problem was to

strip them off before training the HMMs. In decoding, the search mechanism first

expands the non-delayed strokes based on the frame probabilities and the character

models. The delayed strokes were incorporated based on their position relative to the

non-delayed strokes and their fast match probabilities.

o Data Set: Approximately 100,000 characters of data were collected from a pool of

100 writers. The training set consisted of words chosen from a 20,000+ word

dictionary and discrete characters written in isolation.

- 35 -

o Recognition Results: The test set was composed of data collected from a set of 25

writers (different from those who collected the training data) and consisted uniquely

of words chosen at random from the same word dictionary. The test task was applied

on three dictionaries consist of 3K, 12K and 21K words with 9%, 12%, and 18.9%

error rate respectively.

Figure 2-7: The block diagram of the recognition system in [18]

2.6.2. Online Cursive Handwriting Recognition Using Speech Recognition Methods

J. Makhoul and his colleagues in [6] applied an HMM-based continuous speech

recognition system to an on-line writer-dependent cursive handwriting recognition task

for English script. The base system was not modified except for using handwriting

feature vectors instead of speech. A 1.1% word error rate was achieved for a 3050 word

dictionary, and 3%-5% word error rates were obtained for six different writers in a

25,595 word dictionary. The segmentation of words to characters occurs simultaneously

with recognition process. One of the special characteristic of this work is that the

segmentation of sentences into words occurs naturally by incorporating the use of a

dictionary and a language model into the recognition process.

- 36 -

o Preprocessing: Two preprocessing steps were used on the point sequence data.

The first is a simple noise filter which required that the pen traverse over one

hundredth of an inch before allowing a new sample. The second step padded each pen

stroke to a minimum size of ten samples.

o Feature Extractor: Six features were extracted, the writing angle at that sample,

the change in the writing angle, ( ), ( )x y∆ ∆ , pen up/down bit, and Sgn(x – max(x)).

o Recognition Framework: Each character has been represented by a 7-state left-

to-right HMM, as shown in Figure 2-8. Since the penning of a script letter often

differs depending on letters written before and after it, as shown in Figure 2-9,

additional HMMs are used to model these contextual effects.

Figure 2-8: 7-state HMM used to model each character, courtesy: [6]

Figure 2-9: Illustration of contextual effects on a cursively written “i”. The left “i” is written after “m”, the right “i” is written after “v”, courtesy: [12]

o Delayed Strokes: As described above, the idea was to use the base speech

recognition system with no modification. This obligated to simulate a continuous-

time feature vector by arbitrarily connecting the samples from pen-up to pen-down

with a straight line. This line was sampled ten times. The data became one long

criss-crossing stroke for the entire sentence, where words run together, "i" and "j"

- 37 -

dots and "t" and "x'' crosses cause back-tracing over previously drawn script, as

shown in Figure 2-10.

Figure 2-10: Connecting strokes (in dashed lines), courtesy: [6]

o Language Model: This work has shown that statistical grammars (e.g. bi-gram)

significantly reduce the perplexity. Recognition with no grammar but with context

produced an error rate of 4.2%. When the grammar was added and context not used,

the error rate dropped to 2.2%. However, the best result used both context and a

grammar for a word error rate of 1.1% in ATIS (Airline Travel Information Service)

corpus of 3050 words.

o Limitations: The rules given the subjects were: sentence should be written in pure

cursive, the body of a word should be connected (lifting the pen in the middle of a

word is not allowed); and crossing and dotting should be done after completing the

body of a word.

o Recognition Results: This system was trained and tested by the same six users.

The bi-gram model used was constructed from approximately two million sentences

collected from the Wall Street Journal. The error recognition average was reported at

4.2% with a dictionary size of 25,595 words.

- 38 -

2.6.3. Writer Independent Online Handwriting Recognition Using an HMM Approach

J. Hu and her colleagues in [19] developed an HMM-based writer-independent

handwriting recognition system for English script. Writer independence was achieved by

two processes: (1) a preprocessing step to move much of the variation in handwriting due

to the varying personal styles and writing influences and (2) by feature invariance to

reduce sensitivity to the remaining variations.

o Preprocessing: Preprocessing phase was divided into two types: noise reduction

and normalization of size and orientation. Noise reduction was achieved by spline

filtering for smoothing after the standard wild point reduction and dehooking

procedures. Normalization was achieved by first estimating the base line (joining the

bottom of small lower-case letters such “a”) and the core line (joining the top of small

lower-case letters) based on Hough transform. Then, the word was rotated such that

the base line is horizontal, and scaled such that the distance between the base and core

lines equals a predefined value. Finally, an equa-arc-length resampling procedure was

applied to remove variation in writing speed.

o Feature Extraction: Seven features were computed at each sample point. Two of

these features are conventional ones widely used in HMM-based online handwriting

recognizers: tangent slope angle and normalized vertical coordinate. Two invariant

features, normalized curvature and radio of tangent, were introduced. These features

are invariant under arbitrary similitude transformation which adopted from the

machine vision world, and finally, three high-level features were extracted which are

indeed commonly used in handwriting recognition: crossing, cusps, and loops, as

- 39 -

Loops Crossings Cusps

Figure 2-11: Illustration of the three high-level features, courtesy: [19]

o Recognition Framework: Left-to-Right HMM without state skipping was selected

to represent subcharacters and characters. These character models were embedded in

a grammar network, which can represent a variety of grammatical constraints, e.g.

word dictionaries, statistical N-gram models and context free grammars. Figure 2-12

is a grammar network implementing a dictionary, each arc in this network represents

a character and each path from the start node to the final node corresponds to a word

in a dictionary. To recognize a word, the Viterbi algorithm was used to search for the

most likely state sequence corresponding to the given observation sequence of this

Figure 2-12: A grammar network implementing a word dictionary, courtesy: [19]

- 40 -

o Training: Models were trained using the iterative segmental training method based

on Viterbi decoding. Training flow was divided into four phases. The first phase is

initial character training to train specific isolated character with a given specific

class. The second phase is lattice character training to train a specific isolated

character with any class. The third phase is linear word training to train a specific

cursive word, given the class sequence corresponding to word characters. The last

phase is lattice word training to train a specific word with any character classes.

o Delayed Strokes: Delayed strokes were treated as special characters in the

alphabet. A word with delayed strokes was given alternative spelling to accommodate

different sequences with delayed strokes drawn in different orders. However, this

solution could dramatically increase the hypothesis space, and is impossible to use for

a large vocabulary task. A two-stage approach was used to solve this problem. The

first stage is by applying the N-best decoding algorithm using simplified delayed

stroke modeling to narrow down the choice of words. Then, the second is the detail

matching stage by constructing a grammar network to cover the top N candidates

using exact delayed stroke modeling. Afterwards, a best-path search was applied to

the reduced network to find the final optimal candidate.

o Recognition Results: Word-recognition results on small, medium and large size

vocabularies have been reported. Recognition rate of 91.8% - 98.4% with a dictionary

consists of 500 words, recognition rate of 83.2% - 94.5% with a dictionary consists of

5000 words, and recognition rate of 76.3 - 91.0 with a dictionary consists of 20,000

words.

- 41 -

2.7. Online Handwriting Recognition Work for Arabic Script

Several previous work has been done on offline handwriting recognition for Arabic

script, such as: [23]- [26]. However, there has been minor work which is known to the

author that has tackled/discussed the online Arabic handwriting recognition difficulties.

S. Al-Emami and M. Urber [27] proposed an online Arabic handwriting recognition

system based on decision-tree techniques to recognize Arabic words. It involves a

preprocessing stage, in which features of the letters are selected. The slope of each stroke

has been categorized to one of four directions, followed by a learning stage in which the

preprocessed data are entered into a decision tree. The same process is applied for

recognizing the unknown letter. The system has been tested only with 13 Arabic letter

shapes that build the four words: (��, ��, ق ,دار�tز) which are used as a term of postcode

in some countries.

- 42 -

Chapter 3

Background – Hidden Markov Model Hidden Markov Model (HMM) is the statistical model extensively used in this work;

therefore, this chapter has been dedicated to give a basic background of the concepts,

problems, and solutions in terms of this model. HMMs have been the mainstay of the

statistical modeling used for modern speech recognition systems and they are regarded as

the most successful technique in this domain [30]. HMM has also been applied to many

types of problems in molecular sequence analysis.

HMMs are divided into two types according to their observation densities: discrete-

density and continuous-density HMMs. Discrete-density HMMs refer to when the

observations are described as discrete symbols chosen from a finite alphabet. Continuous-

density HMMs refer to when the observations are described as continuous signals or

vectors. For simplicity, this chapter is restricted to describe only HMMs with discrete-

observation densities. For a complete survey of HMM see [14] and [30].

Hidden Markov models can be presented as extensions of the discrete-state Markov

process which is described in the next section. The extension of the concepts of the

discrete-state Markov process to HMMs is described in Section 3.2. Section 3.3 is

dedicated to discuss the three fundamental problems in term of HMM and their solutions.

- 43 -

3.1. Discrete Markov Process

A Markov Process is a stochastic process where the next state depends only on the

current state (or more generally, on a finite number of immediate past states), thus, it

satisfies the Markov condition (or the state-independence assumption). A discrete

Markov process, which is also known as the Markov chain, is a system which is

described at any time as being in one of a set of N distinct states, S1, S2, … , SN, as shown

in Figure 3-1 (where N = 3). We denote the actual state at time t as qt, (where t = 1, 2,

3,…) denotes the time instants associated with state changes. The probability of the

process being in state Si at time t is denoted by P(qt=Si). Specifically, in the first order of

the Markov chain, the probability of being in Sj, Si, Sk, … at time t, t-1, t-2, … respectively

is truncated to be the probability of being only in Sj, Si at time t, t-1 respectively, i.e.,

P(qt=Sj | qt-1=Si, qt-2=Sk, …) = P(qt=Sj | qt-1=Si)

Figure 3-1: Illustration of Markov chain with NNNN = 3 states

Formally, the Markov process is defined as the duple λ = {A, ∏}, where,

∏ = πi = P(q1 = Si), where, 1 ≤ i ≤ N, called the initial state probabilities.

a3,2 a1,3

a1,1 a2,2

- 44 -

A = aij = P(qt+1 = Sj| qt = Si), where, 1 ≤ i, j ≤ N and t>0, are called the transition

probabilities.

= ∀∑

Example 3-1: Suppose today’s weather tells us something about tomorrow’s weather (but

yesterday’s weather does not tell us anything about tomorrow’s weather). Suppose that

the weather can either be Raining, Cloudy or Sunny.

S1 = Raining, S2 = Cloudy, S3 = Sunny

Figure 3-2 is the weather Markov chain given the following transition probability matrix

and the initial probability vector:

0.5 0.4 0.1

{ } 0.6 0.2 0.2

0.1 0.2 0.7

i jA a

[ ]{ } 0.3 0.2 0.5i

πΠ = =

Figure 3-2: The weather Markov chain

If it's sunny today, what is the probability that it will rain the day after tomorrow, given

the Markov chain in Figure 3-2?

0.2 0.1

0.5 0.2

- 45 -

Answer:

Let’s denote q0 to be the state of today, q1 to be the state of tomorrow and q2 to be the state

of the day after tomorrow, so we are interested in: P(q2 = R | q0 = S)

P(q2 = R | q0 = S) =

P(q1 = R| q0 = S) × P(q2 = R| q1 = R)))) +

P(q1 = C| q0 = SSSS) × P(q2 = R| q1 = C) ) ) ) +

P(q1 = S| q0 = S) × P(q2 = R| q1 = S) ) ) ) = 0.1×0.5 + 0.2×0.6 + 0.7×0.1 = 0.24

3.2. Extension to Hidden Markov Models4

Thus far, each state in Markov chains is corresponded to one observable event. This

model is very limited in terms of its application with many problems of interest. In this

section, the concept of Markov chains has been extended to include the case where the

observation is a probabilistic function of the state. This extension is applied by

associating a deterministic observation to each state of a Markov process i.e., the symbol

ot is always observed when the process is in state qt. In this extended model, the

observation does not uniquely determine the state sequence. In general, one observation

sequence may be produced by many different state sequences. Hence, the state sequence

is hidden.

An HMM is formally defined as the triple λ = {A, B, ∏} where,

1) N is the number of states of the model.

2) M is the number of distinct observation symbols per state, i.e., the discrete

alphabet size. These symbols are denoted as V = {v1, v2, …, vM}.

4 The material presented in this section is based on [14]

- 46 -

3) A is the state-transition probability distribution as defined in the Markov process,

(see previous section).

4) The observation-symbol probability in state j, B={bj (k)}, where,

bj (k) = P(vk at t | qt = Sj), 1 ≤ j ≤ N,

1 ≤ k ≤ M

The probability of observing the symbol vk when the process is in state k. B is also

known as the observation-probability matrix.

5) ∏ is the initial state distribution, as defined in the Markov process (see previous

section).

Figure 3-3: Illustration of a hidden Markov model with three states (NNNN = 3) and two observable

symbols (M M M M = 2)

3.3. The Three Fundamental Problems for HMMs

Given the HMM of the previous section, three basic problems of interest have to be

solved to make this model very useful in a real-world application. These problems are

described as follows:

a3,2 a1,3

a1,1 a2,2

b1,1 b1,2 b2,1 b2,2

b3,1 b3,2

- 47 -

• The Evaluation Problem: Given the observation sequence O = [o1, o2,…,oT],

and a model λ = {A, B, ∏}, how do we efficiently compute P(O|λ), the probability

of the observation sequence, given the model?

• The Decoding Problem: Given the observation sequence O = [o1, o2,…,oT], and

a model λ = {A, B, ∏}, what is the corresponding optimal state sequence Q = [q1,

q2, ... , qT] which maximizes P(O, Q|λ) ? (i.e., best state sequence “explains” the

observation sequence)

• The Training Problem: Given the observation sequence O = [o1, o2,…,oT], what

is the optimal model that maximizes P(O|λ) ?

3.3.1. The Evaluation Problem

Given the observation sequence O = [o1, o2,…,oT], and the model λ = {A, B, ∏}, we

need to compute P(O|λ).

The most direct way of doing this is through enumerating every possible state

sequence of length T. However, this solution requires an exponential number of

computations: 2T·NT, where N is the number of states. Therefore, a more efficient

algorithm has been proposed to solve this problem which is known as: Forward-

Backward Procedure. Consider the forward variable:

1 2( ) ( ... , | )t t t i

i P o o o q Sα λ= =

iα is the probability of the partial observation sequence O = [o1, o2,…,ot] and state Si at

time t, given the model λ. ( )t

iα is computed inductively as follows:

1) Initialization:

1 1( ) ( ), 1i i

i b o i Nα π= ≤ ≤

- 48 -

This step initializes the forward probabilities as the joint

probability of state Si, and initial observation o1.

2) Induction:

( ) ( ) ( ), 1 1

t t ij j t

j j a b o t T

α α+ +=

= ≤ ≤ −

≤ ≤

3) Termination:

( | ) ( )N

P O iλ α=

Using the above algorithm we efficiently get the answer of the evaluation problem.

The number of computations involved is on the order of T·N2 instead of 2T·NT. For in-

depth analysis of the forward procedure, see [14] and [30].

3.3.2. The Decoding Problem – Viterbi Algorithm

Given the observation sequence O= [o1, o2,…,oT], and the model λ = {A, B, ∏}, we need

to find the optimal state sequence (path) Q = [q1, q2, ... , qT] which maximizes P(O, Q|λ).

The well-known Viterbi algorithm, based on dynamic programming, exists to find the

whole state sequence with maximum likelihood. In order to facilitate the computation we

define an auxiliary variable,

1 2 11 2 1 2

, ,...,( ) max ( ... , ... | )

t t tq q q

i p q q q i o o oδ λ−

iδ denotes the highest probability of the optimal partial state sequence, Q = [q1, q2, ... ,

qt] leading to state Si at time t while observing the observation sequence O = [o1, o2,…,ot].

iδ is computed inductively much like the forward variable ( )t

iα . Since the optimal

state sequence Q = [q1, q2, ... , qt] is required, we need to keep track of the argument which

- 49 -

maximizes ( )t

iδ , therefore the matrix ( )t

jψ is used to save this tracked data. The Viterbi

algorithm is described as follows:

1) Induction:

( ) ( ), 1

i ji b o i N

= ≤ ≤

2) Recursion:

( ) max[ ( ) ] ( ), 2

t t ij j ti N

j i a b o t T

δ δ −≤ ≤

= ≤ ≤

≤ ≤

( ) arg max[ ( ) ], 2

t t iji N

j i a t T

ψ δ −≤ ≤

= ≤ ≤

≤ ≤

3) Termination:

1max[ ( )]

P iδ≤ ≤

arg max[ ( )]T T

q iδ≤ ≤

4) Path (state sequence) backtracking:

1 1( ), 1, 2,...,1t t t

q q t T Tψ + += = − −

Given a model λ = {A, B, ∏}, and an observation sequence O = [o1, o2,…,oT], the

Viterbi algorithm provides the following:

• The optimal path * * *

1, 2, ,[ ..., ]T

q q q corresponding to the observation sequence O.

• The accumulated likelihood score ( *P ) along the optimal path.

• The state segmentation along the optimal path corresponding to the observation

sequence O.

The number of computations involved in Viterbi algorithm is O(N2T) .

- 50 -

3.3.3. The Training Algorithm – Baum-Welch Algorithm

Given the observation sequence O = [o1, o2,…,oT], we need to determine a method to

adjust the model parameters {A, B, ∏} to maximize the probability of the observation

sequence given the model λ, i.e., P(O|λ). The training problem is considered the most

difficult problem among the three fundamental problems. In fact, there is no known

analytical solution for this optimization problem. However, an iterative procedure known

as the Baum-Welch algorithm or forward-backward algorithm that locally maximizes

P(O|λ). The Baum-Welch algorithm is a special case of the EM (Expectation-

Maximization) algorithm [28] and [29]. The Baum-Welch algorithm is described next.

In order to describe the Beum-Welch algorithm, auxiliary variables are defined.

Similar to the forward procedure, a backword procedure is defined first as follows:

1 2( ) ( ... | , )t t t T t i

i P o o o q Sβ λ+ += =

iβ is the probability of the partial observation sequence O = [ot+1, o2,…,oT], given the

state Si at time t and the model λ, it is computed inductively as follows:

1) Initialization:

( ) 1T

2) Induction:

( ) ( ) ( ), 1 1,1N

t ij j t t

i a b o j t T i Nβ β+ +=

= ≤ ≤ − ≤ ≤∑

The above procedure computes the backward variable ( )t

iβ in the order of T·N2.

Second, we define the auxiliary variable ( , )t

i jξ as the probability of being in state Si

at time t and in state Sj at time t+1, given the model and the observation sequence, i.e.,

1( , ) ( , | , )t t i t j

i j P q S q S Oξ λ+= = =

- 51 -

From the definition of the forward and backward variables we get:

1 11 1

( ) ( ) ( )( , )

( ) ( ) ( )

t ij j t t

i a b o ji j

i a b o j

α βξ

+ += =

∑∑

Given the model = {A, B, }λ Π and the observation sequence O, the model can be

reestimated iteratively to maximize P(O|λ). The new generated model is denoted

as = {A, B, }λ Π , where,

expected frequency (number of times) in state at time( 1)

expected number of transitionsfromstate

expected number of transitions fromstate

S to Sa

∑∑

expected number of times in state and observing symbol( )

expected number of times in state

t o v j

i vb v

= ∩ = =

∑ ∑

∑∑

- 52 -

Baum and his colleagues have proved that the above procedure, either 1) the initial

model defines a critical point of the likelihood function, in such case, = λ λ ; or 2) model

λ is more likely than model λ in terms of ( | ) ( | )P O P Oλ λ> . Based on this procedure,

if λ replaces λ and then repeating the reestimation calculation iteratively, this will

improve the probability of O being observed from the model until a limited point is

reached.

- 53 -

Chapter 4

Preprocessing The goals of the preprocessing phase are: (1) to reduce or remove imperfections

caused by acquisition devices, and (2) to minimize handwriting variations irrelevant for

pattern classification which may exist in the acquired data. The preprocessing phase has a

great influence on subsequent processing, and a real impact on the recognition rate

(see [31] for a good survey of preprocessing techniques). Typically, preprocessing of

online handwriting recognition in previous research has been classified into three

types [31]: noise reduction, data simplification, and normalization. The preprocessing

phase consists of three steps corresponding to each of these types respectively. The three

steps are described in detail in the following sections.

4.1. Step 1: Noise Reduction

Noise reduction attempts to reduce imperfections caused mainly by hardware (digital

device) limitations. Such reduction is usually performed by various geometric operations

such as smoothing, wild point reduction, and hook removal, which are described next.

• Smoothing is used to remove/reduce the effect of hardware limitations or

erratic hand motion. Strokes captured by acquisition devices tend to contain

noise. One could experience such noise while writing on a digital tablet while

traveling by a train, airplane, or car. Several approaches have been introduced

- 54 -

to reduce such noise by applying filters such as spline filters [19] and low-pass

filters [32].

• Wild Points Reduction is performed to reduce wild points which are the

occasional spurious points detected by digital devices, and are mainly due to

hardware problems. Major improvements to digital devices have reduced these

kinds of imperfections, but software processing is still required to completely

eliminate this problem.

• Hook Removal is used to remove the strokes captured by a digital device,

which tend to include small hooks created by a quick motion when the pen is

lowered or raised. Hooks are partially removed by hardware devices. Careful

software-based removal of leftover hooks is necessary in the preprocessing

phase.

In this preprocessing step, the low-pass filter method with a rectangle windowed

impulse response is used for noise reduction. The width of this window rectangle is

chosen empirically, as illustrated in Figure 4-1. The low-pass filter is used mainly for

smoothing the input point sequence while it is assumed that the wild points are reduced

by the input hardware. No direct or specific task is required to remove hooks since the

low-pass filter is adequate at hook removal.

(a) (b)

Figure 4-1: Illustration of applying a low-pass filter on a word which was written in a moving train

Word (b) is the result of applying a low-pass filter on word (a)

- 55 -

4.2. Step 2: Data Simplification

Data simplification is the process of cutting down the number of data points acquired

by a digital device, through the elimination of redundant points irrelevant for pattern

classification. This processing directly affects the recognizer performance. In this work,

Douglas&Peucker's algorithm [33] has been adopted to simplify a given polyline (point

sequence). Douglas&Peucker's algorithm finds the skeleton points of the point sequence

produced by step 1 (in Section 4.1). Skeleton points are a subset of the original points,

and represent the global geometric shape. Usually eliminating one of them may transform

geometric shape greatly. Skeleton vectors are defined as the vectors which connect each

two consequent skeleton points, as illustrated in Figure 4-2. We have applied

Douglas&Peucker's algorithm with a tolerance τ1, determined empirically.

Figure 4-2: Polyline simplification

4.3. Step 3: Normalization

The normalization task is used to reduce the effects of handwriting variations and to

simplify the recognition process. Essentially, this processing is used to reduce,

Original Polyline

Skeleton Point

Simplified Plolyline

Skeleton Vector

- 56 -

handwriting speed, word size, word orientation, and word skew. These features are called

variant features since they often vary between writers and even with the same writer. In

a writer-dependent system, all or part of these useful features are fed to the trainer and

recognizer engines – (for more details see section 1.2). However, when implementing a

writer-independent system, normalizing these features is a necessity to reduce the

recognition-process complexity and to minimize the shape-variation domain. It has been

shown that basic recognition and segmentation algorithms perform best for a consistent

size of writing [34]. In the following two sections, two types of normalization are

discussed: size/orientation normalization and speed normalization.

4.3.1. Size and Orientation Normalization

In Latin script, four boundary lines are defined for every handwritten word: base line,

core line, the ascender line, and the descender line. The base line is joining the bottom of

small lower-case letters such as “c”. The core line is joining the top of small lower-case

letters. The ascender line is joining the top of the letters with ascenders such as “l”. The

descender line is joining the bottom of letters with descenders such as “g” [19], [31],

and [34]. These boundary lines are illustrated in Figure 4-3. Previous work assumed that

base lines and core lines structure all words, while the ascender and decender lines do

not [19], and [34]. For each word, these boundary lines are estimated. This estimation has

been identified as a difficult problem in handwriting recognition, particularly when

tackling unconstrained script [19]. Different techniques have been proposed to resolve

this problem. The main approaches include the histogram-based methods [31], [35]- [37],

linear-regression methods [38], word-model methods [39], and Hough transform

methods [19]. Upon estimating these boundary lines, the word is rotated so that the base

- 57 -

line becomes horizontal, and scaled so that the core height (the distance between the base

and core lines) equals a predefined value.

Figure 4-3: Word boundary lines

In Arabic script, boundary lines do not necessarily exist in handwritten words,

particularly in the very common top-down style5. Since the same letter is written above

base lines, and it is at times written above the core lines, depending on the consequent

letter, as shown in Figure 4-4. In other words, there is no predefined area for where each

letter is located in a word. Since there are no well defined boundary lines in Arabic, this

indeed leads to complications in the normalization process.

Size and orientation normalization problems are outside the scope of this work. These

issues will be explored as part of our future work.

Figure 4-4: Illustration of a word written in the top-down writing style. The left word is the printed style of the right. �� (bHajm) ‘in size’. In the right word, the letter ـ� is above the letter ـ which is ,ـabove the letter ــ and above the letter ـ�.

5 The authors of [34] assumed that boundary lines always exist, even in Arabic, Farsi and similar scripts, which is not accurate.

- 58 -

4.3.2. Speed Normalization

Digital devices sample in equal periods of time rather than equal distances, therefore,

strokes which are written slower than other include more samples than others. Therefore,

the data point sequence needed to be re-sampled. Upon finishing Step 2, (finding the

skeleton points) each skeleton vector with length l is divided to [l/f]+2 points, where f is

a predefined fragment length, and the additional 2 for the original skeleton points.

Figure 4-5: Illustration of preprocessing steps applied on the handwritten word (jAmEp) ج��‘university’

(a) (b)

(c) (d)

- 59 -

Figure 4-6: Illustration of preprocessing steps applied on the handwritten word �� (vqAfp) ‘education’

All preprocessing steps are illustrated in Figure 4-5 and Figure 4-6. In both figures,

Figure (a) is the original point sequences acquired by a digital tablet. Figure (b) is created

by applying the smoothing step described in Section 4.1.1. Figure (c) is the result of

applying Douglas&Peucker's algorithm in point sequences in (b) (skeleton point

sequence). Finally, Figure (d) is the result of re-sampling the skeleton vectors in (c). All

of these figures are produced by our system.

(c) (d)

(a) (b)

- 60 -

Chapter 5

Feature Extraction Upon completion of the preprocessing phase, three features are extracted from each

preprocessed point sequence Ps = [p1, p2, …, pn]: local, semi-local, and global features.

These features are described in the following three sections respectively. In Section 5.4, a

feature vector is constructed for each data point from Ps. Section 5.5 describes how an

observation sequence is constructed from the feature vectors. Finally, Section 5.6

describes how dots and delayed strokes are incorporated in the observation sequence.

5.1. Local Feature

One local feature is adopted in our work which is extracted from the point sequence,

it is the angle between each vector (v = pi-1pi), where i > 1, and the X-axis – it is denoted by

α-angle. Together with the assumption that the preprocessed points are equi-distanced

along the trajectory makes the α-angle feature influential, since the preprocessed point

sequence can be reproduced unequivocally. In other words, there is no data loss.

Figure 5-1 illustrates the α-angle between v and the x-axis.

- 61 -

Figure 5-1: Illustration of αi-angle (the angle between the vector (p(p(p(piiii----1111ppppiiii)))) and the X-axis)

The anglebetween and -axis, where 1

The anglebetween and -axis, where 1i i

p p X i

ip p X i N

5.2. Semi-Local Features

The local feature described in the previous section represents the information of each

segment individually. It does not provide information concern its environment, such as its

neighboring segments. Several previous work, such as [6] and [12] used the difference

between writing angles of a data point and its preceding data point (the first angle is set to

0). One could use the angle between segment i and segment i+2 in the preprocessed point

sequence. These features provide environmental information; however, they do not

provide additional information in the case of existing k > 2 points that are located almost

in same line, which is the usual case in handwriting. This leads to introduce a new type of

feature: semi-local feature that provides wider geometric information enough to

determine its segment group. This is the only feature, in this work, that provides the

description of connectivity information between a set of adjacent points. According to

our experience adding this environmental information to the feature vector gives better

results than its absence.

αi pi-1

- 62 -

The semi-local feature is computed by applying Douglas&Peucker's algorithm on the

preprocessed point sequences to find the skeleton points with tolerance τ2 > τ1 ,where τ1

is the tolerance utilized in the preprocessing phase (in Section 4.2). τ2 has been

determined empirically (as shown in Figure 5-2 and Figure 5-3). The semi-local feature is

the angle of each skeleton vector and the X-axis, it is denoted as β-angle, as illustrated in

Figure 5-4.

Figure 5-2: Illustration of running Douglas&Peucker's algorithm. Figure (a) is a handwritten word �� (mElm) ‘teacher’, red points in Figure (b) are the skeleton points returned by Douglas&Peucker's algorithm applied on the point sequence in Figure (a).

Figure 5-3: Illustration of the skeleton points and vectors, the red points are the skeleton points, the red segments show the skeleton vectors.

(a) (b)

- 63 -

Figure 5-4: Illustration of the β-angle (the angle between a skeleton vector and X-axis)

Let’s assume that the skeleton points of Ps is Qs = [q1, q2,…, qM], Qs ⊆ Ps. Without

losing generality, we can assume that the points are different, otherwise we index them.

βi is formally defined as follows:

The angle betwe

The angle between and the X-axis, if and ,where 1 ,

, , is maximum and is minimum

( is a point between the two skeleton points , )

j j k j l j

iq q p q p q k i l M

q q Qs k l

∃ = = ≤ < ≤ ≤∈

1en and the X-axis, if is a skeleton point, ,j j i i jq q p p q Qs+ = ∈

5.3. Global Features

Global features or high-level features provide global information of the geometric

shape of words, word-parts, or letters. Three common global features used in previous

work in handwriting recognition are loops, cusps and crossing [19], [22] and [40] (as

shown in Figure 5-5). In this work, we have adopted the loop feature ones. Arabic letter

shapes are classified into three categories regarding the letter-shape loop containing:

• Category I contains Arabic-letter shapes that include loops as an integral part (see

Table 5-1).

- 64 -

• Category II includes letter shapes that can be written with and without loops, depends

on writing style (see Table 5-2).

• Category III includes the remaining letter shapes that don’t contain loops.

This category division leads one to think of an immediate global feature that

immediately reduces the search space. In this work the recognizer will return a low

probability when trying to match a handwritten shape that does not include a loop to a

letter shape that includes a loop (described in Chapter 6).

Loops Crossings Cusps

Figure 5-5: Illustration of the three high-level features, courtesy: [19]

kـ ط ـj ـiـ hـ ض ـg ـfـ eـ ص

ـu ـ�ـ ـs ـ2ـ ـo ـnـ mـ ظ ـl ـ�ـ

F ة ـ� ـ` ـ� ـ}ـ ـr ـqـ ـ, ـ ـ

Table 5-1: Category I – 31 letter shapes that can not be written without loops

ـ� ؤ و ـ$ـ م �ـ ق tـ ف �ـ

خ [ـ ح �ـ ج جـ ـ�

Table 5-2: Category II – 17 letter shapes that it is not clear if they are written with loops or not

- 65 -

Figure 5-6: Illustration of loops (in red) in an Arabic word

Figure 5-6 is a preprocessed point sequence of the handwritten word: �� . It consists

of four letters, three of which (first, second and fourth) include loops. The loops are

drawn in red dots, the first letter shape belongs to category-II, the second, and fourth

letter shape belong to category-I.

5.3.1. Loop Determination

Due to variant writing speed and style, some little loops may be added unintentionally

to letter shapes that should not include loops. This usually happens when trying to rewrite

part of a word or when pen direction is changed. Often, these accidental loops are

elliptical in shape. Therefore, they are removed by checking if the ratio of the loop’s area

to its diameter is less than a predefined threshold t. Figure 5-7 illustrates a very common

example when the loop is added. In Figure (a), a preprocessed point sequence represents

the letter shape zـ which belongs to category-III. The red line in Figure (b) is the diameter

of the loop. The red area in Figure (c) is the area of the accidental loop.

- 66 -

0. if (area/diameter < t) {

1. Loop considered as not a real loop.

2. } else {

3. Loop considered as a real loop.

Figure 5-7: Illustration of an accidental loop in a letter that should not include a loop

5.4. Feature Vector Construction

After computing the three features described above a three dimensional feature vector

fi is constructed for each data point pi ∈ Ps, where 1<i ≤N, as shown in Figure 5-8.

fi = (αi , βi, is-loopi), where:

• αi = α-angle of point pi

• βi = β-angle of point pi

• is-loopi = 1 if pi is in a loop, otherwise it’s 0.

(a) (b) (c)

Diameter Area

- 67 -

Figure 5-8: Illustration of how ffffiiii is extracted from each point pi

5.5. Feature Vector to Observation Code

A Discrete Hidden Markov Model (HMM) is used for the training/recognition task,

which is discussed in Chapter 6. The input to this type of model is a sequence of discrete

values – observation sequence (see Chapter 3 for more details). Thus, a discretization

process is required to convert the feature vector sequence, extracted from a given shape,

to a discrete observation sequence. The three dimensional features vector domain is

discretized to 260 integer values, 256 for the (α, β, is-loop) described above and other 4

integer values are used to represent delayed strokes (described in the next section).

Feature vector f = (α, β, is-loop) is discretized as follows:

• α is a real angle value between (1…360); this angle is discretized to 16 directions

similar to [21], as shown in Figure 5-9.(a). For example:

o α = 82.2o => α-code = 4

o α = 320o => α-code = 14

• β is a real angle value between (1…360): This angle is discretized to 8 directions,

as shown in Figure 5-9.(b). For example:

o β = 82.2o => β-code = 2

fi = (αi , βi, is-loopi)

- 68 -

o β = 320o => β -code = 7

• is-loop = 1/0 (one bit)

Figure 5-9: Figure (a) is a division of angle’s space to 16 directions. Figure (b) is a division of angle’s space to 8 directions.

α-code is coded by 4 bits, β-code is coded by 3 bits and is-loop in one bit, thus, the

coded feature will be as the following form, each cell represents a bit.

α-code β-code is-loop

Example 5-1:

If we use the same shape in Figure 5-1, α7 = 247o, β7 = 240.5o and p7 is not in a loop.

=> α7-code = 11, β7-code = 5 and loop = 0.

1 0 1 1 1 0 1 0 α-code = 11 β-code = 5 is-loop

The discretized feature vector of f7 is o7 = 135

(a) (b)

- 69 -

Figure 5-10: Illustration of how observation i i i i is computed

5.6. Delayed Strokes

We have seen in Section 1.4.3 the reason why dots/delayed strokes are very important

in handwriting recognition for Arabic script. Three methods were proposed in previous

work to solve this problem for Latin script. First, dots/delayed-strokes are stripped off in

the preprocessing phase (before training/recognition process) [18]. In the second method,

the end of a word is connected to the dots/delayed strokes with a special connecting

stroke. This stroke indicates that the pen is raised [6], [12] (as illustrated in Figure 2-10).

In the last method, delayed strokes are treated as special characters in the alphabet. A

word with delayed strokes is given alternative spelling to accommodate different

sequences with delayed strokes drawn in different orders [19]. These three methods are

not adequate for the task of recognizing Arabic script. The first method could not be

employed, since the information that makes letters different from others is the number

and position where the dots are located. Eliminating delayed strokes will cause a huge

ambiguity, particularly, when the letter’s body is not written clearly. Furthermore, some

Arabic letters have a similar shape of composition with some letters, such as: the letter (s)

��ـ has a similar shape to the three letters (b + t + y) �ـ )without dots(. The second and

third methods also could not be employed, since Arabic words may contain many dots.

o7 = [α7-code, β7-code, 0]

- 70 -

These methods will dramatically increase the hypothesis space, since words should be

represented in all its permutations. For example: The word �� (Hqyqyp) ‘real’ contains

10 dots, thus, 10! representations are required.

To solve the delayed strokes problem in Arabic, a novel method has been introduced

called delayed-stoke projection. Delayed strokes are written after the word-part body.

Thus, the first written point sequence is the word-part body. Then every stroke that

satisfies one of the following four conditions considered delayed stroke:

a. If the whole stroke is above or under the word-part body (e.g. Figure 5-11.a).

b. If the stroke is written after the minX (left) of the word-part body and it is a dot

(e.g. Figure 5-11.b).

c. If the stroke is written before the maxX (right) of the word-part body (e.g.

Figure 5-11.c).

d. If the stroke intersects with the word word-part body (e.g. Figure 5-11.d).

Figure 5-11: Figure (a): five delayed strokes for word-part body 1. Figure (b): two delayed strokes for word-part body 3. Figure (c): three delayed strokes for word-part body 1. Figure (d): one delayed stroke for word-part body 1.

5.6.1. Delayed-Stroke Projection

After determining the delayed strokes, the following procedure is applied:

� The first point of the delayed stroke (denoted q), is vertically projected to the

body, the projected point on the body denoted as p.

(a) (b) (c) (d)

- 71 -

� A virtual vector is connected from p to q (as shown in Figure 5-12.b).

� An additional virtual vector is connected from the last point of the delayed stroke

to p (as shown in Figure 5-12.b).

� These two virtual vectors are discretized to a predefined number of virtual points

(as shown in Figure 5-12.c).

Figure 5-12: Illustration of the delayed-stroke projection, black points are the letter body, blue points are the delayed stroke, red points are virtual points.

Four special observation codes to represent the virtual points are added to the 256

observation codes (described in Section 5.5.):

� o = 256 represents a virtual point that belongs to a virtual vector in the direction

up from the body to the delayed stroke, as shown in Figure 5-12.(b) – a bottom-up

vector.

down from the delayed stroke to the body, as shown in Figure 5-12.(b) – a top-

down vector.

down from the body to the delayed stroke.

(a) (b) (c)

- 72 -

up from the delayed stroke to the body.

Figure 5-13: Illustration of point sequence with delayed stokes to observation sequence

Figure 5-13 illustrates the flow of incorporating the delayed strokes into the

observation sequence, Figure (a) is the preprocessed point sequence represents the letter

Figure (b) illustrates how the virtual vectors are projected from the body to the dot and .ن

from the dot to the body. Figure (c) illustrates how the virtual vectors are discretized to

virtual points (in red) – note that in this example, each red point represents two points,

one for bottom-up vector, and other for top-down vector. Finally, Figure (d) illustrates

how the whole observation sequence for this letter is built, o11 to o14 with observation-

code = 256, and o16 to o19 with observation code = 257.

Every word part defines one observation sequence computed from a feature-vector

sequence. For example, Figure 5-14 defines five observation sequences, and Figure 5-15

defines four observation sequences.

(b) (a)

(c) (d)

- 73 -

Figure 5-14: Delayed-strokes projection in the handwritten word: اق� ’(AlAst$rAq) ‘orientalism ا!� �

Figure 5-15: Delayed-strokes projection in the handwritten word: ا!%$#�ع (AlAnTbAE) ‘the impression’

In both Figure 5-14 and Figure 5-15: Figure (a) illustrates the original word which has

been captured by a digital tablet in our system; Figure (b) is the preprocessed point

sequences of the word in Figure (a) – red points in Figure (b) are the added virtual points.

We have seen in this chapter that the feature vector extracted from each data point is

discretized to one of [1…260] observations. The obligation of selecting such sharp

discretization is the lack of training samples for online Arabic handwriting systems.

Using a large domain of observations requires many samples to train the Hidden Markov

Model utilized in this work which is described in the next chapter.

(a) (b)

- 74 -

Chapter 6

The Recognition Framework Our recognition framework uses discrete Hidden Markov Models (HMMs) to

represent letters which are embedded in a grammar network that represents the word-part

dictionary. The segmentation and recognition of handwritten word parts are performed

simultaneously in an integrated process similar to [6], [12], [18] and [19]. In this chapter

we shall describe the recognition algorithms and the framework models/architecture

simultaneously. The first section describes the recognition framework models followed

by two word-recognition algorithms, an optimized grammar network, and the dictionary

database structure. Finally, we discuss the writing style support and model training.

6.1. Framework Models

The next three sections describe the basic models utilized in this work: letter models,

word-part models and word models.

6.1.1. Letter Models

Left-to-right HMM without state skipping (aij = 0 for j ≠ i+1 and j ≠ i; πi = 0 for i > 1),

as shown in Figure 6-1, has been adopted to model each letter. We have selected this

simple topology because it has been successfully used in speech recognition, and there is

no sufficient evidence that more complex topologies would necessarily lead to better

recognition performance [19]. Furthermore, using this topology can effectively model the

time-dependent property in the handwriting observation sequence. The number of states

- 75 -

in a letter model is selected automatically based on the training set for this letter. More

details about this selection are described in Section 6.7.

Figure 6-1: left-to-right HMM

Each Arabic letter has two or four shapes depending on its positions in the word (see

Table 1-2). We have chosen to treat these letter shapes as different letters. For example,

associated with the letter (h) are four letter models for F, ـ$ـ ,هـ, and ـ� corresponding to

h’s isolated, initial, medial, and final shape respectively.

6.1.2. Word-Part Models

An Arabic word may contain several word parts (see Section 1.4.2 for more details).

Each word part is written with a single continuous stroke for its whole body followed by

number of delayed strokes. Thus, word parts (in a word) form a lower bound on the

number of pen lifts.

A model for a word part consisting of the letters L1, L2, …, Ln is built by concatenating

the letter-model Li (MLi) with the letter-model Li+1 (for 1 ≤ i <n) by linking the last node

of MLi to the first node of MLi+1 with a null transitions (as shown in Figure 6-2).

Figure 6-2: Word-part model consists of nnnn letter models

ML1 1 1 1

1 1 1 1 1 1

aN,N a3,3 a2,2 a1,1

a1,2 a2,3 a3,4 1 2 3 N

- 76 -

6.1.3. Word Models

To build a word model for a given word of k word parts, we simply build k word-part

models, one for each word part, as shown in Figure 6-3.

Figure 6-3: Illustration of word model for the word: (jAmEap) ‘university’. Word-part model ج��for and word-part model for (ـ� and جـ :two letters) ج� �� (three letters: ـ�ـ ,�ـ and .(ـ

6.2. Word and Word-Part Dictionaries

The dictionary is the list of words representing the domain of the search used in the

recognition task. The dictionary of Arabic words D could be divided into sub-

dictionaries, D = {D1, D2, …, Dn}, where, Di is a dictionary of all the words that consist

of i word parts.

Di = {w ∈D | w consists of i word parts}

Example 6-1: Consider the following word-dictionary:

D = {�� }`د , �}V , روا"� , � , � ا �aن , �� , هz , ج�� {ال�Vي ,

D is divided into the following sub-dictionaries:

=> D1 = {�� , V{� , zه}

D2 = {د`{� , � {�� , ج��

D3 = {ن�a ي , اVال�}

D4 = { �روا" }

ـ�1 1 1

جـ1 1 1

ـ�1 1 1

ـ ـ1 1 1

- 77 -

We refer to the word-part dictionary WPDi, j as the list of word-parts placed in index j

(starting from right) of the words in Di

Example 6-2:

D3 = {ب ، ر �ن ، �ري ، ��رق ، ��وق� is a {��دي ، �دي ، ��رس ، ��دي ، !��ن ، "��ن ، "��رن ، "��ون ، ر

word dictionary consists of words with three word parts.

=> WPD3, 1 = {ر ،��" ، ��! ، �� ، � ، ��}

WPD3, 2 = {� ،و ، � {د ، ر،

WPD3, 3 = {ي ، س ، ن ، ق}

6.3. Arabic-Word Recognizer

Viterbi algorithm does not only give the optimal path corresponding to a given

observation sequence, it also gives an accumulated likelihood score and state

segmentation along this optimal path (see Chapter 3).

We have seen in Chapter 5 how to compute the observation sequences Os = [O1, O2, …,

Ok] from a given handwritten Arabic word, where Oi = [oi,1, oi,2,…,oi,m] is an observation

sequence calculated from the handwritten word-part i. Based on the word model

described in Section 6.1.3 and the Viterbi algorithm, we introduce two algorithms to

recognize an Arabic word given its observation sequences (Os), and a word-dictionary D.

The first algorithm is described in the next section; the second algorithm is an optimized

version of the first described in Section 6.3.2.

- 78 -

6.3.1. Word Recognizer - Algorithm I

Algorithm 6.1 (algorithm I) accepts Os = [O1, O2, …, Ok] (k observation sequences)

such that Oi corresponds to word-part i, and searches for the word in Dk that maximizes

the probability of observing Os.

Algorithm 6. 1:

0. WordRecognizer-I (Os=[O1, O2, …, Ok], D) : Word {

1. max_prob = 0;

2. max_word = null;

3. for each w ∈ Dk {

4. p = 1;

5. for (i = 1 to k) {

6. M = buildWordPartModel(w.getWordPartText(i));

7. p = p * M.viterbi (Oi);

9. if (p > max_prob) {

10. max_prob = p;

11. max_word = w;

14. return max_word;

� buildWordPartModel(word-part) is a method that returns the word-part model of

word-part as described in Section 6.1.2.

� M.viterbi(O): Viterbi algorithm that returns the maximum probability of observing

an observation-sequence O given the model M.

- 79 -

Algorithm I is not only simple to implement, but it also does not require large

amounts of memory, since the word model for the iterated word is always created

dynamically and there is no need to keep it in memory. Note that updating the word

dictionary is an independent of the algorithm and its associated data structures.

The main limitation of algorithm I is the computational redundant. Word parts are

shared by different words, therefore, they should not be computed more than once (using

Viterbi algorithm – line 7). For example: in the Arabic words �ر�V� (mdrsp) ‘school’ and

,are the same in both words ر V� (mdrb) ‘trainer’, the first word part V� and the secondرب

and in same position in the two words, index = 0 and index = 1 respectively. This

redundancy also extends to shared prefixes and suffixes of word parts that are not taken

into account, thus they are recomputed. For example: the Arabic words V{� (mHmd)

‘Mohammad’ and V{�� (mjmd) ‘frozen’ have a shared prefix ـ� and also a shared suffix

6.3.2. Word Recognizer - Algorithm II

We can overcome the limitation of algorithm I by caching the computed Viterbi

word-part probabilities in a map f: key � value, where,

key = <word-part text, the index of the word-part in the word (index = i) >

value = the maximum probability of observing Oi, given the model M

(Oi is the observation sequence computed from the handwritten word-part i)

This word-part probability map may be implemented by a hash table or balanced binary

search tree. For example: consider a sub-dictionary containing three words D3 = {�ر�V�

, are computed, and ر In processing the first word, the word parts V� and .{ر�� , �Vرب

they will not be recomputed for the second word. The word part �� is computed in the

- 80 -

first word, and it will not be computed for the third word. By applying this optimization

on algorithm I, we obtain algorithm 6.2 (algorithm II) described next.

Algorithm 6. 2:

0. WordRecognizer-II (Os=[O1, O2, …, Ok], D) : Word

1. max_word_prob = 0;

2. f[] = new Array[k] of Map;

4. p = 1;

5. for (i = 1 to k) {

6. word_part_text = w.getWordPartText(i);

7. if (f[i].find(word_part_text) == EXIST) {

8. word_part_prob = f[i].get(word_part_text);

9. } else {

10. M = buildWordPartModel(word_part_text);

11. word_part_prob = M.viterbi(Oi);

12. f[i].put(word_part_text, word_part_prob);

14. p = p * word_part_prob;

15. if (p < max_word_prob)

16. break;

18. if (p > max_word_prob) {

19. max_word_prob = p;

20. max_word = w;

To further optimize algorithm II, we added lines 15, and 16 to terminate the loop (in

line 5) when the probability is not relevant any more (i.e. less than max_word_prob).

- 81 -

6.4. Optimized Grammar Network

Algorithm II does not address the second limitation of algorithm I, namely that it

recomputes uncached shared prefixes and suffixes of word parts. This section describes a

new model network developed to overcome this limitation.

6.4.1. Optimized Word-Part Network

The left-to-right HMM letter models (described in Section 6.1) are embedded in a

grammar network which represents the word-part dictionary. This network has been

optimized such that all shared suffixes are grouped.

Figure 6-4: Optimized-grammar network implements a word-part dictionary includes k word parts, with grouping all shared suffixes and replacing each node by its corresponding letter model

Figure 6-4 gives the simplified diagram of the grammar network representing a word-

part dictionary. In this diagram, each node represents a letter shape, and each path from

. . . .

Start node

a1,2 a2,3 a3,4

aN,N a3,3 a2,2 a1,1

1 2 3 N Ø

: . . . .

- 82 -

the start node to a leaf corresponds to a unique word part. Each leaf contains the word-

part text WPi (where, 1 ≤ i ≤ k) to represent the path from the start node to this leaf. Since

this network is a tree, for each leaf there is exactly one path that starts from the root (start

node) and ends with this leaf. We shall refer to this network as a word-part network

(WPN). WPN* is referred to a WPN with replacing each letter node in WPN with its

corresponding letter model (described in Section 6.1).

The WPN for a given word-part dictionary is simply built by algorithm 6.3, and

illustrated in Figure 6-5.

Algorithm 6. 3

0. wordPartDictionaryToWPN(WPD) : WPN {

1. root = new Node(“start-node”);

2. for each wp in WPD {

3. addWordPart(wp, root);

5. return root;

where,

0. addWordPart(wp = [L1, L2, …, Ln], root) {

1. curr_node = root;

2. for (i=n down to 1) {

3. if (curr_node contains a childe C with letter Li) {

4. curr_node = C;

5. } else {

6. curr_node = new Node(Li);

7. curr_node.addChilde(curr_node)

10. curr_node.setWordPartText(description(wp));

- 83 -

Figure 6-5: Illustration of running wordPartDictionaryToWPN({��, ��&, �ر ,'� })

Either word-part suffix or prefix could be used to build the word-part network.

However, the decision of using the suffix rather than the prefix of word parts depends on

the fact that Arabic words always (except the last word part) end with one of the six

disconnictive letters (see Section 1.4.2). This fact guarantees that at least one letter is

shared in each word part, which leads to a reduction in the size of the WPN.

6.4.2. Word Network

The word network of a sub-dictionary Dk. (denoted by WNk) is an array of word-part

networks, where WNk [j] = the word-part network (WPN) of WPDk, j, constructed by the

wordPartDictionaryToWPN method described in previous section, for 1≤ j ≤ k, (as

shown in Figure 6-6). We shall refer to WNk*[j] as the WPN* of WPDk, j (WNk*[j] = WPN*

of WPDk, j), for 1≤ j ≤ k.

ـ�ـ

- 84 -

Figure 6-6: Illustration of the word network of the sub-dictionaryDDDD3 3 3 3 described in example 6-2

6.4.3. Word-Dictionary Database Architecture

The word-dictionary database (WDB) represents the data structure utilized in the

search procedure performed in the recognition task. WDB is represented by an array of

word networks (WN), where WDB[i] = WNi (1≤ i ≤ k), k is the number of word parts of the

words that have the maximum number of word parts in the whole word dictionary.

Figure 6-7 shows the word database (WDB) structure.

WN3 [3]

WN3 [2]

ـ�ـ

WN3 [1]

- 85 -

Figure 6-7: The Word-dictionary architecture

6.5. Optimized Recognizer

In this section, we introduce a new Arabic word recognizer using the database

architecture described in the previous section and Viterbi algorithm. Given an

observation sequences Os = [O1, O2, …, Ok] of a handwritten word, where Oi = [oi,1,

oi,2,…,oi,Ti] is an observation sequence of the handwritten word-part i. The recognition task

is to find the word W = [wp1, wp2, …, wpk] (wpi is word-part i in W) in a given sub-

dictionary Dk which maximizes the following probability:

ii Owp1

)|(P Os)|P(W

where,

P(wpi |Oi) = P(Oi|wpi)P(wpi)/P(Oi),

Since P(Oi) is the same for all word parts and assuming that all word parts in the

dictionary occur with equal probability, the problem is reduced to maximizing P(Oi|wpi),

ki <≤∀1 which can be computed efficiently by Viterbi algorithm given a WNk [i]. The

. . . WDB:

WN1 WN2 WNk WN3 WNi . . .

....... .

.......

WPNi [i] WPNi [1]

- 86 -

Viterbi algorithm is used to compute the probability of all paths for a given word-part

network and an observation-sequence O. Viterbi algorithm computes δt(S) which refers to

the best score (highest probability) along a single path at time t, which accounts for the

first t observations and ends in states S (see Chapter 3 for more details). Particularly, we

are only interested in the accumulated likelihood scores of the leaf states at time Ti in

WNk* [i]. Therefore,

( | ) ( ), i Ti

P O wp qδ= where q is a leaf state in WPNi* and q.wordPartText = wp

We select the word part (wp) in WPDi that maximizes ( | )i

P O wp .

6.5.1. Word Recognizer - Algorithm III

Algorithm 6.4 accepts an observation sequence Oi = [o1,i, o2,i,…,oTi,i] generated from a

handwritten word-part i, and a WPN* as an input, and returns a word-part probability

g: key � value, where,

key = <the word-part text of a leaf S in WPN*>

value = δTi(S)

The word-part probability map (g) contains a key for each leaf in WPN*. The word-part

text of the leaf state S can be accessed by the method: S.getWordPartText().

Using algorithm 6.4, an optimized Arabic word recognizer is described in algorithm

6.5 (algorithm III).

- 87 -

Algorithm 6. 4 0. computeWPProbabilityMap(Oi = [o1,i, o2,i,…,oT,i], WPN*) : Map {

1. Map wp_to_probability_map;

2. δ = WPN*.viterbi(Oi);

3. for each leaf S in WPN* {

4. p = δT(S)

5. wp_to_probability_map.put(S.getWordPartText(), p);

7. return wp_to_probability_map;

Algorithm 6. 5

0. WordRecognizer-III (Os = [O1, O2, …, Ok], WDB) : Word {

1. WN* = WDB[k];

2. for (i = 1 to i≤ k) {

3. wp_to_probability_maps[i] =computeWPProbabilityMap(Oi ,WN*[i]);

5. max_prob = 0;

7. p = 1;

8. for (i=1 to k) {

9. wp_prob = wp_to_probability_maps[i].get(w.part(i));

10. p = p * wp_prob;

12. if (p > max_prob) {

13. max_prob = p;

14. max_word = w;

- 88 -

For each observation sequence Oi (where, 1 ≤ i ≤ k) the word-part probability map is

built in lines 2-4. These maps are stored in the array wp_to_probability_maps. Lines 6-

15 search for the word that maximizes Os in the sub-dictionary Dk.

6.6. Support for Writing Style

Supporting letter styles is a significant requirement for writer-independent systems

(see Section 1.2). This section describes how by modifying the WPN a writing style

support is achieved.

Writing styles of letters are called letter classes, where each letter class refers to a

letter with a specific writing style, as illustrated in Figure 6-8. The decision of the number

of classes assigned for each letter is discussed in the training process in Section 6.7.

(a) (b) (c) (d)

Figure 6-8: Illustration of four letter classes of the medial letter shape ـ(ـ (h)

In this work, writing styles are supported by assigning a letter model (described in

Section 6.1.1) for every letter class, denoted as letter-class model, similar to [18], [19]

and [21]. For example, four letter-class models are assigned for the four letter classes in

Figure 6-8. All letter-class models for the same letter are grouped in one model that has

one “input” and one “output”, called multiple-letter-class model. The input is a null

transition that connects the first states of all the letter-class models in this group.

- 89 -

Similarly, the output is a null transition that connects all the last states in this group, as

Figure 6-9: Multiple-letter-class model contains n n n n letter-class models

The multiple-letter-class model is embedded in the word-part network (WPN). Instead

of replacing each letter node by its corresponding letter model (as shown in Figure 6-4),

now each letter node in WPN is replaced by its corresponding multiple-letter-class model,

as shown in Figure 6-10. Upon applying this change on the WPN, we use algorithm 6.5

without any modification to recognize a handwritten word written in different styles.

Figure 6-10: The word-part network with replacing each node by a multiple-letter-class model

- 90 -

6.7. Model Training

The goal of the training process is to train the HMM parameters, λ = (A, B, π) for each

letter class. The Baum-Welch training algorithm is used for this task (for details about

Baum-Welch algorithm see Section 3.3.3). To train the models, we initialize the model

parameters as follows:

The initial states distribution π = {πi} is initialized to:

• π1 = 1.0 and πi = 0 for 1 < i < N (where N is the number of states in the model)

The transition probability matrix A = {ai,j} is initialized to:

• ai,i = 0.5 and ai,i+1 = 0.5 for i < N

• ai,j = 0.0 (where, i ≠ j and i ≠ j+1 for i < N)

• aN, N = 1.0

Good initial estimates for the observation probability matrix B = {bi(oj)} are helpful for

the training task [14]. Since the fact that some global geometric information of letters is

known before running the training process, some possible observed symbol oj are

estimated by assigning bi(oj) = 0.0, and the remaining observed symbols are initialized to

reflect a uniform distribution. Global geometric information that can be extracted from a

letter includes:

• Letter without dots leads to a direct elimination of observations oj corresponding

to dots in its letter model – bi(oj) = 0.

• Loop free letters.

- 91 -

6.7.1. Training Problems

Two difficult problems are needed to be solved for the training process. First, the

system must automatically determine how many styles there are for each letter or, in

other words, how many letter-class models are required to represent the training letter

samples. The second problem is how to determine the number of states required for each

letter-class model. Both of these problems are explored as follows:

• Determining The Number of Letter Classes

The training samples for a specific letter are written in different styles by different

users. The α-angles of each sample is discretesized to 16 directions (see Section 5.5),

denoted by direction codes. To determine the number of letter classes for a specific letter

we first cluster the samples that have similar diction codes. Afterwards, we use the

number of consequent clusters as the number of classes needed to represent this letter

model. In addition, the clustering technique determines which sample will be trained for

which class model (we will discuss it in Section 6.7.2). The agglomerative (bottom-up)

clustering method based on [21] is used for this task.

To use this type of clustering algorithm, a distance function between two samples

should be determined first. The distance function has been chosen by the minimal edit

distance between the two direction codes, described as follows:

Given two samples A and B corresponding to the same letter:

A = (a1, a2, …, aN), where ai is a direction codes of A.

B = (b1, b2, …, bM), where bi is a direction codes of B.

The distance between A and B is defined as follows based on [21]:

D(A,B) = D(aN, bM) where,

- 92 -

D(ai, bj) = min {D(ai-1, bj-1) + δ(ai, bj) × SP,

D(ai-1, bj) + DP,

D(ai, bj-1) + IP }

where, SP, DP , IP is a substitution, deletion and an insertion penalty respectively.

and, δ(ai, bj) = 0, if d(ai, bi) < X

δ(ai, bj) = 1, otherwise

where d(ai, bi) is the directional code difference between direction code ai and bi. X is a

predefined threshold chosen empirically.

• Number of HMM States

The number of HMM states for a letter-class model is an important parameter to

improve the recognition performance. Previous researchers have shown that assigning

more states to letter models with more complicated shape than those with simpler shape

leads to better recognition results than assigning the same number to all letter

models [19]. The HMM state number in [19] has been selected empirically. To automate

this selection, algorithm 6.6 is introduced to test all the possible instances of the number

of states.

According to our observation, the number of states varies from 3 to 10 states. Our

system assigned 10 states to the initial letter shape of (h) هـ, and 3 states to the isolated

letter shape of (A) ا. The letter model training algorithm 6.6 is described as follows:

- 93 -

Algorithm 6.6

Input: the training observation sequences of a letter class LC.

Output: optimal trained model represents LC.

0. ComputeOptimalModel(train_obs_seqs, LC) {

1. optimal_model = null;

2. max_prob = 0;

3. for n = 3 to 10 {

4. M = build a letter model consists of n states;

5. M.initialize(LC); //initialize the model as described above

6. M.trainModel(train-obs-seqs);

7. p = M.computeLikelihoodMean(train_obs_seqs);

8. if (p >= max_prob) {

9. optimal_model = M;

10. max_prob = p;

13. return optimal_model;

Algorithm 6.6 constructs a letter model with n ∈{3, 4, …, 10} states, and trains it

using its corresponding training samples. This algorithm computes the recognition

likelihood mean of all the same training samples using Viterbi algorithm given the

constructed model. This process will be repeated for all possible number of states n ∈{3,

4,…, 10}. The model that gives the maximum likelihood mean will be selected to be the

model to represent the letter class.

6.7.2. The Training Process

The training process is divided into four stages, as described in [19], supervised

training data collection, the training of letter-class models, semi-unsupervised training

- 94 -

data collection, and the Final training of the letter-class models, discussed respectively

as follows:

I. Supervised training data collection: In this stage the trainer (writer) is asked

to handwrite (using a digital tablet) a list of predetermined words. Then, he/she is asked

to split each word manually to its compound letters, such that all the delayed strokes are

in between the two splitters. The system helps the writer which two letters should be

split, as shown in 6.11. The words are split to letter samples and every sample is tested to

determine if it satisfies predetermined letter rules (e.g. number of dots put above or under

the letter body) in order to avoid improper samples. The letter and the sample are stored

in the trained-sample database.

Figure 6-11: A screen shot of the trainer system; the green vertical lines are the split lines. The dashed lines are automatically given by the system to avid word-part overlapping.

II. The training of letter-class models: After collecting the training samples from

stage-I, algorithm 6.7 is performed to obtain all the trained models for each letter class.

- 95 -

III. Semi-unsupervised training data collection: The collection of the training

data in stage-I is a time consuming process, particularly in the presence of a huge amount

of training data set. Since the trainer is asked to “split” each word manually. Therefore,

another stage is introduced to accelerate this process, which is based mainly on the output

models of stage-II for estimating automatically the split points between each two letters

of a handwritten word, given the word text. For each word part the multiple-letter-class

models (described in Section 6.7) corresponds to each letter are connected in one chain

model, such that the multiple-letter-class output of letter i is linked to the input of the

multiple-letter-class model i+1, as shown in Figure 6-12. Then the Viterbi algorithm is

used to compute the state segmentation along the optimal path in this chain model, as

shown in Figure 6-13. Running Viterbi algorithm on the chain model, given the

observation sequence (O = [o1, o2, …, oT]) of a word part, will observe for each letter Li an

Oi = [oi,1, oi,2,…, oi,k] ⊆O that describes Li the best. In addition, it gives the correspondence

of the optimal state sequence in the chain model to Oi. This state segmentation is used to

segment the letters from a word part, as shown in Figure 6-13 and Figure 6-14. Note that

Algorithm 6. 7

for each letter samples SL {

1. Perform the preprocess phase (Chapter 4) on all the samples in SL.

2. Perform the clustering method described in Section 6.7.1

=> Clusters(L) = {c1, c1,…, ck}

3. Perform the feature extraction phase (Chapter 5)

4. Run algorithm 6.6 on each ci in Clusters(L) to get a trained model for

each letter class.

- 96 -

this stage does not produce new letter classes, it is only used to improve the letter-class

models produced in stage-I.

IV. Final training of the letter-class models: Using the training data processed

by stage-III and the original training data (by stage-II); each letter-model class is

constructed and trained again by running algorithm 6.6 to get new trained models.

Figure 6-12: Multiple-letter-class model chain of mmmm word parts

Figure 6-13: Screen shot of our automatic letter splitting system

Figure 6-13 is a screen shot of our word-trainer system of stage-III, the red text was

written by a user, the green lines are the split lines estimated automatically. The user is

asked to decide if it is an appropriate estimation or not.

C1 1 1 1

L1 L2 Lm

- 97 -

Figure 6-14: Illustration of letter-class model chain

Figure 6-14 illustrates the multiple-letter-class model chain of the word �� . From left

to right: the letter ـ� has three letter-class models, the letter ـ ,has two letter-class models ـ

the letter ـ�ـ has one letter-class model, and the letter ـ� has three letter-class models. The

orange states are the optimal path returned by Viterbi algorithm of the word written in

Figure 6-13.

ـ�ـ ـ� ـ ـ �ـ

- 98 -

Chapter 7

Implementation An online Arabic handwriting recognition system has been implemented during two

years of this research. The code has been written in Java 1.4 technology – Sun

Microprocessing.

The system includes the follows applications:

� Word-sample collection for training: This application has been built to

collect the word samples which are provided to the training process. It utilizes the

supervised training data collection method described in Section 6.7.2. An XML

file is created for each trainer containing his/her personal information and all

word samples he or she has written. For each word sample, the 2D point

sequences are saved along with the word-sample text, and the split points.

� Letter-Sample Tester: Upon collecting the word samples in the previous step,

each letter sample is tested using this application to automatically verify if it

satisfies basic rules. In addition, this application also allows manual testing for

each letter.

� Trainer: Upon testing each sample, the preprocessing (in Chapter 4) and feature-

extraction (in Chapter 5) phases are applied to each letter sample, and then

algorithm 6.6 is used to construct and train the letter models. Each letter-model

object is serialized in a file utilizing Java object serialization.

- 99 -

� Word-sample collection for testing: This application is used to allow users

to record words from a given dictionary. It saves the point sequences for each

word along with its text in an XML file for each user. These samples are used to

report the recognizer’s performance and recognition results.

� Arabic Word Recognizer: The preprocessing (in Chapter 4), feature extraction

(in Chapter 5), and algorithm 6.2 (algorithm–II) are implemented in this

application. The preprocessing and feature extraction phases are applied to each

collected word sample from the previous step, and then running algorithm-II. If

the recognition result matches the associated-word text in a word sample, the

word will be saved in a success-log file, otherwise it will be saved in an error-log

file. The recognizer architecture is shown in Figure 7-3.

� Arabic Word Recognizer with GUI: This application includes the previous

application adding to it a GUI (General User Interface) to enable any user to use

this system, as shown in Figure 7-1 and Figure 7-2.

- 100 -

Figure 7-1: Screen shot 1 of the handwriting recognition system

Figure 7-2: Screen shot 2 of the handwriting recognition system

Figure 7-1 and Figure 7-2 are screen shots of the “Online Arabic Handwriting

Recognition” system, developed in this work. The yellow background is the area that

users write in. the red printed sentences are the heights probability that match the

handwritten words.

- 101 -

Figure 7-3: The global architecture of our system

[[p1,1 p1,2, …, p1,n1], [p2,1 p2,2, …, p2,n2 ],

. . . [pk,1 pk,2, …, pk,nk ]]

Input - point sequences of

a handwritten word:

[[p’1,1 p’1,2, …, p’1,m1], [p’2,1 p’2,2, …, p’2,m2 ],

. . . [p’l,1 p’l,2, …, p’l,ml ]]

Sub-Dictionary Classifier (Chapter 6)

Dictionary DB

Word Recognizer (Chapter 6)

آ�}�

[[o1,1 o1,2, …, o1,m1], [o2,1 o2,2, …, o2,m2 ],

. . . [ol,1 ol,2, …, ol,ml ]]

Sub-Dictionary

Online Arabic Handwriting Recognition System

Preprocessor (Chapter 4)

Features Extractor (Chapter 5)

Delayed-Stroke Processor (Chapter 5)

- 102 -

Chapter 8

Results A publicly available corpus and enormous data samples for training and testing online

handwriting recognition systems exist in English. It has been addressed by UNIPEN, a

project supported by the US National Institute of Standards and Technology (NIST) and

the Linguistic Data Consortium (LDC) [41]. Unfortunately, there is no reference to

similar type of data for Arabic script. Therefore, part of my thesis includes collecting and

organizing data samples for training and testing.

Four users (see Table 8-1) were asked to write 600 predetermined words which were

used in the training process.

User Name Gender Hand Age

User 1 Male Right 24

Table 8-1: The four users who trained our system

User Name Gender Hand Age

User 6 Female Right 31

User 7 Male Left 23

Table 8-2: The users who tested the system, (different from those who did the training)

- 103 -

After running the training phase, the same users who trained the system were asked to

handwrite various number of words (different from those used in the training phase) to

attain writer-dependent results. Other six users (see Table 8-2) were asked to handwrite

predefined words (from the dictionary, different from those used in the training phase) to

obtain writer-independent results.

Algorithm 6.2 returns the word that matches the observation sequences – the word

that gives the highest match probability. This algorithm has been modified to return the

highest probabilities of the first three words that match the observation sequences. The

recognized word with the heights probability is denoted by the 1st option. The 2nd option

is referred to the 2nd highest probability, and the 3rd option refers to 3rd highest

probability.

The reported results in this chapter utilize five different sizes of word dictionaries:

5K, 10K, 20K, 30K, and 40K words selected from Arabic Treebank [42], using twenty

random articles from Al-Arabi Magazine [43], and ten random articles from the news

channel Aljazeera Net [44].

Table 8-3 presents the word recognition results for the four users who trained the

system to attain writer-dependent results (see Table 8-1). These results are given by

testing the system with the five word-dictionary sizes.

- 104 -

Dictionary size

User name

Number of test words

Number of correctly

recognized words

Number of error words

Word recognition rate (%)

User 1 253 245 8 96.84

5K User 2 179 170 9 94.97

User 3 258 252 6 97.67

User 4 249 240 9 96.39

User 1 253 244 9 96.44

10K User 2 179 168 11 93.85

User 3 258 248 10 96.12

User 4 249 238 11 95.58

User 1 253 241 12 95.26

20K User 2 179 159 20 88.83

User 3 258 243 15 94.19

User 4 249 232 17 93.17

User 1 253 238 15 94.07

30K User 2 179 153 26 85.47

User 3 258 238 20 92.25

User 4 249 228 21 91.57

User 1 253 237 16 93.68

40K User 2 179 148 31 82.68

User 3 258 236 22 91.47

User 4 249 227 22 91.16

Table 8-3: The writer-dependent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size

Table 8-4 presents the word recognition results for the six users (see Table 8-2) to

obtain writer-independent results. These results are given by testing the system with the

five word-dictionary sizes.

- 105 -

Dictionary size

User name

Number of correctly

recognized words

Number of error words

Word recognition rate (%)

User 5 279 271 8 97.13

User 6 138 135 3 97.83

5K User 7 241 226 15 93.78

User 8 316 308 8 97.47

User 9 196 187 9 95.41

User 10 249 242 7 97.19

User 5 279 269 10 96.42

User 6 138 133 5 96.38

10K User 7 241 222 19 92.12

User 8 316 300 16 94.94

User 9 196 187 9 95.41

User 10 249 239 10 95.98

User 5 279 259 20 92.83

User 6 138 130 8 94.20

20K User 7 241 215 26 89.21

User 8 316 291 25 92.09

User 9 196 183 13 93.37

User 10 249 233 16 93.57

User 5 279 250 29 89.61

User 6 138 124 14 89.86

30K User 7 241 206 35 85.48

User 8 316 279 37 88.29

User 9 196 182 14 92.86

User 10 249 229 20 91.97

User 5 279 246 33 88.17

User 6 138 121 17 87.68

40K User 7 241 204 37 84.65

User 8 316 270 46 85.44

User 9 196 179 17 91.33

User 10 249 226 23 90.76

Table 8-4: The writer-independent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size

Figure 8-1 summarizes the average of the word recognition results of all users for

each dictionary.

- 106 -

Recognition Rate

89.7590.84

5K 10K 20K 30K 40K

Dictionary Size

Figure 8-1: The average of the word recognition results of all users in each dictionary.

Writer Dependent (WD) in left columns, writer independent (WI) in right columns

Table 8-5 and Table 8-6 present the recognition results in term of word parts for all

users, tested with the five word dictionaries. The fourth column in these tables represents

the number of word parts that were correctly recognized in the 1st option. The fifth

column is the number of word parts that incorrectly recognized in the 1st option.

- 107 -

Dictionary size

User name

Number of word parts

Number of correctly recognized word

Number of error word parts

Word-part recognition rate (%)

User 1 666 658 8 98.80

User 2 472 462 10 97.88

User 3 714 707 7 99.02

User 4 664 651 13 98.04

5K User 5 733 724 9 98.77

User 6 351 348 3 99.15

User 7 644 627 17 97.36

User 8 806 798 8 99.01

User 9 506 495 11 97.83

User 10 664 656 8 98.80

User 1 666 657 9 98.65

User 2 472 458 14 97.03

User 3 714 702 12 98.32

User 4 664 649 15 97.74

10K User 5 733 719 14 98.09

User 6 351 346 5 98.58

User 7 644 621 23 96.43

User 8 806 786 20 97.52

User 9 506 496 10 98.02

User 10 664 651 13 98.04

User 1 666 654 12 98.20

User 2 472 448 24 94.92

User 3 714 697 17 97.62

User 4 664 642 22 96.69

20K User 5 733 709 24 96.73

User 6 351 341 10 97.15

User 7 644 613 31 95.19

User 8 806 776 30 96.28

User 9 506 491 15 97.04

User 10 664 643 21 96.84

Table 8-5: The recognition of word-part results with 5K, 10K, 20K word-dictionary size of all users

- 108 -

Dictionary size

User name

Number of word parts

Number of correctly recognized word

Number of error word parts

Word-part recognition rate (%)

User 1 666 651 15 97.75

User 2 472 441 31 93.43

User 3 714 691 23 96.78

User 4 664 635 29 95.63

30K User 5 733 699 34 95.36

User 6 351 333 18 94.87

User 7 644 601 43 93.32

User 8 806 761 45 94.42

User 9 506 489 17 96.64

User 10 664 638 26 96.08

User 1 666 650 16 97.60

User 2 472 435 37 92.16

User 3 714 689 25 96.50

User 4 664 634 30 95.48

40K User 5 733 694 39 94.68

User 6 351 330 21 94.02

User 7 644 601 43 93.32

User 8 806 753 53 93.42

User 9 506 483 23 95.45

User 10 664 634 30 95.48

Table 8-6: The recognition of word-part results with 30K and 40K word-dictionary size of all users

Table 8-7 and Table 8-8 present the word results of the three options (1st, 2nd and 3rd)

– the three words that match at most the handwritten word (returned by the recognizer).

These results motivate to include an additional model in our system based on Natural

Language Processing (NLP) (e.g. bi-gram) when trying to recognize an entire sentence to

improve the recognition rate. Since many words were incorrectly recognized in the 1st

option, however, they were correctly recognized in 2nd or the 3rd options.

- 109 -

Dictionary size

User name

Number of correctly

recognized words

Num. of correctly recognized words in the 2nd option

Num. of correctly recognized words in the 3rd option

User 1 253 245 6 1

User 2 179 170 7 0

User 3 258 252 4 0

User 4 249 240 6 1

5K User 5 279 271 3 2

User 6 138 135 1 1

User 7 241 226 12 1

User 8 316 308 5 1

User 9 196 187 4 0

User 10 249 242 5 1

User 1 253 244 7 1

User 2 179 168 7 2

User 3 258 248 8 0

User 4 249 238 6 2

10K User 5 279 269 5 1

User 6 138 133 2 1

User 7 241 222 15 1

User 8 316 300 10 3

User 9 196 187 4 0

User 10 249 239 5 4

User 1 253 241 6 5

User 2 179 159 16 1

User 3 258 243 12 1

User 4 249 232 11 2

20K User 5 279 259 11 2

User 6 138 130 4 2

User 7 241 215 15 5

User 8 316 291 16 3

User 9 196 183 7 1

User 10 249 233 10 4

Table 8-7: The word recognition results of the three options with 5K, 10K and 20K word-dictionary size

- 110 -

Dictionary size

User name

Number of correctly

recognized words

Num. of correctly recognized words in the 2nd option

Num. of correctly recognized words in the 3rd option

User 1 253 238 9 4

User 2 179 153 20 2

User 3 258 238 14 3

User 4 249 228 13 2

30K User 5 279 250 16 5

User 6 138 124 9 1

User 7 241 206 21 6

User 8 316 279 22 6

User 9 196 182 6 2

User 10 249 229 12 3

User 1 253 237 10 0

User 2 179 148 22 3

User 3 258 236 15 3

User 4 249 227 13 1

40K User 5 279 246 14 10

User 6 138 121 11 1

User 7 241 204 19 8

User 8 316 270 28 7

User 9 196 179 7 2

User 10 249 226 13 3

Table 8-8: The word recognition results of the three options with 30K and 40K word-dictionary size

The following tables show the words for each user that has been incorrectly

recognized by our system. In addition, these tables report the Levenshtein edit minimal

distance (EMD) from the correct word to the recognized one for the 1st, 2nd and 3rd

options which provide the quality of letter based recognition. Note if the EMD is 0 in an

option, it means that the word has been correctly recognized in this option. These results

were obtained by running the recognizer with 5K word dictionary.

- 111 -

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

�� 1 �� 0 �� 2

� 0 و�� 1 و�� و�� 2 و�

� �� 3 �� 1 � 0

4 ی�� 0 ی�� 2 ی�� ی��

3 ت�ر 0 ن��ا 2 ن�ا ن��ا

�� !��" 3 �� 0 !�� 2

4 وش�'� 4 وث�! 4 ون�%� وت$#�

Table 8-9: Incorrectly recognized words of “User 1” with 5K word-dictionary size

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

2 ص�ل! 0 %�ل! 1 ��ل! %�ل!

2 و�� 0 س,ف 1 س,�� س,ف

2 اث��ء 0 اث�ر 1 اش�ر اث�ر

2 ��0ة 0 #�0ة 3 ��ة #�0ة

5 ن%�� 0 ل0"�1 1 "�1 ل0"�1

2 ن��ی! 0 ث�ی! 1 �ی! ث�ی!

3 ری�� 4 ال��3! 4 ال��2 ال��

5 ا5�ق 0 ال"� �! 2 ال"�ص! ال"� �!

�ت 4 7س��3ف 2 ��7ت 7ن��ج�9 4

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

2 ال��ی! 0 ال��ی! 2 اب�ی! ال��ی!

6 وا��ی�� 0 وال;�ب! 3 وال>�ی� وال;�ب!

4 ا97�ن 3 ا7 �ك 2 ا7س�ك ا7س�<�ك

� �� 1 �� 3 2� 2

2 ی��@ 0 ی%�@ 1 ی%��@ ی%�@

A�وال � 3 وا�� 0 وال�A 2 وال

- 112 -

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

B� B�� 1 B� 0 B2 ی��

��ی! 0 9��ب! 3 ��0ی! 9��ب! 3

5 ان��ل� 1 اش�ر 4 ان��ء ث�را

Cواس�ل� D3 وال�0ل C5 وروس�� 0 واس�ل�

��5 D4 ن��! 4 ش�� !��9 4

1 رج@ 0 اج@ 2 ا��@ اج@

A و F� 1 و A 2 و�� 0 و

�ل 4 اس�;�ل ال��ل 0 ال��ل 4 اس�

Aال0%�دی A2 ال0%�رب� A0 ال0%�دی A3 ال0�9,ی

Test word

st option

EMD from the

option 2

nd option

EMD from the

option 3

rd option

EMD from the

option

2 ال�0@ 0 ال�0, 1 ال�%, ال�0,

0 ��د 2 ص�ى 1 ه�د ��د

�9 K�9 2 @ 3 ن�F 2 ن

3 وه�K 0 و�� 1 و�� و��

�ب� 5 ت�ی�� 3 ه��ي ل;,ي 5

2 ت%� 0 ت�� 3 ت'�� ت��

� @� 2 @�� 3 �� 1

0 ل��,ج� 6 �ب 6 ت�ی� ل��,ج�

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

�رت�� N 0ات 2 ,ات �Nات 5

O�%ت P3 ت�� O1 ت�� O�%0 ت

3 ا7ب�ء 3 ا7ن��ء 2 ا7ش��ر ا7ث�ر

- 113 -

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

B� B�� 1 B� 0 �� 2

4 ان��ل 0 اث�ر 1 اش�ر اث�ر

1 ��'�ة 0 ��0�ة 1 ��0ة ��0�ة

4 و ��ت 0 وت<��ت 2 وت��ت وت<��ت

4 ا ��آ�! 0 ال��ی! 1 ال�,ی! ال��ی!

6 ال��D 0 رئ�� 1 رئ��D رئ��

��5 � 2 �� 0 ��5 4 ن%�

1S1 2 ب�� ی� 0 ی1S 2 ی

3 خ>,ات� �N 0ات 2 �رت �Nات

O�%ت P3 ت�� O�%0 ت O1 ت��

U�>3 ت�� ل�� U�>6 ش��! 0 ل��

@N�9 !2 ��@ 2 �9ش 3 آ��

P�� !�# 4 P�� 0 !��" 4

3 ال��F 0 ال"��@ 3 ان��B ال"��@

Test word

st option

EMD from the

option 2

nd option

EMD from the

option 3

rd option

EMD from the

option

B� C4 �ی%� 2 ی� B�� 1

4 ال'�2 0 ال�� 4 ال��2 ال��

3 ال�� 0 اج@ 2 ا��@ اج@

A 2 وه�� و A 0 و A1 و�

�� 1 �� 0 ��S 2

�9 K�9 2 ��3 ن �9 0

1 و�� 1 و�� 2 و9� و��

� �� 1 � 0 �� 3

Test word

option

EMD from the 1

st option

option

EMD from the 2

nd option

option

EMD from the 3

rd option

4 �� 3 خ>�N 4 C��A ص;��

5 رثو 1 9,دت� 1 9,دت�� 9,دت<�

4 ی�$$,ن 0 ی�'�ن 1 ی�'�,ن ی�'�ن

1 ال�,ی! 0 ال��ی! 2 اب�ی! ال��ی!

X3 ب��ی! 3 ج�ی! ب�ی A3 ج�ی�

O3 اث�� ان� ! 4 ا ��! 5 وث�

O�%2 ت��3 ت K��"4 ت �� 3 ت%

4 یUال 0 ی��0اى 2 ی��اس ی��0اى

P�� !�# 4 P�� 0 !<� 4

- 114 -

Test word

st option

EMD from the

option 2

nd option

EMD from the

option 3

rd option

EMD from the

option

2 ن��ی! 0 ث�ی! 1 �ی! ث�ی!

3 خ�� 0 خ��<� 1 خ�� خ��<�

K�"ت � 0 ت"�K 4 ت%>0� 3 ت%

Aال0%�دی A2 ال0%�رب� A0 ال0%�دی A3 ال0�9,ی

4 ال0$��$�! 4 ال��$�! 3 ال��$�! ال<�$�!

3 وتZآ� 0 وت��9� 1 وت��9� وت��9�

2 ص�ى 0 ��د 1 ه�د ��د

Figure 8-2 and Figure 8-3 demonstrate various words handwritten by different users

that were correctly recognized by our system. Table 8-19 lists some additional words that

were correctly recognized. The system used the 40K word-dictionary size.

Figure 8-2: Illustration of correctly recognized words written by different users utilizing 40K word dictionary

- 115 -

Figure 8-3: Illustration of the same words written by different users, all of these words were correctly recognized by our system with 40K word-dictionary size

Dس�$�! ادب� �>�� ت'��,ا ال$

,اده� ت$�@ ال$,ن��س اده1

ن�ء ث��ت� ال0�رس! اس��ذا

Aا7ب A�0ال��0ه �� Dه

واس��ب خ�,ص�! ال0'�ل ا�7$�ر

وال>�ی� س��D ال0,ازن! ا77ت

وال<��و��5��! ض�3@ اتال��و ال��ب!

و��1 9�� ب�<0! ال�"�ج

ودب�, �س�,ن 9� ب�واءیA ال��0ت

Aوس�1 �5دوا ب��9ی! ال%�ب�

D%ب��>! ال A�9�� Dو�

وآ�� ن� بS^ ال��د

و , �ت �1 ب'��0ت ال��ی!

D'ت�ری"�� ال� �� Pون�

%��ج,نی ��A0 ت��P ال��ب!

ی%�@ ل;,ي ت��ب�� ال>�,ر

ی�'�وا %�دث�ت ت�ا�^ ال'<,د

ی�� %0� ت�� ال��ی�

Table 8-19: Example of correctly recognized words with 40K word-dictionary size

- 116 -

Chapter 9

Future Work In this work, some points and problems have not been investigated. Our future

analysis will explore these aspects. These points are discussed in this chapter:

• As we have discussed in Chapter 5 boundary lines are used for size, orientation

and skew normalization for Latin scripts. Since boundary lines do not necessarily

exist in unconstrained Arabic script, an alternative solution should be found to

apply such normalization.

• In our work we have chosen only the loops as the global feature. We intend to

attempt to add more global features such as cusps and crosses to our system as

was used in [19] to determine if better results are achieved.

• Our work does not explore the problem of the existence of punctuation marks

(e.g. [(.), (,), (;), (:), (!), (?), (#), ($), (%), etc…]), and the numbers (0, 1, ..., 9).

Extra research is needed to recognize that these symbols are not part of words,

since oftentimes there is no space between them and written words.

• To achieve a better time performance, we divided the dictionary database in terms

of word parts. However, when a sizeable lexicon (e.g. 100,000 words or more) is

used, additional methods should be employed to reduce the search space. Our

future research will test our system with a lexicon of this size and improve it by

exploring the method proposed in [45].

- 117 -

• Some word-parts/letters have been recognized incorrectly. The common error

cases are listed in Table 9-1. In order to reduce the number of such errors, we

propose having an additional postprocessing phase to detect these cases using

geometric-computation techniques.

Table 9-1: Possible errors reported by our recognizer. Sometimes, the word-parts/letter-shapes in the left column are exchanged to their appropriate letter-shapes/word-parts in the right column and vice versa.

• Letters that include ء (hamzi) or ~ (madi) in Table 9-2 are not supported in our

recognizer. Supporting these letters will also be part of our future work. They will

be handled by applying the same method that incorporates delayed strokes within

the observation sequence.

� ب

ت تـ�

ث �ـ�

ن ـ�

� ـ�

ـU ـ�ـ�

ـX ـWـ�

ـ{ ـ|ـ�

zـ Vـ

- 118 -

Table 9-2: Not supported letters

• Optimizing the WPN for words that consist of only one word part, by optimizing

both suffixes and prefixes in the same network, similar to [19].

Alef+Hamza above أ � ـ

Alef+Hamzi below ـ� إ Alef+Madi S ـ� Lam-Alef+Hamza above M ـ�

Lam-Alef+Hamzi below L ـ�

Lam-Alef+Madi N ـ�

- 119 -

Chapter 10

Conclusion The primary aim of this research was to investigate how an online Arabic

handwriting recognition system may be built out of Hidden Markov Model, and how well

such a system is capable in the task of resolving the complexities of Arabic handwriting

recognition. In addition, we analyzed the special characteristics of Arabic script which

differentiates it from other script categories.

To this end, the preprocessing phase has been explored and discussed in Chapter 4; in

this phase Douglas&Peucker's algorithm was selected to reduce the data amount upon

applying a low-pass filter method, and then the data point sequences were re-sampled.

In Chapter 5, three types of features: local feature, semi-local feature and global

feature have been extracted from the preprocessed point sequences. These feature

sequences were discretized to an observation sequences. In addition, we have shown how

the delayed strokes are detected and incorporated in the observation sequence.

The recognition framework was described in Chapter 6. Left-to-right HMM has been

selected to represent each Arabic-letter’s shape. Using this model three algorithms were

introduced to recognize an Arabic word given its observation sequences. The second

algorithm is an optimized version of the first. For optimization reasons, the letter models

were embedded in a grammar network representing the word-part network. Then the third

algorithm has been proposed to search for the most probable word that maximizes the

given observation sequences and the corresponding word-part networks. Then we have

- 120 -

shown how to recognize different writing styles (letter-classes). Additionally, Chapter 6

explored how the process of training works with resolving the automatic number classes

determination for each letter and the number of states assigned to the letter-class models.

During the two years of research, an online Arabic handwriting recognition system

was developed as described in Chapter 7. Chapter 8 focused on the collection of word

samples by four users. Writer-dependent results were reported with an average of 96.46%

tested with a dictionary of 5K words, 95.5% of 10K words, 92.86% with 20K words,

90.84% of 30K and 89.75% of 40K words. Writer-independent results of six other writers

were obtained with a recognition rate average of 96.46% tested with a dictionary of 5K

words, 95.21% of 10K words, 92.55% with 20K words, 89.68% of 30K and 88.01% of

40K words. Finally, Chapter 9 was dedicated to the exploration of some points and

problems which remain to be investigated in our future studies.

- 121 -

References

[1] R. Plamondon and S. Srihari. “On-line and off-line handwriting recognition: A comprehensive survey”. IEEE Trans. On Pattern Analysis and Machine Recognition, 22(1):63–84, January 2000.

[2] http://www.handwriting.org

[3] M. Sakkal, “The Art of Arabic Calligraphy” - Seattle Art Museum Resource Room, display boards; Seattle, March 1993. http://www.sakkal.com/ArtArabicCalligraphy.html

[4] Y. Haralambous, “Simplification of the Arabic Script,”, Vol. 1375 archive Proceedings of the 7th International Conference on Electronic Publishing, Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, Pages: 138 - 156, 1998, ISBN:3-540-64298-6 , Verlag London, UK.

[5] Backwalter Transliterations for Arabic Letters: http://www.ldc.upenn.edu/myl/morph/buckwalter.html

[6] J. Makhoul, T. Starner, R. Schwartz, G. Chou. “On-line cursive handwriting recognition using speech recognition methods”, in Proceeding of IEEE ICASSP’94 Adelaide, Australia, April 1994, pp. v125-v128.

[7] Homayoon S.M. Beigi, “An Overview of Handwriting Recognition,” Proceedings of the 1st Annual Conference on Technological Advancements in Developing Countries, Columbia University, New York, July 24-25, 1993, pp. 30-46.

[8] C.C. Tappert, C.Y. Suen, T. Wakahara, “The State of the Art in On-Line Handwriting Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 8, pp. 787-808, Aug. 1990.

[9] Scott D. Connell, “Online Handwriting Recognition Using Multiple Pattern Class Models”, PhD thesis Michigan State University, 2000.

[10] C. C. Tappert, “Adaptive on-line handwriting recognition,” presented at The 7th International Conference on Pattern Recognition, Montreal , Canada, pp. 1004-1007, July-Aug. 1984.

[11] J.J. Brault and R. Plamondon, “Segmenting Hanwritten Signatures at Their Perceptually Important Points,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol, 15, No. 9, pp. 953-957, September 1993.

[12] Han Shu S. B., “On-Line Handwriting Recognition Using Hidden Markov Models,” Electrical Engineering and Computer Science Massachusetts Institute of Technology (1996) – M.S. thesis.

[13] C. C. Tappert, “Cursive Script Recognition by Elastic Matching,” IBM Journal of Research and Development, Vol.26, Nov. 1982, pp. 765-771.

[14] L. R. Rabiner. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” In A. Waibel and K.-F. Lee, editors, Readings in Speech Recognition, pages 267–296. Kaufmann, San Mateo, CA, 1990.

[15] D. Nahamoo and M. Analoui, “Speech Recognition Using Hidden Markov Models,” First Annual Conference on Technological Advancement in Developing Countries, Columbia University, New York, June 24-25, 1993.

- 122 -

[16] L.R. Bahl, P.F. Brown, P.V. deSouza, and R.L. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” Proc. ICASSP'86, Tokyo, Japan, pp. 49-52, Oct. 1986.

[17] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 3, pp. 179-190, March 1983.

[18] K.S. Nathan, H.S.M. Beigi, J. Subrahmonia, G.J. Clary, H. Maruyama. Real-Time on-line unconstrained handwriting recognition using statistical methods, in: Proceeding of IEEE ICASSP’95, Detroit, USA, June 1995, pp. 2619-2622.

[19] J. Hu, S.G. Lim, M.K. Brown, “Writer independent on-line handwriting recognition using an HMM approach,” Pattern Recognition 33 (2000) 133-147.

[20] J, Hu, M. K. Brown, W. Turin, “Handwriting Recognition with Hidden Markov Models and Grammatical Constraints,” AT&T Bell Laboratories Murray Hill, New Jersey 07974.

[21] J. J. Lee, J. Kim and J. H. Kim, “Data-Driven Design Of HMM Topology For Online Handwriting Recognition,” International Journal of Pattern Recognition and Artificial Intelligence Vol. 15, No. 1 (2001) 107-121 © World Scientific Publishing Company doi:10.1142/S021800140100076z.

[22] S. Bercu and G. Lorette, “On-line handwritten word recognition: An approach based on hidden Markov models”. In Proceeding Third Int. Work-shop on Frontiers in Handwriting Recognition,

pages 385–390, Buffalo, USA, May 1993.

[23] A. Amin. “Recognition of Printed Arabic Text Based on Global Features and Decision Tree Learning Techniques,” Pattern Recognition, 33(8):1309–1323, August 2000.

[24] A. Elgammal and M.A. Ismail. “A Graph-Based Segmentation and Feature Extraction Framework for Arabic Text Recognition,” In ICDAR’01, 2001.

[25] S. S. El-Dabi, R. Ramsis, and A. Kuwait. “Arabic Character Recognition System: a Statistical Approach for Recognizing Cursive Typewritten Text,” Pattern Recognition, 23(5):485–495, 1990.

[26] E. J. Erlandson, J. M. Trenkle, and R. C. Vogt. “Word-Level Recognition of Multifont Arabic Text Using a Feature Vector Matching Approach.” In Proc. SPIE, Document Recognition III, Luc M. Vincent; Jonathan J. Hull; Eds., volume 2660, pages 63–70, March 1996.

[27] S. Al-Emami and M. Urber. “On-line Recognition of Handwritten Arabic Characters,” IEE Transaction on Pattern Analysis and machine Intelligence VOL. 12. NO. 7, July 1990, pp. 704-710

[28] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, vol. 39, pp. 1-21, 1977.

[29] L. E. Baum and G. R. Sell, “Growth Functions for Transformations on Manifolds,” Pac. J. Math., vol. 27, pp. 221-227, 1968.

[30] C. D. Manning and H. Schutze, “Foundations of Statistical Natural Language Processing” (book) pages: 317-338

[31] W. Guerfali and R. Plamondon. “Normalizing and Restoring On-line Handwriting”. Pattern

Recognition, 26(3):419 – 431, 1993.

- 123 -

[32] H. Beigi, “Pre-Processing the Dynamics of On-Line Handwriting Data, Feature Extraction and Recognition,” Proc. 5th Int. Workshop on Frontiers in Handwriting Recognition, Colchester, England, pp. 255-258, Sept. 1996.

[33] D. Douglas & T. Peucker, “Algorithms for the Reduction of the Number of Points Required to Represent a Digitized line or its Caricature”, The Canadian Cartographer 10(2), 112-122 (1973).

[34] H. S.M. Beigi, K. Nathan, Gregory J. Clary, and Jayashree Subrahmonia, “Challenges of Handwriting Recognition in Farsi, Arabic and Other Languages with Similar Writing styles, An On-line Digit Recognizer”.

[35] D. J. Burr. “A Normalizing Transform for Cursive Script Recognition”. In Proc. 6th ICPR, volume 2, pages 1027-- 1030, Munich, October 1982.

[36] M. K. Brown and S. Ganapathy. Preprocessing techniques for cursive script recognition. Pattern Recognition, 16(5):447--458, November 1983.

[37] H. S. M. Beigi, K. Nathan, G. J. Clary, and J. Subrahmonia. “Size Normalization in On-line Unconstrained Handwriting Recognition”. In Proc. ICASSP'94, pages 169--172, Adelaide, Australia, April 1994.

[38] M. Schenkel, I. Guyon, and D. Henderson. “On-line Cursive Script Recognition Using Time delay Neural Networks and Hidden Markov Models”. In R. Plamondon, editor, Special Issue of Machine Vision and Applications on Cursive Script Recognition. Springer Verlag, 1995.

[39] Y. Bengio and Y. LeCun. “Word Normalization for Online Handwritten Word Recognition”. In Proc. 12th ICPR, volume 2, pages 409--413, Jerusalem, October 1994.

[40] S. A. Guberman and V. V. Rozentsveig. “Algorithm for the Recognition of Handwritten Text”. Automation and Remote Control, 37 37(5):751--757, May 1976.

[41] I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet, “UNIPEN project of on-line data exchange and benchmarks,” presented at International Conference on Pattern Recognition, ICPR'94, Jerusalem, Israel, 1994.

[42] M. Maamouri, A. Bies, H. Jin, and T. Buckwalter. 2003. Arabic treebank: Part 1 v 2.0. Distributed by the Linguistic Data Consortium. LDC Catalog No.: LDC2003T06.

[43] Alarabi Magazine: http://www.alarabimag.com/main.htm

[44] www.aljazeera.net

[45] A. Leroy and Irisa, “Lexicon Reduction Based On Global Features, For On-Line Handwriting,” Campus de Beaulieu - 35042 RENNES cedex - France - aleroy@irisa.fr

Abstractfadi/papers/to-ms-thesis.pdf · - 1 - Online Arabic Handwriting Recognition Using Hidden...

Documents

Transcript of Abstractfadi/papers/to-ms-thesis.pdf · - 1 - Online Arabic Handwriting Recognition Using Hidden...

Building an ASR using HTK CS4706 Fadi Biadsy 1. Outline Speech Recognition Feature Extraction HMM – 3 basic problems HTK – Steps to Build a speech recognizer.

Fadi Nashar Photography / Videography

ICANN 52 Fadi Chehadé Keynote

Charisma in English and Arabic Political Speech Julia Hirschberg Columbia University Joint work with Andrew Rosenberg and Fadi Biadsy Stony Brook University,

Automatic Dialect/Accent Recognition Fadi Biadsy April 12 th, 2010 1.

Fadi Ghanem, MD

Handwriting Development - Warwick Hospital€¦ · HANDWRITING DEVELOPMENT . 2 South Warwickshire Foundation Trust ... Handwriting grip & Arm position 25 -26 Handwriting Tips for

Présentation de thèse - Fadi KACEM

Building an ASR using HTK CS4706 Fadi Biadsy. Outline Speech Recognition Feature Extraction Modeling Speech Hidden Markov Models (HMM): 3 basic problems.

Automatic Dialect/Accent Recognition - Columbia Universityjulia/courses/CS4706/cs4706_DREC.pdf · 2010-04-14 · PhD Proposal –Fadi Biadsy Problem: Dialect Recognition Given a speech

LBP Project Report [Fadi]

BOOK FADI PRESTIGE (Français)

Year 1 Autumn 2 Week 1 W.B. 2.11 · 2020. 11. 4. · Week 1 W.B. 2.11.20. Handwriting . Handwriting. Handwriting. Handwriting. Handwriting. Handwriting. Handwriting. English . Listen

Startup Turkey 2017 - Fadi Bishari

FADI ILIAS

Mensagem aos alunos - FADI

Press Release Fadi Yazigi Release Fadi Yazigi.pdf · Title: Microsoft Word - Press Release Fadi Yazigi.docx Author: Yallay Space Created Date: 10/29/2015 9:50:01 AM

Final report_Internship_ Fadi 5261 _

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

FADI Fadi ist anders - DEUTSCH&MEHR