Post on 29-Jul-2020
- 1 -
Online Arabic Handwriting Recognition Using Hidden Markov Models
by
Fadi Biadsy
Jun 2005
Thesis Submitted as Part of the Requirements for the
Master of Science Degree in Computer Science at Ben-Gurion University of the Negev
Abstract
This thesis describes an online writer-independent handwriting recognition system for
the Arabic script. Recognition of Arabic script is a difficult problem since it is naturally
both cursive and unconstrained. The analysis of Arabic script is further complicated in
comparison with other scripts, since most Arabic letters include dots/delayed strokes that
are placed above or below letters. The number and placement of these dots/delayed
strokes are crucial parameters used to distinguish words – any minor/minute change may
completely alter one word into an entirely different one. This work introduces a Hidden
Markov Model (HMM) based system to provide solutions for most of the difficulties
inherent in recognizing Arabic script including: the natural connectivity of writing, the
changing letter-shape depending on position in the word, and the dots/delayed stroke
problem. This work consists of several phases. Initially, a preprocessing phase is used to
reduce hardware imperfections and the amount of writing variations. A feature extraction
phase used the preprocessed data to construct observation sequences which are provided
- 2 -
to an HMM structure for recognition or training. This is the first work which is known to
the author that is based on HMM for recognizing Arabic handwriting script. We report
successful experimental results for writer-dependent and writer-independent using data
collected by four trainers and tested by ten users.
Keywords:
Online Handwriting Recognition, Arabic Script, Hidden Markov Model, Writer
Independent.
- 3 -
Acknowledgments
I am deeply indebted to Dr. Jihad El-Sana, my advisor, for his wonderful support,
guidance, encouragement and inspiration which made this study possible, enjoyable, and
highly successful. In addition to being an excellent professor of mine he is everything
that one wants from an advisor. I am exceedingly thankful to Dr. Nizar Habash for his
constant support, advice, and the time given while editing and revising. Without their
support, these pages would not have been written and my thoughts would not find a way
to advance beyond inception. I would also like to thank meaningfully Mr. Meni Adler for
assisting me with his expertise in statistical models. I would like to give special thanks to
my professor Dr. Michael Elhadad who directed me to find the right references and
material from time to time, in addition to his useful discussion and comments. I would
like to thank my best friends: Azzam Marai, Gabriel Zaccak, Guy Wiener, Nasim Biadsy
and Wisam Dakka for the main support and encouragement I always get, in addition to
their marvelous excitement when a new step is implemented in my system. Their
constructive discussion made this work interesting and enjoyable. I am also thankful to
my close friends Jennifer Miller and Vinnee Tong for reviewing my thesis and providing
valuable feedback. Finally, I would like to thank my family for the great support and the
warm care they always provide.
- 4 -
TABLE OF CONTENTS Chapter 1 – Introduction ...................................................................................... - 11 - 1.1. Offline versus Online Recognition ............................................................. - 12 - 1.2. Writer-Dependent versus Writer-Independent ............................................ - 13 - 1.3. Lexicon-Dependent versus Lexicon-Independent ....................................... - 14 - 1.4. Arabic Script Background .......................................................................... - 15 - 1.4.1. History of Arabic Language and Script ........................................ - 15 - 1.4.2. Characteristics of Arabic Script ................................................... - 17 - 1.4.3. Dots and Additional Strokes ........................................................ - 20 - 1.5. Recognition Difficulties and Ambiguity ..................................................... - 22 - 1.6. Dot Problem in Words ............................................................................... - 22 - 1.7. Absence of Diacritics ................................................................................. - 22 - Chapter 2 – Previous Work ................................................................................... - 25 - 2.1. Data Acquisition ........................................................................................ - 25 - 2.2. Segmentation ............................................................................................. - 27 - 2.3. Modeling Methods ..................................................................................... - 29 - 2.3.1. Template Matching Models ......................................................... - 29 - 2.3.2. Statistical Models ........................................................................ - 30 - 2.3.3. Neural Networks Models ............................................................. - 31 - 2.4. Postprocessing ........................................................................................... - 32 - 2.5. Related Work ............................................................................................. - 32 - 2.6. Previous Handwriting Recognition System Summarization ........................ - 33 - 2.7. Online Handwriting Recognition Work for Arabic Script ........................... - 41 - Chapter 3 – Background – Hidden Markov Model .............................................. - 42 - 3.1. Discrete Markov Process............................................................................ - 43 - 3.2. Extension to Hidden Markov Models ......................................................... - 45 - 3.3. The Three Fundamental Problems for HMMs ............................................ - 46 - 3.3.1. The Evaluation Problem .............................................................. - 47 - 3.3.2. The Decoding Problem – Viterbi Algorithm ................................ - 48 - 3.3.3. The Training Algorithm – Baum-Welch Algorithm ..................... - 50 - Chapter 4 – Preprocessing .................................................................................... - 53 - 4.1. Step 1: Noise Reduction ............................................................................. - 53 - 4.2. Step 2: Data Simplification ........................................................................ - 55 - 4.3. Step 3: Normalization ................................................................................ - 55 - 4.3.1. Size and Orientation Normalization ............................................. - 56 - 4.3.2. Speed Normalization ................................................................... - 58 - Chapter 5 – Feature Extraction ............................................................................. - 60 - 5.1. Local Feature ............................................................................................. - 60 - 5.2. Semi-Local Features .................................................................................. - 61 - 5.3. Global Features .......................................................................................... - 63 -
- 5 -
5.3.1. Loop Determination ........................................................................................ - 65 - 5.4. Feature Vector Construction ...................................................................... - 66 - 5.5. Feature Vector to Observation Code .......................................................... - 67 - 5.6. Delayed Strokes ......................................................................................... - 69 - 5.6.1. Delayed-Stroke Projection ........................................................... - 70 - Chapter 6 – The Recognition Framework ............................................................. - 74 - 6.1. Framework Models .................................................................................... - 74 - 6.1.1. Letter Models .............................................................................. - 74 - 6.1.2. Word-Part Models ....................................................................... - 75 - 6.1.3. Word Models ............................................................................... - 76 - 6.2. Word and Word-Part Dictionaries .............................................................. - 76 - 6.3. Arabic-Word Recognizer ........................................................................... - 77 - 6.3.1. Word Recognizer - Algorithm I ................................................... - 78 - 6.3.2. Word Recognizer - Algorithm II .................................................. - 79 - 6.4. Optimized Grammar Network .................................................................... - 81 - 6.4.1. Optimized Word-Part Network .................................................... - 81 - 6.4.2. Word Network ............................................................................. - 83 - 6.4.3. Word-Dictionary Database Architecture ...................................... - 84 - 6.5. Optimized Recognizer................................................................................ - 85 - 6.5.1. Word Recognizer - Algorithm III ................................................. - 86 - 6.6. Support for Writing Style ........................................................................... - 88 - 6.7. Model Training .......................................................................................... - 90 - 6.7.1. Training Problems ....................................................................... - 91 - 6.7.2. The Training Process ................................................................... - 93 - Chapter 7 – Implementation .................................................................................. - 98 - Chapter 8 – Results ...............................................................................................- 102 - Chapter 9 – Future Work .....................................................................................- 116 - Chapter 10 – Conclusion .......................................................................................- 119 -
- 6 -
LIST OF FIGURES
Figure 1-1: Development of scripts, courtesy: Mamoun Sakkal [3] ...................................................... - 16 -
Figure 1-2: Example of Arabic calligraphy. Two words: � م��� ‘love peace’, courtesy: [3]............... - 17 -
Figure 1-3: Word (a) is a result of moving the dot to the left from word (b).......................................... - 21 -
Figure 1-4, Word (a) is a result of eliminating the dot above the first letter from word (b) .................... - 21 -
Figure 1-5: Delayed strokes in Arabic script ........................................................................................ - 21 -
Figure 1-6: Illustration of words with the same meaning written in different styles............................... - 22 -
Figure 1-7: Illustration of an ambiguous Arabic word .......................................................................... - 22 -
Figure 1-8: Unclear Arabic sentence because of the last word .............................................................. - 23 -
Figure 1-9: The graph of the recognized possible sentences of the sentence in Figure 1-8 ..................... - 23 -
Figure 2-1: Digital Tablet .................................................................................................................. - 26 -
Figure 2-2: Optical Pen ....................................................................................................................... - 26 -
Figure 2-3: CrossPad .......................................................................................................................... - 26 -
Figure 2-4: PDA ................................................................................................................................. - 27 -
Figure 2-5: Tablet PC ........................................................................................................................ - 27 -
Figure 2-6: Handwriting styles in English. The order is from easy to difficult in term of recognition [10]- 28 -
Figure 2-7: The block diagram of the recognition system in [18] ......................................................... - 35 -
Figure 2-8: 7-state HMM used to model each character, courtesy: [6] .................................................. - 36 -
Figure 2-9: Illustration of contextual effects on a cursively written “i”. The left “i” is written after “m”,
the right “i” is written after “v”, courtesy: [12] ................................................................................... - 36 -
Figure 2-10: Connecting strokes (in dashed lines), courtesy: [6] ......................................................... - 37 -
Figure 2-11: Illustration of the three high-level features, courtesy: [19]................................................ - 39 -
Figure 2-12: A grammar network implementing a word dictionary, courtesy: [19] ............................... - 39 -
Figure 3-1: Illustration of Markov chain with N = 3 states ................................................................... - 43 -
Figure 3-2: The weather Markov chain ................................................................................................ - 44 -
Figure 3-3: Illustration of a hidden Markov model with three states (N = 3) and two observable symbols
(M = 2) ............................................................................................................................................... - 46 -
Figure 4-1: Illustration of applying a low-pass filter on a word which was written in a moving train..... - 54 -
Figure 4-2: Polyline simplification ...................................................................................................... - 55 -
Figure 4-3: Word boundary lines ......................................................................................................... - 57 -
- 7 -
Figure 4-4: Illustration of a word written in the top-down writing style. The left word is the printed style
of the right. �� (bHajm) ‘in size’. In the right word, the letter ـ is above the letter ــ, which is above
the letter ـ�ـ and above the letter 57 - ..................................................................................................... .ـ� -
Figure 4-5: Illustration of preprocessing steps applied on the handwritten word � (jAmEp) ج��
‘university’ ......................................................................................................................................... - 58 -
Figure 4-6: Illustration of preprocessing steps applied on the handwritten word ����� (vqAfp) ‘education’- 59 -
Figure 5-1: Illustration of αi-angle (the angle between the vector (pi-1pi) and the X-axis) ........................ - 61 -
Figure 5-2: Illustration of running Douglas&Peucker's algorithm. Figure (a) is a handwritten word �� � (mElm) ‘teacher’, red points in Figure (b) are the skeleton points returned by Douglas&Peucker's algorithm applied on the point sequence in Figure (a). ......................................................................... - 62 -
Figure 5-3: Illustration of the skeleton points and vectors, the red points are the skeleton points, the red segments show the skeleton vectors. ................................................................................................... - 62 -
Figure 5-4: Illustration of the β-angle (the angle between a skeleton vector and X-axis) ....................... - 63 -
Figure 5-5: Illustration of the three high-level features, courtesy: [19] ................................................. - 64 -
Figure 5-6: Illustration of loops (in red) in an Arabic word .................................................................. - 65 -
Figure 5-7: Illustration of an accidental loop in a letter that should not include a loop .......................... - 66 -
Figure 5-8: Illustration of how fi is extracted from each point pi ........................................................... - 67 -
Figure 5-9: Figure (a) is a division of angle’s space to 16 directions. Figure (b) is a division of angle’s space to 8 directions. .......................................................................................................................... - 68 -
Figure 5-10: Illustration of how observation i is computed ................................................................... - 69 -
Figure 5-11: Figure (a): five delayed strokes for word-part body 1. Figure (b): two delayed strokes for
word-part body 3. Figure (c): three delayed strokes for word-part body 1. Figure (d): one delayed stroke for word-part body 1. .......................................................................................................................... - 70 -
Figure 5-12: Illustration of the delayed-stroke projection, black points are the letter body, blue points are the delayed stroke, red points are virtual points. ............................................................................. - 71 -
Figure 5-13: Illustration of point sequence with delayed stokes to observation sequence ...................... - 72 -
Figure 5-14: Delayed-strokes projection in the handwritten word: اق����� - (AlAst$rAq) ‘orientalism’- 73 ا
Figure 5-15: Delayed-strokes projection in the handwritten word: ع��� � (AlAnTbAE) ‘the ا
impression’......................................................................................................................................... - 73 -
Figure 6-1: left-to-right HMM ............................................................................................................ - 75 -
Figure 6-2: Word-part model consists of n letter models ...................................................................... - 75 -
Figure 6-3: Illustration of word model for the word: � (jAmEap) ‘university’. Word-part model for ج�� � and word-part model for (ـ� and جـ :two letters) ج�� (three letters: ـ ,�ـ - 76 - .............................. .(ـ� and ـ
Figure 6-4: Optimized-grammar network implements a word-part dictionary includes k word parts, with
grouping all shared suffixes and replacing each node by its corresponding letter model ........................ - 81 -
Figure 6-5: Illustration of running wordPartDictionaryToWPN({��, ��!, ��", ر }) .............................. - 83 -
Figure 6-6: Illustration of the word network of the sub-dictionaryD3 described in example 6-2 ............. - 84 -
Figure 6-7: The Word-dictionary architecture ...................................................................................... - 85 -
- 8 -
Figure 6-8: Illustration of four letter classes of the medial letter shape ـ$ـ (h) ....................................... - 88 -
Figure 6-9: Multiple-letter-class model contains n letter-class models .................................................. - 89 -
Figure 6-10: The word-part network with replacing each node by a multiple-letter-class model ............ - 89 -
Figure 6-11: A screen shot of the trainer system; the green vertical lines are the split lines. The dashed
lines are automatically given by the system to avid word-part overlapping. .......................................... - 94 -
Figure 6-12: Multiple-letter-class model chain of m word parts ............................................................ - 96 -
Figure 6-13: Screen shot of our automatic letter splitting system ......................................................... - 96 -
Figure 6-14: Illustration of letter-class model chain ............................................................................. - 97 -
Figure 7-1: Screen shot 1 of the handwriting recognition system........................................................ - 100 -
Figure 7-2: Screen shot 2 of the handwriting recognition system........................................................ - 100 -
Figure 7-3: The global architecture of our system .............................................................................. - 101 -
Figure 8-1: The average of the word recognition results of all users in each dictionary. ...................... - 106 -
Figure 8-2: Illustration of correctly recognized words written by different users utilizing 40K word dictionary ......................................................................................................................................... - 114 -
Figure 8-3: Illustration of the same words written by different users, all of these words were correctly
recognized by our system with 40K word-dictionary size .................................................................. - 115 -
- 9 -
LIST OF TABLES
Table 1-1: Offline verses online features ............................................................................................. - 13 -
Table 1-2: The basic Arabic alphabets ................................................................................................. - 19 -
Table 1-3: Special Arabic letters ......................................................................................................... - 19 -
Table 1-4: Backwalter transliterations for Arabic letters and diacritics [5] ............................................ - 20 -
Table 5-1: Category I – 31 letter shapes that can not be written without loops ..................................... - 64 -
Table 5-2: Category II – 17 letter shapes that it is not clear if they are written with loops or not .......... - 64 -
Table 8-1: The four users who trained our system.............................................................................. - 102 -
Table 8-2: The users who tested the system, (different from those who did the training) ..................... - 102 -
Table 8-3: The writer-dependent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size- 104 -
Table 8-4: The writer-independent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size- 105 -
Table 8-5: The recognition of word-part results with 5K, 10K, 20K word-dictionary size of all users . - 107 -
Table 8-6: The recognition of word-part results with 30K and 40K word-dictionary size of all users .. - 108 -
Table 8-7: The word recognition results of the three options with 5K, 10K and 20K word-dictionary size................................................................................................................................................... - 109 -
Table 8-8: The word recognition results of the three options with 30K and 40K word-dictionary size . - 110 -
Table 8-9: Incorrectly recognized words of “User 1” with 5K word-dictionary size ............................ - 111 -
Table 8-10: Incorrectly recognized words of “User 2” with 5K word-dictionary size .......................... - 111 -
Table 8-11: Incorrectly recognized words of “User 3” with 5K word-dictionary size .......................... - 111 -
Table 8-12: Incorrectly recognized words of “User 4” with 5K word-dictionary size .......................... - 112 -
Table 8-13: Incorrectly recognized words of “User 5” with 5K word-dictionary size .......................... - 112 -
Table 8-14: Incorrectly recognized words of “User 6” with 5K word-dictionary size .......................... - 112 -
- 10 -
Table 8-15: Incorrectly recognized words of “User 7” with 5K word-dictionary size .......................... - 113 -
Table 8-16: Incorrectly recognized words of “User 8” with 5K word-dictionary size .......................... - 113 -
Table 8-17: Incorrectly recognized words of “User 9” with 5K word-dictionary size .......................... - 113 -
Table 8-18: Incorrectly recognized words of “User 10” with 5K word-dictionary size ........................ - 114 -
Table 8-19: Example of correctly recognized words with 40K word-dictionary size ........................... - 115 -
Table 9-1: Possible errors reported by our recognizer. Sometimes, the word-parts/letter-shapes in the
left column are exchanged to their appropriate letter-shapes/word-parts in the right column and vice versa................................................................................................................................................. - 117 -
Table 9-2: Not supported letters ........................................................................................................ - 118 -
- 11 -
Chapter 1
Introduction Keyboards and electronic mice are the basic input devices for computers today. Still,
these devices may not endure as the only prevalent means of transmitting electronic data
to computers. Other methods may become necessary, particularly with regard to the size
of newer mobile devices and the method of transmission. Hand-held computers, mobile
technology, for example, present significant opportunities for alternative devices that
work in forms smaller than the traditional keyboard and mouse. In addition, the need for
more natural human-machine interfaces becomes ever more important as computer use
reaches a larger number of people. The need grows particularly acute in developing
countries, where new adopters use computers in an effort to improve living conditions. In
this, a potential problem arises: millions of average, ordinary people do not know how to
type. One could think about two alternatives (to typing) speech and handwriting, which
are universal and natural methods of communicating since they have been used for
thousands of years. Therefore, they are better alternatives for many new adopters as a
way of interacting with computers. Notice that speech does not currently provide an
adequate interface to ensure privacy and confidentiality. These requirements, however,
can be achieved by using handwriting.
The following three sections explore three aspects of handwriting recognition
systems: Mode of operation online and offline processing, writer dependence, and
lexicon dependence respectively. Section 1.4 is dedicated to give a background of Arabic
- 12 -
script and the way it is written. The recognition difficulties and ambiguity, particularly
for Arabic script, is discussed in Section 1.5. Finally, the dot separation problem and
diacritics in Arabic script are explored in Sections 1.6 and 1.7 respectively.
1.1. Offline versus Online Recognition
Handwriting recognition can be divided into two categories, which differ in the
presentation of the data to the system – Offline handwriting recognition and online
handwriting recognition.
In offline handwriting recognition systems, the input is a digitized image of scanned
paper of handwritten or printed text. There is no interaction with the user, the text can be
written or printed any time before the recognition process starts. In the case of Optical
Character Recognition (OCR) the text was printed mechanically in uniform font.
In online handwriting recognition system, the user writes on a digital device using a
special stylus, the system samples and records the point sequence [(x0, y0), …, (xn, yn)] as it
is being written. Therefore, the online handwriting samples contain additional temporal
data, which is not present in offline sampled data.
When implementing handwriting recognition system, differences between online and
offline should be taken into account. Next we review these differences.
1. Current technology of scanning and digitizing of images into a discrete grid form
eliminates the continuity of the original shapes. Such continuity is maintained in the
online or temporal recognition systems.
2. The offline recognition systems are sensitive to noise over the entire scanned paper
(page) while the online recognition systems are sensitive to the smoothness of the
drawn shapes.
- 13 -
3. An essential step in any offline recognition systems is to normalize the thickness of
letters/words (reduce the pen thickness to one). Such step usually involves image-
processing algorithms with high complexity and low robustness level. In contrast, the
notion of pen thickness does not exist in online recognition systems.
4. Pen features such as pen-up and pen-down are used to segment the input point
sequence. Obviously, these features are irrelevant to offline recognition systems.
5. Online recognition systems are interactive nature. Therefore, the recognition time is
critical for online recognition. However, real-time performance is not crucial for
offline recognition systems.
Table 1-1 summarizes the comparison between offline and online recognition systems.
For detailed comparison between online and offline handwriting recognition see [1].
Feature
Offline handwriting recognition system
Online handwriting recognition system
Data acquisition Optical scanner / Digital camera
Digital tablet / Optical pen / CrossPad / Tablet PC
Input Bit-map image Point time-sequence
Noise type Global Local
Stroke thickness Exists Does not exist
Pen features Do not exist Exist
Real-time Recognition
Not Critical Required
User interaction Yes No
Table 1-1: Offline verses online features
1.2. Writer-Dependent versus Writer-Independent
There is a wide range of variation among handwriting of writers in a specific script
(Latin, Arabic, Hebrew, Hangul and etc). These variations are perceived to be
individualistic and even expository of personal traits such as: emotional indicators,
mental ability, indicators of self-image, thinking styles, attitude, and modes of
communication [2]. Handwriting recognition systems can be classified into two
categories regarding how these variations are handled.
- 14 -
1. Writer-dependent systems recognize handwriting of users who train the
system in their own handwriting style. They usually achieve a high accuracy only
to those users, since variant features of those users/trainers are taken into
consideration. Examples of well-known features are speed of writing and pen
pressure.
2. Writer-independent systems are capable of recognizing handwriting of users
who have not trained the system. Therefore, they have to learn global features
fundamentally common to all users and invariant to handwriting styles. Writer-
independent system is more difficult to develop than the writer-dependent system,
since the variant features should be removed carefully to avoid destroying useful
data. Usually every script has its own characteristic and global features. For
example, the Latin scripts have different characteristics than the Arabic scripts.
1.3. Lexicon-Dependent versus Lexicon-Independent
Handwriting recognition systems are divided into two types: Lexicon-dependent (LD)
and Lexicon-independent (LI).
1. Lexicon-dependent systems recognize a handwritten word by searching for the
most matching word to the input point sequences from a given arbitrary size lexicon.
LD systems can be divided into two types: systems with static-lexicon and systems
with dynamic-lexicon. Static-lexicon usually requires a direct modeling of every word
in the lexicon. Thus, adding a new word to static-lexicon systems may require an
architectural changes or an explicit new training for this word. In contrast, dynamic-
lexicon systems allow adding new word to the lexicon database with no architectural
changes and without retraining. However, adding a large number of words to the
- 15 -
lexicon will increase the search space measurably leading to higher error rates and
degrading performance.
2. Lexicon-independent systems are capable to recognize a handwritten word that
consists of any permutation of letters without the constraint of being in a lexicon.
1.4. Arabic Script Background
This section is dedicated to explain the Arabic script and the way it is written. The
lack of knowledge of Arabic script may complicate the understanding of the important
key issues.
1.4.1. History of Arabic Language and Script1
The language of Arabic originated from the earliest-known alphabet, the North
Semitic alphabet that was developed in Syria around 1700 B.C. The North Semitic
alphabet consisted of 22 consonants. Similarly, the Arabic alphabet is comprised solely of
consonants. In Arabic, vowels are represented by diacritics, which are accent mark used
to denote pronunciation.
Arabic was one of three languages, along with Hebrew and Phoenician, that were
developed from the North Semitic alphabet. In 1000 B.C., the Greeks took the Phoenician
alphabet as a model and added vowels to it. About two hundred years later, the Etruscans
used the Greek alphabet as a model for theirs. Then, the ancient Romans used this model,
and that became the basis for all Western alphabets. Figure 1-1 shows the script
development with the alphabet of each script.
1 The material presented in this section is based on [3]
- 16 -
Figure 1-1: Development of scripts, courtesy: Mamoun Sakkal [3]
The North Arabic script eventually became the Arabic script of the Quran. As Islam
spread outside the Arab world, the language of the Quran spread with it. Several non-
Arab nations adopted Arabic for their own languages. Examples include Farsi in Iran,
where four letters were added ((, گ ,ژ ,چ), and the Ottoman Turks, who added yet
another letter. Other languages that also used the Arabic alphabet at times include Urdu,
Malay, Swahili, Hausa, and others.
With the rise of Islam in the 7th century, Arabic developed into an art form. There are
two main categories of calligraphic style (as illustrated in Figure 1-2). One is the dry
style, which is generally called the Kufic. The other is soft cursive, which includes
Nasikh, Thuluth, Nastaliq and many others.
- 17 -
Figure 1-2: Example of Arabic calligraphy. Two words: م�� �� ‘love peace’, courtesy: [3]
1.4.2. Characteristics of Arabic Script
This section explains the nature of Arabic script. Which consists of 28 basic letters,
12 additional special letters, 8 diacritics2, and it is written from right to left. Arabic words
are written (machine printed and handwritten) in a cursive style. Discreet styles of
handwriting in Arabic have been proposed but never became popular [4]. Most letters are
written in four different shapes depending on their position in a word (e.g. the letter (E) is
written: isolated: ع, initial: ـ!, medial: ـ (see the alphabet Table 1-2). There (ـ, :and final ـ
are 6 letters { ا (A), د (d), ر ,(*) ذ (r), ز (z), و (w) } from the basic letters that have no
medial and initial shape, these letters do not connect the following letter and will be
referred to as disconnectives <Discon-letter>. These letters appearing will cause the
continuity of the graphic form of the word to be interrupted. We denote letters in a word
that are connected together as a word part. If a word part is composed only of one letter,
this letter will be in an isolated shape. Arabic script is also similar to English in that it
uses spaces and punctuation markers to separate words.
2 The diacritics are not explored here, since they are almost never used in handwriting.
- 18 -
Example 1-1: Consider the Arabic word: ت� ’(mrtfEAt)3 ‘heights ��ت2
• By separating the letters we will get: ـ ـ� ت .�ـ ـ� تـ ـ2ـ ـ
• This word consists of 7 letters: ا ,ع ,ف ,ت ,ر ,م, and ت
• This word includes three word parts:
o �� - this is a word part because it ends with a letter (r) that
has no medial and initial shape.
o � this is a word part because it ends with a letter (A) that - ت2
has no medial and initial shape.
o ت - this is a word part because it contains only one letter.
Formally, Arabic word can be described as follows:
<Word> ::= {<Word-part>}+
<Word-Part> ::= <Con-letter>.initial • <Con-letter>.medial* • <Letter>.final
<Letter> ::= <Con-letter> || <Discon-letter>
<Con-letter> ::= ت ب|| ||ث|| ج|| خ|| ح ||س ||ض|| ص|| ش || || ظ|| ط || غ || ع ||ق|| ف ||ل|| ك ||ن || م F ي || ئ||
<Discon-letter> ::= ) ء || ة || L || M ||N || �ا|| ر|| ز|| د || ذ || و|| ) S|| أ || إ || ى ||ئ || ؤ ||
3 In this work, all Arabic words and letters are transliterated in Buckwalter’s Arabic transliteration format (without diacritics) [5]
- 19 -
Letter Name
Isolated
Initial Medial Final
Letter Name Isolated Final
Ba ـ ب ـ� ا Alef ـ� ـ�ـ
Ta ـ�ـ تـ ت Uـ Dal د Vـ
Tha ـ �ـ ثWـ Xـ Dhal ذ Yـ
Jim ـ�ـ جـ ج Zـ Ra ـ� ر
H’a ـ �ـ حـ] ـ Zay ـ\ ز
Kha ـ_ ـ^ـ [ـ خ Waw ـ` و
Sin ـ �ـ سaـ bـ Basic Arabic diconnective Letters
Shin ـ�ـ شـ ش dـ
Sad ـ صe ـfـ gـ
Dad ـ ضh ـiـ jـ
Ta ـ طk ـ�ـ lـ
Zha ـ ظm ـnـ oـ
A’yn ـ !ـ ع ـ, ـ
Ghayn ـ غp ـqـ rـ
Fa ـ2ـ �ـ ف sـ
Qaf ـ قt ـ�ـ uـ
Kaf ـ آـ كwـ xـ
Lam ـ�ـ لـ ل zـ
Mim ـ� ـ}ـ �ـ م
Nun ـ{ ـ|ـ ـ ن
Ha F ـ� ـ$ـ هـ
Ya ـ� ـ�ـ "ـ ي
Basic Arabic connective letters
Table 1-2: The basic Arabic alphabets
Table 1-3: Special Arabic letters
Letter Name
Isolated Initial Medial Final
Hamza Ala Korsi
ـ� ـ�ـ �ـ ئ
Special letter – does not appear in begin of words
Letter Name Isolated Final
Alef + Hamza above أ � ـ
Alef + Hamza below ـ� إ
Alef + Mada S ـ�
Lam Alef � ـ�
Lam Alef + Hamza a. M ـ�
Lam Alef + Hamza b. L ـ�
Lam-Alef+Mada N ـ�
Waw+Hamza ـ� ؤ
Additional special diconnective letters
Ta Marbota ـ� ة
Alef Maksora ـ� ى
Special letters written only at the end of the
word (no medial and initial shape)
Letter Name Isolated
Hamza ء Special letter-written only in an isolated shape, does
not appear in begin of words
- 20 -
Letter Buckwalter Transliteration
Letter Buckwalter Transliteration
Letter Buckwalter Transliteration
l ل * ذ ' ء
S | ر r م m
n ن z ز < أ
s F h س & ؤ
w و $ ش > إ
Y ى S ص { ئ
y ي D ض A ا
ب b ط T ًـ F
N ـٌ Z ظ p ة
K ـٍ E ع t ت
a ـَ g غ v ث
u ـُ _ _ j ج
i ـِ f ف H ح
~ ـّ q ق x خ
o ـْ k ك d د
Table 1-4: Backwalter transliterations for Arabic letters and diacritics [5]
1.4.3. Dots and Additional Strokes
Most Arabic letters contain dots in addition to the letter body, such as ش ($) which
consists of س (s) letter body and three dots above it. Some other letters also composed of
strokes which attached to the letter-shape body such as ـ� ,ط and ك. These dots and
strokes are called delayed strokes since they are usually drawn last in a handwritten
word-part/word. This is similar to handwriting of Latin scripts, the cross in letters t, the
slash in x letter and the dots in i and j letters, they are also usually drawn at last.
Dots in Arabic script are very important to distinguish among letters that have the
same letter body but different in the number of dots put under or above the letter.
Eliminating, adding or moving a dot could turn into totally different letter sounds, as
illustrated in Figure 1-3 and Figure 1-4.
- 21 -
(ِa) ام\! (b) ام�p
Figure 1-3: Word (a) is a result of moving the dot to the left from word (b)
Word (a): (EzAm) ‘lion’
Word (b): (grAm) ‘love’
(a) ب�! (b) ب�p
Figure 1-4, Word (a) is a result of eliminating the dot above the first letter from word (b)
Word (a): (Erb) ‘Arab’
Word (b): (grb) ‘west’
Figure 1-5: Delayed strokes in Arabic script
Figure 1-5 illustrates the delayed strokes possibilities – above or under the letter body
(bold line) in two main and common styles:
a. One dot under the letter body (e.g. ج)
b. One dot above the letter body (e.g. ن)
c. Two dots or horizontal line above the letter body (e.g. ق)
d. Two dots or horizontal line under the letter body (e.g. ي)
e. Three dots or a little “hat” above the letter body (e.g. ش)
f. One vertical line above the letter body (e.g. ط )
g. Hamza (ء) above the letter body (e.g. ك)
(a) (b) (c) (d) (e) (f) (g)
- 22 -
1.5. Recognition Difficulties and Ambiguity
Arabic words can be written in different styles. A common word writing style is the
top-down style, as shown in Figure 1-6-(a)/(c)/(d). In this style, some letters are written
above follows letters. Such style causes a problem in estimating the word baseline that is
used usually for normalization in the preprocessing phase (we will discuss it in Chapter
4).
(a) (b) (c) (d)
Figure 1-6: Illustration of words with the same meaning written in different styles
1.6. Dot Problem in Words
In some cases mainly in the top-down writing style, dots may cause ambiguity of
understanding handwritten words, it will be impossible to recognize that words without
knowing the sentence. As shown in Figure 1-7, if the second dot from the right belongs to
the first letter, the word will be zت (thl) ‘to solve’ or ‘to be solved’; if the second dot
belongs to the second letter, the word will be z^ (nxl) ‘palms’.
Figure 1-7: Illustration of an ambiguous Arabic word
1.7. Absence of Diacritics
Arabic diacritization is optional and almost never written. This leads to a reduction in
the average length of words, which increases word similarity and ambiguity. Absence of
these diacritics causes the same word to be pronounced differently and this may lead to
- 23 -
different meanings. For example the Arabic word �ر�V� without diacritics has two
meanings, the first means ‘teacher’, with diacritics: Vَ�ُ���َر (mudar~_sap) and the second
means ‘school’, with diacritics: ��ََرVْ�َ (madorasap).
In [6] has shown that language models (e.g. bi-gram) significantly reduce perplexities
in handwriting recognition problem. However, absence of diacritics may complicate the
selected language model(s). Since this absence will increase the number of meanings for
a single word. The following example is chosen to illustrate the difficulty in bi-grams
model (using statistics on the probability of one word following another):
Figure 1-8: Unclear Arabic sentence because of the last word
Let us assume that running a certain handwriting recognizer on the handwritten
sentence, in Figure 1-8, returns a maximum probability for the first two words to their
correct words in the lexicon; However, it returns an equal probability for the last word to
be � ,outstanding’. To eliminate this confusion‘ (ra'Ep) را� � (wAsEp) ‘large’ or وا�
addition model should be used.
Figure 1-9: The graph of the recognized possible sentences of the sentence in Figure 1-8
�Vر��Teacher or
School
� را�
Outstanding
� وا�
Large
FYه This is
- 24 -
The possible sentences could be obtained from the graph in Figure 1-9:
- This is an outstanding school – grammatically and semantically correct.
- This is an outstanding teacher – grammatically and semantically correct.
- This is a wide school – grammatically and semantically correct.
- This is a large teacher – grammatically correct (according to the lexical
classification), but non-sense on an Arabic semantic level.
Knowing that the second word is ‘teacher’, the language model leads to an immediate
elimination of the ‘large’ option. However, as mentioned, this can not be known without
the diacritics. This problem is out of the boundary of this work, since language models
are not supported.
- 25 -
Chapter 2
Previous Work This chapter explores several of the issues and methods involved in typical online
handwriting recognition which have been investigated in previous work. Typically,
handwriting recognition systems in past studies have been implemented using part or all
of the following phases: data acquisition, segmentation, preprocessing, feature
extraction, recognition/modeling, and postprocessing. Sections 2.1, 2.2, 2.3, and 2.4
describe data acquisition devices, segmentation, modeling methods, and postprocessing
respectively. Chapters 4 and 5 have been dedicated for an in-depth analysis of
preprocessing and feature extraction both in previous works and this work respectively.
For a more complete survey of techniques that have been used in handwriting recognition
see [7]- [9].
2.1. Data Acquisition
Online handwritten data are acquired using a special device. Typically, a digital tablet
(see Figure 2-1) is used to sample the pen position with a sampling rate of 80 to 200
samples per second. This generates a point (x- and y-coordinate) sequence of the
handwriting data. The pen pressure of every sampled point is additionally captured by
some devices and used by several systems. Recently, other devices are used to collect
these data, such as: an optical pen to write anywhere on some paper (see Figure 2-2),
touch screens – used in many PDAs and Tablet PCs (see Figure 2-4 and Figure 2-5), and
- 26 -
CrossPad (see Figure 2-3). The advantage of optical pens and CrossPads compared to the
other devices is that the writing surface is not part of the hardware, and the paper makes
the writing ability more natural.
Figure 2-1: Digital Tablet
Figure 2-2: Optical Pen Figure 2-3: CrossPad
- 27 -
Figure 2-4: PDA Figure 2-5: Tablet PC
2.2. Segmentation
The difficulty of handwriting recognition tasks forced different researchers in the past
to make some assumptions regarding the style and manner of writing. Figure 2-6
illustrates different writing styles in English. The writing style of the first three lines is
generally referred to as discrete handwriting, in which the writer is asked to write each
letter within a bounding box or to isolate each letter. The writing style of the fourth line is
commonly referred to as pure cursive or connected handwriting, in which the writers are
asked to connect all of the lower case letters within a word. Usually, people write in a
mixed style, a combination of printed and cursive/connected styles, similar to the writing
of the fifth line. Cursive style (fourth and fifth lines) is the most difficult style to be
recognized, since the recognizer has to segment each word to its compound letters, either
in a separated phase from the recognition or simultaneously with the recognition process.
- 28 -
Figure 2-6: Handwriting styles in English. The order is from easy to difficult in term of recognition [10]
Depending on the form in which data is acquired to a recognition system,
segmentation of the input may be required. Data which is given in the form of a single
word will require letter segmentation, if writing is modeled at the letter level (not the
word level). While data which is given in the form of several words – in a line, or even an
entire page – a segmentation method is required to separate words, if writing is modeled
at the word level (not a sentence modeling).
For well-separated letters, a simple spatial gap threshold can be used to segment
letters. However, more complicated method is required for connected or overlapping
letters. One method begins by over segmentation in which candidate segmentation points
are selected [11]. A graph of possible segmentation sequence is constructed by
connecting consecutive segments to form a sequence of possible whole letters. Each of
these combined segments is then classified into a letter class by the recognizer and the
“best” letter sequence is identified during the post processing phase [9]. A further
- 29 -
extension of this approach is the method of segmentation by using hidden Markov
models which is also adopted in our work, described in Chapter 6.
When writing sentences, a word separation method is necessary. Particularly, in
mixed writing style, since it is not clear if a subword belongs to the current word, belongs
to the next one, or even it is a new separated word. This is a reason which turns the mix
writing style into the most difficult style to be recognized. A spatial gap threshold can be
used to segment words. Some work, such as [6] and [12] used HMM to find the most
likely hypothesis sentence (described in detail in Section 2.6.2)
As mentioned in Chapter 1, there is no discrete style in Arabic and words may contain
several word parts. Thus, Arabic script belongs to the mixed writing style. Sentence
segmentation in our work has been done by a spatial gap threshold depending on the size
of the word.
2.3. Modeling Methods
Modeling methods are the core of the handwriting recognizer; three main modeling
methods have been used in previous work: template matching models, statistical models,
and neural network models are described in the following sections respectively.
2.3.1. Template Matching Models
Template-matching is widely used in the pattern-recognition field. A template is
every type of pattern that can be provided to the recognizer. Template-matching has been
employed for offline handwriting recognition which is usually conjunct with other
techniques. Some previous work has applied template matching for online handwriting
recognition described next.
- 30 -
At the training phase, template-matching systems define the template dictionary of
the basic handwriting segments and represent them by feature-vector notions used for
recognition. These templates are denoted as predefined templates, unknown pattern refers
to the input data shape which is needed to be recognized.
At the recognition phase, template-matching systems compute a distance measure
from the segment of data to templates from the predefined-template dictionary. Different
distance measures have been introduced in the past, such as using some minimum
distance approach of the input data to the predefined template, by creating a likelihood
value depending on the closeness of the unknown pattern to a predefined template. One
very popular matching algorithm which has been used widely in the literature is elastic
matching [13]. The basic notions behind elastic matching is that the computation of the
minimum distance measure between a set of points and a predefined template which does
not necessarily match to the same number of points or size [7]. In addition, the distance
measure can be found between an unknown pattern and the predefined templates by
applying a number of local transformations to the unknown or the predefined template.
The amount of distortion between the unknown and the predefined templates has been
used as a distance measure.
2.3.2. Statistical Models
Statistical models such as hidden Markov models (HMMs) were first applied to
speech recognition [14]- [17] with great success. Recently, researchers have taken the
advantage of the similarity of online handwriting recognition problem and the speech
recognition to apply similar techniques used in speech recognition to handwriting
recognition [6], [12], [18], [20]- [22]. HMM is widely used because of the time sequential
- 31 -
natural of online scripts as well as its capability of modeling shape variability in
probabilistic terms. Typically, each letter class has been represented by one left-to-right
HMM with or without state skipping and variant techniques to choose the states number.
These models are trained using either Viterbi or forward-backward training algorithms.
These letter models are embedded in a connected graph to represent the word/sub-word
dictionary. Viterbi algorithm is used to find the hypothesis word that matches the input
observation sequence the most in such graph. The results of handwriting recognition for
English cursive script have been better than those achieved with speech recognition [19].
2.3.3. Neural Networks Models
Artificial Neural Networks (ANN) have been used successfully in many particular
problems, such as learning to recognize spoken words, learning to recognize faces, OCR,
equation recognition, signature verification, and other pattern recognition problems.
Conventional ANN has been employed to recognize online handwriting for discrete
letters. However, it has not been employed to recognize cursive and unconsecrated words
– unless templates of each word are made. This is due to the problem of letter
segmentation that exists in cursive and unconstrained handwritten words (see Section
2.2). To resolve this problem, many researchers use Time-Delay Neural Networks
(TDNNs) in different manners, described next.
TDNNs were fist introduced for speech recognition and are well suited for sequential
signal processing. TDNN is known to be an appropriate architecture for time series
prediction, due to its internal memory to represent temporal relationships.
TDNNs have been explored for the online handwriting recognition problem. A set of
networks are used to look at adjacent data frames. The outputs of these networks are then
- 32 -
passed along to another network which has outputs corresponding to each character in the
alphabet that may be assigned to a point sequence. The output of each character node
represents the likelihood of the data being a part of that character [7].
2.4. Postprocessing
The goal of the postprocessing phase is to use additional contextual information about
the characters or words being recognized, which has not been taken into account in
previous phases, in order to correct possible errors. In addition, postprocessing is used in
some systems to determine the “best” characters, words or sentences among some
candidates returned by the recognizer. For instance, some systems strip off some data,
such as dots and delayed strokes, in the preprocessing phase and then use it in the
postprocessing phase to find the correct candidate. Postprocessing may also be used to
correct commonly-made mistakes that are well-understood in terms of detection and
correction. More advanced postprocessing techniques can be used to involve language
models such as a grammar, bi-gram or Markov chains to determine the best sentence that
matches predefined rules.
2.5. Related Work
• Signature Verification: Biometric authentication methods have been explored
recently, such as voice identification, fingerprint identification, face recognition,
and retina scan. Signature verification is an addition biometric-authentication
technology that is used to positively identify a person from their handwritten
signature. Similarly to handwriting recognition, there are two types of signature
verification systems, online and offline. With online signature versification, it is
not the shape or look of the signature that is meaningful; it is the changes in
- 33 -
speed, pressure, and timing that occur during the act of signing. Only the original
signer can recreate the changes in timing and X, Y, and Z (pressure).
• Equation Recognition: Handwriting-based equation editor is a system that
allows users to enter a handwritten mathematical formulas using one of the
acquisition devices described above. Typically, such system uses online character
recognition techniques and a graph grammar to generate an internal parse tree of
the input, which can then be converted into an output representation such as
LATEX.
2.6. Previous Handwriting Recognition System Summarization
The aim of this section is to summarize previously published works with an emphasis
on the implemented methodologies to give the reader a sense of how online handwriting
recognition systems are constructed. Furthermore, our research has implemented
similarly related concepts and ideas to those previously utilized in these articles.
2.6.1. Real-Time Online Unconstrained Handwriting Recognition Using Statistical Methods
Krishna S. Nathan and his colleagues in [18] discussed a general handwriting
recognition system based on Hidden Markov Models for large English vocabulary,
writer-independent and unconstrained handwriting in any combination of styles (discrete,
cursive or both). A key characteristic of the system was that it performed recognition in
real-time on PC 486 platforms without requiring large amounts of memory. Language
models were not employed in this work.
o Preprocessing: The acquired point sequence was normalized to a standard size and
re-sampled spatially.
- 34 -
o Feature Extraction: A feature vector consisting of ( ), ( ), cos( ) and sin( )x y θ θ∆ ∆
was constructed at equi-spaced points along the trajectory, where θ is the angle at the
sample points. Contextual information is incorporated by splicing several individual
feature vectors into one large feature vector, so that it spans a window of adjacent
points called frame.
o Real Time: To gain a real-time performance they divided the search process to two
phases, fast match and detailed match. Fast-match search is a simpler degenerate
single state model to generate a short list of candidate hypotheses strings. The
detailed-match search is computationally more expensive model that used to reorder
each word in the short list.
o Recognition Framework: A character was represented by a set of left-to-right
HMMs. Each HMM in this set represents a specific writing style for this character
(called lexeme). Either Viterbi or forward-backward training used to train the lexeme
HMMs. The diagram of the implemented recognition system in this wok is shown in
Figure 2-7.
o Delayed Strokes: The proposed solution to the delayed-strokes problem was to
strip them off before training the HMMs. In decoding, the search mechanism first
expands the non-delayed strokes based on the frame probabilities and the character
models. The delayed strokes were incorporated based on their position relative to the
non-delayed strokes and their fast match probabilities.
o Data Set: Approximately 100,000 characters of data were collected from a pool of
100 writers. The training set consisted of words chosen from a 20,000+ word
dictionary and discrete characters written in isolation.
- 35 -
o Recognition Results: The test set was composed of data collected from a set of 25
writers (different from those who collected the training data) and consisted uniquely
of words chosen at random from the same word dictionary. The test task was applied
on three dictionaries consist of 3K, 12K and 21K words with 9%, 12%, and 18.9%
error rate respectively.
Figure 2-7: The block diagram of the recognition system in [18]
2.6.2. Online Cursive Handwriting Recognition Using Speech Recognition Methods
J. Makhoul and his colleagues in [6] applied an HMM-based continuous speech
recognition system to an on-line writer-dependent cursive handwriting recognition task
for English script. The base system was not modified except for using handwriting
feature vectors instead of speech. A 1.1% word error rate was achieved for a 3050 word
dictionary, and 3%-5% word error rates were obtained for six different writers in a
25,595 word dictionary. The segmentation of words to characters occurs simultaneously
with recognition process. One of the special characteristic of this work is that the
segmentation of sentences into words occurs naturally by incorporating the use of a
dictionary and a language model into the recognition process.
- 36 -
o Preprocessing: Two preprocessing steps were used on the point sequence data.
The first is a simple noise filter which required that the pen traverse over one
hundredth of an inch before allowing a new sample. The second step padded each pen
stroke to a minimum size of ten samples.
o Feature Extractor: Six features were extracted, the writing angle at that sample,
the change in the writing angle, ( ), ( )x y∆ ∆ , pen up/down bit, and Sgn(x – max(x)).
o Recognition Framework: Each character has been represented by a 7-state left-
to-right HMM, as shown in Figure 2-8. Since the penning of a script letter often
differs depending on letters written before and after it, as shown in Figure 2-9,
additional HMMs are used to model these contextual effects.
Figure 2-8: 7-state HMM used to model each character, courtesy: [6]
Figure 2-9: Illustration of contextual effects on a cursively written “i”. The left “i” is written after “m”, the right “i” is written after “v”, courtesy: [12]
o Delayed Strokes: As described above, the idea was to use the base speech
recognition system with no modification. This obligated to simulate a continuous-
time feature vector by arbitrarily connecting the samples from pen-up to pen-down
with a straight line. This line was sampled ten times. The data became one long
criss-crossing stroke for the entire sentence, where words run together, "i" and "j"
- 37 -
dots and "t" and "x'' crosses cause back-tracing over previously drawn script, as
shown in Figure 2-10.
Figure 2-10: Connecting strokes (in dashed lines), courtesy: [6]
o Language Model: This work has shown that statistical grammars (e.g. bi-gram)
significantly reduce the perplexity. Recognition with no grammar but with context
produced an error rate of 4.2%. When the grammar was added and context not used,
the error rate dropped to 2.2%. However, the best result used both context and a
grammar for a word error rate of 1.1% in ATIS (Airline Travel Information Service)
corpus of 3050 words.
o Limitations: The rules given the subjects were: sentence should be written in pure
cursive, the body of a word should be connected (lifting the pen in the middle of a
word is not allowed); and crossing and dotting should be done after completing the
body of a word.
o Recognition Results: This system was trained and tested by the same six users.
The bi-gram model used was constructed from approximately two million sentences
collected from the Wall Street Journal. The error recognition average was reported at
4.2% with a dictionary size of 25,595 words.
- 38 -
2.6.3. Writer Independent Online Handwriting Recognition Using an HMM Approach
J. Hu and her colleagues in [19] developed an HMM-based writer-independent
handwriting recognition system for English script. Writer independence was achieved by
two processes: (1) a preprocessing step to move much of the variation in handwriting due
to the varying personal styles and writing influences and (2) by feature invariance to
reduce sensitivity to the remaining variations.
o Preprocessing: Preprocessing phase was divided into two types: noise reduction
and normalization of size and orientation. Noise reduction was achieved by spline
filtering for smoothing after the standard wild point reduction and dehooking
procedures. Normalization was achieved by first estimating the base line (joining the
bottom of small lower-case letters such “a”) and the core line (joining the top of small
lower-case letters) based on Hough transform. Then, the word was rotated such that
the base line is horizontal, and scaled such that the distance between the base and core
lines equals a predefined value. Finally, an equa-arc-length resampling procedure was
applied to remove variation in writing speed.
o Feature Extraction: Seven features were computed at each sample point. Two of
these features are conventional ones widely used in HMM-based online handwriting
recognizers: tangent slope angle and normalized vertical coordinate. Two invariant
features, normalized curvature and radio of tangent, were introduced. These features
are invariant under arbitrary similitude transformation which adopted from the
machine vision world, and finally, three high-level features were extracted which are
indeed commonly used in handwriting recognition: crossing, cusps, and loops, as
shown in Figure 2-11.
- 39 -
Loops Crossings Cusps
Figure 2-11: Illustration of the three high-level features, courtesy: [19]
o Recognition Framework: Left-to-Right HMM without state skipping was selected
to represent subcharacters and characters. These character models were embedded in
a grammar network, which can represent a variety of grammatical constraints, e.g.
word dictionaries, statistical N-gram models and context free grammars. Figure 2-12
is a grammar network implementing a dictionary, each arc in this network represents
a character and each path from the start node to the final node corresponds to a word
in a dictionary. To recognize a word, the Viterbi algorithm was used to search for the
most likely state sequence corresponding to the given observation sequence of this
word.
Figure 2-12: A grammar network implementing a word dictionary, courtesy: [19]
- 40 -
o Training: Models were trained using the iterative segmental training method based
on Viterbi decoding. Training flow was divided into four phases. The first phase is
initial character training to train specific isolated character with a given specific
class. The second phase is lattice character training to train a specific isolated
character with any class. The third phase is linear word training to train a specific
cursive word, given the class sequence corresponding to word characters. The last
phase is lattice word training to train a specific word with any character classes.
o Delayed Strokes: Delayed strokes were treated as special characters in the
alphabet. A word with delayed strokes was given alternative spelling to accommodate
different sequences with delayed strokes drawn in different orders. However, this
solution could dramatically increase the hypothesis space, and is impossible to use for
a large vocabulary task. A two-stage approach was used to solve this problem. The
first stage is by applying the N-best decoding algorithm using simplified delayed
stroke modeling to narrow down the choice of words. Then, the second is the detail
matching stage by constructing a grammar network to cover the top N candidates
using exact delayed stroke modeling. Afterwards, a best-path search was applied to
the reduced network to find the final optimal candidate.
o Recognition Results: Word-recognition results on small, medium and large size
vocabularies have been reported. Recognition rate of 91.8% - 98.4% with a dictionary
consists of 500 words, recognition rate of 83.2% - 94.5% with a dictionary consists of
5000 words, and recognition rate of 76.3 - 91.0 with a dictionary consists of 20,000
words.
- 41 -
2.7. Online Handwriting Recognition Work for Arabic Script
Several previous work has been done on offline handwriting recognition for Arabic
script, such as: [23]- [26]. However, there has been minor work which is known to the
author that has tackled/discussed the online Arabic handwriting recognition difficulties.
S. Al-Emami and M. Urber [27] proposed an online Arabic handwriting recognition
system based on decision-tree techniques to recognize Arabic words. It involves a
preprocessing stage, in which features of the letters are selected. The slope of each stroke
has been categorized to one of four directions, followed by a learning stage in which the
preprocessed data are entered into a decision tree. The same process is applied for
recognizing the unknown letter. The system has been tested only with 13 Arabic letter
shapes that build the four words: (��, ���, ق ,دار�tز) which are used as a term of postcode
in some countries.
- 42 -
Chapter 3
Background – Hidden Markov Model Hidden Markov Model (HMM) is the statistical model extensively used in this work;
therefore, this chapter has been dedicated to give a basic background of the concepts,
problems, and solutions in terms of this model. HMMs have been the mainstay of the
statistical modeling used for modern speech recognition systems and they are regarded as
the most successful technique in this domain [30]. HMM has also been applied to many
types of problems in molecular sequence analysis.
HMMs are divided into two types according to their observation densities: discrete-
density and continuous-density HMMs. Discrete-density HMMs refer to when the
observations are described as discrete symbols chosen from a finite alphabet. Continuous-
density HMMs refer to when the observations are described as continuous signals or
vectors. For simplicity, this chapter is restricted to describe only HMMs with discrete-
observation densities. For a complete survey of HMM see [14] and [30].
Hidden Markov models can be presented as extensions of the discrete-state Markov
process which is described in the next section. The extension of the concepts of the
discrete-state Markov process to HMMs is described in Section 3.2. Section 3.3 is
dedicated to discuss the three fundamental problems in term of HMM and their solutions.
- 43 -
3.1. Discrete Markov Process
A Markov Process is a stochastic process where the next state depends only on the
current state (or more generally, on a finite number of immediate past states), thus, it
satisfies the Markov condition (or the state-independence assumption). A discrete
Markov process, which is also known as the Markov chain, is a system which is
described at any time as being in one of a set of N distinct states, S1, S2, … , SN, as shown
in Figure 3-1 (where N = 3). We denote the actual state at time t as qt, (where t = 1, 2,
3,…) denotes the time instants associated with state changes. The probability of the
process being in state Si at time t is denoted by P(qt=Si). Specifically, in the first order of
the Markov chain, the probability of being in Sj, Si, Sk, … at time t, t-1, t-2, … respectively
is truncated to be the probability of being only in Sj, Si at time t, t-1 respectively, i.e.,
P(qt=Sj | qt-1=Si, qt-2=Sk, …) = P(qt=Sj | qt-1=Si)
Figure 3-1: Illustration of Markov chain with NNNN = 3 states
Formally, the Markov process is defined as the duple λ = {A, ∏}, where,
∏ = πi = P(q1 = Si), where, 1 ≤ i ≤ N, called the initial state probabilities.
S3
S2
S1
a1,2
a2,1
a2,3
a3,2 a1,3
a3,1
a3,3
a1,1 a2,2
- 44 -
A = aij = P(qt+1 = Sj| qt = Si), where, 1 ≤ i, j ≤ N and t>0, are called the transition
probabilities.
,
1
1N
i j
j
a i=
= ∀∑
Example 3-1: Suppose today’s weather tells us something about tomorrow’s weather (but
yesterday’s weather does not tell us anything about tomorrow’s weather). Suppose that
the weather can either be Raining, Cloudy or Sunny.
S1 = Raining, S2 = Cloudy, S3 = Sunny
Figure 3-2 is the weather Markov chain given the following transition probability matrix
and the initial probability vector:
,
0.5 0.4 0.1
{ } 0.6 0.2 0.2
0.1 0.2 0.7
i jA a
= =
[ ]{ } 0.3 0.2 0.5i
πΠ = =
Figure 3-2: The weather Markov chain
If it's sunny today, what is the probability that it will rain the day after tomorrow, given
the Markov chain in Figure 3-2?
S
C
R
0.6
0.4
0.2
0.2 0.1
0.1
0.7
0.5 0.2
- 45 -
Answer:
Let’s denote q0 to be the state of today, q1 to be the state of tomorrow and q2 to be the state
of the day after tomorrow, so we are interested in: P(q2 = R | q0 = S)
P(q2 = R | q0 = S) =
P(q1 = R| q0 = S) × P(q2 = R| q1 = R)))) +
P(q1 = C| q0 = SSSS) × P(q2 = R| q1 = C) ) ) ) +
P(q1 = S| q0 = S) × P(q2 = R| q1 = S) ) ) ) = 0.1×0.5 + 0.2×0.6 + 0.7×0.1 = 0.24
3.2. Extension to Hidden Markov Models4
Thus far, each state in Markov chains is corresponded to one observable event. This
model is very limited in terms of its application with many problems of interest. In this
section, the concept of Markov chains has been extended to include the case where the
observation is a probabilistic function of the state. This extension is applied by
associating a deterministic observation to each state of a Markov process i.e., the symbol
ot is always observed when the process is in state qt. In this extended model, the
observation does not uniquely determine the state sequence. In general, one observation
sequence may be produced by many different state sequences. Hence, the state sequence
is hidden.
An HMM is formally defined as the triple λ = {A, B, ∏} where,
1) N is the number of states of the model.
2) M is the number of distinct observation symbols per state, i.e., the discrete
alphabet size. These symbols are denoted as V = {v1, v2, …, vM}.
4 The material presented in this section is based on [14]
- 46 -
3) A is the state-transition probability distribution as defined in the Markov process,
(see previous section).
4) The observation-symbol probability in state j, B={bj (k)}, where,
bj (k) = P(vk at t | qt = Sj), 1 ≤ j ≤ N,
1 ≤ k ≤ M
The probability of observing the symbol vk when the process is in state k. B is also
known as the observation-probability matrix.
5) ∏ is the initial state distribution, as defined in the Markov process (see previous
section).
Figure 3-3: Illustration of a hidden Markov model with three states (NNNN = 3) and two observable
symbols (M M M M = 2)
3.3. The Three Fundamental Problems for HMMs
Given the HMM of the previous section, three basic problems of interest have to be
solved to make this model very useful in a real-world application. These problems are
described as follows:
S3
S2
S1
a1,2
a2,1
a2,3
a3,2 a1,3
a3,1
a3,3
a1,1 a2,2
b1,1 b1,2 b2,1 b2,2
b3,1 b3,2
- 47 -
• The Evaluation Problem: Given the observation sequence O = [o1, o2,…,oT],
and a model λ = {A, B, ∏}, how do we efficiently compute P(O|λ), the probability
of the observation sequence, given the model?
• The Decoding Problem: Given the observation sequence O = [o1, o2,…,oT], and
a model λ = {A, B, ∏}, what is the corresponding optimal state sequence Q = [q1,
q2, ... , qT] which maximizes P(O, Q|λ) ? (i.e., best state sequence “explains” the
observation sequence)
• The Training Problem: Given the observation sequence O = [o1, o2,…,oT], what
is the optimal model that maximizes P(O|λ) ?
3.3.1. The Evaluation Problem
Given the observation sequence O = [o1, o2,…,oT], and the model λ = {A, B, ∏}, we
need to compute P(O|λ).
The most direct way of doing this is through enumerating every possible state
sequence of length T. However, this solution requires an exponential number of
computations: 2T·NT, where N is the number of states. Therefore, a more efficient
algorithm has been proposed to solve this problem which is known as: Forward-
Backward Procedure. Consider the forward variable:
1 2( ) ( ... , | )t t t i
i P o o o q Sα λ= =
( )t
iα is the probability of the partial observation sequence O = [o1, o2,…,ot] and state Si at
time t, given the model λ. ( )t
iα is computed inductively as follows:
1) Initialization:
1 1( ) ( ), 1i i
i b o i Nα π= ≤ ≤
- 48 -
This step initializes the forward probabilities as the joint
probability of state Si, and initial observation o1.
2) Induction:
1 1
1
( ) ( ) ( ), 1 1
1
N
t t ij j t
i
j j a b o t T
j N
α α+ +=
= ≤ ≤ −
≤ ≤
∑
3) Termination:
1
( | ) ( )N
T
i
P O iλ α=
=∑
Using the above algorithm we efficiently get the answer of the evaluation problem.
The number of computations involved is on the order of T·N2 instead of 2T·NT. For in-
depth analysis of the forward procedure, see [14] and [30].
3.3.2. The Decoding Problem – Viterbi Algorithm
Given the observation sequence O= [o1, o2,…,oT], and the model λ = {A, B, ∏}, we need
to find the optimal state sequence (path) Q = [q1, q2, ... , qT] which maximizes P(O, Q|λ).
The well-known Viterbi algorithm, based on dynamic programming, exists to find the
whole state sequence with maximum likelihood. In order to facilitate the computation we
define an auxiliary variable,
1 2 11 2 1 2
, ,...,( ) max ( ... , ... | )
t
t t tq q q
i p q q q i o o oδ λ−
= =
( )t
iδ denotes the highest probability of the optimal partial state sequence, Q = [q1, q2, ... ,
qt] leading to state Si at time t while observing the observation sequence O = [o1, o2,…,ot].
( )t
iδ is computed inductively much like the forward variable ( )t
iα . Since the optimal
state sequence Q = [q1, q2, ... , qt] is required, we need to keep track of the argument which
- 49 -
maximizes ( )t
iδ , therefore the matrix ( )t
jψ is used to save this tracked data. The Viterbi
algorithm is described as follows:
1) Induction:
1 1
1
( ) ( ), 1
( ) 0
i ji b o i N
i
δ π
ψ
= ≤ ≤
=
2) Recursion:
11
( ) max[ ( ) ] ( ), 2
1
t t ij j ti N
j i a b o t T
j N
δ δ −≤ ≤
= ≤ ≤
≤ ≤
11
( ) arg max[ ( ) ], 2
1
t t iji N
j i a t T
j N
ψ δ −≤ ≤
= ≤ ≤
≤ ≤
3) Termination:
*
1max[ ( )]
Ti N
P iδ≤ ≤
=
*
1
arg max[ ( )]T T
i N
q iδ≤ ≤
=
4) Path (state sequence) backtracking:
* *
1 1( ), 1, 2,...,1t t t
q q t T Tψ + += = − −
Given a model λ = {A, B, ∏}, and an observation sequence O = [o1, o2,…,oT], the
Viterbi algorithm provides the following:
• The optimal path * * *
1, 2, ,[ ..., ]T
q q q corresponding to the observation sequence O.
• The accumulated likelihood score ( *P ) along the optimal path.
• The state segmentation along the optimal path corresponding to the observation
sequence O.
The number of computations involved in Viterbi algorithm is O(N2T) .
- 50 -
3.3.3. The Training Algorithm – Baum-Welch Algorithm
Given the observation sequence O = [o1, o2,…,oT], we need to determine a method to
adjust the model parameters {A, B, ∏} to maximize the probability of the observation
sequence given the model λ, i.e., P(O|λ). The training problem is considered the most
difficult problem among the three fundamental problems. In fact, there is no known
analytical solution for this optimization problem. However, an iterative procedure known
as the Baum-Welch algorithm or forward-backward algorithm that locally maximizes
P(O|λ). The Baum-Welch algorithm is a special case of the EM (Expectation-
Maximization) algorithm [28] and [29]. The Baum-Welch algorithm is described next.
In order to describe the Beum-Welch algorithm, auxiliary variables are defined.
Similar to the forward procedure, a backword procedure is defined first as follows:
1 2( ) ( ... | , )t t t T t i
i P o o o q Sβ λ+ += =
( )t
iβ is the probability of the partial observation sequence O = [ot+1, o2,…,oT], given the
state Si at time t and the model λ, it is computed inductively as follows:
1) Initialization:
( ) 1T
iβ =
2) Induction:
1 1
1
( ) ( ) ( ), 1 1,1N
t ij j t t
j
i a b o j t T i Nβ β+ +=
= ≤ ≤ − ≤ ≤∑
The above procedure computes the backward variable ( )t
iβ in the order of T·N2.
Second, we define the auxiliary variable ( , )t
i jξ as the probability of being in state Si
at time t and in state Sj at time t+1, given the model and the observation sequence, i.e.,
1( , ) ( , | , )t t i t j
i j P q S q S Oξ λ+= = =
- 51 -
From the definition of the forward and backward variables we get:
1 1
1 1
1 11 1
( ) ( ) ( )( , )
( | )
( ) ( ) ( )
( ) ( ) ( )
t ij j t t
t
t ij j t t
N N
t ij j t t
i j
i a b o ji j
P O
i a b o j
i a b o j
α βξ
λ
α β
α β
+ +
+ +
+ += =
=
=
∑∑
Given the model = {A, B, }λ Π and the observation sequence O, the model can be
reestimated iteratively to maximize P(O|λ). The new generated model is denoted
as = {A, B, }λ Π , where,
11
expected frequency (number of times) in state at time( 1)
( , )
i i
N
j
S t
i j
π
ξ=
= =
=∑
1
1
1
1 1
expected number of transitionsfromstate
expected number of transitions fromstate
( , )
( , )
i jij
i
T
t
t
T N
t
t j
S to Sa
S
i j
i j
ξ
ξ
−
=−
= =
=
=∑
∑∑
1 1
1 1
expected number of times in state and observing symbol( )
expected number of times in state
( , )
( , )
t k
ki k
T N
t
t o v j
T N
t
t j
i vb v
i
i j
i j
ξ
ξ
= ∩ = =
= =
=
=
∑ ∑
∑∑
- 52 -
Baum and his colleagues have proved that the above procedure, either 1) the initial
model defines a critical point of the likelihood function, in such case, = λ λ ; or 2) model
λ is more likely than model λ in terms of ( | ) ( | )P O P Oλ λ> . Based on this procedure,
if λ replaces λ and then repeating the reestimation calculation iteratively, this will
improve the probability of O being observed from the model until a limited point is
reached.
- 53 -
Chapter 4
Preprocessing The goals of the preprocessing phase are: (1) to reduce or remove imperfections
caused by acquisition devices, and (2) to minimize handwriting variations irrelevant for
pattern classification which may exist in the acquired data. The preprocessing phase has a
great influence on subsequent processing, and a real impact on the recognition rate
(see [31] for a good survey of preprocessing techniques). Typically, preprocessing of
online handwriting recognition in previous research has been classified into three
types [31]: noise reduction, data simplification, and normalization. The preprocessing
phase consists of three steps corresponding to each of these types respectively. The three
steps are described in detail in the following sections.
4.1. Step 1: Noise Reduction
Noise reduction attempts to reduce imperfections caused mainly by hardware (digital
device) limitations. Such reduction is usually performed by various geometric operations
such as smoothing, wild point reduction, and hook removal, which are described next.
• Smoothing is used to remove/reduce the effect of hardware limitations or
erratic hand motion. Strokes captured by acquisition devices tend to contain
noise. One could experience such noise while writing on a digital tablet while
traveling by a train, airplane, or car. Several approaches have been introduced
- 54 -
to reduce such noise by applying filters such as spline filters [19] and low-pass
filters [32].
• Wild Points Reduction is performed to reduce wild points which are the
occasional spurious points detected by digital devices, and are mainly due to
hardware problems. Major improvements to digital devices have reduced these
kinds of imperfections, but software processing is still required to completely
eliminate this problem.
• Hook Removal is used to remove the strokes captured by a digital device,
which tend to include small hooks created by a quick motion when the pen is
lowered or raised. Hooks are partially removed by hardware devices. Careful
software-based removal of leftover hooks is necessary in the preprocessing
phase.
In this preprocessing step, the low-pass filter method with a rectangle windowed
impulse response is used for noise reduction. The width of this window rectangle is
chosen empirically, as illustrated in Figure 4-1. The low-pass filter is used mainly for
smoothing the input point sequence while it is assumed that the wild points are reduced
by the input hardware. No direct or specific task is required to remove hooks since the
low-pass filter is adequate at hook removal.
(a) (b)
Figure 4-1: Illustration of applying a low-pass filter on a word which was written in a moving train
Word (b) is the result of applying a low-pass filter on word (a)
- 55 -
4.2. Step 2: Data Simplification
Data simplification is the process of cutting down the number of data points acquired
by a digital device, through the elimination of redundant points irrelevant for pattern
classification. This processing directly affects the recognizer performance. In this work,
Douglas&Peucker's algorithm [33] has been adopted to simplify a given polyline (point
sequence). Douglas&Peucker's algorithm finds the skeleton points of the point sequence
produced by step 1 (in Section 4.1). Skeleton points are a subset of the original points,
and represent the global geometric shape. Usually eliminating one of them may transform
geometric shape greatly. Skeleton vectors are defined as the vectors which connect each
two consequent skeleton points, as illustrated in Figure 4-2. We have applied
Douglas&Peucker's algorithm with a tolerance τ1, determined empirically.
Figure 4-2: Polyline simplification
4.3. Step 3: Normalization
The normalization task is used to reduce the effects of handwriting variations and to
simplify the recognition process. Essentially, this processing is used to reduce,
Original Polyline
Skeleton Point
Simplified Plolyline
Skeleton Vector
- 56 -
handwriting speed, word size, word orientation, and word skew. These features are called
variant features since they often vary between writers and even with the same writer. In
a writer-dependent system, all or part of these useful features are fed to the trainer and
recognizer engines – (for more details see section 1.2). However, when implementing a
writer-independent system, normalizing these features is a necessity to reduce the
recognition-process complexity and to minimize the shape-variation domain. It has been
shown that basic recognition and segmentation algorithms perform best for a consistent
size of writing [34]. In the following two sections, two types of normalization are
discussed: size/orientation normalization and speed normalization.
4.3.1. Size and Orientation Normalization
In Latin script, four boundary lines are defined for every handwritten word: base line,
core line, the ascender line, and the descender line. The base line is joining the bottom of
small lower-case letters such as “c”. The core line is joining the top of small lower-case
letters. The ascender line is joining the top of the letters with ascenders such as “l”. The
descender line is joining the bottom of letters with descenders such as “g” [19], [31],
and [34]. These boundary lines are illustrated in Figure 4-3. Previous work assumed that
base lines and core lines structure all words, while the ascender and decender lines do
not [19], and [34]. For each word, these boundary lines are estimated. This estimation has
been identified as a difficult problem in handwriting recognition, particularly when
tackling unconstrained script [19]. Different techniques have been proposed to resolve
this problem. The main approaches include the histogram-based methods [31], [35]- [37],
linear-regression methods [38], word-model methods [39], and Hough transform
methods [19]. Upon estimating these boundary lines, the word is rotated so that the base
- 57 -
line becomes horizontal, and scaled so that the core height (the distance between the base
and core lines) equals a predefined value.
Figure 4-3: Word boundary lines
In Arabic script, boundary lines do not necessarily exist in handwritten words,
particularly in the very common top-down style5. Since the same letter is written above
base lines, and it is at times written above the core lines, depending on the consequent
letter, as shown in Figure 4-4. In other words, there is no predefined area for where each
letter is located in a word. Since there are no well defined boundary lines in Arabic, this
indeed leads to complications in the normalization process.
Size and orientation normalization problems are outside the scope of this work. These
issues will be explored as part of our future work.
Figure 4-4: Illustration of a word written in the top-down writing style. The left word is the printed style of the right. �� (bHajm) ‘in size’. In the right word, the letter ـ� is above the letter ـ which is ,ـabove the letter ــ and above the letter ـ�.
5 The authors of [34] assumed that boundary lines always exist, even in Arabic, Farsi and similar scripts, which is not accurate.
- 58 -
4.3.2. Speed Normalization
Digital devices sample in equal periods of time rather than equal distances, therefore,
strokes which are written slower than other include more samples than others. Therefore,
the data point sequence needed to be re-sampled. Upon finishing Step 2, (finding the
skeleton points) each skeleton vector with length l is divided to [l/f]+2 points, where f is
a predefined fragment length, and the additional 2 for the original skeleton points.
Figure 4-5: Illustration of preprocessing steps applied on the handwritten word (jAmEp) ج���‘university’
(a) (b)
(c) (d)
- 59 -
Figure 4-6: Illustration of preprocessing steps applied on the handwritten word ���� (vqAfp) ‘education’
All preprocessing steps are illustrated in Figure 4-5 and Figure 4-6. In both figures,
Figure (a) is the original point sequences acquired by a digital tablet. Figure (b) is created
by applying the smoothing step described in Section 4.1.1. Figure (c) is the result of
applying Douglas&Peucker's algorithm in point sequences in (b) (skeleton point
sequence). Finally, Figure (d) is the result of re-sampling the skeleton vectors in (c). All
of these figures are produced by our system.
(c) (d)
(a) (b)
- 60 -
Chapter 5
Feature Extraction Upon completion of the preprocessing phase, three features are extracted from each
preprocessed point sequence Ps = [p1, p2, …, pn]: local, semi-local, and global features.
These features are described in the following three sections respectively. In Section 5.4, a
feature vector is constructed for each data point from Ps. Section 5.5 describes how an
observation sequence is constructed from the feature vectors. Finally, Section 5.6
describes how dots and delayed strokes are incorporated in the observation sequence.
5.1. Local Feature
One local feature is adopted in our work which is extracted from the point sequence,
it is the angle between each vector (v = pi-1pi), where i > 1, and the X-axis – it is denoted by
α-angle. Together with the assumption that the preprocessed points are equi-distanced
along the trajectory makes the α-angle feature influential, since the preprocessed point
sequence can be reproduced unequivocally. In other words, there is no data loss.
Figure 5-1 illustrates the α-angle between v and the x-axis.
- 61 -
Figure 5-1: Illustration of αi-angle (the angle between the vector (p(p(p(piiii----1111ppppiiii)))) and the X-axis)
1 2
1
The anglebetween and -axis, where 1
The anglebetween and -axis, where 1i i
p p X i
ip p X i N
α−
=
< ≤
=
5.2. Semi-Local Features
The local feature described in the previous section represents the information of each
segment individually. It does not provide information concern its environment, such as its
neighboring segments. Several previous work, such as [6] and [12] used the difference
between writing angles of a data point and its preceding data point (the first angle is set to
0). One could use the angle between segment i and segment i+2 in the preprocessed point
sequence. These features provide environmental information; however, they do not
provide additional information in the case of existing k > 2 points that are located almost
in same line, which is the usual case in handwriting. This leads to introduce a new type of
feature: semi-local feature that provides wider geometric information enough to
determine its segment group. This is the only feature, in this work, that provides the
description of connectivity information between a set of adjacent points. According to
our experience adding this environmental information to the feature vector gives better
results than its absence.
αi pi-1
pi x
- 62 -
The semi-local feature is computed by applying Douglas&Peucker's algorithm on the
preprocessed point sequences to find the skeleton points with tolerance τ2 > τ1 ,where τ1
is the tolerance utilized in the preprocessing phase (in Section 4.2). τ2 has been
determined empirically (as shown in Figure 5-2 and Figure 5-3). The semi-local feature is
the angle of each skeleton vector and the X-axis, it is denoted as β-angle, as illustrated in
Figure 5-4.
Figure 5-2: Illustration of running Douglas&Peucker's algorithm. Figure (a) is a handwritten word ���� (mElm) ‘teacher’, red points in Figure (b) are the skeleton points returned by Douglas&Peucker's algorithm applied on the point sequence in Figure (a).
Figure 5-3: Illustration of the skeleton points and vectors, the red points are the skeleton points, the red segments show the skeleton vectors.
(a) (b)
- 63 -
Figure 5-4: Illustration of the β-angle (the angle between a skeleton vector and X-axis)
Let’s assume that the skeleton points of Ps is Qs = [q1, q2,…, qM], Qs ⊆ Ps. Without
losing generality, we can assume that the points are different, otherwise we index them.
βi is formally defined as follows:
1 1
1
1
The angle betwe
The angle between and the X-axis, if and ,where 1 ,
, , is maximum and is minimum
( is a point between the two skeleton points , )
j j k j l j
j j
i j j
iq q p q p q k i l M
q q Qs k l
p q q
β+ +
+
+
∃ = = ≤ < ≤ ≤∈
=
1en and the X-axis, if is a skeleton point, ,j j i i jq q p p q Qs+ = ∈
5.3. Global Features
Global features or high-level features provide global information of the geometric
shape of words, word-parts, or letters. Three common global features used in previous
work in handwriting recognition are loops, cusps and crossing [19], [22] and [40] (as
shown in Figure 5-5). In this work, we have adopted the loop feature ones. Arabic letter
shapes are classified into three categories regarding the letter-shape loop containing:
• Category I contains Arabic-letter shapes that include loops as an integral part (see
Table 5-1).
βk
x
- 64 -
• Category II includes letter shapes that can be written with and without loops, depends
on writing style (see Table 5-2).
• Category III includes the remaining letter shapes that don’t contain loops.
This category division leads one to think of an immediate global feature that
immediately reduces the search space. In this work the recognizer will return a low
probability when trying to match a handwritten shape that does not include a loop to a
letter shape that includes a loop (described in Chapter 6).
Loops Crossings Cusps
Figure 5-5: Illustration of the three high-level features, courtesy: [19]
kـ ط ـj ـiـ hـ ض ـg ـfـ eـ ص
ـu ـ�ـ ـs ـ2ـ ـo ـnـ mـ ظ ـl ـ�ـ
F ة ـ� ـ` ـ� ـ}ـ ـr ـqـ ـ, ـ ـ
هـ
Table 5-1: Category I – 31 letter shapes that can not be written without loops
ـ� ؤ و ـ$ـ م �ـ ق tـ ف �ـ
خ [ـ ح �ـ ج جـ ـ�
Table 5-2: Category II – 17 letter shapes that it is not clear if they are written with loops or not
- 65 -
Figure 5-6: Illustration of loops (in red) in an Arabic word
Figure 5-6 is a preprocessed point sequence of the handwritten word: �� �. It consists
of four letters, three of which (first, second and fourth) include loops. The loops are
drawn in red dots, the first letter shape belongs to category-II, the second, and fourth
letter shape belong to category-I.
5.3.1. Loop Determination
Due to variant writing speed and style, some little loops may be added unintentionally
to letter shapes that should not include loops. This usually happens when trying to rewrite
part of a word or when pen direction is changed. Often, these accidental loops are
elliptical in shape. Therefore, they are removed by checking if the ratio of the loop’s area
to its diameter is less than a predefined threshold t. Figure 5-7 illustrates a very common
example when the loop is added. In Figure (a), a preprocessed point sequence represents
the letter shape zـ which belongs to category-III. The red line in Figure (b) is the diameter
of the loop. The red area in Figure (c) is the area of the accidental loop.
- 66 -
0. if (area/diameter < t) {
1. Loop considered as not a real loop.
2. } else {
3. Loop considered as a real loop.
4. }
Figure 5-7: Illustration of an accidental loop in a letter that should not include a loop
5.4. Feature Vector Construction
After computing the three features described above a three dimensional feature vector
fi is constructed for each data point pi ∈ Ps, where 1<i ≤N, as shown in Figure 5-8.
fi = (αi , βi, is-loopi), where:
• αi = α-angle of point pi
• βi = β-angle of point pi
• is-loopi = 1 if pi is in a loop, otherwise it’s 0.
(a) (b) (c)
Diameter Area
- 67 -
Figure 5-8: Illustration of how ffffiiii is extracted from each point pi
5.5. Feature Vector to Observation Code
A Discrete Hidden Markov Model (HMM) is used for the training/recognition task,
which is discussed in Chapter 6. The input to this type of model is a sequence of discrete
values – observation sequence (see Chapter 3 for more details). Thus, a discretization
process is required to convert the feature vector sequence, extracted from a given shape,
to a discrete observation sequence. The three dimensional features vector domain is
discretized to 260 integer values, 256 for the (α, β, is-loop) described above and other 4
integer values are used to represent delayed strokes (described in the next section).
Feature vector f = (α, β, is-loop) is discretized as follows:
• α is a real angle value between (1…360); this angle is discretized to 16 directions
similar to [21], as shown in Figure 5-9.(a). For example:
o α = 82.2o => α-code = 4
o α = 320o => α-code = 14
• β is a real angle value between (1…360): This angle is discretized to 8 directions,
as shown in Figure 5-9.(b). For example:
o β = 82.2o => β-code = 2
fi = (αi , βi, is-loopi)
pi
- 68 -
o β = 320o => β -code = 7
• is-loop = 1/0 (one bit)
Figure 5-9: Figure (a) is a division of angle’s space to 16 directions. Figure (b) is a division of angle’s space to 8 directions.
α-code is coded by 4 bits, β-code is coded by 3 bits and is-loop in one bit, thus, the
coded feature will be as the following form, each cell represents a bit.
α-code β-code is-loop
Example 5-1:
If we use the same shape in Figure 5-1, α7 = 247o, β7 = 240.5o and p7 is not in a loop.
=> α7-code = 11, β7-code = 5 and loop = 0.
1 0 1 1 1 0 1 0 α-code = 11 β-code = 5 is-loop
The discretized feature vector of f7 is o7 = 135
(a) (b)
- 69 -
Figure 5-10: Illustration of how observation i i i i is computed
5.6. Delayed Strokes
We have seen in Section 1.4.3 the reason why dots/delayed strokes are very important
in handwriting recognition for Arabic script. Three methods were proposed in previous
work to solve this problem for Latin script. First, dots/delayed-strokes are stripped off in
the preprocessing phase (before training/recognition process) [18]. In the second method,
the end of a word is connected to the dots/delayed strokes with a special connecting
stroke. This stroke indicates that the pen is raised [6], [12] (as illustrated in Figure 2-10).
In the last method, delayed strokes are treated as special characters in the alphabet. A
word with delayed strokes is given alternative spelling to accommodate different
sequences with delayed strokes drawn in different orders [19]. These three methods are
not adequate for the task of recognizing Arabic script. The first method could not be
employed, since the information that makes letters different from others is the number
and position where the dots are located. Eliminating delayed strokes will cause a huge
ambiguity, particularly, when the letter’s body is not written clearly. Furthermore, some
Arabic letters have a similar shape of composition with some letters, such as: the letter (s)
��ـ has a similar shape to the three letters (b + t + y) �ـ )without dots(. The second and
third methods also could not be employed, since Arabic words may contain many dots.
o7 = [α7-code, β7-code, 0]
- 70 -
These methods will dramatically increase the hypothesis space, since words should be
represented in all its permutations. For example: The word ������ (Hqyqyp) ‘real’ contains
10 dots, thus, 10! representations are required.
To solve the delayed strokes problem in Arabic, a novel method has been introduced
called delayed-stoke projection. Delayed strokes are written after the word-part body.
Thus, the first written point sequence is the word-part body. Then every stroke that
satisfies one of the following four conditions considered delayed stroke:
a. If the whole stroke is above or under the word-part body (e.g. Figure 5-11.a).
b. If the stroke is written after the minX (left) of the word-part body and it is a dot
(e.g. Figure 5-11.b).
c. If the stroke is written before the maxX (right) of the word-part body (e.g.
Figure 5-11.c).
d. If the stroke intersects with the word word-part body (e.g. Figure 5-11.d).
Figure 5-11: Figure (a): five delayed strokes for word-part body 1. Figure (b): two delayed strokes for word-part body 3. Figure (c): three delayed strokes for word-part body 1. Figure (d): one delayed stroke for word-part body 1.
5.6.1. Delayed-Stroke Projection
After determining the delayed strokes, the following procedure is applied:
� The first point of the delayed stroke (denoted q), is vertically projected to the
body, the projected point on the body denoted as p.
(a) (b) (c) (d)
- 71 -
� A virtual vector is connected from p to q (as shown in Figure 5-12.b).
� An additional virtual vector is connected from the last point of the delayed stroke
to p (as shown in Figure 5-12.b).
� These two virtual vectors are discretized to a predefined number of virtual points
(as shown in Figure 5-12.c).
Figure 5-12: Illustration of the delayed-stroke projection, black points are the letter body, blue points are the delayed stroke, red points are virtual points.
Four special observation codes to represent the virtual points are added to the 256
observation codes (described in Section 5.5.):
� o = 256 represents a virtual point that belongs to a virtual vector in the direction
up from the body to the delayed stroke, as shown in Figure 5-12.(b) – a bottom-up
vector.
� o = 257 represents a virtual point that belongs to a virtual vector in the direction
down from the delayed stroke to the body, as shown in Figure 5-12.(b) – a top-
down vector.
� o = 258 represents a virtual point that belongs to a virtual vector in the direction
down from the body to the delayed stroke.
(a) (b) (c)
- 72 -
� o = 259 represents a virtual point that belongs to a virtual vector in the direction
up from the delayed stroke to the body.
Figure 5-13: Illustration of point sequence with delayed stokes to observation sequence
Figure 5-13 illustrates the flow of incorporating the delayed strokes into the
observation sequence, Figure (a) is the preprocessed point sequence represents the letter
Figure (b) illustrates how the virtual vectors are projected from the body to the dot and .ن
from the dot to the body. Figure (c) illustrates how the virtual vectors are discretized to
virtual points (in red) – note that in this example, each red point represents two points,
one for bottom-up vector, and other for top-down vector. Finally, Figure (d) illustrates
how the whole observation sequence for this letter is built, o11 to o14 with observation-
code = 256, and o16 to o19 with observation code = 257.
Every word part defines one observation sequence computed from a feature-vector
sequence. For example, Figure 5-14 defines five observation sequences, and Figure 5-15
defines four observation sequences.
(b) (a)
(c) (d)
- 73 -
Figure 5-14: Delayed-strokes projection in the handwritten word: اق� ’(AlAst$rAq) ‘orientalism ا!� �
Figure 5-15: Delayed-strokes projection in the handwritten word: ا!%$#�ع (AlAnTbAE) ‘the impression’
In both Figure 5-14 and Figure 5-15: Figure (a) illustrates the original word which has
been captured by a digital tablet in our system; Figure (b) is the preprocessed point
sequences of the word in Figure (a) – red points in Figure (b) are the added virtual points.
We have seen in this chapter that the feature vector extracted from each data point is
discretized to one of [1…260] observations. The obligation of selecting such sharp
discretization is the lack of training samples for online Arabic handwriting systems.
Using a large domain of observations requires many samples to train the Hidden Markov
Model utilized in this work which is described in the next chapter.
(b)
(a) (b)
(a)
- 74 -
Chapter 6
The Recognition Framework Our recognition framework uses discrete Hidden Markov Models (HMMs) to
represent letters which are embedded in a grammar network that represents the word-part
dictionary. The segmentation and recognition of handwritten word parts are performed
simultaneously in an integrated process similar to [6], [12], [18] and [19]. In this chapter
we shall describe the recognition algorithms and the framework models/architecture
simultaneously. The first section describes the recognition framework models followed
by two word-recognition algorithms, an optimized grammar network, and the dictionary
database structure. Finally, we discuss the writing style support and model training.
6.1. Framework Models
The next three sections describe the basic models utilized in this work: letter models,
word-part models and word models.
6.1.1. Letter Models
Left-to-right HMM without state skipping (aij = 0 for j ≠ i+1 and j ≠ i; πi = 0 for i > 1),
as shown in Figure 6-1, has been adopted to model each letter. We have selected this
simple topology because it has been successfully used in speech recognition, and there is
no sufficient evidence that more complex topologies would necessarily lead to better
recognition performance [19]. Furthermore, using this topology can effectively model the
time-dependent property in the handwriting observation sequence. The number of states
- 75 -
in a letter model is selected automatically based on the training set for this letter. More
details about this selection are described in Section 6.7.
Figure 6-1: left-to-right HMM
Each Arabic letter has two or four shapes depending on its positions in the word (see
Table 1-2). We have chosen to treat these letter shapes as different letters. For example,
associated with the letter (h) are four letter models for F, ـ$ـ ,هـ, and ـ� corresponding to
h’s isolated, initial, medial, and final shape respectively.
6.1.2. Word-Part Models
An Arabic word may contain several word parts (see Section 1.4.2 for more details).
Each word part is written with a single continuous stroke for its whole body followed by
number of delayed strokes. Thus, word parts (in a word) form a lower bound on the
number of pen lifts.
A model for a word part consisting of the letters L1, L2, …, Ln is built by concatenating
the letter-model Li (MLi) with the letter-model Li+1 (for 1 ≤ i <n) by linking the last node
of MLi to the first node of MLi+1 with a null transitions (as shown in Figure 6-2).
Figure 6-2: Word-part model consists of nnnn letter models
ML1 1 1 1
ML2
1 1 1 1 1 1
Ø
Ø
MLn
aN,N a3,3 a2,2 a1,1
a1,2 a2,3 a3,4 1 2 3 N
- 76 -
6.1.3. Word Models
To build a word model for a given word of k word parts, we simply build k word-part
models, one for each word part, as shown in Figure 6-3.
Figure 6-3: Illustration of word model for the word: (jAmEap) ‘university’. Word-part model ج���for and word-part model for (ـ� and جـ :two letters) ج� �� (three letters: ـ�ـ ,�ـ and .(ـ
6.2. Word and Word-Part Dictionaries
The dictionary is the list of words representing the domain of the search used in the
recognition task. The dictionary of Arabic words D could be divided into sub-
dictionaries, D = {D1, D2, …, Dn}, where, Di is a dictionary of all the words that consist
of i word parts.
Di = {w ∈D | w consists of i word parts}
Example 6-1: Consider the following word-dictionary:
D = {�� �}`د , �}V , روا"� , � , � ا �aن , ����� , هz , ج�� {ال�Vي ,
D is divided into the following sub-dictionaries:
=> D1 = {�� � , V{� , zه}
D2 = {د`{� , � {����� , ج��
D3 = {ن�a ي , اVال�}
D4 = { �روا" }
ـ�1 1 1
جـ1 1 1
Ø
ـ�1 1 1
ـ ـ1 1 1
Ø
1 1 1
Ø
�ـ
a.
b.
- 77 -
We refer to the word-part dictionary WPDi, j as the list of word-parts placed in index j
(starting from right) of the words in Di
Example 6-2:
D3 = {ب ، ر �ن ، �ري ، ��رق ، ��وق� is a {��دي ، �دي ، ��رس ، ��دي ، !���ن ، "���ن ، "��رن ، "��ون ، ر
word dictionary consists of words with three word parts.
=> WPD3, 1 = {ر ،��" ، ��! ، �� ، � ، ��}
WPD3, 2 = {� ،و ، � {د ، ر،
WPD3, 3 = {ي ، س ، ن ، ق}
6.3. Arabic-Word Recognizer
Viterbi algorithm does not only give the optimal path corresponding to a given
observation sequence, it also gives an accumulated likelihood score and state
segmentation along this optimal path (see Chapter 3).
We have seen in Chapter 5 how to compute the observation sequences Os = [O1, O2, …,
Ok] from a given handwritten Arabic word, where Oi = [oi,1, oi,2,…,oi,m] is an observation
sequence calculated from the handwritten word-part i. Based on the word model
described in Section 6.1.3 and the Viterbi algorithm, we introduce two algorithms to
recognize an Arabic word given its observation sequences (Os), and a word-dictionary D.
The first algorithm is described in the next section; the second algorithm is an optimized
version of the first described in Section 6.3.2.
- 78 -
6.3.1. Word Recognizer - Algorithm I
Algorithm 6.1 (algorithm I) accepts Os = [O1, O2, …, Ok] (k observation sequences)
such that Oi corresponds to word-part i, and searches for the word in Dk that maximizes
the probability of observing Os.
Algorithm 6. 1:
0. WordRecognizer-I (Os=[O1, O2, …, Ok], D) : Word {
1. max_prob = 0;
2. max_word = null;
3. for each w ∈ Dk {
4. p = 1;
5. for (i = 1 to k) {
6. M = buildWordPartModel(w.getWordPartText(i));
7. p = p * M.viterbi (Oi);
8. }
9. if (p > max_prob) {
10. max_prob = p;
11. max_word = w;
12. }
13. }
14. return max_word;
15. }
� buildWordPartModel(word-part) is a method that returns the word-part model of
word-part as described in Section 6.1.2.
� M.viterbi(O): Viterbi algorithm that returns the maximum probability of observing
an observation-sequence O given the model M.
- 79 -
Algorithm I is not only simple to implement, but it also does not require large
amounts of memory, since the word model for the iterated word is always created
dynamically and there is no need to keep it in memory. Note that updating the word
dictionary is an independent of the algorithm and its associated data structures.
The main limitation of algorithm I is the computational redundant. Word parts are
shared by different words, therefore, they should not be computed more than once (using
Viterbi algorithm – line 7). For example: in the Arabic words �ر�V� (mdrsp) ‘school’ and
,are the same in both words ر V� (mdrb) ‘trainer’, the first word part V� and the secondرب
and in same position in the two words, index = 0 and index = 1 respectively. This
redundancy also extends to shared prefixes and suffixes of word parts that are not taken
into account, thus they are recomputed. For example: the Arabic words V{� (mHmd)
‘Mohammad’ and V{�� (mjmd) ‘frozen’ have a shared prefix ـ� and also a shared suffix
V{ـ.
6.3.2. Word Recognizer - Algorithm II
We can overcome the limitation of algorithm I by caching the computed Viterbi
word-part probabilities in a map f: key � value, where,
key = <word-part text, the index of the word-part in the word (index = i) >
value = the maximum probability of observing Oi, given the model M
(Oi is the observation sequence computed from the handwritten word-part i)
This word-part probability map may be implemented by a hash table or balanced binary
search tree. For example: consider a sub-dictionary containing three words D3 = {�ر�V�
, are computed, and ر In processing the first word, the word parts V� and .{ر���� , �Vرب
they will not be recomputed for the second word. The word part �� is computed in the
- 80 -
first word, and it will not be computed for the third word. By applying this optimization
on algorithm I, we obtain algorithm 6.2 (algorithm II) described next.
Algorithm 6. 2:
0. WordRecognizer-II (Os=[O1, O2, …, Ok], D) : Word
1. max_word_prob = 0;
2. f[] = new Array[k] of Map;
3. for each w ∈ Dk {
4. p = 1;
5. for (i = 1 to k) {
6. word_part_text = w.getWordPartText(i);
7. if (f[i].find(word_part_text) == EXIST) {
8. word_part_prob = f[i].get(word_part_text);
9. } else {
10. M = buildWordPartModel(word_part_text);
11. word_part_prob = M.viterbi(Oi);
12. f[i].put(word_part_text, word_part_prob);
13. }
14. p = p * word_part_prob;
15. if (p < max_word_prob)
16. break;
17. }
18. if (p > max_word_prob) {
19. max_word_prob = p;
20. max_word = w;
21. }
22. }
23. return max_word;
24. }
To further optimize algorithm II, we added lines 15, and 16 to terminate the loop (in
line 5) when the probability is not relevant any more (i.e. less than max_word_prob).
- 81 -
6.4. Optimized Grammar Network
Algorithm II does not address the second limitation of algorithm I, namely that it
recomputes uncached shared prefixes and suffixes of word parts. This section describes a
new model network developed to overcome this limitation.
6.4.1. Optimized Word-Part Network
The left-to-right HMM letter models (described in Section 6.1) are embedded in a
grammar network which represents the word-part dictionary. This network has been
optimized such that all shared suffixes are grouped.
Figure 6-4: Optimized-grammar network implements a word-part dictionary includes k word parts, with grouping all shared suffixes and replacing each node by its corresponding letter model
Figure 6-4 gives the simplified diagram of the grammar network representing a word-
part dictionary. In this diagram, each node represents a letter shape, and each path from
. .
.
.
. . . .
Start node
:
:
:
:
:
:
. .
.
.
WP1
WP2
WPk
a1,2 a2,3 a3,4
aN,N a3,3 a2,2 a1,1
1 2 3 N Ø
:
:
: . . . .
- 82 -
the start node to a leaf corresponds to a unique word part. Each leaf contains the word-
part text WPi (where, 1 ≤ i ≤ k) to represent the path from the start node to this leaf. Since
this network is a tree, for each leaf there is exactly one path that starts from the root (start
node) and ends with this leaf. We shall refer to this network as a word-part network
(WPN). WPN* is referred to a WPN with replacing each letter node in WPN with its
corresponding letter model (described in Section 6.1).
The WPN for a given word-part dictionary is simply built by algorithm 6.3, and
illustrated in Figure 6-5.
Algorithm 6. 3
0. wordPartDictionaryToWPN(WPD) : WPN {
1. root = new Node(“start-node”);
2. for each wp in WPD {
3. addWordPart(wp, root);
4. }
5. return root;
6. }
where,
0. addWordPart(wp = [L1, L2, …, Ln], root) {
1. curr_node = root;
2. for (i=n down to 1) {
3. if (curr_node contains a childe C with letter Li) {
4. curr_node = C;
5. } else {
6. curr_node = new Node(Li);
7. curr_node.addChilde(curr_node)
8. }
9. }
10. curr_node.setWordPartText(description(wp));
11. }
- 83 -
Figure 6-5: Illustration of running wordPartDictionaryToWPN({��, ��&, �ر ,'� })
Either word-part suffix or prefix could be used to build the word-part network.
However, the decision of using the suffix rather than the prefix of word parts depends on
the fact that Arabic words always (except the last word part) end with one of the six
disconnictive letters (see Section 1.4.2). This fact guarantees that at least one letter is
shared in each word part, which leads to a reduction in the size of the WPN.
6.4.2. Word Network
The word network of a sub-dictionary Dk. (denoted by WNk) is an array of word-part
networks, where WNk [j] = the word-part network (WPN) of WPDk, j, constructed by the
wordPartDictionaryToWPN method described in previous section, for 1≤ j ≤ k, (as
shown in Figure 6-6). We shall refer to WNk*[j] as the WPN* of WPDk, j (WNk*[j] = WPN*
of WPDk, j), for 1≤ j ≤ k.
ر
ـ�
ـ�ـ
"ـ
!
�ـ
- 84 -
Figure 6-6: Illustration of the word network of the sub-dictionaryDDDD3 3 3 3 described in example 6-2
6.4.3. Word-Dictionary Database Architecture
The word-dictionary database (WDB) represents the data structure utilized in the
search procedure performed in the recognition task. WDB is represented by an array of
word networks (WN), where WDB[i] = WNi (1≤ i ≤ k), k is the number of word parts of the
words that have the maximum number of word parts in the whole word dictionary.
Figure 6-7 shows the word database (WDB) structure.
س
ن
ي
WN3 [3]
ق
ـ
ـ
ـ�
ر
د
و
WN3 [2]
�ـ
ـ
ـ�
�ـ
ـ�
!
ـ�ـ
ر
"ـ
WN3 [1]
- 85 -
Figure 6-7: The Word-dictionary architecture
6.5. Optimized Recognizer
In this section, we introduce a new Arabic word recognizer using the database
architecture described in the previous section and Viterbi algorithm. Given an
observation sequences Os = [O1, O2, …, Ok] of a handwritten word, where Oi = [oi,1,
oi,2,…,oi,Ti] is an observation sequence of the handwritten word-part i. The recognition task
is to find the word W = [wp1, wp2, …, wpk] (wpi is word-part i in W) in a given sub-
dictionary Dk which maximizes the following probability:
∏=
=k
i
ii Owp1
)|(P Os)|P(W
where,
P(wpi |Oi) = P(Oi|wpi)P(wpi)/P(Oi),
Since P(Oi) is the same for all word parts and assuming that all word parts in the
dictionary occur with equal probability, the problem is reduced to maximizing P(Oi|wpi),
ki <≤∀1 which can be computed efficiently by Viterbi algorithm given a WNk [i]. The
. . . WDB:
WN1 WN2 WNk WN3 WNi . . .
.
....... .
.......
. . .
WPNi [i] WPNi [1]
- 86 -
Viterbi algorithm is used to compute the probability of all paths for a given word-part
network and an observation-sequence O. Viterbi algorithm computes δt(S) which refers to
the best score (highest probability) along a single path at time t, which accounts for the
first t observations and ends in states S (see Chapter 3 for more details). Particularly, we
are only interested in the accumulated likelihood scores of the leaf states at time Ti in
WNk* [i]. Therefore,
( | ) ( ), i Ti
P O wp qδ= where q is a leaf state in WPNi* and q.wordPartText = wp
We select the word part (wp) in WPDi that maximizes ( | )i
P O wp .
6.5.1. Word Recognizer - Algorithm III
Algorithm 6.4 accepts an observation sequence Oi = [o1,i, o2,i,…,oTi,i] generated from a
handwritten word-part i, and a WPN* as an input, and returns a word-part probability
map:
g: key � value, where,
key = <the word-part text of a leaf S in WPN*>
value = δTi(S)
The word-part probability map (g) contains a key for each leaf in WPN*. The word-part
text of the leaf state S can be accessed by the method: S.getWordPartText().
Using algorithm 6.4, an optimized Arabic word recognizer is described in algorithm
6.5 (algorithm III).
- 87 -
Algorithm 6. 4 0. computeWPProbabilityMap(Oi = [o1,i, o2,i,…,oT,i], WPN*) : Map {
1. Map wp_to_probability_map;
2. δ = WPN*.viterbi(Oi);
3. for each leaf S in WPN* {
4. p = δT(S)
5. wp_to_probability_map.put(S.getWordPartText(), p);
6. }
7. return wp_to_probability_map;
8. }
Algorithm 6. 5
0. WordRecognizer-III (Os = [O1, O2, …, Ok], WDB) : Word {
1. WN* = WDB[k];
2. for (i = 1 to i≤ k) {
3. wp_to_probability_maps[i] =computeWPProbabilityMap(Oi ,WN*[i]);
4. }
5. max_prob = 0;
6. for each w ∈ Dk {
7. p = 1;
8. for (i=1 to k) {
9. wp_prob = wp_to_probability_maps[i].get(w.part(i));
10. p = p * wp_prob;
11. }
12. if (p > max_prob) {
13. max_prob = p;
14. max_word = w;
15. }
16. return max_word;
17. }
- 88 -
For each observation sequence Oi (where, 1 ≤ i ≤ k) the word-part probability map is
built in lines 2-4. These maps are stored in the array wp_to_probability_maps. Lines 6-
15 search for the word that maximizes Os in the sub-dictionary Dk.
6.6. Support for Writing Style
Supporting letter styles is a significant requirement for writer-independent systems
(see Section 1.2). This section describes how by modifying the WPN a writing style
support is achieved.
Writing styles of letters are called letter classes, where each letter class refers to a
letter with a specific writing style, as illustrated in Figure 6-8. The decision of the number
of classes assigned for each letter is discussed in the training process in Section 6.7.
(a) (b) (c) (d)
Figure 6-8: Illustration of four letter classes of the medial letter shape ـ(ـ (h)
In this work, writing styles are supported by assigning a letter model (described in
Section 6.1.1) for every letter class, denoted as letter-class model, similar to [18], [19]
and [21]. For example, four letter-class models are assigned for the four letter classes in
Figure 6-8. All letter-class models for the same letter are grouped in one model that has
one “input” and one “output”, called multiple-letter-class model. The input is a null
transition that connects the first states of all the letter-class models in this group.
- 89 -
Similarly, the output is a null transition that connects all the last states in this group, as
shown in Figure 6-9.
Figure 6-9: Multiple-letter-class model contains n n n n letter-class models
The multiple-letter-class model is embedded in the word-part network (WPN). Instead
of replacing each letter node by its corresponding letter model (as shown in Figure 6-4),
now each letter node in WPN is replaced by its corresponding multiple-letter-class model,
as shown in Figure 6-10. Upon applying this change on the WPN, we use algorithm 6.5
without any modification to recognize a handwritten word written in different styles.
Figure 6-10: The word-part network with replacing each node by a multiple-letter-class model
. . .
:
WP1
WP2
WP1k
. . .
1 1 1
Ø
1 1 1
1 1 1
C2
Cn
C1
Ø
- 90 -
6.7. Model Training
The goal of the training process is to train the HMM parameters, λ = (A, B, π) for each
letter class. The Baum-Welch training algorithm is used for this task (for details about
Baum-Welch algorithm see Section 3.3.3). To train the models, we initialize the model
parameters as follows:
The initial states distribution π = {πi} is initialized to:
• π1 = 1.0 and πi = 0 for 1 < i < N (where N is the number of states in the model)
The transition probability matrix A = {ai,j} is initialized to:
• ai,i = 0.5 and ai,i+1 = 0.5 for i < N
• ai,j = 0.0 (where, i ≠ j and i ≠ j+1 for i < N)
• aN, N = 1.0
Good initial estimates for the observation probability matrix B = {bi(oj)} are helpful for
the training task [14]. Since the fact that some global geometric information of letters is
known before running the training process, some possible observed symbol oj are
estimated by assigning bi(oj) = 0.0, and the remaining observed symbols are initialized to
reflect a uniform distribution. Global geometric information that can be extracted from a
letter includes:
• Letter without dots leads to a direct elimination of observations oj corresponding
to dots in its letter model – bi(oj) = 0.
• Loop free letters.
- 91 -
6.7.1. Training Problems
Two difficult problems are needed to be solved for the training process. First, the
system must automatically determine how many styles there are for each letter or, in
other words, how many letter-class models are required to represent the training letter
samples. The second problem is how to determine the number of states required for each
letter-class model. Both of these problems are explored as follows:
• Determining The Number of Letter Classes
The training samples for a specific letter are written in different styles by different
users. The α-angles of each sample is discretesized to 16 directions (see Section 5.5),
denoted by direction codes. To determine the number of letter classes for a specific letter
we first cluster the samples that have similar diction codes. Afterwards, we use the
number of consequent clusters as the number of classes needed to represent this letter
model. In addition, the clustering technique determines which sample will be trained for
which class model (we will discuss it in Section 6.7.2). The agglomerative (bottom-up)
clustering method based on [21] is used for this task.
To use this type of clustering algorithm, a distance function between two samples
should be determined first. The distance function has been chosen by the minimal edit
distance between the two direction codes, described as follows:
Given two samples A and B corresponding to the same letter:
A = (a1, a2, …, aN), where ai is a direction codes of A.
B = (b1, b2, …, bM), where bi is a direction codes of B.
The distance between A and B is defined as follows based on [21]:
D(A,B) = D(aN, bM) where,
- 92 -
D(ai, bj) = min {D(ai-1, bj-1) + δ(ai, bj) × SP,
D(ai-1, bj) + DP,
D(ai, bj-1) + IP }
where, SP, DP , IP is a substitution, deletion and an insertion penalty respectively.
and, δ(ai, bj) = 0, if d(ai, bi) < X
δ(ai, bj) = 1, otherwise
where d(ai, bi) is the directional code difference between direction code ai and bi. X is a
predefined threshold chosen empirically.
• Number of HMM States
The number of HMM states for a letter-class model is an important parameter to
improve the recognition performance. Previous researchers have shown that assigning
more states to letter models with more complicated shape than those with simpler shape
leads to better recognition results than assigning the same number to all letter
models [19]. The HMM state number in [19] has been selected empirically. To automate
this selection, algorithm 6.6 is introduced to test all the possible instances of the number
of states.
According to our observation, the number of states varies from 3 to 10 states. Our
system assigned 10 states to the initial letter shape of (h) هـ, and 3 states to the isolated
letter shape of (A) ا. The letter model training algorithm 6.6 is described as follows:
- 93 -
Algorithm 6.6
Input: the training observation sequences of a letter class LC.
Output: optimal trained model represents LC.
0. ComputeOptimalModel(train_obs_seqs, LC) {
1. optimal_model = null;
2. max_prob = 0;
3. for n = 3 to 10 {
4. M = build a letter model consists of n states;
5. M.initialize(LC); //initialize the model as described above
6. M.trainModel(train-obs-seqs);
7. p = M.computeLikelihoodMean(train_obs_seqs);
8. if (p >= max_prob) {
9. optimal_model = M;
10. max_prob = p;
11. }
12. }
13. return optimal_model;
Algorithm 6.6 constructs a letter model with n ∈{3, 4, …, 10} states, and trains it
using its corresponding training samples. This algorithm computes the recognition
likelihood mean of all the same training samples using Viterbi algorithm given the
constructed model. This process will be repeated for all possible number of states n ∈{3,
4,…, 10}. The model that gives the maximum likelihood mean will be selected to be the
model to represent the letter class.
6.7.2. The Training Process
The training process is divided into four stages, as described in [19], supervised
training data collection, the training of letter-class models, semi-unsupervised training
- 94 -
data collection, and the Final training of the letter-class models, discussed respectively
as follows:
I. Supervised training data collection: In this stage the trainer (writer) is asked
to handwrite (using a digital tablet) a list of predetermined words. Then, he/she is asked
to split each word manually to its compound letters, such that all the delayed strokes are
in between the two splitters. The system helps the writer which two letters should be
split, as shown in 6.11. The words are split to letter samples and every sample is tested to
determine if it satisfies predetermined letter rules (e.g. number of dots put above or under
the letter body) in order to avoid improper samples. The letter and the sample are stored
in the trained-sample database.
Figure 6-11: A screen shot of the trainer system; the green vertical lines are the split lines. The dashed lines are automatically given by the system to avid word-part overlapping.
II. The training of letter-class models: After collecting the training samples from
stage-I, algorithm 6.7 is performed to obtain all the trained models for each letter class.
- 95 -
III. Semi-unsupervised training data collection: The collection of the training
data in stage-I is a time consuming process, particularly in the presence of a huge amount
of training data set. Since the trainer is asked to “split” each word manually. Therefore,
another stage is introduced to accelerate this process, which is based mainly on the output
models of stage-II for estimating automatically the split points between each two letters
of a handwritten word, given the word text. For each word part the multiple-letter-class
models (described in Section 6.7) corresponds to each letter are connected in one chain
model, such that the multiple-letter-class output of letter i is linked to the input of the
multiple-letter-class model i+1, as shown in Figure 6-12. Then the Viterbi algorithm is
used to compute the state segmentation along the optimal path in this chain model, as
shown in Figure 6-13. Running Viterbi algorithm on the chain model, given the
observation sequence (O = [o1, o2, …, oT]) of a word part, will observe for each letter Li an
Oi = [oi,1, oi,2,…, oi,k] ⊆O that describes Li the best. In addition, it gives the correspondence
of the optimal state sequence in the chain model to Oi. This state segmentation is used to
segment the letters from a word part, as shown in Figure 6-13 and Figure 6-14. Note that
Algorithm 6. 7
for each letter samples SL {
1. Perform the preprocess phase (Chapter 4) on all the samples in SL.
2. Perform the clustering method described in Section 6.7.1
=> Clusters(L) = {c1, c1,…, ck}
3. Perform the feature extraction phase (Chapter 5)
4. Run algorithm 6.6 on each ci in Clusters(L) to get a trained model for
each letter class.
}
- 96 -
this stage does not produce new letter classes, it is only used to improve the letter-class
models produced in stage-I.
IV. Final training of the letter-class models: Using the training data processed
by stage-III and the original training data (by stage-II); each letter-model class is
constructed and trained again by running algorithm 6.6 to get new trained models.
Figure 6-12: Multiple-letter-class model chain of mmmm word parts
Figure 6-13: Screen shot of our automatic letter splitting system
Figure 6-13 is a screen shot of our word-trainer system of stage-III, the red text was
written by a user, the green lines are the split lines estimated automatically. The user is
asked to decide if it is an appropriate estimation or not.
Ø
1 1 1
1 1 1
1 1 1
C2
Cn1
C1
Ø
1 1 1
1 1 1
1 1 1
C2
Cnk
C1 1 1 1
1 1 1
1 1 1
C2
Cn2
C1
Ø
L1 L2 Lm
- 97 -
Figure 6-14: Illustration of letter-class model chain
Figure 6-14 illustrates the multiple-letter-class model chain of the word �� �. From left
to right: the letter ـ� has three letter-class models, the letter ـ ,has two letter-class models ـ
the letter ـ�ـ has one letter-class model, and the letter ـ� has three letter-class models. The
orange states are the optimal path returned by Viterbi algorithm of the word written in
Figure 6-13.
ـ�ـ ـ� ـ ـ �ـ
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
Ø
Ø
Ø
- 98 -
Chapter 7
Implementation An online Arabic handwriting recognition system has been implemented during two
years of this research. The code has been written in Java 1.4 technology – Sun
Microprocessing.
The system includes the follows applications:
� Word-sample collection for training: This application has been built to
collect the word samples which are provided to the training process. It utilizes the
supervised training data collection method described in Section 6.7.2. An XML
file is created for each trainer containing his/her personal information and all
word samples he or she has written. For each word sample, the 2D point
sequences are saved along with the word-sample text, and the split points.
� Letter-Sample Tester: Upon collecting the word samples in the previous step,
each letter sample is tested using this application to automatically verify if it
satisfies basic rules. In addition, this application also allows manual testing for
each letter.
� Trainer: Upon testing each sample, the preprocessing (in Chapter 4) and feature-
extraction (in Chapter 5) phases are applied to each letter sample, and then
algorithm 6.6 is used to construct and train the letter models. Each letter-model
object is serialized in a file utilizing Java object serialization.
- 99 -
� Word-sample collection for testing: This application is used to allow users
to record words from a given dictionary. It saves the point sequences for each
word along with its text in an XML file for each user. These samples are used to
report the recognizer’s performance and recognition results.
� Arabic Word Recognizer: The preprocessing (in Chapter 4), feature extraction
(in Chapter 5), and algorithm 6.2 (algorithm–II) are implemented in this
application. The preprocessing and feature extraction phases are applied to each
collected word sample from the previous step, and then running algorithm-II. If
the recognition result matches the associated-word text in a word sample, the
word will be saved in a success-log file, otherwise it will be saved in an error-log
file. The recognizer architecture is shown in Figure 7-3.
� Arabic Word Recognizer with GUI: This application includes the previous
application adding to it a GUI (General User Interface) to enable any user to use
this system, as shown in Figure 7-1 and Figure 7-2.
- 100 -
Figure 7-1: Screen shot 1 of the handwriting recognition system
Figure 7-2: Screen shot 2 of the handwriting recognition system
Figure 7-1 and Figure 7-2 are screen shots of the “Online Arabic Handwriting
Recognition” system, developed in this work. The yellow background is the area that
users write in. the red printed sentences are the heights probability that match the
handwritten words.
- 101 -
Figure 7-3: The global architecture of our system
[[p1,1 p1,2, …, p1,n1], [p2,1 p2,2, …, p2,n2 ],
. . .
. . .
. . . [pk,1 pk,2, …, pk,nk ]]
Input - point sequences of
a handwritten word:
[[p’1,1 p’1,2, …, p’1,m1], [p’2,1 p’2,2, …, p’2,m2 ],
. . .
. . .
. . . [p’l,1 p’l,2, …, p’l,ml ]]
Sub-Dictionary Classifier (Chapter 6)
Word
Dictionary DB
Word Recognizer (Chapter 6)
آ�}�
[[o1,1 o1,2, …, o1,m1], [o2,1 o2,2, …, o2,m2 ],
. . .
. . .
. . . [ol,1 ol,2, …, ol,ml ]]
Sub-Dictionary
Online Arabic Handwriting Recognition System
Preprocessor (Chapter 4)
Features Extractor (Chapter 5)
Delayed-Stroke Processor (Chapter 5)
- 102 -
Chapter 8
Results A publicly available corpus and enormous data samples for training and testing online
handwriting recognition systems exist in English. It has been addressed by UNIPEN, a
project supported by the US National Institute of Standards and Technology (NIST) and
the Linguistic Data Consortium (LDC) [41]. Unfortunately, there is no reference to
similar type of data for Arabic script. Therefore, part of my thesis includes collecting and
organizing data samples for training and testing.
Four users (see Table 8-1) were asked to write 600 predetermined words which were
used in the training process.
User Name Gender Hand Age
User 1 Male Right 24
User 2 Male Right 28
User 3 Male Right 26
User 4 Male Right 25
Table 8-1: The four users who trained our system
User Name Gender Hand Age
User 5 Male Right 22
User 6 Female Right 31
User 7 Male Left 23
User 8 Male Right 29
User 9 Male Right 12
User 10 Male Right 25
Table 8-2: The users who tested the system, (different from those who did the training)
- 103 -
After running the training phase, the same users who trained the system were asked to
handwrite various number of words (different from those used in the training phase) to
attain writer-dependent results. Other six users (see Table 8-2) were asked to handwrite
predefined words (from the dictionary, different from those used in the training phase) to
obtain writer-independent results.
Algorithm 6.2 returns the word that matches the observation sequences – the word
that gives the highest match probability. This algorithm has been modified to return the
highest probabilities of the first three words that match the observation sequences. The
recognized word with the heights probability is denoted by the 1st option. The 2nd option
is referred to the 2nd highest probability, and the 3rd option refers to 3rd highest
probability.
The reported results in this chapter utilize five different sizes of word dictionaries:
5K, 10K, 20K, 30K, and 40K words selected from Arabic Treebank [42], using twenty
random articles from Al-Arabi Magazine [43], and ten random articles from the news
channel Aljazeera Net [44].
Table 8-3 presents the word recognition results for the four users who trained the
system to attain writer-dependent results (see Table 8-1). These results are given by
testing the system with the five word-dictionary sizes.
- 104 -
Dictionary size
User name
Number of test words
Number of correctly
recognized words
Number of error words
Word recognition rate (%)
User 1 253 245 8 96.84
5K User 2 179 170 9 94.97
User 3 258 252 6 97.67
User 4 249 240 9 96.39
User 1 253 244 9 96.44
10K User 2 179 168 11 93.85
User 3 258 248 10 96.12
User 4 249 238 11 95.58
User 1 253 241 12 95.26
20K User 2 179 159 20 88.83
User 3 258 243 15 94.19
User 4 249 232 17 93.17
User 1 253 238 15 94.07
30K User 2 179 153 26 85.47
User 3 258 238 20 92.25
User 4 249 228 21 91.57
User 1 253 237 16 93.68
40K User 2 179 148 31 82.68
User 3 258 236 22 91.47
User 4 249 227 22 91.16
Table 8-3: The writer-dependent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size
Table 8-4 presents the word recognition results for the six users (see Table 8-2) to
obtain writer-independent results. These results are given by testing the system with the
five word-dictionary sizes.
- 105 -
Dictionary size
User name
Number of test words
Number of correctly
recognized words
Number of error words
Word recognition rate (%)
User 5 279 271 8 97.13
User 6 138 135 3 97.83
5K User 7 241 226 15 93.78
User 8 316 308 8 97.47
User 9 196 187 9 95.41
User 10 249 242 7 97.19
User 5 279 269 10 96.42
User 6 138 133 5 96.38
10K User 7 241 222 19 92.12
User 8 316 300 16 94.94
User 9 196 187 9 95.41
User 10 249 239 10 95.98
User 5 279 259 20 92.83
User 6 138 130 8 94.20
20K User 7 241 215 26 89.21
User 8 316 291 25 92.09
User 9 196 183 13 93.37
User 10 249 233 16 93.57
User 5 279 250 29 89.61
User 6 138 124 14 89.86
30K User 7 241 206 35 85.48
User 8 316 279 37 88.29
User 9 196 182 14 92.86
User 10 249 229 20 91.97
User 5 279 246 33 88.17
User 6 138 121 17 87.68
40K User 7 241 204 37 84.65
User 8 316 270 46 85.44
User 9 196 179 17 91.33
User 10 249 226 23 90.76
Table 8-4: The writer-independent word results with 5K, 10K, 20K, 30K and 40K word-dictionary size
Figure 8-1 summarizes the average of the word recognition results of all users for
each dictionary.
- 106 -
Recognition Rate
89.7590.84
92.86
96.47
95.50
89.68
96.28
95.21
92.55
88.01
82.00
84.00
86.00
88.00
90.00
92.00
94.00
96.00
98.00
5K 10K 20K 30K 40K
Dictionary Size
%
WD
WI
Figure 8-1: The average of the word recognition results of all users in each dictionary.
Writer Dependent (WD) in left columns, writer independent (WI) in right columns
Table 8-5 and Table 8-6 present the recognition results in term of word parts for all
users, tested with the five word dictionaries. The fourth column in these tables represents
the number of word parts that were correctly recognized in the 1st option. The fifth
column is the number of word parts that incorrectly recognized in the 1st option.
- 107 -
Dictionary size
User name
Number of word parts
Number of correctly recognized word
parts
Number of error word parts
Word-part recognition rate (%)
User 1 666 658 8 98.80
User 2 472 462 10 97.88
User 3 714 707 7 99.02
User 4 664 651 13 98.04
5K User 5 733 724 9 98.77
User 6 351 348 3 99.15
User 7 644 627 17 97.36
User 8 806 798 8 99.01
User 9 506 495 11 97.83
User 10 664 656 8 98.80
User 1 666 657 9 98.65
User 2 472 458 14 97.03
User 3 714 702 12 98.32
User 4 664 649 15 97.74
10K User 5 733 719 14 98.09
User 6 351 346 5 98.58
User 7 644 621 23 96.43
User 8 806 786 20 97.52
User 9 506 496 10 98.02
User 10 664 651 13 98.04
User 1 666 654 12 98.20
User 2 472 448 24 94.92
User 3 714 697 17 97.62
User 4 664 642 22 96.69
20K User 5 733 709 24 96.73
User 6 351 341 10 97.15
User 7 644 613 31 95.19
User 8 806 776 30 96.28
User 9 506 491 15 97.04
User 10 664 643 21 96.84
Table 8-5: The recognition of word-part results with 5K, 10K, 20K word-dictionary size of all users
- 108 -
Dictionary size
User name
Number of word parts
Number of correctly recognized word
parts
Number of error word parts
Word-part recognition rate (%)
User 1 666 651 15 97.75
User 2 472 441 31 93.43
User 3 714 691 23 96.78
User 4 664 635 29 95.63
30K User 5 733 699 34 95.36
User 6 351 333 18 94.87
User 7 644 601 43 93.32
User 8 806 761 45 94.42
User 9 506 489 17 96.64
User 10 664 638 26 96.08
User 1 666 650 16 97.60
User 2 472 435 37 92.16
User 3 714 689 25 96.50
User 4 664 634 30 95.48
40K User 5 733 694 39 94.68
User 6 351 330 21 94.02
User 7 644 601 43 93.32
User 8 806 753 53 93.42
User 9 506 483 23 95.45
User 10 664 634 30 95.48
Table 8-6: The recognition of word-part results with 30K and 40K word-dictionary size of all users
Table 8-7 and Table 8-8 present the word results of the three options (1st, 2nd and 3rd)
– the three words that match at most the handwritten word (returned by the recognizer).
These results motivate to include an additional model in our system based on Natural
Language Processing (NLP) (e.g. bi-gram) when trying to recognize an entire sentence to
improve the recognition rate. Since many words were incorrectly recognized in the 1st
option, however, they were correctly recognized in 2nd or the 3rd options.
- 109 -
Dictionary size
User name
Number of test words
Number of correctly
recognized words
Num. of correctly recognized words in the 2nd option
Num. of correctly recognized words in the 3rd option
User 1 253 245 6 1
User 2 179 170 7 0
User 3 258 252 4 0
User 4 249 240 6 1
5K User 5 279 271 3 2
User 6 138 135 1 1
User 7 241 226 12 1
User 8 316 308 5 1
User 9 196 187 4 0
User 10 249 242 5 1
User 1 253 244 7 1
User 2 179 168 7 2
User 3 258 248 8 0
User 4 249 238 6 2
10K User 5 279 269 5 1
User 6 138 133 2 1
User 7 241 222 15 1
User 8 316 300 10 3
User 9 196 187 4 0
User 10 249 239 5 4
User 1 253 241 6 5
User 2 179 159 16 1
User 3 258 243 12 1
User 4 249 232 11 2
20K User 5 279 259 11 2
User 6 138 130 4 2
User 7 241 215 15 5
User 8 316 291 16 3
User 9 196 183 7 1
User 10 249 233 10 4
Table 8-7: The word recognition results of the three options with 5K, 10K and 20K word-dictionary size
- 110 -
Dictionary size
User name
Number of test words
Number of correctly
recognized words
Num. of correctly recognized words in the 2nd option
Num. of correctly recognized words in the 3rd option
User 1 253 238 9 4
User 2 179 153 20 2
User 3 258 238 14 3
User 4 249 228 13 2
30K User 5 279 250 16 5
User 6 138 124 9 1
User 7 241 206 21 6
User 8 316 279 22 6
User 9 196 182 6 2
User 10 249 229 12 3
User 1 253 237 10 0
User 2 179 148 22 3
User 3 258 236 15 3
User 4 249 227 13 1
40K User 5 279 246 14 10
User 6 138 121 11 1
User 7 241 204 19 8
User 8 316 270 28 7
User 9 196 179 7 2
User 10 249 226 13 3
Table 8-8: The word recognition results of the three options with 30K and 40K word-dictionary size
The following tables show the words for each user that has been incorrectly
recognized by our system. In addition, these tables report the Levenshtein edit minimal
distance (EMD) from the correct word to the recognized one for the 1st, 2nd and 3rd
options which provide the quality of letter based recognition. Note if the EMD is 0 in an
option, it means that the word has been correctly recognized in this option. These results
were obtained by running the recognizer with 5K word dictionary.
- 111 -
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
���� ����� 1 ���� 0 ��� 2
� 0 و�� 1 و�� و�� 2 و�
� ��� 3 �� 1 � 0
4 ی����� 0 ی��� 2 ی���� ی���
4 ی����� 0 ی��� 2 ی���� ی���
3 ت�ر 0 ن��ا 2 ن�ا ن��ا
��� !��" 3 ��� 0 !���� 2
4 وش�'� 4 وث�! 4 ون�%� وت$#�
Table 8-9: Incorrectly recognized words of “User 1” with 5K word-dictionary size
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
2 ص�ل! 0 %�ل! 1 ��ل! %�ل!
2 و�� 0 س,ف 1 س,�� س,ف
2 اث��ء 0 اث�ر 1 اش�ر اث�ر
2 ���0ة 0 #�0ة 3 ���ة #�0ة
5 ن%�� 0 ل0"�1 1 "�1 ل0"�1
2 ن��ی! 0 ث�ی! 1 �ی! ث�ی!
3 ری��� 4 ال��3! 4 ال��2 ال����
5 ا5�ق 0 ال"� �! 2 ال"�ص! ال"� �!
�ت 4 7س���3ف 2 ���7ت 7ن��ج�9 4
Table 8-10: Incorrectly recognized words of “User 2” with 5K word-dictionary size
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
2 ال���ی! 0 ال��ی! 2 اب�ی! ال��ی!
6 وا��ی�� 0 وال;�ب! 3 وال>�ی� وال;�ب!
4 ا97�ن 3 ا7 �ك 2 ا7س�ك ا7س�<�ك
� �� 1 ��� 3 2� 2
2 ی���@ 0 ی%�@ 1 ی%��@ ی%�@
A�وال � 3 وا�� 0 وال�A 2 وال
Table 8-11: Incorrectly recognized words of “User 3” with 5K word-dictionary size
- 112 -
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
B� B�� 1 B� 0 B2 ی��
��ی! 0 9��ب! 3 ��0ی! 9��ب! 3
5 ان��ل� 1 اش�ر 4 ان��ء ث�را
Cواس�ل� D3 وال�0ل C5 وروس�� 0 واس�ل�
��5 D4 ن���! 4 ش�� !��9 4
1 رج@ 0 اج@ 2 ا��@ اج@
A و F� 1 و A 2 و�� 0 و
�ل 4 اس�;�ل ال���ل 0 ال���ل 4 اس�
Aال0%�دی A2 ال0%�رب� A0 ال0%�دی A3 ال0�9,ی
Table 8-12: Incorrectly recognized words of “User 4” with 5K word-dictionary size
Test word
1
st option
EMD from the
1st
option 2
nd option
EMD from the
2nd
option 3
rd option
EMD from the
3rd
option
2 ال�0@ 0 ال�0, 1 ال�%, ال�0,
0 ��د 2 ص�ى 1 ه�د ��د
�9 K�9 2 @ 3 ن�F 2 ن
3 وه�K 0 و�� 1 و�� و��
�ب� 5 ت�ی�� 3 ه��ي ل;,ي 5
2 ت%� 0 ت�� 3 ت'��� ت��
� @� 2 @�� 3 �� 1
0 ل��,ج� 6 �ب 6 ت�ی� ل��,ج�
Table 8-13: Incorrectly recognized words of “User 5” with 5K word-dictionary size
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
�رت�� �N 0ات 2 ,ات �Nات 5
O�%ت P3 ت�� O1 ت�� O�%0 ت
3 ا7ب�ء 3 ا7ن��ء 2 ا7ش��ر ا7ث�ر
Table 8-14: Incorrectly recognized words of “User 6” with 5K word-dictionary size
- 113 -
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
B� B�� 1 B� 0 ���� 2
4 ان��ل 0 اث�ر 1 اش�ر اث�ر
1 ��'�ة 0 ��0�ة 1 ���0ة ��0�ة
4 و ����ت 0 وت<��ت 2 وت���ت وت<��ت
4 ا ��آ�! 0 ال��ی! 1 ال�,ی! ال��ی!
6 ال����D 0 رئ���� 1 رئ��D رئ����
��5 � 2 ��� 0 ��5 4 ن%�
1S1 2 ب�� ی� 0 ی1S 2 ی
3 خ>,ات� �N 0ات 2 �رت �Nات
O�%ت P3 ت�� O�%0 ت O1 ت��
U�>3 ت���� ل�� U�>6 ش���! 0 ل��
@N�9 !2 ���@ 2 �9ش 3 آ��
P��� !�# 4 P��� 0 !���" 4
3 ال���F 0 ال"��@ 3 ان���B ال"��@
Table 8-15: Incorrectly recognized words of “User 7” with 5K word-dictionary size
Test word
1
st option
EMD from the
1st
option 2
nd option
EMD from the
2nd
option 3
rd option
EMD from the
3rd
option
B� C4 �ی%� 2 ی� B�� 1
4 ال'�2 0 ال���� 4 ال��2 ال����
3 ال�� 0 اج@ 2 ا��@ اج@
A 2 وه�� و A 0 و A1 و�
���� ����� 1 ���� 0 ���S 2
�9 K�9 2 ��3 ن �9 0
1 و�� 1 و�� 2 و9� و��
� �� 1 � 0 ��� 3
Table 8-16: Incorrectly recognized words of “User 8” with 5K word-dictionary size
Test word
1st
option
EMD from the 1
st option
2nd
option
EMD from the 2
nd option
3rd
option
EMD from the 3
rd option
4 ���� 3 خ>�N 4 C���A ص;��
5 رثو 1 9,دت� 1 9,دت�� 9,دت<�
4 ی�$$,ن 0 ی�'�ن 1 ی�'�,ن ی�'�ن
1 ال�,ی! 0 ال��ی! 2 اب�ی! ال��ی!
X3 ب��ی! 3 ج�ی! ب�ی A3 ج�ی�
O3 اث��� ان� ! 4 ا ��! 5 وث�
O�%2 ت���3 ت K��"4 ت �� 3 ت%
4 یUال 0 ی��0اى 2 ی��اس ی��0اى
P��� !�# 4 P��� 0 !<� 4
Table 8-17: Incorrectly recognized words of “User 9” with 5K word-dictionary size
- 114 -
Test word
1
st option
EMD from the
1st
option 2
nd option
EMD from the
2nd
option 3
rd option
EMD from the
3rd
option
2 ن��ی! 0 ث�ی! 1 �ی! ث�ی!
3 خ�� 0 خ��<� 1 خ��� خ��<�
K�"ت � 0 ت"�K 4 ت%>0� 3 ت%
Aال0%�دی A2 ال0%�رب� A0 ال0%�دی A3 ال0�9,ی
4 ال0$��$�! 4 ال����$�! 3 ال���$�! ال<�$�!
3 وتZآ� 0 وت��9� 1 وت��9� وت��9�
2 ص�ى 0 ��د 1 ه�د ��د
Table 8-18: Incorrectly recognized words of “User 10” with 5K word-dictionary size
Figure 8-2 and Figure 8-3 demonstrate various words handwritten by different users
that were correctly recognized by our system. Table 8-19 lists some additional words that
were correctly recognized. The system used the 40K word-dictionary size.
Figure 8-2: Illustration of correctly recognized words written by different users utilizing 40K word dictionary
- 115 -
Figure 8-3: Illustration of the same words written by different users, all of these words were correctly recognized by our system with 40K word-dictionary size
Dس�$�! ادب� �>�� ت'��,ا ال$
,اده� ت$�@ ال$,ن��س اده1
ن�ء ث��ت� ال0�رس! اس��ذا
Aا7ب A�0ال��0ه ����� Dه
واس��ب خ�,ص�! ال0'�ل ا�7$�ر
وال>�ی� س��D ال0,ازن! ا77ت
وال<��و��5��! ض�3@ اتال��و ال���ب!
و��1 9�� ب�<0! ال�"�ج
ودب�, �س�,ن 9� ب�واءیA ال�����0ت
Aوس�1 �5دوا ب��9ی! ال%�ب�
D%ب��>! ال A�9�� Dو�
وآ�� ��ن� بS^ ال���د
و , �ت �1 ب'���0ت ال��ی!
D'ت�ری"�� ال� ��� Pون�
%��ج,نی ��A0 ت���P ال��ب!
ی%�@ ل;,ي ت��ب��� ال>�,ر
ی�'�وا %�دث�ت ت�ا�^ ال'<,د
ی��� %0� ت�� ال��ی�
Table 8-19: Example of correctly recognized words with 40K word-dictionary size
- 116 -
Chapter 9
Future Work In this work, some points and problems have not been investigated. Our future
analysis will explore these aspects. These points are discussed in this chapter:
• As we have discussed in Chapter 5 boundary lines are used for size, orientation
and skew normalization for Latin scripts. Since boundary lines do not necessarily
exist in unconstrained Arabic script, an alternative solution should be found to
apply such normalization.
• In our work we have chosen only the loops as the global feature. We intend to
attempt to add more global features such as cusps and crosses to our system as
was used in [19] to determine if better results are achieved.
• Our work does not explore the problem of the existence of punctuation marks
(e.g. [(.), (,), (;), (:), (!), (?), (#), ($), (%), etc…]), and the numbers (0, 1, ..., 9).
Extra research is needed to recognize that these symbols are not part of words,
since oftentimes there is no space between them and written words.
• To achieve a better time performance, we divided the dictionary database in terms
of word parts. However, when a sizeable lexicon (e.g. 100,000 words or more) is
used, additional methods should be employed to reduce the search space. Our
future research will test our system with a lexicon of this size and improve it by
exploring the method proposed in [45].
- 117 -
• Some word-parts/letters have been recognized incorrectly. The common error
cases are listed in Table 9-1. In order to reduce the number of such errors, we
propose having an additional postprocessing phase to detect these cases using
geometric-computation techniques.
Table 9-1: Possible errors reported by our recognizer. Sometimes, the word-parts/letter-shapes in the left column are exchanged to their appropriate letter-shapes/word-parts in the right column and vice versa.
• Letters that include ء (hamzi) or ~ (madi) in Table 9-2 are not supported in our
recognizer. Supporting these letters will also be part of our future work. They will
be handled by applying the same method that incorporates delayed strokes within
the observation sequence.
� ب
ت تـ�
ث �ـ�
ن ـ�
� ـ�
ـU ـ�ـ�
ـX ـWـ�
ـ{ ـ|ـ�
د ل
zـ Vـ
- 118 -
Table 9-2: Not supported letters
• Optimizing the WPN for words that consist of only one word part, by optimizing
both suffixes and prefixes in the same network, similar to [19].
Alef+Hamza above أ � ـ
Alef+Hamzi below ـ� إ Alef+Madi S ـ� Lam-Alef+Hamza above M ـ�
Lam-Alef+Hamzi below L ـ�
Lam-Alef+Madi N ـ�
- 119 -
Chapter 10
Conclusion The primary aim of this research was to investigate how an online Arabic
handwriting recognition system may be built out of Hidden Markov Model, and how well
such a system is capable in the task of resolving the complexities of Arabic handwriting
recognition. In addition, we analyzed the special characteristics of Arabic script which
differentiates it from other script categories.
To this end, the preprocessing phase has been explored and discussed in Chapter 4; in
this phase Douglas&Peucker's algorithm was selected to reduce the data amount upon
applying a low-pass filter method, and then the data point sequences were re-sampled.
In Chapter 5, three types of features: local feature, semi-local feature and global
feature have been extracted from the preprocessed point sequences. These feature
sequences were discretized to an observation sequences. In addition, we have shown how
the delayed strokes are detected and incorporated in the observation sequence.
The recognition framework was described in Chapter 6. Left-to-right HMM has been
selected to represent each Arabic-letter’s shape. Using this model three algorithms were
introduced to recognize an Arabic word given its observation sequences. The second
algorithm is an optimized version of the first. For optimization reasons, the letter models
were embedded in a grammar network representing the word-part network. Then the third
algorithm has been proposed to search for the most probable word that maximizes the
given observation sequences and the corresponding word-part networks. Then we have
- 120 -
shown how to recognize different writing styles (letter-classes). Additionally, Chapter 6
explored how the process of training works with resolving the automatic number classes
determination for each letter and the number of states assigned to the letter-class models.
During the two years of research, an online Arabic handwriting recognition system
was developed as described in Chapter 7. Chapter 8 focused on the collection of word
samples by four users. Writer-dependent results were reported with an average of 96.46%
tested with a dictionary of 5K words, 95.5% of 10K words, 92.86% with 20K words,
90.84% of 30K and 89.75% of 40K words. Writer-independent results of six other writers
were obtained with a recognition rate average of 96.46% tested with a dictionary of 5K
words, 95.21% of 10K words, 92.55% with 20K words, 89.68% of 30K and 88.01% of
40K words. Finally, Chapter 9 was dedicated to the exploration of some points and
problems which remain to be investigated in our future studies.
- 121 -
References
[1] R. Plamondon and S. Srihari. “On-line and off-line handwriting recognition: A comprehensive survey”. IEEE Trans. On Pattern Analysis and Machine Recognition, 22(1):63–84, January 2000.
[2] http://www.handwriting.org
[3] M. Sakkal, “The Art of Arabic Calligraphy” - Seattle Art Museum Resource Room, display boards; Seattle, March 1993. http://www.sakkal.com/ArtArabicCalligraphy.html
[4] Y. Haralambous, “Simplification of the Arabic Script,”, Vol. 1375 archive Proceedings of the 7th International Conference on Electronic Publishing, Held Jointly with the 4th International Conference on Raster Imaging and Digital Typography, Pages: 138 - 156, 1998, ISBN:3-540-64298-6 , Verlag London, UK.
[5] Backwalter Transliterations for Arabic Letters: http://www.ldc.upenn.edu/myl/morph/buckwalter.html
[6] J. Makhoul, T. Starner, R. Schwartz, G. Chou. “On-line cursive handwriting recognition using speech recognition methods”, in Proceeding of IEEE ICASSP’94 Adelaide, Australia, April 1994, pp. v125-v128.
[7] Homayoon S.M. Beigi, “An Overview of Handwriting Recognition,” Proceedings of the 1st Annual Conference on Technological Advancements in Developing Countries, Columbia University, New York, July 24-25, 1993, pp. 30-46.
[8] C.C. Tappert, C.Y. Suen, T. Wakahara, “The State of the Art in On-Line Handwriting Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 8, pp. 787-808, Aug. 1990.
[9] Scott D. Connell, “Online Handwriting Recognition Using Multiple Pattern Class Models”, PhD thesis Michigan State University, 2000.
[10] C. C. Tappert, “Adaptive on-line handwriting recognition,” presented at The 7th International Conference on Pattern Recognition, Montreal , Canada, pp. 1004-1007, July-Aug. 1984.
[11] J.J. Brault and R. Plamondon, “Segmenting Hanwritten Signatures at Their Perceptually Important Points,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol, 15, No. 9, pp. 953-957, September 1993.
[12] Han Shu S. B., “On-Line Handwriting Recognition Using Hidden Markov Models,” Electrical Engineering and Computer Science Massachusetts Institute of Technology (1996) – M.S. thesis.
[13] C. C. Tappert, “Cursive Script Recognition by Elastic Matching,” IBM Journal of Research and Development, Vol.26, Nov. 1982, pp. 765-771.
[14] L. R. Rabiner. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” In A. Waibel and K.-F. Lee, editors, Readings in Speech Recognition, pages 267–296. Kaufmann, San Mateo, CA, 1990.
[15] D. Nahamoo and M. Analoui, “Speech Recognition Using Hidden Markov Models,” First Annual Conference on Technological Advancement in Developing Countries, Columbia University, New York, June 24-25, 1993.
- 122 -
[16] L.R. Bahl, P.F. Brown, P.V. deSouza, and R.L. Mercer, “Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition,” Proc. ICASSP'86, Tokyo, Japan, pp. 49-52, Oct. 1986.
[17] L. R. Bahl, F. Jelinek, and R. L. Mercer, “A Maximum Likelihood Approach to Continuous Speech Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 5, no. 3, pp. 179-190, March 1983.
[18] K.S. Nathan, H.S.M. Beigi, J. Subrahmonia, G.J. Clary, H. Maruyama. Real-Time on-line unconstrained handwriting recognition using statistical methods, in: Proceeding of IEEE ICASSP’95, Detroit, USA, June 1995, pp. 2619-2622.
[19] J. Hu, S.G. Lim, M.K. Brown, “Writer independent on-line handwriting recognition using an HMM approach,” Pattern Recognition 33 (2000) 133-147.
[20] J, Hu, M. K. Brown, W. Turin, “Handwriting Recognition with Hidden Markov Models and Grammatical Constraints,” AT&T Bell Laboratories Murray Hill, New Jersey 07974.
[21] J. J. Lee, J. Kim and J. H. Kim, “Data-Driven Design Of HMM Topology For Online Handwriting Recognition,” International Journal of Pattern Recognition and Artificial Intelligence Vol. 15, No. 1 (2001) 107-121 © World Scientific Publishing Company doi:10.1142/S021800140100076z.
[22] S. Bercu and G. Lorette, “On-line handwritten word recognition: An approach based on hidden Markov models”. In Proceeding Third Int. Work-shop on Frontiers in Handwriting Recognition,
pages 385–390, Buffalo, USA, May 1993.
[23] A. Amin. “Recognition of Printed Arabic Text Based on Global Features and Decision Tree Learning Techniques,” Pattern Recognition, 33(8):1309–1323, August 2000.
[24] A. Elgammal and M.A. Ismail. “A Graph-Based Segmentation and Feature Extraction Framework for Arabic Text Recognition,” In ICDAR’01, 2001.
[25] S. S. El-Dabi, R. Ramsis, and A. Kuwait. “Arabic Character Recognition System: a Statistical Approach for Recognizing Cursive Typewritten Text,” Pattern Recognition, 23(5):485–495, 1990.
[26] E. J. Erlandson, J. M. Trenkle, and R. C. Vogt. “Word-Level Recognition of Multifont Arabic Text Using a Feature Vector Matching Approach.” In Proc. SPIE, Document Recognition III, Luc M. Vincent; Jonathan J. Hull; Eds., volume 2660, pages 63–70, March 1996.
[27] S. Al-Emami and M. Urber. “On-line Recognition of Handwritten Arabic Characters,” IEE Transaction on Pattern Analysis and machine Intelligence VOL. 12. NO. 7, July 1990, pp. 704-710
[28] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society, vol. 39, pp. 1-21, 1977.
[29] L. E. Baum and G. R. Sell, “Growth Functions for Transformations on Manifolds,” Pac. J. Math., vol. 27, pp. 221-227, 1968.
[30] C. D. Manning and H. Schutze, “Foundations of Statistical Natural Language Processing” (book) pages: 317-338
[31] W. Guerfali and R. Plamondon. “Normalizing and Restoring On-line Handwriting”. Pattern
Recognition, 26(3):419 – 431, 1993.
- 123 -
[32] H. Beigi, “Pre-Processing the Dynamics of On-Line Handwriting Data, Feature Extraction and Recognition,” Proc. 5th Int. Workshop on Frontiers in Handwriting Recognition, Colchester, England, pp. 255-258, Sept. 1996.
[33] D. Douglas & T. Peucker, “Algorithms for the Reduction of the Number of Points Required to Represent a Digitized line or its Caricature”, The Canadian Cartographer 10(2), 112-122 (1973).
[34] H. S.M. Beigi, K. Nathan, Gregory J. Clary, and Jayashree Subrahmonia, “Challenges of Handwriting Recognition in Farsi, Arabic and Other Languages with Similar Writing styles, An On-line Digit Recognizer”.
[35] D. J. Burr. “A Normalizing Transform for Cursive Script Recognition”. In Proc. 6th ICPR, volume 2, pages 1027-- 1030, Munich, October 1982.
[36] M. K. Brown and S. Ganapathy. Preprocessing techniques for cursive script recognition. Pattern Recognition, 16(5):447--458, November 1983.
[37] H. S. M. Beigi, K. Nathan, G. J. Clary, and J. Subrahmonia. “Size Normalization in On-line Unconstrained Handwriting Recognition”. In Proc. ICASSP'94, pages 169--172, Adelaide, Australia, April 1994.
[38] M. Schenkel, I. Guyon, and D. Henderson. “On-line Cursive Script Recognition Using Time delay Neural Networks and Hidden Markov Models”. In R. Plamondon, editor, Special Issue of Machine Vision and Applications on Cursive Script Recognition. Springer Verlag, 1995.
[39] Y. Bengio and Y. LeCun. “Word Normalization for Online Handwritten Word Recognition”. In Proc. 12th ICPR, volume 2, pages 409--413, Jerusalem, October 1994.
[40] S. A. Guberman and V. V. Rozentsveig. “Algorithm for the Recognition of Handwritten Text”. Automation and Remote Control, 37 37(5):751--757, May 1976.
[41] I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet, “UNIPEN project of on-line data exchange and benchmarks,” presented at International Conference on Pattern Recognition, ICPR'94, Jerusalem, Israel, 1994.
[42] M. Maamouri, A. Bies, H. Jin, and T. Buckwalter. 2003. Arabic treebank: Part 1 v 2.0. Distributed by the Linguistic Data Consortium. LDC Catalog No.: LDC2003T06.
[43] Alarabi Magazine: http://www.alarabimag.com/main.htm
[44] www.aljazeera.net
[45] A. Leroy and Irisa, “Lexicon Reduction Based On Global Features, For On-Line Handwriting,” Campus de Beaulieu - 35042 RENNES cedex - France - aleroy@irisa.fr