Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition...
Transcript of Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition...
![Page 1: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/1.jpg)
Recognition of Cursive Roman Handwriting– Past, Present and Future
H. Bunke
Department of Computer Science, University of Bern
Neubruckstrasse 10, CH-3012 Bern, Switzerland
Acknowledgments:
- S. Gunter, T. Varga, M. Zimmermann
- Swiss National Science Foundation (20-5287.97 and IM2)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.1/61
![Page 2: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/2.jpg)
Introduction
optical character recognition (OCR)
��
��
��
HH
HH
HH
Oriental Script Roman Script
��
���
HH
HHH
machine printed text handwritten text
��
�
HH
H
on-line off-line
��
HH
isolated cursive
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.2/61
![Page 3: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/3.jpg)
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.3/61
![Page 4: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/4.jpg)
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.4/61
![Page 5: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/5.jpg)
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.5/61
![Page 6: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/6.jpg)
Introduction
(why) is it difficult?
• large variation in personal handwriting style
• different writing instruments
• segmentation problem
• large vocabulary (possibly open)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.6/61
![Page 7: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/7.jpg)
Introduction
hundert
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.7/61
![Page 8: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/8.jpg)
Introduction
is there any future need for automatic handwriting recognition?
• applications with commercial potential: address, form and check reading
• digital libraries, transcription of historical archives
• "non-death" of paper and new devices for handwriting acquisition
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.8/61
![Page 9: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/9.jpg)
Introduction
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.9/61
![Page 10: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/10.jpg)
Contents
1. Introduction
2. State of the Art
3. Current Developments
4. Future Trends
5. Conclusion
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.10/61
![Page 11: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/11.jpg)
Document Image Preprocessing
standard operations include
• noise filtering
• binarization
• thinning
• skew correction
• slant correction
• estimation of baseline and main writing zones
• horizontal and vertical scaling
• additional problem dependent methods to separate handwriting frombackground
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.11/61
![Page 12: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/12.jpg)
Document Image Preprocessing
original image final result
binarized image deslanted image
thinned image estimation of writing zones
estimation of slant deslanted and deskewed image
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.12/61
![Page 13: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/13.jpg)
Isolated Character Recognition
• usually cast as a classification problem
• consists of preprocessing, feature extraction, and classification
features for isolated character recognition:
• raw pixels
• derived from series expansion, moments, etc.
• projection based features, contour based features
• structural features: end points, forks, junctions, etc.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.13/61
![Page 14: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/14.jpg)
Isolated Character Recognition
classifiers for isolated character recognition:
• nearest-neighbor
• Bayes classifier
• neural nets
• SVM, etc.
which classifier is best?
• depends on many factors, for example, available training set, number offree parameters, time & memory constraints, etc.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.14/61
![Page 15: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/15.jpg)
Cursive Word Recognition
• major problem: segmentation
• Sayre’s paradox
• three approaches
− holistic− segmentation-based (oversegment and merge)− segmentation-free (Hidden Markov Models, HMM)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.15/61
![Page 16: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/16.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x01
...x0n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.16/61
![Page 17: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/17.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x11
...x1n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.17/61
![Page 18: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/18.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x21
...x2n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.18/61
![Page 19: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/19.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x31
...x3n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.19/61
![Page 20: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/20.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x41
...x4n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.20/61
![Page 21: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/21.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x51
...x5n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.21/61
![Page 22: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/22.jpg)
Hidden Markov Models (HMMs)
slidingwindow
featurevector
↓0
B
B
@
x61
...x6n
1
C
C
A
HMM S1
P11
S2P12
P(X)
P22
S3P23
P(X)
P33
...
P(X)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.22/61
![Page 23: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/23.jpg)
General Text Recognition
• segmentation-based: segment line of text into individual words, then usecursive word recognizer
• segmentation-free: segmentation and recognition are integrated
− concatenate HMM word to word sequence (or sentence) models− use constraints to narrow down the search-space, for example,
soft-constraints derived from n-gram language models
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.23/61
![Page 24: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/24.jpg)
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.24/61
![Page 25: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/25.jpg)
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1
i)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.25/61
![Page 26: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/26.jpg)
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1
i) p(w2
i|w1
j)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.26/61
![Page 27: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/27.jpg)
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1
i) p(w2
i|w1
j) p(w3
i|w2
j)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.27/61
![Page 28: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/28.jpg)
Segmentation-free Word Sequence Recognition
• concatenation of HMM
w1
w2
wn
w1
w2
wn
w1
w2
wn
...
...
...
p(w1
i) p(w2
i|w1
j) p(w3
i|w2
j)
• bi-gram language model
word next word probability
to the 0.009333
to be 0.002239
to a 0.000138
to have 0.000105
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.28/61
![Page 29: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/29.jpg)
Recognition Experiment
40
50
60
70
80
0 1000 2000 3000 4000 5000 6000 7000 8000
Wor
d R
ecog
nitio
n R
ate
[%]
Vocabulary Size [n]
Simple Sentence ModelUnigram ModelBigram Model
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.29/61
![Page 30: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/30.jpg)
Some Recent Trends
• databases for development and performance evaluation
• multiple classifier systems
• synthetic training data
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.30/61
![Page 31: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/31.jpg)
Databases
• isolated characters and words:− CEDAR− NIST− CENPARMI− ELT9− IRESTE− ...
• cursively handwritten text− Senior/Robinson, PAMI 1998− Elliman/Sherkat, ICDAR 2001− IAM, collection in progress (since about 1997)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.31/61
![Page 32: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/32.jpg)
Some Details of the IAM Database
• more than 1,500 scanned pages of handwritten text
• material from over 600 individual writers− 95,000 correctly segmented words− over 13,000 lines of text− over 5,000 complete sentences
• covering a vocabulary of over 12,000 words
• ground truth and lexical tags available (LOB corpus)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.32/61
![Page 33: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/33.jpg)
Some Details of the IAM Database (2)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.33/61
![Page 34: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/34.jpg)
Some Details of the IAM Database (3)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.34/61
![Page 35: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/35.jpg)
Multiple Classifier Systems
• motivation: use a group of experts rather than a single expert
• many approaches to handwriting recognition have been proposed usingmcs’s
• often the basic classifiers are constructed ’by hand’
• recently so-called ensemble methods have been proposed:− they require only a single classifier to be constructed by hand− the classifier ensemble is generated automatically
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.35/61
![Page 36: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/36.jpg)
Multiple Classifier Systems (2)
"classical" approach
input resultcombiner
nc
1c
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.36/61
![Page 37: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/37.jpg)
Multiple Classifier Systems (3)
c1
cn
combiner resultinput
ensemble method
generateautomatically
base classifier
c
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.37/61
![Page 38: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/38.jpg)
Issues in MCS’s
• ensemble generation− bagging− feature subspace− boosting− others
• combination− voting− rank sum− weighted voting− trainable classifier
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.38/61
![Page 39: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/39.jpg)
Some Results
recognition rates achieved by various ensemble generation methods
algorithm recognition rate
Bagging 68.11%
AdaBoost 68.67%
random subspace 67.35%
feature selection 71.58%
original classifier 66.23%
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.39/61
![Page 40: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/40.jpg)
Synthetic Generation of Training Data
• all recognizers need to be trained
• the larger the training set, the better the performance("you never have enough training data")
• but collection of training data is expensive
• previous work on generation of synthetic training data:− machine printed OCR [Baird et al.]− Arabic and Chinese OCR− isolated characters− (synthetic handwriting for other purposes [Guyon, Plamondon])
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.40/61
![Page 41: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/41.jpg)
Synthetic Generation of Training Data
• no work on synthetic training data generation for cursive Romanhandwriting recognition
• two approaches:− using templates− applying geometric distortions to existing handwritten text
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.41/61
![Page 42: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/42.jpg)
Synthetic Handwriting from Templates
• templates extracted from forms
• templates extracted from running text, using HMM in forced alignmentmode
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.42/61
![Page 43: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/43.jpg)
Synthetic Handwriting from Templates (2)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.43/61
![Page 44: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/44.jpg)
Synthetic Handwriting from Templates (3)
• disadvantages:− all instances of a character are identical− no ligatures
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.44/61
![Page 45: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/45.jpg)
Synthetic Handwriting from N-Grams
• compile a list of frequent 3- and 2-tuples from an electronic corpus
• extract templates of these tuples from a handwritten text, using forcedalignment
• split the given text into available tuples and generate the synthetichandwriting
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.45/61
![Page 46: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/46.jpg)
Synthetic Handwriting from N-Grams (2)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.46/61
![Page 47: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/47.jpg)
Some Results
0 1 2 3 4 560
62
64
66
68
70
72
74
training set
reco
gniti
on r
ate
[%]
• 1193 word instances; 16 writers; 357 word vocabulary
• 80% training; 20% testing; 5-fold cross validation
• 1 = natural training data2 = synthetic training data3 = synthetic training data4 = synthetic training data
• test data: always natural
• except for the training data (natural/synthetic) identical conditions for allexperiments (same training/test words; same size of training/test set etc.)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.47/61
![Page 48: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/48.jpg)
Future Perspectives
• some random comments:
− MCS’s− synthetic training data− enhanced HMMs (for example, 2D)− enhanced language models− etc.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.48/61
![Page 49: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/49.jpg)
Future Perspectives
• to reach a new quality of recognition we need to go from text transcriptionto text understanding:
− include syntactic and semantic text analysis− include task specific knowledge (in addition to statistical parameter
estimation)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.49/61
![Page 50: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/50.jpg)
Who can read this?
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.50/61
![Page 51: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/51.jpg)
Who can read this?
When I was in high school, my physics teacher - whose namewas Mr. Bader - called me down one day after physics classand said, "You look bored; I want to tell you something inte-resting." Then he told me something which I found fascina-ting, and have, since then, always found fascinating....The subject # is this - the principle of least action.Richard P. Feynman: The Feynman Lectures, Volume II.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.51/61
![Page 52: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/52.jpg)
Who can read this?
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.52/61
![Page 53: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/53.jpg)
Who can read this?
Középiskolás koromban, egy nap a fizikatanárom - Bader úrnakívták - magához hívott fizikaóra után és azt mondta: "Unott-nak látszol; szeretnék mondani neked valami érdekeset." Majdelmondott valamit, amit elbûvölõnek találtam, és az-óta is mindig elbûvölõnek találom ... A legkisebb hatáselvérõl van szó.Richard P. Feynman: The Feynman Lectures, Volume II.
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.53/61
![Page 54: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/54.jpg)
Integration of Grammatical Knowledge
• prerequisites:
− a word sequence recognizer that produces an n-best list (see before)− a stochastic context free grammar− a parser to compute the probability of a sentence or the most
probable parse tree
• procedure:
− reorder the n-best list from the recognizer taking parse probabilitiesinto account
final score = recognition score + γ f(parse probability)
where γ is a normalization factor and f(.) is a normalization function
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.54/61
![Page 55: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/55.jpg)
Example of Grammatical Knowledge Integration
Rank Recognition Score Candidate Sentence
1 23923.6 She has put up the value other money .
2 23921.8 She has put up the value of her money .
3 23890.3 She had put up the value other money .
4 23888.4 She had put up the value of her money .
5 23854.3 She has put up the value at her money .
Rank Parse Prob. Candidate Sentence
1 1.58352e-19 She had put up the value of her money .
2 4.62861e-20 She has put up the value of her money .
3 1.12458e-21 She has put up the value at her money .
4 2.63105e-22 She had put up the value other money .
5 7.69052e-23 She has put up the value other money .
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.55/61
![Page 56: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/56.jpg)
Example of Grammatical Knowledge Integration
Rank Recognition Score Candidate Sentence
1 23923.6 She has put up the value other money .
2 23921.8 She has put up the value of her money .
3 23890.3 She had put up the value other money .
4 23888.4 She had put up the value of her money .
5 23854.3 She has put up the value at her money .
Rank Parse Prob. Candidate Sentence
1 1.58352e-19 She had put up the value of her money .
2 4.62861e-20 She has put up the value of her money .
3 1.12458e-21 She has put up the value at her money .
4 2.63105e-22 She had put up the value other money .
5 7.69052e-23 She has put up the value other money .
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.56/61
![Page 57: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/57.jpg)
Some Experimental Results
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
0 10 20 30 40 50 60 70 80 90 100
Sen
tenc
e R
ecog
nitio
n R
ate
[%]
Rank [n]
Reordered 100-Best ListBaseline System
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.57/61
![Page 58: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/58.jpg)
Future Challenge
• to deal with human factors (i.e. errors and abnormalities introduced byhumans)
− statistical modeling has proven very useful− however we also need to incorporate task specific knowledge
provided by human experts
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.58/61
![Page 59: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/59.jpg)
Sample Check Images
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.59/61
![Page 60: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/60.jpg)
Sample Check Images (2)
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.60/61
![Page 61: Recognition of Cursive Roman Handwriting – Past, Present ... · optical character recognition (OCR) HH HH HH Oriental Script Roman Script HH HHH machine printed text handwritten](https://reader034.fdocuments.net/reader034/viewer/2022042807/5f7cb7d075dc1156d60ea5af/html5/thumbnails/61.jpg)
Conclusions
• the recognition of cursive Roman handwriting has been a subject ofresearch for several decades
• for specific tasks some level of maturity has been reached andcommercial systems have become available
• some other tasks, particularly the recognition of unconstrained generaltext, need much more research
• these tasks are interesting for practical applications
• there do exist promising directions to further develop the field
Recognition of Cursive Roman Handwriting – Past, Present and Future – p.61/61