Signal Processing and AnalysisImage analysis III
W
V
Benny Thörnberg
Associate professor
in electronics
Copyright (c) Benny Thörnberg 1:41
Outline•Shape and texture of objects
•Computation of image gradients
•EHD – Edge Histogram Descriptor
•HOG – Histogram of Oriented Gradients
•Original Character Recognition
•Fundamental steps of OCR
•Training sets
•Minimum distance classifier
•Extension of feature vector to improve OCR
•Summary of performance for OCR
•Principal Component Analysis
Copyright (c) Benny Thörnberg 2:41
Shape and texture of objects
Is there a method available to compute a descriptor that is compact enough and still
provide enough information for a computerized classifier to identify objects in pictures?
Shape and texture of objects seem to capture enough information in order for a human to
distinguish between different kinds of objects present in an image
Copyright (c) Benny Thörnberg 3:41
Convolving image I with the Sobel matrixes gives the gradient vector
Computation of image gradients
IGX ∗
−
−
−
=
101
202
101
IGY ∗
−−−
=
121
000
121
( )YX GGG ,=
Gradient magnitude
and orientation (angle)
22)( YX GGGGmag +==
=
−
)(cos 1
Gmag
GXθ
Copyright (c) Benny Thörnberg 4:41
≈ �� + ��
Approximation used for
Sobel operator
EHD – Edge Histogram Descriptor
If the statistical distribution of all gradient vectors in a
neighborhood is collected into a histogram, we have created
a descriptor that captures local salient texture in image.
But how to create a descriptor that also has the ability to
capture shape of an object such as the woman in picture?
Copyright (c) Benny Thörnberg 5:41
EHD – Edge Histogram Descriptor
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30 31 32
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Divide image into blocked sub-images. Compute histogram
for each block and append all histograms into a large vector.
A spatial coding is thus created such that the combined
histograms can capture information about global salient
shapes of objects.
Copyright (c) Benny Thörnberg 6:41
EHD – Edge Histogram Descriptor
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Classifier
Type of object ( human, giraffe, bird … )
Large datasets of images
used for training
L. Touil, A.B. Abdelali and M. Abdelatif, “A hardware acceleration of real time video processing”, Proc. of16th IEEE
Mediterranean Electrotechnical Conference, 28 March 2012, Yasmine Hammamet, Tunisia.
H. Ayad, S.N.H.S. Abdulah and A. Abdullah, “Visual Object Categorization based on Orientation Descriptor”, Proc. of 6th
Asia Modelling Symposium (AMS 2012), 29-31 May 2012, Bali, Indonesia.
Copyright (c) Benny Thörnberg 7:41
Histograms of Oriented Gradients - HOG
Reference: Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”
Overlapping blocks
Cells
NormalizeGamma
and Colour
Computegradient vectors
Weighted vote intohistograms of gradient
orientations, onehistogram for each cell
Contrast normalizecells within
overlapping blocks
Collect HOG’s intoa descriptive
vector for wholedetection window
Linearclassifier(SVM)
Input image
Human detectedor
No human
Copyright (c) Benny Thörnberg 8:41
Histograms of Oriented Gradients - HOG
Reference: Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”
12
−
=
k
k
V
Vv
Vector v is a normalized
block histogram,
computed from
histograms Vk, built from
all cell histograms
belonging to a single
block
Finally, a large feature vector is built from appending all normalized block histograms into a
long feature vector for whole detection window.
NormalizeGamma
and Colour
Computegradient vectors
Weighted vote intohistograms of gradient
orientations, onehistogram for each cell
Contrast normalizecells within
overlapping blocks
Collect HOG’s intoa descriptive
vector for wholedetection window
Linearclassifier(SVM)
Input image
Human detectedor
No human
Cell = 5x5 pixels
Block = 3x3 cells
Detection window = 128x64 pixels
Copyright (c) Benny Thörnberg 9:41
Histograms of Oriented Gradients - HOG
Examples from a training set of 924 images made available at MITMassachusetts Institute of Technology
Ref: http://cbcl.mit.edu/software-datasets/PedestrianData.html
This “large” collection of images can be used to train a classifier to recognise pedestrians in images based on computed image descriptors. It provides a test bench for researchers to compare performance of different methods for pedestrian detection.
Copyright (c) Benny Thörnberg 10:41
Histograms of Oriented Gradients - HOG
Reference: Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”
Dalal and Triggs evaluated performance and achieved a miss rate of 10% at 10-4 False Positives Per Window.
Cell size was 4x4 pixels and block size was 2x2 cells, block stride = 8 pixels and detection window = 64 x 128 pixels
Voting for 9 bins per local HOG using weights linearly proportional to gradient strength
Copyright (c) Benny Thörnberg 11:41
Original Character Recognition - OCR
We will investigate and show how scanned copies of printed letters automatically can be recognised as a string of letters
Copyright (c) Benny Thörnberg 12:41
Fundamental steps of the OCR system
Image acquisition
Preprocessing
Segmentation
Feature extraction
Classification
Labeling
A scanner is used to capture images of papers having sequences of letters printed on it
Copyright (c) Benny Thörnberg 13:41
Fundamental steps of the OCR system
Image acquisition
Preprocessing
Segmentation
Feature extraction
Classification
Labeling
Gradient vector is computed from the Sobel matrixes
Copyright (c) Benny Thörnberg 14:41
Fundamental steps of the OCR system
Image acquisition
Preprocessing
Segmentation
Feature extraction
Classification
Labeling
Gradient image is segmented into a binary image based on thresholdingthe gradient magnitude
Copyright (c) Benny Thörnberg 15:41
Fundamental steps of the OCR system
Image acquisition
Preprocessing
Segmentation
Feature extraction
Classification
LabelingLabelling identifies each letter as single image components
Copyright (c) Benny Thörnberg 16:41
Fundamental steps of the OCR system
Image acquisition
Preprocessing
Segmentation
Feature extraction
Classification
Labeling
Histograms of the gradient orientations (angle) are built from the gradient image developed at preprocessing and only for pixels belonging to an image component. This step generates an Edge Histogram Descriptor (EHD) for each segmented letter.
Copyright (c) Benny Thörnberg 17:41
EHD from training sets for letters R and B
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25Histogram Of Gradients
Gradient angle [degrees]
Pro
babili
ty
0 20 40 60 80 100 120 140 160 1800.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
Histogram Of Gradients
Gradient angle [degrees]
Pro
babili
ty
Copyright (c) Benny Thörnberg 18:41
EHD from training sets for letters T and S
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Histogram Of Gradients
Gradient angle [degrees]
Pro
babili
ty
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25
0.3
0.35Histogram Of Gradients
Gradient angle [degrees]
Pro
babili
ty
Copyright (c) Benny Thörnberg 19:41
EHD from training sets for letter O
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25Histogram Of Gradients
Gradient angle [degrees]
Pro
babili
ty
Copyright (c) Benny Thörnberg 20:41
Feature vectors for letters R,B,T,S and O
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25Histogram Of Gradients
Gradient angle [degrees]
Pro
bability
0 20 40 60 80 100 120 140 160 1800.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
Histogram Of Gradients
Gradient angle [degrees]
Pro
bability
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5Histogram Of Gradients
Gradient angle [degrees]
Pro
babili
ty
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25
0.3
0.35Histogram Of Gradients
Gradient angle [degrees]
Pro
bability
0 20 40 60 80 100 120 140 160 1800
0.05
0.1
0.15
0.2
0.25Histogram Of Gradients
Gradient angle [degrees]
Pro
bability
R B
T S
O
These graphs are
generated for all letters
within a training set of
letters. The width of the
line reveals a statistical
distribution among letters
belonging to the same
class.
Copyright (c) Benny Thörnberg 21:41
Classification
Image acquisition
Preprocessing
Segmentation
Feature extraction
Classification
Labeling
From the EHD feature vectors and its statistical distribution over a training set of images, we can classify each feature vector as belonging to a letter (class) at a an estimated probability of correctness.
Copyright (c) Benny Thörnberg 22:41
Classification
This graph is showing a 3-dimensional feature space having five clearly separable classes.
Copyright (c) Benny Thörnberg 23:41
Minimum Distance Classifier
H.Lin and A.N. Venetsanopoulos, “ A weighted Minimum Distance
Classifier for Pattern Recognition”, Canadian Conference on
Electrical and Computer Engineering, vol.2, 904-907, 1993.
X
Euclidian distances from feature vector to mean vectors of all classes.
���� Select the class giving the shortest distance.
Copyright (c) Benny Thörnberg 24:41
Classification of a string of letters
SRBRTSTTRROOBBOOSS
Input
Output
From experiments, the classification success rate using an
extension to15 bins EHD feature vector covering 360 degrees is
estimated to 70 percent for hand written capital letters A to Z.
Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”
Copyright (c) Benny Thörnberg 25:41
EHD feature for machine written letters
If instead machine printed letters are used, classification success
rate is improved to 87 percent. Letters such as R and B are still
hard to distinguish between.
Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”
Copyright (c) Benny Thörnberg 26:41
Extension of the feature vector
Displacement vector between centre of gravity and centre of bounding box. Two elements are thus added to feature vector.
How can the EHD feature vector be extended with additional features to improve possibilities for distinguishing between letters such as R and B?
If this vector is given as a fraction of the bounding box side length, this feature becomes scale invariant.
Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”
Copyright (c) Benny Thörnberg 27:41
Extension of feature vector
A four bins zonalhistogram is built from the probabilities of having letters within the four indicated areas. These zones are defined from the bounding box parameters.
This feature has four dimensions and thus adds four additional elements to the feature vector.
Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”
Copyright (c) Benny Thörnberg 28:41
After extension of the feature vector
After adding displacement vector and zonal histogram,
classification success rate becomes 86 percent for hand written
letters and close to 100 percent for machine printed capital letters.
Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”
Copyright (c) Benny Thörnberg 29:41
Percentage of success rate
Summary of performance for OCR
Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”
EHD EHD + Geometrical
Hand written 70 86
Machine printed 87 ~100
Copyright (c) Benny Thörnberg 30:41
Overview of feature space
-2
0
2
4
-3
-2
-1
0
1
-1
0
1
2
3
4
PCA comp 1PCA comp 2
PC
A c
om
p 3
B
R
S
T How can we get an overview of a feature space
that has 21 dimensions as in the previous OCR
example?
Typically, there is correlation between variables in
multi-dimensionally data used as input to
classification.
If so, then a large amount of the variance can be described by projecting e.g. 21-
dimensional data onto two or three variables. We call the new variables Principal
components.
The 3D example graph shows data clusters corresponding to letters B,R,S and T. Still 58%
of the variance of original data is described by this graph.
Copyright (c) Benny Thörnberg 31:41
Principal Component Analysis
��
��
�
��
�� is a data vector having K dimensions (only three dimensions are
shown in graph). �� = ��,� , � ∈ 1…�The 2D data matrix � is a set of � data vectors where each vector
�� represent a statistical observation
� = ��,� , � ∈ 1…� ∧ � ∈ 1…�
Input data
Copyright (c) Benny Thörnberg 32:41
Principal Component AnalysisMean vector
��
��
�
Each point is a data vector of length K
This means that input data matrix X represents a
swarm of N points in a K-dimensional space
A mean value ��vector is computed as,
��� =1����,�
�
��� , ∀� ∈ 1. . �
Copyright (c) Benny Thörnberg 33:41
Principal Component AnalysisCentering of data
��
��
�
Subtract the mean vector �� from all data vectors.
This means that all data vectors are equally
translated in a K-dimensional space such that the
center of point cloud is relocated to the origin.
Copyright (c) Benny Thörnberg 34:41
Principal Component AnalysisScaling of data
��
��
�
Divide all data vectors with a variance vector �̅,
��̅ =1�� ��,� − ��� �
�
��� , ∀� ∈ 1. . �
This means that after scaling, all K dimensions will
have unity variance,
1�� ��,� − ��� � = 1
�
��� , ∀� ∈ 1. . �
Copyright (c) Benny Thörnberg 35:41
��
��
�
"#�
Principal Component AnalysisFirst component
• A principal component (PC) is a vector in K-dimensional X-space
that passes through the origin
• Scores are projections of data vectors (blue point)
• The PC is oriented such that the scores approximate the original
data as well as possible
Copyright (c) Benny Thörnberg 36:41
��
��
�
"#�
Principal Component AnalysisSecond component
• The second PC also passes through the origin
• It is oriented to improve approximation of original data
as much as possible but under the constraint that
second PC should be orthogonal to the first one
• The two PCs are describing a plane which we can
think of as a window into X-space
"#�
Copyright (c) Benny Thörnberg 37:41
��
��
�
"#�
Principal Component AnalysisSecond component
• The blue point is a score computed as a projection on the plane
defined by PC1 and PC2.
• Scores are approximations of their corresponding data points
• The model used for approximation in this case is plane
• If more PCs are included, the model becomes a hyperplane
• More PCs will gradually improve the approximation of data
"#�
X-space here is illustrated with three axes.
Remember that real-world X-spaces can have
hundreds or thousands of dimensions.
Copyright (c) Benny Thörnberg 38:41
��
��
�
"#�
Principal Component AnalysisLoadings
• The direction of each PC is given by their angles (cosines) to
all K axes in X-space.
• In this example, the direction of PC1 is given by the cosines of angles $�, $� and $"#�
$�
$
$�
• When scores are computed from input data, these cosines
define how much each dimension in X-space (variable)
contribute to the PC of score values.
See formula for projection.
• For that reason, cosines of directions are called “loadings”.
%& · ( = %& · ( · )*+,
%&
(,%& · (
Copyright (c) Benny Thörnberg 39:41
Principal Component AnalysisLoading plot
• A loading plot summarizes how each X-
variable “load” on each PC
• Points far away from origin have larger
impact on the model compared to points
closer to origin
• If two variable loadings are located very
close in this plot, it means that they are
positively correlated
• If the two variable loadings are located in
opposite sides of the origin, in diagonally
opposed quadrant, it means that those two
variables are negatively correlated
• It is the correlation between variables that
makes it possible to summarize hundreds of
variables in a few PCs.
-0.1
-0.05
0
0.05
0.1
-0.1
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
-0.1
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0.08
0.1
Component 1
Z1
Z4
G8G1G10
G2G15G9
G7
EX1
G14
G3
EX2G6
G11G13G5
G4
G12Z3
Z2
Component 2
Com
ponent
3
Copyright (c) Benny Thörnberg 40:41
Principal Component AnalysisSummary
• T: Scores are coordinates in a hyperplane that are used to approximate data
• P: A set of loading vectors that all together defines the orientation of the hyperplane.
Each loading vector defines the orientation of a PC
• E: A residual term captures the variation of data that can not be described by the model
• -.: Center of X-data are typically positioned in origin prior to computation of PCs.
� = �� + / ∗ "1 + 2Centering of data
Structure of data
Residuals contain noise
Copyright (c) Benny Thörnberg 41:41
Top Related