Introduction to Pattern Recognition
-
Upload
jakeem-barlow -
Category
Documents
-
view
102 -
download
0
description
Transcript of Introduction to Pattern Recognition
Introduction to Pattern Recognition
For 정보과학회 , Pattern Recognition Winter School
2011 년 2 월
김 진형KAIST 전산학과http://ai.kaist.ac.kr/~jkim
2
What is Pattern Recognition?
A pattern is an object, process or event that can be given a namePattern Recognition
assignment of physical object or event to one of several prespecified categeries -- Duda & Hart
A subfield of Artificial Intelligence
human intelligence is based on pattern recognition
3
Examples of Patterns
4
Pattern Recognition
Machine learningMathematical statisticsNeural networksSignal processingRobotics and visionCognitive scienceNonlinear optimizationExploratory data analysisFuzzy and genetic algorithmDetection and estimation theoryFormal languagesStructural modelingBiological cyberneticsComputational neuroscience…
Image processing /segmentationComputer visionSpeech recognitionAutomated target recognitionOptical character recognitionSeismic analysisMan and machine dialogueFingerprint identificationIndustrial inspectionMedical diagnosisECG signal analysisData miningGene sequence analysisProtein structure analysisRemote sensingAerial reconnaissance …
Related fields Application areas
5
패턴인식의 응용 예
Computer aided diagnosis
Medical imaging, EEF, ECG, X-ray mammography
영상인식공장 자동화 , Robot Navigation얼굴식별 , Gesture RecognitionAutomatic Target Recognition
음성인식Speaker identificationSpeech recognitionGoogle Maps Navigation (Beta): search by voice
6
생체 인식 (Biometrics Recognition)
불변의 생체 특징을 이용한 사람 식별정적 패턴
지문 , 홍채 , 얼굴 , 장문 , …DNA
동적 패턴Signature, 성문Typing pattern
활용출입통제전자상거래 인증
패턴인식의 응용
7
Gesture Recognition
Text editing on Pen ComputersTele-operations
Control remote by gesture inputTV control by hand motion
Sign language Interpretation
Camera2D Projection
Gesture (last)
Hand tracking Gesture spotting
패턴인식의 응용
8
데이터로부터 패턴의 추출 Data Mining
인구통계Point of SaleATM금융통계신용정보문헌첩보자료진료기록신체검사기록
데이타 정보 의사결정
A 상품 구매자의 80% 가 B 상품도 구매한다 (CRM)미국시장의 자동차 구매력이 6 개월간 감소A 상품의 매출 증가가 B 상품의 2배탈수 증상을 보이면 위험
광고전략은 ?상품의 진열최적의 예산 할당은 ?시장점유의 확대방안은 ?고객의 이탈 방지책은 ?처방은 ?
국내 사례 : 신용카드 사용 패턴의 학습에 의한 분실 카드 사용 방지
패턴인식의 응용
9
e-Book, Tablet PC, iPad, Smart-phone
패턴인식의 응용
Smart Phone with Rich Sensors
Online 한글 인식기 비교
11
패턴인식의 응용
KAIST Math Expression Recognizer : Demo
12
패턴인식의 응용
MathTutor-SE Demo
13
패턴인식의 응용
14
古文書 認識 : 承政院 日記
패턴인식의 응용
15
文書 認識 Verification & Correction
Interface패턴인식의 응용
16
Mail Sorter패턴인식의 응용
Scene Text Recognition
17
패턴인식의 응용
18
Autonomous Land Vehicle
(DARPA’s GrandChallenge contest)
http://www.youtube.com/watch?v=yQ5U8suTUw0
패턴인식의 응용
19
Protein Structure Analysis패턴인식의 응용
Protein Structure Analysis
20
패턴인식의 응용
21
Types of PR problems
ClassificationAssigning an object to a classOutput: a label of classEx: classifying a product as ‘good’ or ‘bad’ in quality control
ClusteringOrganizing objects into meaningful groupsOutput: (hierarchical) grouping of objectsEx: taxonomy of species
RegressionPredict value based on observationEx: predict stock price, future prediction
DescriptionRepresenting an object in terms of a series of primitivesOutput: a structural or linguistic descriptionEx: labeling ECG signals, video indexing, protein structure indexing
From Ricardo Gutierrez-Osuna,Texas A&M Univ.
22
Pattern Class
A collection of “similar” (not necessarily identical) objects
Inter-class variability
Intra-class variability
Pattern Class Modeldescriptions of each class/population (e.g., a probability density like Gaussian)
23
Classification vs Clustering
Classification (known categories) Clustering (creation of new categories)
Category “A”
Category “B”
Classification (Recognition) (Supervised Classification)
Clustering(Unsupervised Classification)
24
Pattern Recognition : Key Objectives
Process the sensed data to eliminate noise
Data vs Noise
Hypothesize models that describe each class population
Then we may recover the process that generated the patterns.
Choose the best-fitting model for given sensed data to assign the class label associated with the model.
25
일반적인 Classification 과정
ClassifierFeature
Extractor
Sensorsignal
Feature
Class Membership
Sensor
26
Example : Salmon or Sea Bass
Sort incoming fish on a belt according to two classes:
Salmon orSea Bass
Steps:Preprocessing (segmentation)Feature extraction (measure features or properties)Classification (make final decision)
27
Sea bass vs Salmon (by Image)
LengthLightnessWidthNumber and shape of finsPosition of the mouth …
28
Salmon vs. Sea Bass (by length)
29
Salmon vs. Sea Bass (by lightness)
Best Decision Strategy with lightness
30
Cost of Misclassification
There are two possible classification errors.(1) deciding a sea bass into a
salmon.(2) deciding a salmon into a sea
bass.
Which error is more important ?Generalized as Loss functionThen, look for the decision of minimun Risk
Risk = Expected Loss
Salmon
Sea Bass
Salmon
0 -10
Sea bass
-20 0
truth
decision
Loss Function
31
Classification with more features(by length and lightness)
It is possibly better.
Really ??
32
How Many Features and Which?
Choice of features determines success or failure of classification taskFor a given feature, we may compute the best decision strategy from the (training) data
Is called training, parameter adaptation, learningMachine Learning Issues
Issues with feature extraction:
Correlated features do not improve performance.It might be difficult to extract certain features.It might be computationally expensive to extract many features.“Curse” of dimensionality …
33
Feature and Feature Vector
34
− Length− Lightness− Width− …− Number
and shape of fins
− Position of the mouth
Goodness of Feature
35
Features and separability
36
Developing PR system
Sensors and preprocessing.A feature extraction aims to create discriminative features good for classification. A classifier. A teacher provides information about hidden state -- supervised learning. A learning algorithm sets PR from training examples.
Sensors and preprocessing
Feature extraction
Classifier
Classassignment
Learning algorithmTeacher
Pattern
37
PR Approaches
Template matchingThe pattern to be recognized is matched against a stored template
Statistical PR: based on underlying statistical model of patterns(features) and pattern classes.
Structural PR: Syntactic pattern recognitionpattern classes represented by means of formal structures as grammars, automata, strings, etc. Not only for classification but also description
Neural networksclassifier is represented as a network of cells modeling neurons of the human brain (connectionist approach).Knowledge is stored in the connectivity and strength of synaptic weights
Statistical structure AnalysisCombining Structure and statistical analysisBayesian Network, MRF 등의 Probabilistic framework 을 활용
…Modified From Vojtěch Franc
38
Template Matching
Template
Input scene
PR Approaches
39
Deformable Template Matching: Snake
Prototype registration to the low-level segmented
imageShape training set Prototype and variation learning
Prototype warping
Example : Corpus Callosum Segmentation
PR Approaches
40
From Ricardo Gutierrez-Osuna,Texas A&M Univ. PR Approaches
41
Classifier
The task of classifier is to partition feature space into class-labeled decision regions
Borders between decision regions decision boundariesDetermining decision region of a feature vector X
42
Representation of classifier
A classifier is typically represented as a set of discriminant functions
||,,1,:)( YX iGi x
The classifier assigns a feature vector x to the i-the class if )()( xx ji GG ij
)(1 xG
)(2 xG
)(|| xYG
maxx y
Feature vector
Discriminant function
Class identifier
From Vojtěch Franc
…
….
43
Classification of Classifiers by Form of Discriminant Function
Discriminant Function Classifier
A posteriori ProbabilityP( yi | X)
Bayesian
Linear Function Linear Discrinant Analysis, Support Vector Machine
Non-Linear Function Non-Linear Discrinant Analysis
Output of artificial Neuron
Artificial Neural Network
)(xiG
44
Bayesian Decision Making
Statistical approachthe optimal classifier with Minimum errorAssume that complete statistical model is known.
Decision given the posterior probabilities
X is an observation :
if P(1 | x) > P(2 | x) decide state of nature = 1
if P(1 | x) < P(2 | x) decide state of nature = 2
45
Searching Decision Boundary
46
Bayesian Rule : P(x|1) P(1|x)
jjj
iiiii PP
PP
P
PPP
)()|(
)()|(
)(
)()|()|(
x
x
x
xx
47
Limitations of Bayesian approach
Statistical model p(x,y) is mostly not known learning to estimate p(x,y) from training examples {(x1,y1),…,(x,y)}
Usually p(x,y) is assumed to be a parametric form
Ex: multivariate normal distribution
Non-parametric estimation of p(x,y) requires a large set of training samplesNon-Bayesian methods offers equally good (??)
From Vojtěch Franc
48
Polynomial Discriminative Function approaches
Assume that G(x) is a polynomial function
Linear function – Linear Discriminant Analysis (LDA)Quadratic function
Classifier design is determination of separating hyperplane.
From Vojtěch Franc
49
LDA Example : 기수 (J)- 농구선수 (H) 분리
x
2
1
x
x
height
weight
Task: 기수 (J)- 농구선수 (H) 분리
The set of hidden state is
The feature space is
},{ JHY2X
Training examples
)},(,),,{( 11 ll yy xx
1x
2x
Jy
Hy Linear classifier:
0)(
0)()q(
bifJ
bifH
xw
xwx
0)( bxw
From Vojtěch Franc
…
50
Artificial Neural Network Design
For a given structure, find best weight sets which minimizes sum of square error, J(w), from training examples {(x1,y1),…,(x,y)}
2
1
)(2
1)( k
l
kk ztJ
51
PR design cycle
Data collectionProbably the most time-intensive component of projectHow many examples are enough ?
Feature choiceCritical to the success of the PR projectRequire basic prior knowledge, engineering sense
Model choice and designStatistical, neural and structuralParameter settings
TrainingGiven a feature set and ‘blank’ model, adapt the model to explain the training dataSupervised, unsupervised, reinforcement learning
EvaluationHow well does the trained model do ?Overfitting vs. generalization
52
Learning for PR system
Which Feature is good for classifying given classes ?
Feature analysis
Can we get required probabilities or boundaries ?
Learning from training Data
Sensors and preprocessing
Feature extraction
Classifier
Classassignment
Learning algorithmTeacher
Pattern
53
Learning
Change of contents and organization of system’s knowledge enabling to improve to its performance on task - SimonWhen it acquire new knowledge from environment
Learning from Observationfrom trivial memorization to the creation of scientific theoriesInductive Inference
New consistent interpretation of data (observations)General conclusion from examplesInfer association between input and output with some confidence
Data MiningLearning rules from large set of data
Availability of large database allows application of machine learning to real problems
54
Learning Algorithm Categorization Depending on Available
Feedback
Supervised learningexamples of correct input/output pair is availableInduction
Unsupervised learningNo hint at all about the correct outputs.Clustering or consistent interpretation.
Reinforcement learningReceives no examples, but rewards or punishments at the end
Semi-supervised learningTraining with labeled training examples and unlabeled examples
55
Issues on Learning Algorithm
Prior Knowledge Prior knowledge can help in learning.Assumptions on parametric forms and range of values
Incremental learningUpdate old knowledge whenever new example arrives
Batch learningApply learning algorithm to the entire set of examples
Analytic approach : find the optimal parameter values by analysisIterative adaptation : improve parameter values from initial guess
56
Learning Algorithms
General IdeasTweak parameters so as to optimize performance criterionIn the course of learning, the parameter vector traces a path that (hopefully) ends at the best parameter vector
57
Inductive Learning
For given training examplescorrect input-output pairs),
Recover unknown underlying functionfrom which the training data generated
Generalization ability for unseen data is required
Forms of the FunctionLogical sentences / Polynomials / Set of weights (Neural Networks), …
Given form of function, adjust parameters to minimize error
58
Theory of Inductive Inference
Inductive biasconstraints on hypothesis spaceTable of all observation is not a choice
Restricted Hypothesis space biasesPreference biases
Occam’s razor (Ockham) : simple hypo is best
Concept C XExamples are given as (x, y) where xX and
y = 1 if x C, y = 0 if x C Find F such that F(x)= 1 if x C, and F(x)= 0 if x C
59
Consistent hypotheses
William of Ockham (also Occam ) 1285-1349English scholastic philosopher
Prefer the simplest hypothesis consistent with dataDefinition of ‘simple’ is not easy
Tradeoff between complexity of hypothesis and degree of fit
60
Model Complexity
Decision Boundary of Salmon and Sea bassWhich is better ? A or B
A B
61
Model Complexity
We can get perfect classification performance on the training data by choosing complex models.
Issue of generalization
62
Generalization
The main goal of pattern classification system is to suggest the class of objects yet unseen : Generalization
Some complex decision boundaries are not good at generalization.Some simple boundaries are not good either.
Tradeoff between performance and simplicity
core of statistical pattern recognition
63
Generalization Strategy
How can we improve generalization performance ?
More training examples (i.e., better pdf estimates).
Simpler models (i.e., simpler classification boundaries) usually yield better performance.
Simplify the decision boundary!
64
Overfitting and underfitting
underfitting overfittinggood fit
Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.
From Vojtěch Franc
65
Curse of Dimensionality
Function 의 수를 늘려면 error 감소 훈련데이더에 대한 Classifier 성능 향상
제한된 양의 training Data 로 훈련 시에 Feature 수를 늘리면 일반화 능력 감소적절한 일반화 능력 향상을 위하여 요구되는 훈련데이터의 양은 feature dimension 에 따라 급격히 증가
For a finite set of training data, Finding Optimal set of Features is a difficult problem
66
Maximize outcomes from two slot machines of unknown return rates
How much coins should be spent to find the better machine ?
Two Slot Machine Problem
67
Optimal Number of Cells (example)
68
Implication of Curse of Dimensionality to PR system
design
With finite training samples, be cautious of adding features
Features of high Discrimination power first
Feature analysis is mandatory
Simple neural networks is generally better
small number of hidden nodes, links
Tips for structure simplificationParameter tyingEliminate links during learning
69
Cross-Validation
Validate learned model on different set to assess the generalization performance
guarding against overfitting
Partition Training set intoEstimation subset for learning parametersvalidation subset
cross-validation forbest model selectiondetermine when to stop training
Leave-one-out validation methodN-1 for training, 1 for validation, takes turnOvercome Small training set
70
Unsupervised learningInput: training examples {x1,…,x} without information about the hidden state.
Clustering: goal is to find clusters of data sharing similar properties.
Classifier
Learning algorithm
θ
},,{ 1 xx },,{ 1 yyClassifier YΘX :q
A broad class of unsupervised learning algorithms:
From Vojtěch Franc
… …
71
Example of unsupervised learning algorithm
k-Means clustering:
Classifier ||||minarg)q(
,,1i
kiy mxx
Goal is to minimize
1
2)q( ||||
ii ixmx
ij
ji
iII
,||
1xm })q(:{ ij ji xI
Learning algorithm
1m
2m
3m
},,{ 1 xx
},,{ 1 kmmθ
},,{ 1 yy
From Vojtěch Franc
…
…
Other Issues in Pattern Recognition
73
Difficulty of Class Modeling
74
인식에는 Context Processing 이 필수
75
문자인식에서의 Context Processing
Context 없을 경우 사람의 영문 필기 인식률 : 95 %
76
Global Consistency
Local decision is not enough
77
Combining Multiple Classifiers
Approaches for improving the performance of the group of experts
Best single classifier vs Combining multiple classifiersTwo heads (experts, classifiers) are better than one
Classifier output is eitherBest (single) classRankingScore as each class
Method for generating multiple classifiersCo-related classifiers would not be a help
Method for combining multiple classifiersMajority rules, borda count, decorelated combination, etc.
78
패턴 인식 성능의 평가실제
인식결과
a
not a
A not A
p
qr
s
( 정 ) 인식률 = (p+q)/(p+q+r+s)오인식률 = (r+s)/(p+q+r+s)Miss detection = r/(p+r)False alarm = s/(p+s)Recall = p/p+rPrecision = p/p+s기각율 (refuse to make decision)처리율
A case: 20% 기각했는데 결과에는 0.5% errorB case : 10% 기각했는데 결과에는 1.0% errorWhich is better ?
79
패턴인식의 성능 향상
시간 , 노력
성능
100% 를 향하여
첨부
81
Resources
Professional AssociationInternational Association for Pattern Recognition (IAPR)
Text BooksPattern Classification by Richard O. Duda, Pater E. Hart, and David G, Storks
JournalsIEEE Transactions on Pattern Analysis and Machine IntelligencePattern RecognitionPattern Recognition LettersArtificial Intelligence and Pattern Recognition…
Conference and WorkshopsInternational Conference for Pattern RecognitionInt’l Conference on Document Analysis and RecognitionInt’l Workshop in Frontiers of Handwriting RecognitionIEEE Computer Vision and Pattern Recognition…