Non-Metric Methods
description
Transcript of Non-Metric Methods
11
Non-Metric MethodsNon-Metric Methods
Shyh-Kang JengShyh-Kang JengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/
Graduate Institute of Networking and MultiGraduate Institute of Networking and Multimedia, National Taiwan Universitymedia, National Taiwan University
22
Non-Metric DescriptionsNon-Metric DescriptionsNominal dataNominal data– DiscreteDiscrete– Without natural notation of similarity or eveWithout natural notation of similarity or eve
n orderingn orderingProperty d-tupleProperty d-tuple– With lists of attributesWith lists of attributes– e.g., { e.g., { redred, , shinyshiny, , sweetsweet, , smallsmall } } – i.e., color = i.e., color = redred, texture = , texture = shinyshiny, , taste = taste = sweetsweet, size = , size = smallsmall
33
Non-Metric DescriptionsNon-Metric Descriptions
Strings of nominal attributesStrings of nominal attributes– e.g., Base sequences in DNA segmentse.g., Base sequences in DNA segments
“ “AGCTTCAGATTCCA”AGCTTCAGATTCCA”– Might themselves being the output of Might themselves being the output of
other component classifiersother component classifiers– e.g., Chinese character recognizer and e.g., Chinese character recognizer and
a neural network for classifyinga neural network for classifying
component brush strokescomponent brush strokes
44
Non-Metric MethodsNon-Metric Methods
Learn categories from non-metric Learn categories from non-metric datadata
Represent structures in stringsRepresent structures in strings
Toward discrete problems addressed Toward discrete problems addressed byby– Rule based pattern recognition methods Rule based pattern recognition methods – Syntactic pattern recognition methods Syntactic pattern recognition methods
55
Decision TreesDecision Trees
66
Benefits of Decision TreesBenefits of Decision Trees
InterpretabilityInterpretability
Rapid classificationRapid classification– Through a sequence of simple queriesThrough a sequence of simple queries
Natural way to incorporate prior Natural way to incorporate prior knowledge from human expertsknowledge from human experts
77
InterpretabilityInterpretabilityConjunctions and disjunctionsConjunctions and disjunctionsFor any particular test patternFor any particular test pattern– e.g.,properties:{taste, color, shape, size}e.g.,properties:{taste, color, shape, size}– xx = { = { sweetsweet, , yellowyellow, , thinthin, , mediummedium } }– (color = (color = yellowyellow) AND (shape = ) AND (shape = thinthin))For category descriptionFor category description– e.g., Apple = (e.g., Apple = (greengreen AND AND mediummedium) OR ) OR ((redred AND AND mediummedium))
Rule reductionRule reduction– e.g., Apple = (e.g., Apple = (mediummedium AND NOT AND NOT yellowyellow))
88
Tree ConstructionTree Construction
GivenGiven– Set Set DD of labeled training data of labeled training data– Set of properties for discriminating Set of properties for discriminating
patternspatterns
GoalGoal– Organize the tests into a treeOrganize the tests into a tree
99
Tree ConstructionTree ConstructionSplit samples progressively into Split samples progressively into smaller subsetssmaller subsetsPure subsetPure subset– All samples have the same category All samples have the same category
labellabel– Could terminate that portion of the treeCould terminate that portion of the tree
Subset with mixture of labelsSubset with mixture of labels– Decide either to stop or select another Decide either to stop or select another
property and grow the tree furtherproperty and grow the tree further
1010
CARTCARTClassification and regression treesClassification and regression treesA general framework for decision A general framework for decision treestreesGeneral questions in CARTGeneral questions in CART– Number of decision outcomes at a nodeNumber of decision outcomes at a node– Property tested at a nodeProperty tested at a node– Declaration of leafDeclaration of leaf– When and how to pruneWhen and how to prune– Decision of impure leaf nodeDecision of impure leaf node– Handling of missing dataHandling of missing data
1111
Branching Factor and Branching Factor and Binary DecisionsBinary Decisions
Branching factor (branching ratio) Branching factor (branching ratio) BB– Number of links descending from a nodeNumber of links descending from a node
Binary decisionsBinary decisions
Every decision can be represented Every decision can be represented using just binary decisionusing just binary decision– e.g., query of color (e.g., query of color (BB==33) ) – color = color = greengreen? Color = ? Color = yellowyellow??– Universal expressive powerUniversal expressive power
1212
Binary TreesBinary Trees
1313
Geometrical Interpretation for Trees Geometrical Interpretation for Trees for Numerical Datafor Numerical Data
1414
Fundamental PrincipleFundamental PrinciplePrefer decisions leading to a simple, compact tPrefer decisions leading to a simple, compact tree with few nodesree with few nodes– A version of Occam’s razorA version of Occam’s razor
Seek a property query Seek a property query TT at each node at each node NN – Make the data reaching the immediate descendent Make the data reaching the immediate descendent
nodes as pure as possible nodes as pure as possible – i.e., achieve lowest impurityi.e., achieve lowest impurity
Impurity Impurity ii((NN))– Zero if all patterns bear the same labelZero if all patterns bear the same label– Large if the categories are equally representedLarge if the categories are equally represented
1515
Entropy Impurity Entropy Impurity (Information Impurity)(Information Impurity)
Most popular measure of impurityMost popular measure of impurity
)|(ˆ)(
)(log)()( 2
NPP
PPNi
jj
jj
j
x
1616
Variance Impurity for Variance Impurity for Two-Category CaseTwo-Category Case
Particular useful in two-category Particular useful in two-category casecase
)()( 21 PPNi
1717
Gini ImpurityGini Impurity
Generalization of variance Generalization of variance impurityimpurity
Applicable to two or more Applicable to two or more categoriescategories
Expected error rate at node Expected error rate at node NN
jj
jiji PPPNi )(1
2
1)()()( 2
1818
Misclassification ImpurityMisclassification Impurity
Minimum probability that a Minimum probability that a training pattern would be training pattern would be misclassified at misclassified at NN
Most strongly peaked at equal Most strongly peaked at equal probabilitiesprobabilities
)(max1)( jj
PNi
1919
Impurity for Two-Category CaseImpurity for Two-Category Case
*Adjusted in scale and offset for comparison
2020
Heuristic to Choose QueryHeuristic to Choose Query
If entropy impurity is used, the impurity If entropy impurity is used, the impurity reduction is corresponding to an informreduction is corresponding to an information gaination gainReduction of entropy impurity due to a sReduction of entropy impurity due to a split can not be greater than 1 bitplit can not be greater than 1 bit
)()1()()()(
maximize to equery valu choose
RLLL NiPNiPNiTi
s
2121
Finding ExtremaFinding Extrema
Nominal attributesNominal attributes– Perform extensive or exhaustive search over Perform extensive or exhaustive search over
all possible subsets of the training setall possible subsets of the training setReal-valued attributesReal-valued attributes– Use gradient descent algorithms to find a spUse gradient descent algorithms to find a sp
litting hyperplanelitting hyperplane– As a one-dimensional optimization problem As a one-dimensional optimization problem
for binary treesfor binary trees
2222
Tie BreakingTie BreakingNominal dataNominal data– Choose randomlyChoose randomlyReal-valued dataReal-valued data– Assume a split lying in Assume a split lying in xxll < < xxss < < xxuu
– Choose either the middle point or the weighChoose either the middle point or the weighted average ted average xxss = ( = (1-P1-P))xxll + + PxPxuu
– PP is the probability a pattern goes to the “l is the probability a pattern goes to the “left” under the decisioneft” under the decision
Computational simplicity may be a deteComputational simplicity may be a determining factorrmining factor
2323
Greedy MethodGreedy MethodGet a local optimum at each nodeGet a local optimum at each nodeNo assurance that successive locally No assurance that successive locally optimal decisions lead to the global optimal decisions lead to the global optimumoptimumNo guarantee that we will have the No guarantee that we will have the smallest treesmallest treeFor reasonable impurity measure and For reasonable impurity measure and learning methodslearning methods– Often continue to split further to get the Often continue to split further to get the
lowest possible impurity at the leafslowest possible impurity at the leafs
2424
Favoring Gini Impurity to MisclassifiFavoring Gini Impurity to Misclassification Impuritycation Impurity
Example: 90 in Example: 90 in 11 and 10 in and 10 in 22
Misclassification impurity: 0.1Misclassification impurity: 0.1Suppose no splits guarantee a Suppose no splits guarantee a 22 majority in ei majority in either of the two descendent nodesther of the two descendent nodesMisclassification impurity remains at 0.1 for all Misclassification impurity remains at 0.1 for all splitssplitsAn attractive split: 70 An attractive split: 70 11, 0 , 0 22 to the right and 2 to the right and 20 0 11, 10 , 10 22 to the left to the leftGini impurity shows that this is a good splitGini impurity shows that this is a good split
2525
Twoing CriterionTwoing Criterion
For multiclass binary tree creationFor multiclass binary tree creationFind “supercategories” Find “supercategories” CC11 and and CC22
CC11 = = {{i1i1, , i2i2, …, , …, ikik}, }, CC22 = = CC--CC11
Compute Compute ii((ss,,CC11)) as though it correspon as though it corresponding to a standard two-class problemding to a standard two-class problemFind Find ss**((CC11)) that maximize the change and that maximize the change and then the supercategory then the supercategory CC11
**
2626
Practical ConsiderationsPractical Considerations
Choice of impurity function rarely Choice of impurity function rarely affects the final classifier and its affects the final classifier and its accuracyaccuracy
Stopping criterion and pruning Stopping criterion and pruning methods are more important in methods are more important in determining final accuracydetermining final accuracy
2727
Multiway SplitsMultiway Splits
impurity ratiogain on the based
log
)()(
, largeith decision w favoring avoid to
)()()(
12
1
B
kkk
B
B
kkk
PP
sisi
B
NiPNisi
2828
Importance of Stopping CriteriaImportance of Stopping CriteriaFully growing trees have typically been oFully growing trees have typically been overfitverfitExtreme case: each leaf corresponds to Extreme case: each leaf corresponds to a single training pointa single training point– The full tree is merely a look-up tableThe full tree is merely a look-up table– Not to generalize well in noisy problemNot to generalize well in noisy problemEarly stopping Early stopping – Error on training data not sufficiently low Error on training data not sufficiently low – Performance may sufferPerformance may suffer
2929
Stopping by Checking Stopping by Checking Validation ErrorValidation Error
Using a subset of the data (e.g., Using a subset of the data (e.g., 90%) for training and the remaining 90%) for training and the remaining (10%) as a validation set(10%) as a validation set
Continue splitting until the error on Continue splitting until the error on the validation data is minimizedthe validation data is minimized
3030
Stopping by Setting a ThresholdStopping by Setting a ThresholdStop if Stop if maxmaxss ii(s) < (s) < BenefitsBenefits– Use all training dataUse all training data– Leaf can lie in different levelsLeaf can lie in different levelsFundamental drawbackFundamental drawback– Difficult to determine the thresholdDifficult to determine the thresholdAn alternative simple methodAn alternative simple method– Stop when a node represents fewer than soStop when a node represents fewer than so
me me A threshold number of points, A threshold number of points, A fixed percentage of the total training setA fixed percentage of the total training set
3131
Stopping by Checking a Stopping by Checking a Global CriterionGlobal Criterion
Stop when a global criterion is Stop when a global criterion is minimumminimum
Minimum description lengthMinimum description length
Criterion: complexity and uncertaintyCriterion: complexity and uncertainty
nodesleaf
Nisize )(
3232
Stopping Using Statistical Tests Stopping Using Statistical Tests
splitting stop eshold,chosen thr theexceeding
a yieldnot doessplit t significanmost theif
right the to)1( andleft to sends split
2
2
1
22
i i
iiL
Pn
Pnn
n-PPns
3333
Horizon EffectHorizon Effect
Determination of optimal split at a Determination of optimal split at a node is not influenced by decisions at node is not influenced by decisions at its descendent nodesits descendent nodes
A stopping condition may be met too A stopping condition may be met too early for overall optimal recognition early for overall optimal recognition accuracyaccuracy– Biases toward trees in which the Biases toward trees in which the
greatest impurity is near the root nodegreatest impurity is near the root node
3434
PruningPruning
Grow a tree fully firstGrow a tree fully first
All pairs of neighboring leaf nodes All pairs of neighboring leaf nodes are considered for eliminationare considered for elimination– If the elimination yields a satisfactory If the elimination yields a satisfactory
(small) increase in impurity, the (small) increase in impurity, the common antecedent node is declared a common antecedent node is declared a leaf leaf
– merging or joiningmerging or joining
3535
Rule PruningRule Pruning
Each leaf has an associated ruleEach leaf has an associated rule
Some of rules can be simplified if a Some of rules can be simplified if a series of decisions is redundantseries of decisions is redundant
Can improve generalization and Can improve generalization and interpretabilityinterpretability
Allows us to distinguish between Allows us to distinguish between contexts in which the node is usedcontexts in which the node is used
3636
Example 1: A Simple TreeExample 1: A Simple Tree
3737
Example 1: A Simple TreeExample 1: A Simple Tree
lyrespective ,
and for positions 1ly exhaustivecheck
"" form theof splitsconsider
0.1)(log)()(
noderoot ofimpurity
2
1
2
12
x
xn
xx
PPNi
isi
iiiroot
3838
Example 1: A Simple TreeExample 1: A Simple Tree
3939
Example 1: A Simple TreeExample 1: A Simple Tree
4040
Computation ComplexityComputation ComplexityTrainingTraining– Root nodeRoot node
Sorting: Sorting: OO((dndn log log nn))
Entropy computation: Entropy computation: OO((nn)+()+(nn-1)O(-1)O(dd))
Total: Total: OO((dndn log log nn))
– Level 1 nodeLevel 1 nodeAverage case: Average case: OO((dndn log ( log (nn/2))/2))
– Total number of levels: Total number of levels: OO(log (log nn))– Total average complexity: Total average complexity: OO((dndn (log (log nn))22))
Recall and classificationRecall and classification– OO(log (log nn))
4141
Feature ChoiceFeature Choice
4242
Feature ChoiceFeature Choice
4343
Multivariate Decision TreesMultivariate Decision Trees
4444
Multivariate Decision Trees Using Multivariate Decision Trees Using General Linear DecisionsGeneral Linear Decisions
4545
Priors and CostsPriors and Costs
PriorsPriors– Weight samples to correct for the prior freWeight samples to correct for the prior fre
quenciesquenciesCostsCosts– Cost matrix Cost matrix ijij
– Incorporate cost into impurity, e.gIncorporate cost into impurity, e.g.,.,
ji
jiij PPNi,
)()()(
4646
Training and Classification Training and Classification with Deficient Patternswith Deficient Patterns
TrainingTraining– Proceed as usualProceed as usual– Calculate impurities at a node using only Calculate impurities at a node using only
the attribute information presentthe attribute information present
ClassificationClassification– Use traditional (“primary”) decision Use traditional (“primary”) decision
whenever possiblewhenever possible– Use surrogate splits when test pattern is Use surrogate splits when test pattern is
missing some featuresmissing some features– Or use virtual valuesOr use virtual values
4747
Example 2: Surrogate Splits and Example 2: Surrogate Splits and Missing AttributesMissing Attributes
4848
Example 2: Surrogate Splits and Example 2: Surrogate Splits and Missing AttributesMissing Attributes
4949
Algorithm ID3 Algorithm ID3 Interactive dichotomizerInteractive dichotomizerFor use with nominal (unordered) inputs For use with nominal (unordered) inputs onlyonly– Real-valued variables are handled by binsReal-valued variables are handled by binsGain ratio impurity is usedGain ratio impurity is usedContinues until all nodes are pure or theContinues until all nodes are pure or there are no more variablesre are no more variablesPruning can be incorporatedPruning can be incorporated
5050
Algorithm C4.5Algorithm C4.5
Successor and refinement of ID3Successor and refinement of ID3
Real-valued variables are treated as Real-valued variables are treated as in CARTin CART
Gain ratio impurity is usedGain ratio impurity is used
Use pruning based on statistical Use pruning based on statistical significance of splitssignificance of splits
5151
Treating Deficient Patterns in C4.5Treating Deficient Patterns in C4.5
Branching factor Branching factor BB at node at node NN
Follow all Follow all BB possible answers to the possible answers to the descendent nodes and ultimate descendent nodes and ultimate BB leaf nodesleaf nodes
Final decision is based on the labels Final decision is based on the labels of the of the BB leaf nodes, weighted by the leaf nodes, weighted by the decision probabilities at decision probabilities at NN
5252
Example of C4.5RulesExample of C4.5Rules
1
21
21
1
21
21
21
21
)]03.633.1343.5(
)11.016.004.0([
)]03.633.1343.5(
)45.077.196.0(
)02.044.027.0(
)11.016.004.0([
x
x
THEN
xxAND
xxIF
THEN
xxAND
xxAND
xxAND
xxIF
5353
Basic Concepts for Basic Concepts for String RecognitionString Recognition
StringsStrings– ““AGCTTCGAATC”AGCTTCGAATC”
Characters and alphabetCharacters and alphabet– { A, G, C, T }{ A, G, C, T }
WordsWords– x = “AGCTTC”x = “AGCTTC”
TextsTextsFactorFactor– ““GCT” in “AGCTTC”GCT” in “AGCTTC”
5454
String Problems of Importance for String Problems of Importance for Pattern RecognitionPattern Recognition
String matchingString matching– e.g., classify a book by key wordse.g., classify a book by key words
Edit distanceEdit distanceString matching with errorsString matching with errors– e.g., classify texts with misspelled wordse.g., classify texts with misspelled words
String matching with “don’t-care” String matching with “don’t-care” symbolsymbol– e.g., classify DNA sequence according to e.g., classify DNA sequence according to
if it contain a protein with some DNA if it contain a protein with some DNA segments inert or having no function segments inert or having no function
5555
General String Matching ProblemGeneral String Matching Problem
5656
Naïve String MatchingNaïve String Matching
ss
s
msstextm
mns
s
lengthmtextlengthntextA
end
return
1
shift"at occurspattern "print then
]1[]1[ if
while
0
][],[,,, initialize
x
xx
5757
Boyer-Moore String MatchingBoyer-Moore String Matching
end
return
])[(]),1[(1max else
1
shift"at occurspattern "print then
0if
1do
][][ and 0 while
while
0
functionsuffix -good)(
function occurrence-last)(
]length[],length[,,, initialize
jstextFjmjGjss
ss
s
j
jj
jstextjj
mj
mns
s
G
F
mtextntextA
x
x
x
x
xx
5858
Boyer-Moore String MatchingBoyer-Moore String Matching
5959
Subset-Superset ProblemSubset-Superset Problem
Search for several stringsSearch for several strings– Some short strings are factors of long onesSome short strings are factors of long onesExampleExample– Find “beat”, “eat”, “be”Find “beat”, “eat”, “be”– From “when_chris_beats_the_drum”From “when_chris_beats_the_drum”Return the longest target strings onlyReturn the longest target strings only
6060
Edit DistanceEdit Distance
Minimum number of fundamental Minimum number of fundamental operations required to transform operations required to transform string string xx to string to string yy
Fundamental operationsFundamental operations– SubstitutionSubstitution– InsertionInsertion– DeletionsDeletions
6161
Edit Distance ComputationEdit Distance Computation
nj
jj
jj
j
mi
ii
ii
i
lengthnlengthmA
until
],0[
1 do
0
until
]0,[
1 do
0
0]0,0[
][],[,,, initialize
C
C
C
yxyx
6262
Edit Distance ComputationEdit Distance Computation
end
],[return
until
until
])][],[(1]1,1[
,1]1,[
,1],1[min[],[
1 do
1 do
0,0
nm
mi
nj
jiji
ji
jiji
jj
ii
ji
C
yxC
C
CC
6363
Edit Distance ComputationEdit Distance Computation
6464
String Matching with ErrorsString Matching with ErrorsGiven a pattern Given a pattern xx and and texttext
Find the shift for which the edit distance Find the shift for which the edit distance between between xx and a factor of and a factor of texttext is minimu is minimummApply an algorithm similar to that compApply an algorithm similar to that computes edit distance between two strings, utes edit distance between two strings, but uses matrix of cost but uses matrix of cost EEMatrix of cost Matrix of cost EE– EE[[i,ji,j] = min[ ] = min[ CC((xx[1…[1…ii], ], yy[1…[1…jj]) ]]) ]– EE[0,[0,jj] = 0] = 0
6565
String Matching with ErrorsString Matching with Errors
6666
Can modify the naïve algorithmCan modify the naïve algorithmExtending Boyer-Moore algorithm is diffiExtending Boyer-Moore algorithm is difficult and inefficientcult and inefficientCheck Bibliography for better methodsCheck Bibliography for better methods
String Matching with String Matching with “Don’t-Care” Symbols“Don’t-Care” Symbols
6767
Grammars Grammars Structures of hierarchical Rules to Structures of hierarchical Rules to generate stringsgenerate strings
Examples Examples – ““The history book clearly describe several The history book clearly describe several
wars”wars”– Telephone numbersTelephone numbers
Provide constraints for a full system that Provide constraints for a full system that uses a statistical recognizer as a uses a statistical recognizer as a componentcomponent– e.g., Mathematical equation recognizere.g., Mathematical equation recognizer– ASR for phone numbers ASR for phone numbers
6868
GrammarsGrammarsSymbols Symbols – Alphabet Alphabet AA– Null stringNull string
Variables Variables II
Root symbolRoot symbol– Taken from a set Taken from a set SS
Production rules Production rules PP
Grammar Grammar GG={={AA, , II, , SS, , PP}}
Language Language LL((GG))
6969
Example Language 1Example Language 1
aba:
ccc:
bcb:
bbb:
:
aa:
,,,,cb,a,
6
5
4
3
2
1
B
A
A
B
BAAB
BAORSBAS
CBAS
p
p
p
p
p
p
P
ISA
7070
Example Language 1Example Language 1
abc
ab
a
root
4
6
1
p
p
p
A
BA
S
aabbcc
aabbc
aabb
aab
aab
aa
a
root
5
4
3
2
6
p
p
p
p
p
p
p
1
1
A
AA
BAA
ABA
BABA
SBA
S
1|cba)( nnnnGL
7171
Example Language 2Example Language 2
sentence
phraseadverbialadverb
adjectivephraseverb
phrasenounverbnoun
S
I
A
,
,,
,,,
copies,
1000, over, sold, book, history, the,
7272
Example Language 2Example Language 2
lly"heuristica hop dreamsgreen Squishy "
holds buys describes
theorembook
ORoveradverb
ORORORverb
ORORnoun
phraseadverbialphraseverbphraseverb
phrasenounadjectivephrasenoun
phraseverbphrasenounsentence
P
7373
Derivation TreeDerivation Tree
7474
Types of String GrammarsTypes of String GrammarsType 0: Free or unrestrictedType 0: Free or unrestrictedType 1: Context-sensitiveType 1: Context-sensitive– II Type 2: Context-freeType 2: Context-free– II Type 3: Finite state or regularType 3: Finite state or regular– zz OR OR zzGrammars of type Grammars of type ii includes all grammar includes all grammars of type s of type ii+1+1
7575
Chomsky Normal Form (CNF)Chomsky Normal Form (CNF)
CNFCNF– AA BCBC and and AA zz
For every context-free grammar For every context-free grammar GG, , there is another there is another GG’’ in CNF such that in CNF such that LL((GG)=)=LL((GG’’))
7676
Example 3Example 3
6
,,1,2,3,6
,,,,
,,,,,,,
999,999 and 1between number any pronounce To
digit
tysteensdigitdigitsdigitsdigits
thousandhunderdninetythirty
twentyelevententwoone
S
I
A
7777
Example 3Example 3
ninetyORthirtyORtwentytys
ninenteenORelevenORtenteens
nineORtwoORonedigit
digitORdigittysORtysORteensdigits
digitsORdigitdigits
digitsdigitdigits
digitsORdigitsdigits
digitsdigitsdigits
1
1 1 2
2 hunderd 13
2 hunderd 13
3 thousand36
3 thousand36
P
*Type 2 grammar
7878
Example 3Example 3
7979
Recognition Using GrammarsRecognition Using Grammars
Classify a test sentence Classify a test sentence xx according to th according to the grammar on which it is generatede grammar on which it is generatedThe grammar is among The grammar is among cc different ones: different ones: GG11, , GG22, …, , …, GGcc
ParsingParsing– Find a derivation in Find a derivation in GG that leads to that leads to xx
8080
Bottom-Up Parsing AlgorithmBottom-Up Parsing Algorithm
1 do
0
1 do
1
until
|
1 do
0
CNFin , initialize
1
21
ii
i
jj
j
ni
xAAV
ii
i
xxx
ii
nxPS,I,A,G
8181
Bottom-Up Parsing AlgorithmBottom-Up Parsing Algorithm
end
return
"in successful" " of parse"print then if
until
1 until
1 until
and ,|
1 do
0
1
,
G
P
xn
kjkiik
ijij
ij
VS
nj
jni
jk
VCVBBCAA
VV
kk
k
V
8282
ExampleExample
a:
b:
a:
:
,,
b,a
4
3
2
1
ORABC
ORCCB
ORBAA
BCORABS
S
CBA
p
p
p
p
P
S
I
A
8383
ExampleExample
8484
ExampleExample
8585
ExampleExample
8686
Finite State MachineFinite State Machine
G
BORBA
AS
cowmouse
the
8787
Grammatical InferenceGrammatical Inference
Learn a grammar from example Learn a grammar from example sentences it generatessentences it generates
Differences from statistical methodsDifferences from statistical methods– Usually an infinite number of grammars Usually an infinite number of grammars
consistent with the training dataconsistent with the training data– Often can not recover the source Often can not recover the source
grammar uniquelygrammar uniquely
8888
Main TechniquesMain TechniquesUse both positive and negative instanceUse both positive and negative instances, s, DD++ and and DD--, respectively, respectively– Generalization for multicategory casesGeneralization for multicategory casesImpose conditions and constraintsImpose conditions and constraints– Alphabet contains only those symbols appeAlphabet contains only those symbols appe
aring in training sentencesaring in training sentences– Every production rule is usedEvery production rule is used– Seek the “simplest” grammar that explainSeek the “simplest” grammar that explain
s the training sentencess the training sentences
8989
Grammatical Inference (Overview)Grammatical Inference (Overview)
G
G
A
S
by parsed benot can if
from read
1 do
0
possible) as simple (as initialize
in characters ofset
, initialize
i
i D
ii
i
D
S
Dn
DD
x
x
9090
Grammatical Inference (Overview)Grammatical Inference (Overview)
end
return
sproductionredundant eliminate
until
in sentenses nobut parses if updatesaccept
tobles varia
and tosproduction additional propose dothen
PS,I,A,G
G
I
P
ni
D-ix
9191
Example 4: Grammatical InferenceExample 4: Grammatical Inference
DD++={ a, aaa, aaab, aab }={ a, aaa, aaab, aab }
DD-- = { ab, abc, abb, aabb } = { ab, abc, abb, aabb }
AA = { a, b } = { a, b }
I I = { = { AA } }
SS = = SS
PP = { = { SS AA } }
GG00 = { = { AA, , I, S, PI, S, P } }
9292
Example 4: Grammatical InferenceExample 4: Grammatical Inferenceii xxii
++ PP PP produces produces DD--??
11 aa SSAA
AAaa
NoNo
22 aaaaaa SSAA
AAaa
AAaaAA
NoNo
33 aaabaaab SSAA
AAaa
AAaaAA
AAabab
Yes: Yes: abab is in is in DD--
9393
Example 4: Grammatical InferenceExample 4: Grammatical Inferenceii xxii
++ PP PP produces produces DD--??
33 aaabaaab SSAA
AAaa
AAaaAA
AAaabaab
NoNo
44 aabaab SSAA
AAaa
AAaaAA
AAaabaab
NoNo
9494
Rule-Based MethodsRule-Based Methods
For classes characterized by general For classes characterized by general relationships among entities, rather relationships among entities, rather than by instances per sethan by instances per se
Integral to expert systemIntegral to expert system
Modest use in pattern recognitionModest use in pattern recognition
Focus on if-then rules for Focus on if-then rules for representing and learning such representing and learning such relationshipsrelationships
9595
Example: ArchExample: Arch
9696
IF-THEN RulesIF-THEN RulesExample: Example: – IFIF Swims(Swims(xx)) ANDAND HasScales(HasScales(xx)) THENTHEN FiFish(sh(xx))
PredicatesPredicates– e.g., e.g., Man(.), HasTeeth(.), AreMarrieMan(.), HasTeeth(.), AreMarried(.,.), Touch(.,.), Supportsd(.,.), Touch(.,.), Supports(.,.,.)(.,.,.)
Variables Variables – e.g., e.g., IFIF Eats(Eats(x,yx,y)) ANDAND HasFlesh(HasFlesh(xx)) THETHE
NN Carnivore(Carnivore(yy))
9797
Propositional Logic and Propositional Logic and First-Order LogicFirst-Order Logic
Propositional logicPropositional logic– e.g., e.g., IFIF Male(Bill)Male(Bill) ANDAND IsMarried(BilIsMarried(Bill)l) THENTHEN IsHusband(Bill)IsHusband(Bill)
First-order logic examplesFirst-order logic examples– IFIF Male(x)Male(x) ANDAND IsMarried(x,y)IsMarried(x,y) THENTHEN IIsHusband(x)sHusband(x)
– IFIF Parent(x,y)Parent(x,y) ANDAND Parent(y,z)Parent(y,z) THENTHEN GrandParent(x,z)GrandParent(x,z)
– IFIF Spouse(x,y)Spouse(x,y) THENTHEN Spouse(y,x)Spouse(y,x)
9898
Functions and TermsFunctions and Terms
ExampleExample– IFIF Male(Male(xx)) ANDAND (Age((Age(xx) < 16)) < 16) THENTHEN BoBoy(y(xx))
9999
Applications in Applications in Pattern ClassificationPattern Classification
ExampleExample– IFIF IsBlock(IsBlock(xx)) ANDAND IsBlock(IsBlock(yy)) ANDAND IsBIsBlock(lock(zz)) ANDAND Touch(Touch(x,yx,y)) ANDAND Touch(Touch(x,x,zz)) ANDAND NotTouch(NotTouch(y,zy,z)) ANDAND Supports(Supports(x,x,y,zy,z)) THENTHEN Arch(Arch(x,y,zx,y,z))
Algorithms to implement predicates or fAlgorithms to implement predicates or functions may be difficultunctions may be difficult
100100
Learning RulesLearning RulesTrain a decision tree and then simplify the Train a decision tree and then simplify the tree to extract rules tree to extract rules
Infer rules via grammatical inferenceInfer rules via grammatical inference
Sequential covering learningSequential covering learning– Given sets of positive and negative examplesGiven sets of positive and negative examples– Deletes examples that the rules explain and Deletes examples that the rules explain and
iteratesiterates– Leads to disjunctive set of rules that cover the Leads to disjunctive set of rules that cover the
training datatraining data– Simplifies the results by standard logical Simplifies the results by standard logical
methodsmethods
101101
Learning RulesLearning Rules