Grammar Induction: Techniques and Theory - UNIV - NANTES
Transcript of Grammar Induction: Techniques and Theory - UNIV - NANTES
1ICML 2006, Grammatical Inference1
Colin de la Higuera, Université de Saint-EtienneTim Oates, University of Maryland
Grammar Induction: Techniques and Theory
2ICML 2006, Grammatical Inference2
Acknowledgements• Laurent Miclet, Tim Oates, Jose Oncina, Rafael Carrasco, PacoCasacuberta, Pedro Cruz, RémiEyraud, Philippe Ezequel, Henning Fernau, Jean-Christophe Janodet, Thierry Murgue, Frédéric Tantini, Franck Thollard, Enrique Vidal,...
• … and a lot of other people to whom we are grateful
3ICML 2006, Grammatical Inference3
Outline
1 An introductory example
2 About grammatical inference
3 Some specificities of the task
4 Some techniques and algorithms
5 Open issues and questions
4ICML 2006, Grammatical Inference4
1 How do we learn languages?
A very simple example
Carmel and Markovitch 98 & 99http://www.cs.technion.ac.il/~carmel/papers.html
5ICML 2006, Grammatical Inference5
The problem:
• An agent must take cooperative decisions in a multi-agent world.
• His decisions will depend:
– on the actions of other agents;
– on what he hopes to win or lose.
6ICML 2006, Grammatical Inference6
Hypothesis: the opponent follows a rational strategy (given by a DFA/Moore machine):
e el l
pp
d
p e
e e p e p → le e e → d
You: listenor doze
Me: equations or pictures
7ICML 2006, Grammatical Inference7
Example: (the prisoner’s dilemma)
• Each prisoner can admit (a) or stay silent (s)
• If both admit: 3 years each;
• If A admits but not B, A=0 years, B=5 years;
• If B admits but not A, B=0 years, A=5 years;
• If neither admits: 1 year each.
8ICML 2006, Grammatical Inference8
a
a
s
s
-3
-3
0
-5
0
-5
-1
-1
AB
9ICML 2006, Grammatical Inference9
• Here an iterated version against an opponent that follows a rational strategy.
• Gain Function: limit of means.
• A game is a word in
(His_moves × My_moves)*!
10ICML 2006, Grammatical Inference10
The general problem
• We suppose that the strategy of the opponent is given by a deterministic finite automaton.
• Can we imagine an optimal strategy?
11ICML 2006, Grammatical Inference11
Suppose we know the opponent’s strategy:
• Then (game theory):
• Consider the opponent’s graph in which we value the edges by our own gain.
12ICML 2006, Grammatical Inference12
a s
a
s
-3 0
-5 -1
s s
aa
a s s
a s-3
-5 -1
0
-1
0
13ICML 2006, Grammatical Inference13
1 Find the cycle of maximum mean weight.
2 Find the best path leading to this cycle of maximum mean weight.
3 Follow the path and stay in the cycle.
All that is needed is to find the opponent’s automaton!
Then
14ICML 2006, Grammatical Inference14
a s
a
s
-3 0
-5 -1
s s
aa
a s s
a s
Mean= -0.5
Best path
-3
-5 -1
0
-1
0
15ICML 2006, Grammatical Inference15
Question
• Having seen a game of this opponent…
• Can we reconstruct his strategy ?
16ICML 2006, Grammatical Inference16
Data (him, me) : {aa as sa aa as ssss ss sa}
HIM MEa aa ss aa aa ss ss ss as a
I play asa, his move is a
17ICML 2006, Grammatical Inference17
λ→ a
a→a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
a
18ICML 2006, Grammatical Inference18
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
a
a
19ICML 2006, Grammatical Inference19
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
a
a
20ICML 2006, Grammatical Inference20
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
a
a, s
21ICML 2006, Grammatical Inference21
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
sa
a
s
22ICML 2006, Grammatical Inference22
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
sa
a
s
a
23ICML 2006, Grammatical Inference23
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
sa
a
s
a,s
24ICML 2006, Grammatical Inference24
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
sa
a
s
a
s
25ICML 2006, Grammatical Inference25
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
ssa
a
s
a
s
26ICML 2006, Grammatical Inference26
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
ssa
a
s
a
s
s
27ICML 2006, Grammatical Inference27
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
s
ssa
a
s
a
s
28ICML 2006, Grammatical Inference28
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
s
ssa
a
s
a
s
29ICML 2006, Grammatical Inference29
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
s
ssa
a
s
a
s
a
30ICML 2006, Grammatical Inference30
λ → a
a → a
as → s
asa → a
asaa → a
asaas → s
asaass → s
asaasss → s
asaasssa → s
s
ssa
a
s
a
s
a
31ICML 2006, Grammatical Inference31
s
ssa
a
s
a
s
a
32ICML 2006, Grammatical Inference32
How do we get hold of the learning data?
a) through observation
b) through exploration
33ICML 2006, Grammatical Inference33
An open problemThe strategy is probabilistic:
s
a:70%S:30%
a:50%S:50%
a:20%S:80%
a
s
a
s
a
34ICML 2006, Grammatical Inference34
Tit for Tat
sa
a
s
a
s
35ICML 2006, Grammatical Inference35
2 Specificities of grammatical inference
Grammatical inference consists (roughly) in finding the (a) grammar or automaton that has produced a given set of strings (sequences, trees, terms, graphs).
36ICML 2006, Grammatical Inference36
The goal/idea
• Old Greeks:
A whole is more than the sum of all parts
• Gestalt theory
A whole is different than the sum of all parts
37ICML 2006, Grammatical Inference37
Better said
• There are cases where the data cannot be analyzed by considering it in bits
• There are cases where intelligibility of the pattern is important
38ICML 2006, Grammatical Inference38
Nothing Lots
What do people know about formal language theory?
39ICML 2006, Grammatical Inference39
A small reminder on formal language theory
• Chomsky hierarchy
• + and – of grammars
40ICML 2006, Grammatical Inference40
A crash course in Formal language theory
• Symbols
• Strings
• Languages
• Chomsky hierarchy
• Stochastic languages
41ICML 2006, Grammatical Inference41
Symbols
are taken from some alphabet Σ
Stringsare sequences of symbols from Σ
42ICML 2006, Grammatical Inference42
Languages
are sets of strings over Σ
Languagesare subsets of Σ*
43ICML 2006, Grammatical Inference43
Special languages
• Are recognised by finite state automata
• Are generated by grammars
44ICML 2006, Grammatical Inference44
a
b
a
b
a
b
DFA: Deterministic Finite State Automaton
45ICML 2006, Grammatical Inference45
a
b
a
b
a
b
abab∈L
46ICML 2006, Grammatical Inference46
What is a context free grammar?A 4-tuple (Σ, S, V, P) such that:
– Σ is the alphabet;
– V is a finite set of non terminals;
– S is the start symbol;
– P ∈ V × (V∪Σ)* is a finite set of rules.
47ICML 2006, Grammatical Inference47
Example of a grammarThe Dyck1 grammar
– (Σ, S, V, P)
– Σ = {a, b}
– V = {S}
– P = {S → aSbS, S → λ }
48ICML 2006, Grammatical Inference48
Derivations and derivation trees
S → aSbS
→ aaSbSbS
→ aabSbS
→ aabbS
→ aabb
a
a
b
b
S
SS
S
S
λ
λ
λ
49ICML 2006, Grammatical Inference49
Chomsky Hierarchy
• Level 0: no restriction
• Level 1: context-sensitive
• Level 2: context-free
• Level 3: regular
50ICML 2006, Grammatical Inference50
Chomsky Hierarchy• Level 0: Whatever Turing machines can do
• Level 1: – {anbncn: n∈ }– {anbmcndm : n,m∈ }– {uu: u∈Σ*}
• Level 2: context-free– {anbn: n∈ }– brackets
• Level 3: regular– Regular expressions (GREP)
51ICML 2006, Grammatical Inference51
The membership problem
• Level 0: undecidable
• Level 1: decidable
• Level 2: polynomial
• Level 3: linear
52ICML 2006, Grammatical Inference52
The equivalence problem
• Level 0: undecidable
• Level 1: undecidable
• Level 2: undecidable
• Level 3: Polynomial only when the representation is DFA.
53ICML 2006, Grammatical Inference53
41
31
21
21
21
32
b
b
a
a
a
b
43
21
PFA: Probabilistic Finite (state) Automaton
54ICML 2006, Grammatical Inference54
0.1
0.3
a
b
a
b
a
b
0.65
0.9
0.7
0.3
0.7
0.35
DPFA: Deterministic Probabilistic Finite (state) Automaton
55ICML 2006, Grammatical Inference55
What is nice with grammars?
• Compact representation
• Recursivity
• Says how a string belongs, not just if it belongs
• Graphical representations (automata, parse trees)
56ICML 2006, Grammatical Inference56
What is not so nice with grammars?
• Even the easiest class (level 3) contains SAT, Boolean functions, parity functions…
• Noise is very harmful:
– Think about putting edit noise to language {w: |w|a=0[2]∧|w|b=0[2]}
57ICML 2006, Grammatical Inference57
Inductive Inference Pattern Recognition
Grammatical Inference
The field
Machine Learning
Computational linguistics Computational biology Web technologies
58ICML 2006, Grammatical Inference58
The data
• Strings, trees, terms, graphs
• Structural objects
• Basically the same gap of information as in programming between tables/arrays and data structures
59ICML 2006, Grammatical Inference59
Alternatives to grammatical inference
• 2 steps:
– Extract features from the strings
– Use a very good method over Rn.
60ICML 2006, Grammatical Inference60
Examples of stringsA string in Gaelic and its translation to English:
• Tha thu cho duaichnidh ri èarràirde de a’ coisich deas damh
•You are as ugly as the north end of a southward traveling ox
61ICML 2006, Grammatical Inference61
62ICML 2006, Grammatical Inference62
63ICML 2006, Grammatical Inference63
>A BAC=41M14 LIBRARY=CITB_978_SKBAAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTAGTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAATGGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCAGAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGACACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCTGGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGGGCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCAGGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAACAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACACACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGATGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAATGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATAAAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCATCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAGTATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGAGGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGATGGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAAAAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCCTGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGACTAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCATGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT
64ICML 2006, Grammatical Inference64
����1
� � � � � � �� � � � � � � � � � � �
65ICML 2006, Grammatical Inference65
66ICML 2006, Grammatical Inference66
67ICML 2006, Grammatical Inference67
<book><part><chapter>
<sect1/><sect1><orderedlist numeration="arabic"><listitem/><f:fragbody/>
</orderedlist></sect1>
</chapter></part>
</book>
68ICML 2006, Grammatical Inference68
<?xml version="1.0"?><?xml-stylesheet href="carmen.xsl" type="text/xsl"?><?cocoon-process type="xslt"?><!DOCTYPE pagina [<!ELEMENT pagina (titulus?, poema)><!ELEMENT titulus (#PCDATA)><!ELEMENT auctor (praenomen, cognomen, nomen)><!ELEMENT praenomen (#PCDATA)><!ELEMENT nomen (#PCDATA)><!ELEMENT cognomen (#PCDATA)><!ELEMENT poema (versus+)><!ELEMENT versus (#PCDATA)>]><pagina><titulus>Catullus II</titulus><auctor><praenomen>Gaius</praenomen><nomen>Valerius</nomen><cognomen>Catullus</cognomen></auctor>
69ICML 2006, Grammatical Inference69
70ICML 2006, Grammatical Inference70
A logic program learned by GIFTcolor_blind(Arg1) :-
start(Arg1,X),p11(Arg1,X).
start(X,X).
p11(Arg1,P) :- mother(M,P),p4(Arg1, M).p4(Arg1,X) :-
woman(X),father(F,X),p3(Arg1,F).p4(Arg1,X) :-
woman(X),mother(M,X),p4(Arg1,M).p3(Arg1,X) :- man(X),color_blind(X).
71ICML 2006, Grammatical Inference71
3 Hardness of the task– One thing is to build algorithms, another is to be able to state that it works.
– Some questions:– Does this algorithm work?
– Do I have enough learning data?
– Do I need some extra bias?
– Is this algorithm better than the other?
– Is this problem easier than the other?
72ICML 2006, Grammatical Inference72
Alternatives to answer these questions:
– Use benchmarks
– Solve a real problem
– Prove things
73ICML 2006, Grammatical Inference73
Theory
• Because you may want to be able to say something more than « seems to work in practice ».
74ICML 2006, Grammatical Inference74
Convergence
• Does my algorithm converge in some sense to a best solution.
• To be able to answer, we have to admit the existence of a best solution.
75ICML 2006, Grammatical Inference75
Issues
• Get close to the best?
– Metrics
– Distributions over strings
• PAC related model and similar: very negative results
76ICML 2006, Grammatical Inference76
Identification in the limit
L Pres ⊆N→XA class of languages
A class of grammarsG
LThe naming functionA learner
yields
ϕ
f(N)=g(N) ⇒yields(f)=yields(g)L(ϕ(f))=yields(f)
77ICML 2006, Grammatical Inference77
f1 f2 fn
h1 h2 hn
fi
hi ≡ hn
L(hi)= L
L is identifiable in the limit in terms of Gfrom Pres iff
∀L∈L, ∀f∈Pres(L)
78ICML 2006, Grammatical Inference78
No quería componer otro Quijote —lo cual es fácil— sino el Quijote. Inútil agregar que no encaró nunca una transcripción mecánica del original; no se proponía copiarlo. Su admirable ambición era producir unas páginas que coincidieran -palabra por palabra y línea por línea-con las de Miguel de Cervantes.
[…]
“Mi empresa no es difícil, esencialmente” leo en otro lugar de la carta. “Me bastaría ser inmortal para llevarla a cabo.”
Jorge Luis Borges(1899–1986)Pierre Menard, autor del Quijote (El jardín de senderos que
se bifurcan) Ficciones
79ICML 2006, Grammatical Inference79
4 Algorithmic ideas
80ICML 2006, Grammatical Inference80
The space of GI problems
• Type of input (strings)
• Presentation of input (batch)
• Hypothesis space (subset of the regular grammars)
• Success criteria (identifi-cation in the limit)
81ICML 2006, Grammatical Inference81
Types of input
the cat hates the dogStrings:
StructuralExamples:
cat dog the the hates
(+)
(-)
Graphs:
82ICML 2006, Grammatical Inference82
Types of input - oracles• Membership queries
– Is string S in the target language?
• Equivalence queries– Is my hypothesis correct?
– If not, provide counter example
• Subset queries– Is the language of my hypothesis a subset of the target language?
83ICML 2006, Grammatical Inference83
Presentation of input
• Arbitrary order
• Shortest to longest
• All positive and negative examples up to some length
• Sampled according to some probability distribution
84ICML 2006, Grammatical Inference84
Presentation of input
• Text presentation
– A presentation of all strings in the target language
• Complete presentation (informant)
– A presentation of all strings over the alphabet of the target language labeled as + or -
85ICML 2006, Grammatical Inference85
Hypothesis space
• Regular grammars
– A welter of subclasses
• Context free grammars
– Fewer subclasses
• Hyper-edge replacement graph grammars
86ICML 2006, Grammatical Inference86
Success criteria
• Identification in the limit
– Text or informant presentation
– After each example, learner guesses language
– At some point, guess is correct and never changes
• PAC learning
87ICML 2006, Grammatical Inference87
Theorem’s due to Gold• The good news
– Any recursively enumerable class of languages can be learned in the limit from an informant (Gold, 1967)
• The bad news– A language class is superfinite if it includes all finite languages and at least one infinite language
– No superfinite class of languages can be learned in the limit from a text (Gold, 1967)
– That includes regular and context-free
88ICML 2006, Grammatical Inference88
A picture
Little information
A lot of information
Poor languages Rich Languages
Sub-classes of reg, from pos
Mildly context sensitive, from queries
DFA, from queries
Context-free, from pos
DFA, from pos+neg
89ICML 2006, Grammatical Inference89
AlgorithmsRPNI
K-Reversible
GRIDS
SEQUITUR
L*
90ICML 2006, Grammatical Inference90
4.1 RPNI
• Regular Positive and Negative Grammatical Inference
Identifying regular languages in polynomial time
Jose Oncina & Pedro García 1992
91ICML 2006, Grammatical Inference91
• It is a state merging algorithm;
• It identifies any regular language in the limit;
• It works in polynomial time;
• It admits polynomial charac-teristic sets.
92ICML 2006, Grammatical Inference92
The algorithm
function rmerge(A,p,q)
A = merge(A,p,q)
while ∃a∈Σ, p,q∈δA(r,a), p≠qdo
rmerge(A,p,q)
93ICML 2006, Grammatical Inference93
A=PTA(X); Fr ={δ(q0,a): a∈Σ };
K ={q0};
While Fr≠∅ do
choose q from Fr
if ∃p∈K: L(rmerge(A,p,q))∩X-=∅then A = rmerge(A,p,q)
else K = K ∪ {q}
Fr = {δ(q,a): q∈K} – {K}
94ICML 2006, Grammatical Inference94
X+={λ, aaa, aaba, ababa, bb, bbaaa}
a
a
aa
b
b
b
a
a
a
ba b
a
X-={aa, ab, aaaa, ba}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
95ICML 2006, Grammatical Inference95
Try to merge 2 and 1
a
a
aa
b
b
b
a
a
a
ba b
a
X-={aa, ab, aaaa, ba}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
96ICML 2006, Grammatical Inference96
Needs more merging for determinization
aa
aa
b
b
b
a
a
a
b a ba
X-={aa, ab, aaaa, ba}
1,2
3
4
5
6
7
8
9
10
11
12
13
14
15
97ICML 2006, Grammatical Inference97
But now string aaaa is accepted, so the merge must be rejected
a
b
b a
a
a
ab
a
X-={aa, ab, aaaa, ba}
1,2,4,7
3,5,8 6
9, 11
10
12
13
14
15
98ICML 2006, Grammatical Inference98
Try to merge 3 and 1
a
a
aa
b
b
b
a
a
a
ba b
a
X-={aa, ab, aaaa, ba}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
99ICML 2006, Grammatical Inference99
Requires to merge 6 with {1,3}
a
a
aa
b
b
a
a
a
ba b
a
X-={aa, ab, aaaa, ba}
1,3
2
4
5
6
7
8
9
10
11
12
13
14
15
b
10ICML 2006, Grammatical Inference100
And now to merge 2 with 10
a
a
aa
b
aa
a
ba b
a
X-={aa, ab, aaaa, ba}
1,3,6
2
4
5
7
8
9
10
11
12
13
14
15
b
10ICML 2006, Grammatical Inference101
And now to merge 4 with 13
a
a
aa
b
a
ba b
a
X-={aa, ab, aaaa, ba}
1,3,6
2,10 4
5
7
8
9
11
12
13
14
15
b a
10ICML 2006, Grammatical Inference102
And finally to merge 7 with 15
a
a
aa
b
a
ba b
a
X-={aa, ab, aaaa, ba}
1,3,6
2,10
4,13
5
7
8
9
11
12
14
15
b
10ICML 2006, Grammatical Inference103
No counter example is accepted so the merges are kept
a
a
aa
bb
a ba
X-={aa, ab, aaaa, ba}
1,3,6
2,10
4,13
5
7,15
8
9
11
12
14
b
10ICML 2006, Grammatical Inference104
Next possible merge to be checked is {4,13} with {1,3,6}
a
a
aa
bb
a ba
X-={aa, ab, aaaa, ba}
1,3,6
2,10
4,13
5
7,15
8
9
11
12
14
b
10ICML 2006, Grammatical Inference105
a
a
ab
ba b
a
X-={aa, ab, aaaa, ba}
1,3,4,6,13
2,10
5
7,15
8
9
11
12
14
b
a
More merging for determinizationis needed
10ICML 2006, Grammatical Inference106
ab
a ba
X-={aa, ab, aaaa, ba}
1,3,4,6,8,13
2,7,10,11,15
5
9
12
14
b
a
But now aa is accepted
10ICML 2006, Grammatical Inference107
So we try {4,13} with {2,10}
a
a
aa
bb
a ba
X-={aa, ab, aaaa, ba}
1,3,6
2,10
4,13
5
7,15
8
9
11
12
14
b
10ICML 2006, Grammatical Inference108
After determinizing, negative string aa is again accepted
a ba b
a
X-={aa, ab, aaaa, ba}
1,3,62,4,7,10,13,15
5,89,11 12
14
b
a
10ICML 2006, Grammatical Inference109
So we try 5 with {1,3,6}
a
a
aa
bb
a ba
X-={aa, ab, aaaa, ba}
1,3,6
2,10
4,13
5
7,15
8
9
11
12
14
b
11ICML 2006, Grammatical Inference110
But again we accept ab
aa
aa
b
b
X-={aa, ab, aaaa, ba}
1,3,5,6,12
2,9,10,144,13
7,15
8
11
b
11ICML 2006, Grammatical Inference111
So we try 5 with {2,10}
a
a
aa
bb
a ba
X-={aa, ab, aaaa, ba}
1,3,6
2,10
4,13
5
7,15
8
9
11
12
14
b
11ICML 2006, Grammatical Inference112
Which is OK. So next possible merge is {7,15} with {1,3,6}
a
a
a
ab
b
X-={aa, ab, aaaa, ba}
1,3,6
2,5,10
4,9,13
7,15
8,12
11,14
b
11ICML 2006, Grammatical Inference113
Which is OK. Now try to merge {8,12} with {1,3,6,7,15}
aa
a
ab
a
X-={aa, ab, aaaa, ba}
1,3,6,7,15
2,5,10
4,9,13
8,12
11,14
b
b
11ICML 2006, Grammatical Inference114
And ab is accepted
a
a
b
a
X-={aa, ab, aaaa, ba}
1,3,6,7,8,12,15
2,5,10,11,14
4,9,13
b
b
11ICML 2006, Grammatical Inference115
Now try to merge {8,12} with {4,9,13}
aa
a
ab
a
X-={aa, ab, aaaa, ba}
1,3,6,7,15
2,5,10
4,9,13
8,12
11,14
b
b
11ICML 2006, Grammatical Inference116
This is OK and no more merge is possible so the algorithm halts.
aa
a
a
b
X-={aa, ab, aaaa, ba}
1,3,6,7,11,14,15
2,5,10
4,8,9,12,13
b
b
11ICML 2006, Grammatical Inference117
Definitions
• Let ≤ be the length-lexordering over Σ*
• Let Pref(L) be the set of all prefixes of strings in some language L.
11ICML 2006, Grammatical Inference118
Short prefixes
Sp(L)={u∈Pref(L): δ(q0,u)=δ(q0,v) ⇒ u≤v}
• There is one short prefix per useful state
0
1 2a
b
a
b b
aSp(L)={λ, a}
11ICML 2006, Grammatical Inference119
Kernel-sets
• N(L)={ua∈Pref(L): u∈Sp(L)}∪{λ}• There is an element in the Kernel-set for each useful transition
0
1 2a
b
a
b b
aN(L)={λ, a, b, ab}
12ICML 2006, Grammatical Inference120
A characteristic sample
• A sample is characteristic (for RPNI) if
–∀x∈Sp(L) ∃xu∈X+–∀x∈Sp(L), ∀y∈N(L),
δ(q0,x)≠δ(q0,y) ⇒∃z∈Σ*: xz∈X+∧yz∈X- ∨
xz∈X-∧yz∈X+
12ICML 2006, Grammatical Inference121
About characteristic samples• If you add more strings to a characteristic sample it still is characteristic;
• There can be many different characteristic samples;
• Change the ordering (or the exploring function in RPNI) and the characteristic sample will change.
12ICML 2006, Grammatical Inference122
Conclusion• RPNI identifies any regular language in the limit;
• RPNI works in polynomial time.
Complexity is in O(║X+║3.║X-║);• There are many significant variants of RPNI;
• RPNI can be extended to other classes of grammars.
12ICML 2006, Grammatical Inference123
Open problems
• RPNI’s complexity is not a tight upper bound. Find the correct complexity.
• The definition of the characteristic set is not tight either. Find a better definition.
12ICML 2006, Grammatical Inference124
AlgorithmsRPNI
K-Reversible
GRIDS
SEQUITUR
L*
12ICML 2006, Grammatical Inference125
4.2 The k-reversible languages• The class was proposed by Angluin(1982).
• The class is identifiable in the limit from text.
• The class is composed by regular languages that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k.
12ICML 2006, Grammatical Inference126
Let A=(Σ, Q, δ, I, F) be a NFA, we denote by AT=(Σ, Q, δT, F, I) the reversal automaton with:
δT(q,a)={q’∈Q: q∈δ(q’,a)}
12ICML 2006, Grammatical Inference127
0 1
3
b2
4
a
ba
a a a
0 1
3
b2
4
a
ba
a a a
A
AT
12ICML 2006, Grammatical Inference128
Some definitions
• u is a k-successor of q if │u│=k and δ(q,u)≠∅.
• u is a k-predecessor of q if │u│=k and δT(q,uT)≠∅.
• λ is 0-successor and 0-predecessor of any state.
12ICML 2006, Grammatical Inference129
0 1
3
b2
4b
a
a a a
A
• aa is a 2-successor of 0 and 1 but not of 3.
• a is a 1-successor of 3.
• aa is a 2-predecessor of 3 but not of 1.
a
13ICML 2006, Grammatical Inference130
A NFA is deterministic with look-ahead k iff ∀q,q’∈Q: q≠q’(q,q’∈I) ∨ (q,q’∈δ(q”,a))
⇒
(u is a k-successor of q) ∧(v is a k-successor of q’) ⇒ u≠v
13ICML 2006, Grammatical Inference131
Prohibited:
2
1
a
a
u
u
│u│=k
13ICML 2006, Grammatical Inference132
Example
This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2.
0 1
3
b2
4
a
ba
a a a
13ICML 2006, Grammatical Inference133
K-reversible automata• A is k-reversible if A is deterministic and AT is deterministic with look-ahead k.
• Example
0 1
b
2baa
b
0 1
b
2baa
bdeterministic deterministic with look-ahead 1
13ICML 2006, Grammatical Inference134
Violation of k-reversibility• Two states q, q’ violate the k-reversibility condition iff– they violate the deterministic condition: q,q’∈δ(q”,a);
or
– they violate the look-ahead condition: •q,q’∈F, ∃u∈Σk: u is k-predecessor of both;
•∃u∈Σk, δ(q,a)=δ(q’,a) and u is k-predecessor of both q and q’.
13ICML 2006, Grammatical Inference135
Learning k-reversible automata
• Key idea: the order in which the merges are performed does not matter!
• Just merge states that do not comply with the conditions for k-reversibility.
13ICML 2006, Grammatical Inference136
K-RL Algorithm (φk-RL)
Data: k∈ , X sample of a k-RL L
A=PTA(X)
While ∃q,q’ k-reversibility violators do
A=merge(A,q,q’)
13ICML 2006, Grammatical Inference137
Let X={a, aa, abba, abbbba}
a
λ ab abb
aa
abbbbabbb abbbba
abbaa
b b b b a
a
a
k=2
Violators, for u= ba
13ICML 2006, Grammatical Inference138
Let X={a, aa, abba, abbbba}
a
λ ab abb
aa
abbbbabbb
abbaa
b b b ba
a
a
k=2
Violators, for u= bb
13ICML 2006, Grammatical Inference139
Let X={a, aa, abba, abbbba}
a
λ ab abb
aa
abbb
abbaa
b b bb
a
a
k=2
14ICML 2006, Grammatical Inference140
Properties (1)• ∀k≥0, ∀X, φk-RL(X) is a k-reversible language.
• L(φk-RL(X)) is the smallest k-reversible language that contains X.
• The class Lk-RL is identifiable in the limit from text.
14ICML 2006, Grammatical Inference141
Properties (2)• Any regular language is k-reversible iff
(u1v)-1L ∩(u2v)-1L≠∅ and │v│=k
⇒(u1v)
-1L=(u2v)-1L
(if two strings are prefixes of a string of length at least k, then the strings are
Nerode-equivalent)
14ICML 2006, Grammatical Inference142
Properties (3)
• Lk-RL(X) ⊂ L(k+1)-RL(X)• Lk-TSS(X) ⊂ L(k-1)-RL(X)
14ICML 2006, Grammatical Inference143
Properties (4)
The time complexity is O(k║X║3).
The space complexity is O(║X║).
The algorithm is not incremental.
14ICML 2006, Grammatical Inference144
Properties (4) Polynomial aspects
• Polynomial characteristic sets
• Polynomial update time
• But not necessarily a polynomial number of mind changes
14ICML 2006, Grammatical Inference145
Extensions• Sakakibara built an extension for context-free grammars whose tree language is k-reversible
• Marion & Besombes propose an extension to tree languages.
• Different authors propose to learn these automata and then estimate the probabilities as an alternative to learning stochastic automata.
14ICML 2006, Grammatical Inference146
Exercises
• Construct a language L that is not k-reversible, ∀k≥0.
• Prove that the class of k-reversible languages is not in TxtEx.
• Run φk-RL on X={aa, aba, abb, abaaba, baaba} for k=0,1,2,3
14ICML 2006, Grammatical Inference147
Solution (idea)
• Lk={ai: i≤k}
• Then for each k: Lk is k-reversible but not k-1-reversible.
• And ULk = a*
• So there is an accumulation point…
14ICML 2006, Grammatical Inference148
AlgorithmsRPNI
K-Reversible
GRIDS
SEQUITUR
L*
14ICML 2006, Grammatical Inference149
4.3 Active Learning: learning DFA from membership and
equivalence queries: the L* algorithm
15ICML 2006, Grammatical Inference150
The classes C and H
• sets of examples
• representations of these sets
• the computation of L(x) (and h(x)) must take place in time polynomial in ⏐x⏐.
15ICML 2006, Grammatical Inference151
Correct learningA class C is identifiable with a polynomial number of queries of type T if there exists an algorithm φ that:
1) ∀L∈C identifies L with a polynomial number of queries of type T;
2) does each update in time polynomial in ⎪f⎪ and in Σ⎪xi⎪, {xi} counter-examples seen so far.
15ICML 2006, Grammatical Inference152
Algorithm L*• Angluin’s papers
• Some talks by Rivest
• Kearns and Vazirani
• Balcazar, Diaz, Gavaldà & Watanabe
15ICML 2006, Grammatical Inference153
Some references• Learning regular sets from queries and counter-examples, D. Angluin, Information and computation, 75, 87-106, 1987.
• Queries and Concept learning, D. Angluin, Machine Learning, 2, 319-342, 1988.
• Negative results for Equivalence Queries, D. Angluin, Machine Learning, 5, 121-150, 1990.
15ICML 2006, Grammatical Inference154
The Minimal Adequate Teacher
• You are allowed:
– strong equivalence queries;
– membership queries.
15ICML 2006, Grammatical Inference155
General idea of L*• find a consistent table (representing a DFA);
• submit it as an equivalence query;
• use counterexample to update the table;
• submit membership queries to make the table complete;
• Iterate.
15ICML 2006, Grammatical Inference156
An observation table
λa
abaab
λ a
1 0
00
010001
15ICML 2006, Grammatical Inference157
The states (S) or test set
The transitions (T)
The experiments (E)
λa
abaab
λ a
1 0
00
010001
15ICML 2006, Grammatical Inference158
Meaning
δ(q0, λ. λ)∈F⇔
λ ∈L
λa
abaab
λ a
1 0
00
010001
15ICML 2006, Grammatical Inference159
δ(q0, ab.a)∉ F⇔
aba ∉ L
λa
abaab
λ a
1 0
00
010001
16ICML 2006, Grammatical Inference160
Equivalent prefixes
These two rows are equal,
hence
δ(q0,λ)= δ(q0,ab)
λa
abaab
λ a
1 0
00
010001
16ICML 2006, Grammatical Inference161
Building a DFA from a table
λa
abaab
λ a
1 0
00
010001
λ
a
a
16ICML 2006, Grammatical Inference162
λ
λ
a
a
abaab
1 0
00
010001
λ
a
a
b
a
b
16ICML 2006, Grammatical Inference163
λ
λ
a
a
abaab
1 0
00
010001
λ
a
a
b
a
b
Some rules
This set is prefix-closed
This set is suffix-closed
SΣ\S=T
16ICML 2006, Grammatical Inference164
An incomplete table
λ
λ
a
a
abaab 01
001
1 0
0
λ
a
a
b
a
b
16ICML 2006, Grammatical Inference165
Good idea
We can complete the table by making membership queries...
u
v
?uv∈L ?
Membership query:
16ICML 2006, Grammatical Inference166
A table is
closed if any row of Tcorresponds to some row in S
λ
λ
a
a
abaab
1 0
00
011001
Not closed
16ICML 2006, Grammatical Inference167
And a table that is not closed
λa
abaab
λ a
1 0
00
011001
λ
a
a
b
a
b
?
16ICML 2006, Grammatical Inference168
What do we do when we have a table that is not closed?
• Let s be the row (of T) that does not appear in S.
• Add s to S, and ∀a∈Σ sa to T.
16ICML 2006, Grammatical Inference169
An inconsistent table
λ
abaa
1 0a
b00
00
010
λ a
1
bbba 01
00
Are a and bequivalent?
17ICML 2006, Grammatical Inference170
A table is consistent if
Every equivalent pair of rows in H remains equivalent in Safter appending any symbol
row(s1)=row(s2)
⇒∀a∈Σ, row(s1a)=row(s2a)
17ICML 2006, Grammatical Inference171
What do we do when we have an inconsistent table?
Let a∈Σ be such that row(s1)=row(s2) butrow(s1a)≠row(s2a)
• If row(s1a)≠row(s2a), it is so for experiment e
• Then add experiment ae to the table
17ICML 2006, Grammatical Inference172
What do we do when we have a closed and consistent table ?
• We build the corresponding DFA
• We make an equivalence query!!!
17ICML 2006, Grammatical Inference173
What do we do if we get a counter-example?
• Let u be this counter-example
• ∀w∈Pref(u) do– add w to S
–∀a∈Σ, such that wa∉Pref(u) add wa to T
17ICML 2006, Grammatical Inference174
Run of the algorithm
λ
λ
a
b
1
1
1 Table is now closed
and consistentλ
b
a
17ICML 2006, Grammatical Inference175
An equivalence query is made!
λ
b
a
Counter example baa is returned
17ICML 2006, Grammatical Inference176
λ
a
b
1
0baaba
baaa
bbbab
baab
1
01
1
11
Not consistent
λ
1
Because of
17ICML 2006, Grammatical Inference177
λ
a
b
1
0 0 baaba
baaa
bbbab
baab
1 1
0 1
1 0
1 1
λ a
1 1 Table is now closed and consistent
λ ba
baa
a
b
a
b b
a
0
0 0
1 1
17ICML 2006, Grammatical Inference178
Proof of the algorithm
Sketch only
Understanding the proof is important for further algorithms
Balcazar et al. is a good place for that.
17ICML 2006, Grammatical Inference179
Termination / Correctness• For every regular language there is a unique minimal DFA that recognizes it.
• Given a closed and consistent table, one can generate a consistent DFA.
• A DFA consistent with a table has at least as many states as different rows in S.
• If the algorithm has built a table with n different rows in S, then it is the target.
18ICML 2006, Grammatical Inference180
Finiteness
• Each closure failure adds one different row to S.
• Each inconsistency failure adds one experiment, which also creates a new row in S.
• Each counterexample adds one different row to S.
18ICML 2006, Grammatical Inference181
Polynomial
• |E| ≤ n
• at most n-1 equivalence queries
• |membership queries| ≤ n(n-1)m where m is the length of the longest counter-example returned by the oracle
18ICML 2006, Grammatical Inference182
Conclusion• With an MAT you can learn DFA
– but also a variety of other classes of grammars;
– it is difficult to see how powerful is really an MAT;
– probably as much as PAC learning.
– Easy to find a class, a set of queries and provide and algorithm that learns with them;
– more difficult for it to be meaningful.
• Discussion: why are these queries meaningful?
18ICML 2006, Grammatical Inference183
AlgorithmsRPNI
K-Reversible
GRIDS
SEQUITUR
L*
18ICML 2006, Grammatical Inference184
4.4 SEQUITUR
(http://sequence.rutgers.edu/sequitur/)
(Neville Manning & Witten, 97)
Idea: construct a CF grammar from a very long string w, such that L(G)={w}– No generalization
– Linear time (+/-)
– Good compression rates
18ICML 2006, Grammatical Inference185
Principle
The grammar with respect to the string:
– Each rule has to be used at least twice;
– There can be no sub-string of length 2 that appears twice.
18ICML 2006, Grammatical Inference186
ExamplesS→abcdbc S →aAdA
A →bc
S→AbAabA →aa
S→aabaaab
S→AaAA →aab
18ICML 2006, Grammatical Inference187
abcabdabcabd
18ICML 2006, Grammatical Inference188
In the beginning, God created the heavens and the earth.
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
And God said, Let there be light: and there was light.
And God saw the light, that it was good: and God divided the light from the darkness.
And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.
And God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters.
And God made the firmament, and divided the waters which were under the firmament from the waters which were above the firmament: and it was so.
And God called the firmament Heaven. And the evening and the morning were the second day.
18ICML 2006, Grammatical Inference189
19ICML 2006, Grammatical Inference190
• appending a symbol to rule S;
• using an existing rule;
• creating a new rule;
• and deleting a rule.
Sequitur options
19ICML 2006, Grammatical Inference191
Results
On text:
– 2.82 bpc
– compress 3.46 bpc
– gzip 3.25 bpc
– PPMC 2.52 bpc
19ICML 2006, Grammatical Inference192
AlgorithmsRPNI
K-Reversible
GRIDS
SEQUITUR
L*
19ICML 2006, Grammatical Inference193
4.5 Using a simplicity bias (Langley & Stromsten, 00)
Based on algorithm GRIDS (Wolff, 82)
Main characteristics:– MDL principle;
– Not characterizable;
– Not tested on large benchmarks.
19ICML 2006, Grammatical Inference194
Two learning operatorsCreation of non terminals and rules
NP →ART ADJ NOUNNP →ART ADJ ADJ NOUN
NP →ART AP1NP →ART ADJ AP1AP1 → ADJ NOUN
19ICML 2006, Grammatical Inference195
Merging two non terminals
NP →ART AP1NP →ART AP2AP1 → ADJ NOUNAP2 → ADJ AP1
NP →ART AP1AP1 → ADJ NOUNAP1 → ADJ AP1
19ICML 2006, Grammatical Inference196
• Scoring function: MDL
principle: ⎪G⎪+Σw∈T ⎪d(w)⎪
• Algorithm:
– find best merge that improves current grammar
– if no such merge exists, find best creation
– halt when no improvement
19ICML 2006, Grammatical Inference197
Results
• On subsets of English grammars (15 rules, 8 non terminals, 9 terminals): 120 sentences to converge
• on (ab)*: all (15) strings of length ≤ 30
• on Dyck1: all (65) strings of length ≤ 12
19ICML 2006, Grammatical Inference198
AlgorithmsRPNI
K-Reversible
GRIDS
SEQUITUR
L*
19ICML 2006, Grammatical Inference199
5 Open questions and conclusions• dealing with noise
• classes of languages that adequately mix Chomsky’s hierarchy with edit distance
• stochastic context-free grammars
• polynomial learning from text
• learning POMDPs
• fast algorithms…