What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International...
Transcript of What Was Wrong with the Chomsky Hierarchy?seen as computer languages. In of the Tenth International...
What Was Wrong with the Chomsky Hierarchy?
Makoto KanazawaHosei University
context-free
Chomsky Hierarchy of Formal Languages
regular
context-sensitive
recursively enumerable
Hopcroft and Ullman 1979
Enduring impact of Chomsky’s work on theoretical computer science.15 pages devoted to the Chomsky hierarchy.
450 INDEX
Carmichael, R. D., 444Cartesian product, 6, 46CD-ROM, 349Certificate, 293CFG, see Context-free grammarCFL, see Context-free languageChaitin, Gregory J., 264Chandra, Ashok, 444Characteristic sequence, 206Checkers, game of, 348Chernoff bound, 398Chess, game of, 348Chinese remainder theorem, 401Chomsky normal form, 108–111, 158,
198, 291Chomsky, Noam, 444Church, Alonzo, 3, 183, 255Church–Turing thesis, 183–184, 281CIRCUIT-SAT , 386Circuit-satisfiability problem, 386CIRCUIT-VALUE, 432Circular definition, 65Clause, 302Clique, 28, 296CLIQUE, 296Closed under, 45Closure under complementation
context-free languages, non-, 154deterministic context-free
languages, 133P, 322regular languages, 85
Closure under concatenationcontext-free languages, 158NP, 322P, 322regular languages, 47, 60
Closure under intersectioncontext-free languages, non-, 154regular languages, 46
Closure under starcontext-free languages, 158NP, 327P, 327regular languages, 62
Closure under unioncontext-free languages, 158NP, 322P, 322regular languages, 45, 59
CNF-formula, 302Co-Turing-recognizable language, 209Cobham, Alan, 444Coefficient, 183Coin-flip step, 396Complement operation, 4Completed rule, 140Complexity class
ASPACE(f(n)), 410ATIME(t(n)), 410BPP, 397coNL, 354coNP, 297EXPSPACE, 368EXPTIME, 336IP, 417L, 349NC, 430NL, 349NP, 292–298NPSPACE, 336NSPACE(f(n)), 332NTIME(f(n)), 295P, 284–291, 297–298PH, 414PSPACE, 336RP, 403SPACE(f(n)), 332TIME(f(n)), 279ZPP, 439
Complexity theory, 2Composite number, 293, 399Compositeness witness, 401COMPOSITES, 293Compressible string, 267Computability theory, 3
decidability and undecidability,193–210
recursion theorem, 245–252reducibility, 215–239Turing machines, 165–182
Computable function, 234Computation history
context-free languages, 225–226defined, 220linear bounded automata,
221–225Post Correspondence Problem,
227–233reducibility, 220–233
Sipser 2013
No mention of the Chomsky hierarchy.The same with the latest edition of Hopcroft, Motwani, and Ullman (2006).
Semi-Thue System/String Rewriting System
α → βProduction
γαδ ⇒ γβδOne-step rewriting
L(G) = { w ∈ Σ* ∣ S ⇒* w }
Type 0 unrestricted
Type 1 |α| ≤ |β|
Type 2 α ∈ N
Type 3 α ∈ N, β ∈ Σ N
α, β ∈ (N ∪ Σ)*
Language
N: finite alphabet of nonterminal symbolsΣ: finite alphabet of terminal symbols
context-free
regular
context-sensitive
recursively enumerable
type 0
type 1
type 2
type 3
Turing machines
LBA
PDA
FA
rewriting systems
machine models
LBA: linear-bounded automata = NSPACE(n) Turing machinesPDA: pushdown automataFA: finite automata
recursively enumerable
context-sensitive
context-free
regular
type 0
type 1
type 2type 3
PSPACE-complete
Context-sensitive languages are computationally very complex.Some context-sensitive languages are PSPACE-complete.
• Context-sensitive languages are computationally very complex.
• There is a huge gap between context-free and context-sensitive.
• Many intermediate classes of languages have been considered in formal language theory.
• In particular, “mildly context-sensitive” grammars have been investigated by computational linguists.
Indexed GrammarsHopcroft and Ullman 1979
recursively enumerable
context-sensitive
context-free
regular
type 0
type 1
type 2type 3
indexed??
What type of rewriting system is an indexed grammar??
T → ABC
Tfg ⇒ AfgBfgCfg
Production
One-step rewriting
α → βProduction
γαδ ⇒ γβδOne-step rewriting
Type 0
Indexed
An indexed grammar is not an instance of a type 0 grammar.
recursively enumerable
context-sensitive
context-free
regular
type 0
type 1
type 2type 3
indexed
NP-complete
Indexed languages are also sort of complex.
70 GERALD GAZDAR
push items onto, pop items from, and copy the stack. What we end up with now is no longer equivalent to the CF-PSGs but is significantly more powerful, namely the indexed grammars (Aho, 1968). This class of grammars has been alluded to a number of times in the recent linguistic literature: by Klein (1981) in connection with nested comparative constructions, by Dahl (1982) in connection with topicalised pronouns, by Engdahl (1982) and Gazdar (1982) in connection with Scandinavian unbounded dependencies, by Huybregts (1984) ilDd Pulman and Ritchie (1984) in connection with Dutch, by Marsh and Partee (1984) in connection with variable binding, and doubtless elsewhere as well.
Indexed grammars fall in between CF-PSGs and context-sensitive grammars in their generative capacity. Every context-free language (CPL) is an indexed language (IL), but not conversely. Thus anbnen is an IL, for example, but it is not a CPL. And every IL is a context-sensitive language, but not conversely. Until recently, there were no good arguments to suggest that the natural languages (NLs) fell outside the CFLs (see Pullum and Gazdar, 1982, for defense of this claim), but work by Culy (1985), Huybregts (1984) and Shieber (1985) does now indicate rather strongly that they do. However, no NL phenomena are known which would imply that NLs exist which fall outside the indexed class (see Gazdar and Pullum, 1985, for a survey).
Hopcroft and Ullman write that "of the many generalizations of context -free grammars that have been proposed, a class called 'indexed' appears the most natural, in that it arises in a wide variety of contexts" (1979:389). One purpose of the present paper is to make indexed grammars more accessible to linguists and computational linguists, and more directly relevant to their concerns. It assumes some passive competence in mathematical linguistics, but not very much. In accord with the purpose just mentioned, I shall present grammars informally as sets of rules, rather than as the official n-tuples. The start symbol will always be "S" or and the relevant sets of terminals, indices, and nonterminals can simply be inferred from looking at what appears in the rules.
2. THE FORM OF RULES: A STACK-ORIENTED NOTATION
In exhibiting schematic rules and trees below, I shall maintain a number
APPLICABILITY OF INDEXED GRAMMARS 71
of orthographic conventions: nonterminal symbols are indicated by upper case letters (A, B, C), terminal symbols by lower case letters (a, b, e), possibly empty strings of terminals and nonterminals by W, W1, W2, etc., indices by lower case italic letters k), and stacks of indices by square brackets and periods ([], [ .. ], [i, .. ]) where [i, .. ] is a stack whose topmost index is i, [] is an empty, and [ .. ] is a possibly empty stack of indices. As for the rules themselves, Aho (1968) uses one notation, and Hopcroft and Ullman (1979) use another. The notation used below is essentially just a redundant, and intendedly more perspicuous, variant of that employed by Hopcroft and Ullman.
In the standard formulations, an indexed grammar can contain rules of three different sorts:
(1) 1. A[ .. ] -> W[ .. ] i 1. A[ .. ] -> B[i, .. ] i i.i. A[i, .. ] -> W[ .. ]
I shall refer to rules that have one or other of these three forms as H& U rules. The first type of rule simply copies the stack to all nonterminal daughters. The second type of rule pushes a new index onto the stack handed down to its unique nonterminal daughter. And the third type of rule pops an index off the stack and distributes what is left to its nonterminal daughters. In the rules, A and B are nonterminal symbols (not necessarily distinct) and W is a string ofterminal and/or nonterminal symbols. A compound symbol of the form A[ .. ] means that the nonterminalA bears the stack [ .. ]. A compound symbol of the form W[ .• ] stands for a string of terminal and/or nonterminal symbols each nonterminal symbol-of which bears the stack [ .. ]. Terminal symbols cannot bear Thus, if W = BcDe, then W[ •. ] = B[ .. ]e D[ .. ]e. Stacks of indices are ,;thus associated with nonterminals and transmitted by rules. Indeed, the iLs are exactly characterised by a class of automata known as (one-way nondeterministic) nested stack automata (Aho, 1969). The indices that make up stacks are drawn from some finite vocabulary though the stacks themselves are not bounded in size. Any upper bound on the size of stacks restricts the class of grammars to the context-free class.
The types of.rule shown in (1) are just those permitted in Hopcroft and Ullman's defInition of the class of grammars--what I shall refer to as the standard definition. Notice that this defInition (a) copies all or most of the
68 STUART M. SHffiBER 2. Robert Berwick and Amy Weinberg. The Grammatical Basis of Linguistic Performance:
Language Use and Acquisition. MIT Press, Cambridge, Massachusetts, 1984. 3. Joan Bresnan, editor. The Mental Representation of Grammatical Relations. MIT Press,
Cambridge, Massachusetts, 1982. 4. Luca Cardelli and Peter Wegner. Understanding types, data abstraction and
polymorphism. Draft paper, 1985. 5. Noam Chomsky. Aspects of the Theory of Syntax. MIT Press, Cambridge, Massachusetts,
1965. 6. Noam Chomsky. Lectures on Government and Binding. Foris Publications, Dordrecht,
Holland, 1982. 7. Steven Fortune, Daniel Leivant, and Michael O'Donnell. The Expressiveness of Simple
and Second Order Type Structures. Research Report RC 8542, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1980.
8. Gerald Gazdar. Phrase Structure Grammar, pages 131-186. D. Reidel, Dordrecht, Holland, 1982.]
9. Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. Generalized Phrase Structure Grammar. Blackwell Publishing, Oxford, England, and Harvard University Press, Cambridge, Massachusetts, 1985.
10. Ronald Kaplan. Personal communication, 1985. 11. Martin Kay. Unification Grammar. Technical Report, Xerox Palo Alto Research
Center, Palo Alto, California, 1983. 12. Fernando C. N. Pereira and Stuart M. Shieber. The semantics of grammar formalisms
seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California, 2-7 July 1984.
13. Fernando C. N. Pereira and David H. D. Warren. Parsing as deduction. In of the 21st Annual Meeling of the Association for ComputatiolJal Linguistics, pages 137-144, Massachusetts Institute of Technology, Cambridge, Massachusetts, 15-17 June 1983.
14. Ritchie, Graeme. The Computational Complexity of Sentence Derivation in Functional Unification Grammar. In Proceedings of the 11th International Conference on Computational Linguistics, pages 5!W-586, University of Bonn, Bonn, West-Germany, August 1986.
15. William C. Rounds and Robert Kasper. A Complete Logical Calculus for RecoId Structures Representing Linguistic Information. In Proceedings of the 15th Annual Symposium on Logic in Computer Science, Massachusetts Institute of Technology, Cambridge, Massachussetts, June 1986.
16. Stuart M. Shieber. The design of a computer language for linguistic information. In Of the Tenth International Conference on Computational Linguistics,
Stanford University, Stanford, California, 2-7 July 1984. 17. Stuart M. Shieber. An Introduction to Unification-Based Approaches to Grammar.
Volume 4 of Lecture Note Series, Center for the Study of Language and Information, Stanford, California, 1986.
18. Stuart M. Shieber. A simple reconstruction of GPSG. In Proceedings of the 11th International Conference on Computational Linguistics, University of Bonn, Bonn, West Germany, 25-29 August 1986.
19. Stuart M. Shieber. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In of the 22nd Annual Meeting of the Association for Computational Linguistics, University of Chicago, Chicago, Dlinois, July 1985.
20. Patrick H. Winston and Karen A. Prendergast. The AI Business: Commercial Uses of ArtifiCial Intelligence. MIT Press, Cambridge, Massachusetts, 1984.
GERALD GAZDAR
UNIVERSITY OF SUSSEX
SCHOOL OF SOCIAL SCIENCES
BRIGHTON
APPLICABILITY
OF INDEXED GRAMMARS
TO NATURAL LANGUAGES
* 1. INTRODUCI10N
If we take the class of context-free phrase structure grammars (CF-PSGs) and modify it so that (i) grammars are allowed to make use of fmite feature systems and (ii) rules are permitted to manipulate the features in arbitrary ways, then what we end up with is equivalent to what we started out with. Suppose, however, that we take the class of context-free phrase structure grammars and modify it so that (i) grammars are allowed to employ a single designated feature that takes stacks of items drawn from some finite set as its values, and (ii) rules are permitted to
* This paper originates in a talk given to the Workshop on Scandinavian and the Theory of Grammar held in Trondheim in the Summer of 1982 and orgamzed by Lars Hellan. I am indebted to Jens Erik Fenstad for drawing my attention to an error in my presentation there. A subsequent version was given to the Santa Cruz Workshop on the Mathematics of Grammars and Languages organized by Geoffrey K. Pullum in June 1985. I . am grateful to Fernando Pereira, Carl Pollard, and Stuart Shieber for relevant conversations to Bill Marsh and Geoff Pullum for their detailed comments on earlier drafts, and to'Dikran Karageuzian and his colleagues for the work they did on the diagrams and figures found below. Support for work on this paper was provided by the ESRC (UK), by grants to Stanford University from the National Science Foundation (BNs.8102406) and the Sloan Foundation by the Center for the Study of Language and Information, and by grants from the Sloan' Foundation and System Development Foundation to the Center for Advanced Study in the Behavioral Sciences.
I
69
U. Reyle and C. Rohrer (eds.), Natural Language Parsing and Linguistic Theories, 69-94. © 1988 by D. Reidel PUblishing Company.
68 STUART M. SHffiBER 2. Robert Berwick and Amy Weinberg. The Grammatical Basis of Linguistic Performance:
Language Use and Acquisition. MIT Press, Cambridge, Massachusetts, 1984. 3. Joan Bresnan, editor. The Mental Representation of Grammatical Relations. MIT Press,
Cambridge, Massachusetts, 1982. 4. Luca Cardelli and Peter Wegner. Understanding types, data abstraction and
polymorphism. Draft paper, 1985. 5. Noam Chomsky. Aspects of the Theory of Syntax. MIT Press, Cambridge, Massachusetts,
1965. 6. Noam Chomsky. Lectures on Government and Binding. Foris Publications, Dordrecht,
Holland, 1982. 7. Steven Fortune, Daniel Leivant, and Michael O'Donnell. The Expressiveness of Simple
and Second Order Type Structures. Research Report RC 8542, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1980.
8. Gerald Gazdar. Phrase Structure Grammar, pages 131-186. D. Reidel, Dordrecht, Holland, 1982.]
9. Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag. Generalized Phrase Structure Grammar. Blackwell Publishing, Oxford, England, and Harvard University Press, Cambridge, Massachusetts, 1985.
10. Ronald Kaplan. Personal communication, 1985. 11. Martin Kay. Unification Grammar. Technical Report, Xerox Palo Alto Research
Center, Palo Alto, California, 1983. 12. Fernando C. N. Pereira and Stuart M. Shieber. The semantics of grammar formalisms
seen as computer languages. In of the Tenth International Conference on Computational Linguistics, StanfoId University, Stanford, California, 2-7 July 1984.
13. Fernando C. N. Pereira and David H. D. Warren. Parsing as deduction. In of the 21st Annual Meeling of the Association for ComputatiolJal Linguistics, pages 137-144, Massachusetts Institute of Technology, Cambridge, Massachusetts, 15-17 June 1983.
14. Ritchie, Graeme. The Computational Complexity of Sentence Derivation in Functional Unification Grammar. In Proceedings of the 11th International Conference on Computational Linguistics, pages 5!W-586, University of Bonn, Bonn, West-Germany, August 1986.
15. William C. Rounds and Robert Kasper. A Complete Logical Calculus for RecoId Structures Representing Linguistic Information. In Proceedings of the 15th Annual Symposium on Logic in Computer Science, Massachusetts Institute of Technology, Cambridge, Massachussetts, June 1986.
16. Stuart M. Shieber. The design of a computer language for linguistic information. In Of the Tenth International Conference on Computational Linguistics,
Stanford University, Stanford, California, 2-7 July 1984. 17. Stuart M. Shieber. An Introduction to Unification-Based Approaches to Grammar.
Volume 4 of Lecture Note Series, Center for the Study of Language and Information, Stanford, California, 1986.
18. Stuart M. Shieber. A simple reconstruction of GPSG. In Proceedings of the 11th International Conference on Computational Linguistics, University of Bonn, Bonn, West Germany, 25-29 August 1986.
19. Stuart M. Shieber. Using restriction to extend parsing algorithms for complex-feature-based formalisms. In of the 22nd Annual Meeting of the Association for Computational Linguistics, University of Chicago, Chicago, Dlinois, July 1985.
20. Patrick H. Winston and Karen A. Prendergast. The AI Business: Commercial Uses of ArtifiCial Intelligence. MIT Press, Cambridge, Massachusetts, 1984.
GERALD GAZDAR
UNIVERSITY OF SUSSEX
SCHOOL OF SOCIAL SCIENCES
BRIGHTON
APPLICABILITY
OF INDEXED GRAMMARS
TO NATURAL LANGUAGES
* 1. INTRODUCI10N
If we take the class of context-free phrase structure grammars (CF-PSGs) and modify it so that (i) grammars are allowed to make use of fmite feature systems and (ii) rules are permitted to manipulate the features in arbitrary ways, then what we end up with is equivalent to what we started out with. Suppose, however, that we take the class of context-free phrase structure grammars and modify it so that (i) grammars are allowed to employ a single designated feature that takes stacks of items drawn from some finite set as its values, and (ii) rules are permitted to
* This paper originates in a talk given to the Workshop on Scandinavian and the Theory of Grammar held in Trondheim in the Summer of 1982 and orgamzed by Lars Hellan. I am indebted to Jens Erik Fenstad for drawing my attention to an error in my presentation there. A subsequent version was given to the Santa Cruz Workshop on the Mathematics of Grammars and Languages organized by Geoffrey K. Pullum in June 1985. I . am grateful to Fernando Pereira, Carl Pollard, and Stuart Shieber for relevant conversations to Bill Marsh and Geoff Pullum for their detailed comments on earlier drafts, and to'Dikran Karageuzian and his colleagues for the work they did on the diagrams and figures found below. Support for work on this paper was provided by the ESRC (UK), by grants to Stanford University from the National Science Foundation (BNs.8102406) and the Sloan Foundation by the Center for the Study of Language and Information, and by grants from the Sloan' Foundation and System Development Foundation to the Center for Advanced Study in the Behavioral Sciences.
I
69
U. Reyle and C. Rohrer (eds.), Natural Language Parsing and Linguistic Theories, 69-94. © 1988 by D. Reidel PUblishing Company.
72 GERALD GAZDAR
stack to all nonterminal . daughters in all rule types, and (b) restricts additions to the stack to cases where the mother category has but a single daughter. Neither aspect of the definition appears to be essential, and things are probably only done that way in order to facilitate doing proofs. Suppose we allow the rule types shown in (2) in addition to those listed in (1):
(2) i. ii. iii.
A[ .. ] -> WiD B[ .. ] W2D A[) -> WiD B[i, .. ] WZp A[l, .. ] -> W1[]E[ .. ] W2U
Rules of type (2.i) would allow the stack to be carried down onto a single nonterminal designated daughter, and rules of type (2.ii) and (2.iii) would also allow such a designated daughter to have its stack incremented or decremented, respectively. As shown in the appendix, permitting such additional rule types has no consequences for the class of languages such grammars can generate.
3. SOME EXAMPLE GRAMMARS
The best way to see how indexed grammars work is to look at an example, such as (3), which shows a grammar for aDbD.
(3) S[ .• ] -> aA[z, .• ] A[ .. ] -> aA[a, .. ] A[ .. ] -> B[ .. ] B[a, .. ] -> bB[ .. ] B[z, .. ] -> b
This looks more complicated than it actually is. All it does is generate trees such as that shown in (4):
I I
.J ... ":"<'?
APPLICABILITY OF INDEXED GRAMMARS 73
(4) 51]
8
B[z] b
b
As we go down the tree, nonterminal as are produced, and index as are loaded onto the stack. WheD we get to the middle, signalled by the change of category from A to B, then nonterminal bs are produced, and index as are removed from the stack. The stack thus records how many nonterminal as were originally produced, and this record can then be used to produce exactly as many nonterminal bs. One slight complication is the use of the end-marker index z. This is necessary because the standard formulations of index grammars allow a non-empty stack to simply disappear if the category the· stack appears on only has terminal daughters. We could avoid the need for end-marker indices of this kind by requiring that the stack distributed over daughters must be empty when every daughter is nonterminal. As shown in the appendix, such a constraint would not affect the class of languages generated, but it does simplify the formulation of certain kinds of grammar. Thus we could simplify the grammar in (3) above by replacing each occurrence of z with Q. When I come to discuss certain abstract tree configurations later in the paper, I shall simply ignore end-marker indices.
Notice that (4) is a right-linear (and hence finite state) tree, and that
Gazdar 1988
r r r \
r r (!fIWt
l
r r r r r r r r r r r r r
The productions in an Indexed Grammar can be written as (refer [Gazdar 85a], the
original definition of Indexed Grammars appears in [Aho 68])
.Y'[··]--+ .. Ydi, .. ] ..... Yn[i, .. ] (push) or
.Y [ i, .. ] --+.\ 1 [ •• ] ..... Y n [ •• ] or
In an Indexed Grammar, a stack of svmbols (called the indices) is associated \vith . .
nontenninals during derivations. [ .. ] is used to represent the present stack associated
with a nonterminal. In the first production the parent (th.s of the production) distributes
copies of its stack to its children after pushing the symbol i at the top of the stack. Thus,
we can see how the stack information is shared by all the symbols in me right-hand
of the production.
In an LIG the productions have the fonn
.. "'( [ i, .. ] --+ ."'( 1 [] • • •. ,:r j [ .. ] . . .• "'( n [] (pop) or
LIG differs from IG in that the stack is passed on to only one of the children. [] corre-
sponds to a new but empty stack. The productions of LIG' s can be generalized [0 be of
the form
79
r r r \
r r (!fIWt
l
r r r r r r r r r r r r r
The productions in an Indexed Grammar can be written as (refer [Gazdar 85a], the
original definition of Indexed Grammars appears in [Aho 68])
.Y'[··]--+ .. Ydi, .. ] ..... Yn[i, .. ] (push) or
.Y [ i, .. ] --+.\ 1 [ •• ] ..... Y n [ •• ] or
In an Indexed Grammar, a stack of svmbols (called the indices) is associated \vith . .
nontenninals during derivations. [ .. ] is used to represent the present stack associated
with a nonterminal. In the first production the parent (th.s of the production) distributes
copies of its stack to its children after pushing the symbol i at the top of the stack. Thus,
we can see how the stack information is shared by all the symbols in me right-hand
of the production.
In an LIG the productions have the fonn
.. "'( [ i, .. ] --+ ."'( 1 [] • • •. ,:r j [ .. ] . . .• "'( n [] (pop) or
LIG differs from IG in that the stack is passed on to only one of the children. [] corre-
sponds to a new but empty stack. The productions of LIG' s can be generalized [0 be of
the form
79
Vijayashanker 1987
“Convergence” of Mildly Context-Sensitive Grammar Formalisms
TAG ≡ LIG ≡ HG
Vijay-Shanker and Weir 1994
TAG: tree-adjoining grammar; LIG: linear indexed grammar; HG: head grammar
recursively enumerable
context-sensitive
context-free
regular
type 0
type 1
type 2type 3
linear indexedtype 1.5?
There is a book where the author calls the linear indexed grammars “type 1.5”.
T[] → AB[]C
T[ fg] ⇒ AB[ fg]C
Production
One-step rewriting
α → βProduction
γαδ ⇒ γβδOne-step rewriting
Type 0
Linear indexed
A linear indexed grammar is not an instance of a type 0 grammar.
recursively enumerable
context-sensitive
context-free
regular
type 0
type 1
type 2type 3
linear indexed
indexed
There are actually almost no other grammar formalisms that are instances of Chomsky’s type 0 grammars.
Thue→Post→Chomsky
§ 1. GENERATIVE GRAMMARS AND LINGUISTIC COMPETENCE 9
ments may provide useful, in fact, compelling evidence for such a theory.
To avoid what has been a continuing misunderstanding, it is perhaps worth while to reiterate that a generative grammar is not a model for a speaker or a hearer. It attempts to characterize in the most neutral possible terms the knowledge of the language that provides the basis for actual use of language by a speaker- hearer. When we speak of a grammar as generating a sentence with a certain structural description, we mean simply that the grammar assigns this structural description to the sentence. When we say that a sentence has a certain derivation with respect to a particular generative grammar, we say nothing about how the speaker or hearer might proceed, in some practical or efficient way, to construct such a derivation. These questions belong to the theory of language use — the theory of per- formance. No doubt, a reasonable model of language use will incorporate, as a basic component, the generative grammar that expresses the speaker-hearer's knowledge of the language; but this generative grammar does not, in itself, prescribe the char- acter or functioning of a perceptual model or a model of speech production. For various attempts to clarify this point, see Chomsky (1957), Gleason (1961), Miller and Chomsky (1963), and many other publications.
Confusion over this matter has been sufficiently persistent to suggest that a terminological change might be in order. Never- theless, I think that the term "generative grammar" is completely appropriate, and have therefore continued to use it. The term "generate" is familiar in the sense intended here in logic, particularly in Post's theory of combinatorial systems. Further- more, "generate" seems to be the most appropriate translation for Humboldt's term erzeugcn, which he frequently uses, it seems, in essentially the sense here intended. Since this use of the term "generate" is well established both in logic and in the tradition of linguistic theory, I can see no reason for a revision of terminology.
Chomsky 1965
Tan JouHNL ow SmBouc LOGIC Volume 12, Number 1, March 1947
RECURSIVE UNSOLVABILITY OF A PROBLEM OF THUE
EMIL L. POST
Alonzo Church suggested to the writer that a certain problem of Thue [6]' might be proved unsolvable by the methods of [5]. We proceed to prove the problem recursively unsolvable, that is, unsolvable in the sense of Church [1], but by a method meeting the special needs of the problem.
Thue's (general) problem is the following. Given a finite set of symbols al,
a2, ... , a, , we consider arbitrary strings (Zeichenreihen) on those symbols, that is, rows of symbols each of which is in the given set. Null strings are in- cluded. We further have given a finite set of pairs of corresponding strings on
the ai's, (Al , B1), (A2 , B2), , I (An , B,). A string R is said to be a substring of a string S if S can be written in the form URV, that is, S consists of the letters,
in order of occurrence, of some string U, followed by the letters of R, followed by the letters of some string V. Strings P and Q are then said to be similar if Q can be obtained from P by replacing a substring Ai or Bi of P by its correspond- ent Bi, Ai. Clearly, if P and Q are similar, Q and P are similar. Finally, P and Q are said to be equivalent if there is a finite set R1 , R2, * * *, R, of strings on a,, * * *, a, such that in the sequence of strings P, R1, R2, ... , RX Q each string except the last is similar to the following string. It is readily seen that this relation between strings on a,, * * *, a, ,is indeed an equivalence relation. Thue's problem is then the problem of determining for arbitrarily given strings
A, B on al, * * *, a;, whether, or no, A and B are equivalent. This problem, at least for the writer, is more readily placed if it is restated
in terms of a special form of the canonical systems of [3]. In that notation, strings C and D are similar if D can be obtained from C by applying to C one of the following operations:
PAiQ produces PBQ, PBQ produces PAQ, i = 1, 2, * , n. (1)
In these operations the operational variables P, Q represent arbitrary strings Strings A and B will then be equivalent if B can be obtained from A by starting with A, and applying in turn a finite sequence of operations (1). That is, A and B are equivalent if B is an assertion in the "canonical system"2 with primi- tive assertion A and operations (1). Thue's general problem thus becomes the decision problem for the class of all canonical systems of this "Thue type."
This general problem could easily be proved recursively unsolvable if, instead of the pair of operations for each i of (1), we merely had the first operation of each pair.3 In fact, by direct methods such as those of [31, we easily reduce the decision problem of an arbitrary "normal system" [3] to the decision problem of such a system of "semi-Thue type," the known recursive unsolvability of the
Received October 26, 1946. Presented to the American Mathematical Society November 2, 1946.
1 Numbers in brackets refer to the bibliography at the end of the paper. 2 Null assertions, however, now being allowed. J That is, using the language of propositions instead of operations, if we merely had an
implication where (1) has an equivalence.
1
This content downloaded from 133.25.240.37 on Thu, 28 Jun 2018 12:34:17 UTCAll use subject to http://about.jstor.org/terms
How did Chomsky come to use semi-Thue systems as the most general type of grammars?I believe Post (1947) coined the term “semi-Thue system”.
Post Canonical Systems FORMAL REDUCTIONS OF THE GENERAL COMBINATORIAL
DECISION PROBLEM.*
By EMIL L. POST.
1. Introduction. It is not new to the literature that the usual form of
a symbolic logic with its parenthesis notation and infinite set of variables can
be transformed into one in which the enunciations, i. e., formulas of the
system, are finite sequences of letters,' the different letters constituting a
once-and-for-all given finite set. If the primitive letters of such a system are
represented by a,, a2, , ap, an arbitrary enunciation of the system will take the form ai, a2* a,, i - 1, 2, 3, , ij 1, 2, , 1. In describing the basis of such a system it is convenient to use new letters to represent finite
sequences of the above primitive letters. If then A, B, * , E represent
the sequences ai, a 2 . . . a,, aj, aj2 ... aj , * * *, a)t1 a,2 * a., respectively, ABR E will represent the sequence ail a 2 **aO aj, aj2 aj.- a,1 am2 * .* am,,.
We shall say that such a system is in canonical form if its basis has the
following structure.2 The primitive assertions of the system are a specified
finite set of enunciations of the above form. The operations of the system are a specified finite set of productionts, each of the following form:
gllPi'l g12PVt2 91M gmPi'.1 91 ('M+1)
g2lPi"i g22PV'2 . .2m2Pi"m_2 g2(m2+1)
P(o) d (e) gkmkP,' (k) ghk
produce
gqlPil gXi2P . . . ,q'Mpim ,QM+l'
* Received November 14, 1941; Revised April 11, 1942. 'More exactly, "strings" of "marks," to use terms of C. I. Lewis (A Survey of
Symbolic Logic, Berkeley, 1918: chapter VI, sec. III).
2 This formulation stems from the "Generalization by Postulation" of the writer's
"Introduction to a general theory of elementary propositions," American Journal of Mathematics, vol. 43 (1921), pp. 163-185 (see p. 176). We take this opportunity to
make the following Emendation: Lemma 1 thereof (pp. 177-178) requires the added
condition that the expressions replacing the r's do not involve any letter upon which a
substitution is made in the given deductive process. This necessitates several minor
changes in the proof of the theorem there following. Actually, both Lemma 1 and its
companion Lemma 2 admit of further simplification, with the proof of the theorem then
being valid as it stands.
197
This content downloaded from 133.25.167.229 on Fri, 29 Jun 2018 04:50:34 UTCAll use subject to http://about.jstor.org/terms
FORMAL REDUCTIONS OF THE GENERAL COMBINATORIAL
DECISION PROBLEM.*
By EMIL L. POST.
1. Introduction. It is not new to the literature that the usual form of
a symbolic logic with its parenthesis notation and infinite set of variables can
be transformed into one in which the enunciations, i. e., formulas of the
system, are finite sequences of letters,' the different letters constituting a
once-and-for-all given finite set. If the primitive letters of such a system are
represented by a,, a2, , ap, an arbitrary enunciation of the system will take the form ai, a2* a,, i - 1, 2, 3, , ij 1, 2, , 1. In describing the basis of such a system it is convenient to use new letters to represent finite
sequences of the above primitive letters. If then A, B, * , E represent
the sequences ai, a 2 . . . a,, aj, aj2 ... aj , * * *, a)t1 a,2 * a., respectively, ABR E will represent the sequence ail a 2 **aO aj, aj2 aj.- a,1 am2 * .* am,,.
We shall say that such a system is in canonical form if its basis has the
following structure.2 The primitive assertions of the system are a specified
finite set of enunciations of the above form. The operations of the system are a specified finite set of productionts, each of the following form:
gllPi'l g12PVt2 91M gmPi'.1 91 ('M+1)
g2lPi"i g22PV'2 . .2m2Pi"m_2 g2(m2+1)
P(o) d (e) gkmkP,' (k) ghk
produce
gqlPil gXi2P . . . ,q'Mpim ,QM+l'
* Received November 14, 1941; Revised April 11, 1942. 'More exactly, "strings" of "marks," to use terms of C. I. Lewis (A Survey of
Symbolic Logic, Berkeley, 1918: chapter VI, sec. III).
2 This formulation stems from the "Generalization by Postulation" of the writer's
"Introduction to a general theory of elementary propositions," American Journal of Mathematics, vol. 43 (1921), pp. 163-185 (see p. 176). We take this opportunity to
make the following Emendation: Lemma 1 thereof (pp. 177-178) requires the added
condition that the expressions replacing the r's do not involve any letter upon which a
substitution is made in the given deductive process. This necessitates several minor
changes in the proof of the theorem there following. Actually, both Lemma 1 and its
companion Lemma 2 admit of further simplification, with the proof of the theorem then
being valid as it stands.
197
This content downloaded from 133.25.167.229 on Fri, 29 Jun 2018 04:50:34 UTCAll use subject to http://about.jstor.org/terms
An indexed grammar is an instance of a Post canonical system.
Why Didn’t Chomsky Use Post Canonical Systems?
REDUCTIONS OF TI-IE COMIBINATORIAL DECISION PROBLEM. 199
the production whenever it contains the enuniciations represelnted by the several premises of the prodluction.
A very special case of the canonical form is what we term the normal
form. A system in canonical form will be said to be in normal form if it has
but one primitive assertion, and, ea el of its productions is in the form
gP
produces
Pg'.
The main purpose of the presenit paper is to demonstrate that every system in canonical form can formally be reduced to a system in normal form. The
two forms may therefore in fact be said to be equipotent.- More precisely, we prove the following
THEOREM. Giver a system in canonical form with primitive letters
a1, a2 < * , , ap, a system in1 normal form with primitive letters a, a2, , aiu, a 1, a'2, * , a' can be set utp such that the assertions of the system in canonical
form are exactly those assertions of the system in normal formn which involve
no other letters than al, a2, , ap.
As a result of this theorem the decision problem for a system in canonical
form is reduced to the decision problenm for the corresponding systemi in normial
form. For an enunciation of the former system is an assertion when and only when it is an assertion of the latter system. Hence any procedure whiclh could
effectively determine for an arbitrary enunciation of the system in normal form whether it is or is not an assertion thereof would automatically do the same for the system in canonical form. Now by methocls such as those referred
to in the opening sentence of this introduction, it can be shown that the problem of determining for an arbitrary well-formed formula in the X-calculus
of Church whether it has or has not a normnal form (Church) ) can be reduced
to the decision problem for a particular system in our canonical form. While Church has proved the above problenm unsolvable in a certaini technical sense, in the interest of economy we invoke his identification of X-definability with
effective calculability to conclude that as a resuilt the decision problem for that
particular system in canonical form, and hence for the class of systems in canonical form, is unsolvable. We are thus led to the more surprising result
5 Alonzo Church, " An unsolvable problem of elementary number theory," American Journal of Mathematics, vol. 58 (1936), pp. 345-363.
This content downloaded from 133.25.167.229 on Fri, 29 Jun 2018 04:50:34 UTCAll use subject to http://about.jstor.org/terms
Post 1947
Perhaps because of this striking result?
recursively enumerable
context-sensitive
context-free
regular
type 0
type 1
type 2type 3
linear indexed
indexed
Should Context-Free Grammars Be Thought of as String Rewriting Systems?
S
NP VP
V NPKim
saw Sandy
S ⇒ NP VP ⇒ Kim VP ⇒ Kim V NP ⇒ Kim saw NP ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ Kim VP ⇒ Kim V NP ⇒ Kim V Sandy ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ NP V NP ⇒ Kim V NP ⇒ Kim saw NP ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ NP V NP ⇒ NP saw NP ⇒ Kim saw NP ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ NP V NP ⇒ NP V Sandy ⇒ Kim V Sandy ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ NP V NP ⇒ Kim V NP ⇒ Kim V Sandy ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ NP V NP ⇒ NP saw NP ⇒ NP saw Sandy ⇒ Kim saw Sandy
S ⇒ NP VP ⇒ NP V NP ⇒ NP V Sandy ⇒ NP saw Sandy ⇒ Kim saw Sandy
It is silly to define well-formed sentences in terms of derivations if what we really want is to assign tree structures to sentences.
Bottom-up Interpretation of Context-Free Rules
Bi ⇒G* vi (i = 1,…,n) A → w0 B1 w1 … Bn wn ∈ PA ⇒G* w0 v1 w1 … vn wn
L(G) = { w ∈ Σ* | S ⇒G* w }
All we need is the subrelation of ⇒G*
restricted to N × Σ*.Just a general type of inductive definition.
Inductive Definition = CFG Interpreted Bottom-up
CFGs as Logic Programs on Strings
A → w0 B1 w1 … Bn wn
A(w0 x1 w1 … xn wn) ← B1(x1),…,Bn(xn)
Horn clause
L(G) = { w ∈ Σ* | G ⊢ S(w) }
Derivation = Proof Tree
S(Kim saw Sandy)
NP(Kim) VP(saw Sandy)
V(saw) NP(Sandy)
G ⊢ S(Kim saw Sandy)
It’s a small step to use n-ary predicates for n ≥ 2.
Multiple Context-Free Grammars
A(α1,…,αq) ← B1(x1,1,…,x1,q1),…,Bn(xn,1,…,xn,qn)
n ≥ 0, q, qi ≥ 1,αk ∈ (Σ ∪ { xi,j | i ∈ [1,n], j ∈ [1,qi] })*each xi,j occurs exactly once in (α1,…,αq)
• q = dim(A) (dimension of A)• dim(S) = 1• L(G) = { w ∈ Σ* | G ⊢ S(w) }
It’s best to think of an MCFG as a kind of logic program.Each rule is a definite clause.Nonterminals are predicates on strings.
S(x1#x2) ← D(x1, x2)D(ε, ε) ←
D(x1y1, y2x2) ← E(x1,x2), D(y1,y2)E(ax1ā, āx2a) ← D(x1,x2)
{ w#wR | w ∈ D1* }
S(aaāāaā#āaāāaa)
D(aaāāaā, āaāāaa)
E(aaāā, āāaa)
D(aā, āa)
E(aā, āa)
D(ε, ε)
D(ε, ε)
D(aā, āa)
E(aā, āa)
D(ε, ε)
D(ε, ε)
2-MCFG2-ary branching
derivation tree
S(x1…xm) ← A(x1,…,xm)A(ε,…,ε) ←
A(a1 x1 a2,…,a2m−1 xm a2m) ← A(x1,…,xm)
non-branching m-MCFG
{ a1n a2n … a2m−1n a2mn | n ≥ 0 }m-MCFL(m−1)-MCFL
2-MCFL
=CFL
1-MCFL
Seki et al. 1991
MCFGs as Logic Programs on Strings
Annius Groenink
..., daß Frank Julia Fred schwimm
en helfen sah
...th
at Fr
ank
saw
Jul
ia he
lp F
red
sw
im
...dat Frank Julia Fred zag helpen zwemmen
word order and tractability issues in natural language analysis
Surface without Structure
Elementary Formal Systems
Smullyan 1961
Elementary Formal Systems
“Elementary Formal Systems can be looked at as variants of the canonical languages of Post …. The reason for our choice of elementary formal systems, in lieu of Post’s canonical systems, is that their structure is more simply described, and their techniques are more easily applied. The general notion of “production” used in Post systems is replaced by the simpler logistic rules of substitution and detachment (modus ponens), which are more easily formalized.”
Smullyan 1961
Chomsky HierarchyRewriting Systems Machines Logic Programs on Strings Languages
Type 0 Turing Elementary Formal Systems (Smullyan 1961) r.e.
Type 1 LBA Length-Bounded EFS (Arikawa et al. 1989)
CSL = NSPACE(n)
Poly-time Turing
Simple LMG (Groenink 1997) / Hereditary EFS (Ikeda and
Arimura 1997)P
MCFG MCFLType 2 PDA Simple EFS (Arikawa 1970) CFLType 3 FA Reg
recursively enumerable
context-sensitive
multiple context-free
context-free
regular
head
Chomsky on Mathematical Linguistics (1979)
Chomsky (2004): Generative Enterprise Revisited
. . .
. . .
A long quote, from an interview held in the year when Hopcroft and Ullman’s textbook was published.
Chomsky on Mathematical Linguistics (1979)
Chomsky (2004): Generative Enterprise Revisited
. . .
Summary• The four levels of the Chomsky hierarchy are not of equal
importance.
• The choice of semi-Thue systems as the most general form of grammar hampered investigation of some important language classes.
• Smullyan’s elementary formal system provides the right level of generality and simplicity.
• Bottom-up semantics of rules is basic.
• Natural generalization of “inductive definitions”.
What is the Significance of Grammar Formalisms?
• Abstract conception of grammars.
• Makes possible investigation of algorithmic properties of grammars.
• Should have good formal properties (like familiar closure properties).
• The choice of a grammar formalism should not be thought of as a tight characterization of possible human grammars.
• A grammar formalism does not have explanatory power.