Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome...
-
Upload
terence-singleton -
Category
Documents
-
view
216 -
download
0
Transcript of Why Syntax is Impossible Mike Dowman. Syntax FLanguages have tens of thousands of words FSome...
Why Syntax is ImpossibleWhy Syntax is Impossible
Mike DowmanMike Dowman
SyntaxSyntax
Languages have tens of thousands of words
Some combinations of words make valid sentences
Others don’tNo one understands the grammar
of any language
Languages have tens of thousands of words
Some combinations of words make valid sentences
Others don’tNo one understands the grammar
of any language
Syntax is Complicated!Syntax is Complicated!
I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?
I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?
Syntax is Complicated!Syntax is Complicated!
I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?
I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!
I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?
I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!
Syntax is Complicated!Syntax is Complicated!
I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?
I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!Who did you see and Mary yesterday?
I saw Bill with Mary yesterday.You saw WHO with Mary yesterday?!Who did you see with Mary yesterday?
I saw Bill and Mary yesterday.You saw WHO and Mary yesterday?!Who did you see and Mary yesterday?
Generative GrammarGenerative Grammar
An explicit formal system that defines the set of valid sentences in a language
And maybe also explains what each one means
Generative grammar is the core research topic in linguistics
Includes strongly nativist theories and theories proposing that languages are primarily learned
An explicit formal system that defines the set of valid sentences in a language
And maybe also explains what each one means
Generative grammar is the core research topic in linguistics
Includes strongly nativist theories and theories proposing that languages are primarily learned
Grammar WritingGrammar Writing
Linguists take a selection of possible sentences
And obtain grammaticality judgments for those sentences
Then they produce a grammar that accounts for all the data
Linguists take a selection of possible sentences
And obtain grammaticality judgments for those sentences
Then they produce a grammar that accounts for all the data
Grammar CoverageGrammar Coverage
Linguists’ grammars only work for selected sentences
They can’t explain most naturally occurring sentences
The more data we consider the more surprising quirks of syntax that emerge
Linguists’ grammars only work for selected sentences
They can’t explain most naturally occurring sentences
The more data we consider the more surprising quirks of syntax that emerge
Children’s Language Acquisition
Children’s Language Acquisition
Kid’s observe a limited number of example sentences
But quickly internalize a system that correctly characterizes the whole language
Kid’s observe a limited number of example sentences
But quickly internalize a system that correctly characterizes the whole language
I-languageE-language LAD
How can kids do syntax when linguists can’t?
How can kids do syntax when linguists can’t?
Innate component of language (provided by genes)
Learned component of language (provided by language data)
Innate component of language (provided by genes)
Learned component of language (provided by language data)
How can kids do syntax when linguists can’t?
How can kids do syntax when linguists can’t?
Innate component of language (provided by genes)
Learned component of language (provided by language data)
Linguists have to infer bothChildren only the learned compo
nent
Innate component of language (provided by genes)
Learned component of language (provided by language data)
Linguists have to infer bothChildren only the learned compo
nent
Information TheoryInformation Theory
Both components of language must contain some amount of information
Data available to children must provide at least enough information as is in the learned component
This puts a limit on the complexity of the learned component of language
Both components of language must contain some amount of information
Data available to children must provide at least enough information as is in the learned component
This puts a limit on the complexity of the learned component of language
Linguists’ TaskLinguists’ Task
Linguists need to have at least as much information as is in the learned and innate components together
Can use data from multiple languages to try to characterize innate components
And can use positive and negative data
Linguists need to have at least as much information as is in the learned and innate components together
Can use data from multiple languages to try to characterize innate components
And can use positive and negative data
Correspondence to Linguistic Theories
Correspondence to Linguistic Theories
Small learned component = parameter setting
Large learned component = learned languages
Small innate component = general learning mechanism
Large innate component = universal grammar
Small learned component = parameter setting
Large learned component = learned languages
Small innate component = general learning mechanism
Large innate component = universal grammar
Size of Each ComponentSize of Each Component
Innate Component
small large huge
small learn = easy
ling = easy
learn = easy
ling = hard
learn = easy
ling = impossible
large learn = hard
ling = hard
learn = hard
ling = hard
learn = hard
ling = impossible
Learned
Component
huge learn = impossible
ling = impossible
learn = impossible
ling = impossible
learn = impossible
ling = impossible
Which component is large?
Which component is large?
As we haven’t yet managed to produce a generative grammar, at least one of innate or learned components must be large
Children learn relatively easily, so the learned component can’t be too big
As we haven’t yet managed to produce a generative grammar, at least one of innate or learned components must be large
Children learn relatively easily, so the learned component can’t be too big
Size of Each ComponentSize of Each Component
Innate Component
small large huge
small learn = easy
ling = easy
learn = easy
ling = hard
learn = easy
ling = impossible
large learn = hard
ling = hard
learn = hard
ling = hard
learn = hard
ling = impossible
Learned
Component
huge learn = impossible
ling = impossible
learn = impossible
ling = impossible
learn = impossible
ling = impossible
How big could the innate component be?
How big could the innate component be?
Genome contains 3 billion base pairs = 6 billion bits
Cell metabolism adds more information
Each base pair can be modified
Huge amount of information!
Genome contains 3 billion base pairs = 6 billion bits
Cell metabolism adds more information
Each base pair can be modified
Huge amount of information!
What could be in a huge innate component?
What could be in a huge innate component?
Not words forms - vary from language to language
Grammaticality patternsRules of syntax would be hugely
complex
Not words forms - vary from language to language
Grammaticality patternsRules of syntax would be hugely
complex
Impossibility of SyntaxImpossibility of Syntax
Grammaticality judgments on average can provide no more than one bit of information each
If syntax is hugely complex, there will be many grammars that are compatible with any given body of data
But all but one of these grammars would fail when tested on enough new data
Grammaticality judgments on average can provide no more than one bit of information each
If syntax is hugely complex, there will be many grammars that are compatible with any given body of data
But all but one of these grammars would fail when tested on enough new data
A Concrete ExampleA Concrete Example
A multi-agent modelEach agent has:innate componentlearned componentBoth are bit strings of fixed
lengthSentences are 100 bit strings
A multi-agent modelEach agent has:innate componentlearned componentBoth are bit strings of fixed
lengthSentences are 100 bit strings
Deciding on the Grammaticality of a Sentence 1
Deciding on the Grammaticality of a Sentence 1
Treat the sentence as a binary number Find:bi = s mod ni
bl = s mod nl
b is an index to a bit in the innate (bi) or learned (bl) component
n is the number of bits in the innate (ni) or learned (nl) component
s is the length of the sentences
Treat the sentence as a binary number Find:bi = s mod ni
bl = s mod nl
b is an index to a bit in the innate (bi) or learned (bl) component
n is the number of bits in the innate (ni) or learned (nl) component
s is the length of the sentences
Deciding on the Grammaticality of a Sentence 2
Deciding on the Grammaticality of a Sentence 2
A pseudo-random function maps from the two selected bits plus the sentence to a Boolean grammaticality judgment
It’s therefore typically necessary to know every bit of the sentence and both the innate and learned bits to predict the grammaticality of the sentence
Every bit counts
Usually about half of sentences are grammatical, half ungrammatical
A pseudo-random function maps from the two selected bits plus the sentence to a Boolean grammaticality judgment
It’s therefore typically necessary to know every bit of the sentence and both the innate and learned bits to predict the grammaticality of the sentence
Every bit counts
Usually about half of sentences are grammatical, half ungrammatical
4 Kinds of Agent4 Kinds of Agent
TeacherInnate: 10101000
Learned: 10010101
RelatedInnate: 10101000
Learned: 11110001
UnrelatedInnate: 10110101
Learned: 00111000
LinguistInnate: 00110100
Learned: 10001100
Learning by Related, Unrelated
Learning by Related, Unrelated
Observe a sentence from the teacher
Work out if it is grammatical according to current I-language
If not, invert the relevant bit of the learned component
Observe a sentence from the teacher
Work out if it is grammatical according to current I-language
If not, invert the relevant bit of the learned component
Grammar Inference by Linguists
Grammar Inference by Linguists
Choose random sentencesAsk the teacher if they are grammatic
alStore all sentences and grammaticali
ty judgmentsSearch for a setting of innate and lea
rned components that assigns the correct grammaticality rating to every sentence
Choose random sentencesAsk the teacher if they are grammatic
alStore all sentences and grammaticali
ty judgmentsSearch for a setting of innate and lea
rned components that assigns the correct grammaticality rating to every sentence
1,000 Bit Innate and Learned Components1,000 Bit Innate and
Learned Components
0.6
0.7
0.8
0.9
1
0 5000 10000 15000 20000
Number of Example Sentences
relatedunrelatedlinguist
1,000 Bit Innate Component 1,000,000 Bit Learned Component
1,000 Bit Innate Component 1,000,000 Bit Learned Component
0.6
0.7
0.8
0.9
1
0 5000 10000 15000 20000
Number of Example Sentences
relatedunrelatedlinguist
1,000,000 Bit Innate Component 1,000 Bit Learned Component
1,000,000 Bit Innate Component 1,000 Bit Learned Component
0.6
0.7
0.8
0.9
1
0 5000 10000 15000 20000
Number of Example Sentences
relatedunrelatedlinguist
Implications of Impossible Syntax
Implications of Impossible Syntax
A linguist can write a grammar that will adequately characterize any body of data
But it will fail when tested on new data
Partial grammars are not a stepping stone to complete generative grammars
A linguist can write a grammar that will adequately characterize any body of data
But it will fail when tested on new data
Partial grammars are not a stepping stone to complete generative grammars
A Universal Law of Generative Grammar
A Universal Law of Generative Grammar
Generative grammar is impossible if:
H(learned component) + H(innate component) > H(language data)
Unless we can use information from another source (genetic, neuroscientific, psycholinguistic)
Generative grammar is impossible if:
H(learned component) + H(innate component) > H(language data)
Unless we can use information from another source (genetic, neuroscientific, psycholinguistic)
Why do Syntax?Why do Syntax?
Studying generative grammar may tell us something about the human mind
It won’t help us build natural language processing systems
Is studying rare and obscure constructions the best way to do syntax?
Studying generative grammar may tell us something about the human mind
It won’t help us build natural language processing systems
Is studying rare and obscure constructions the best way to do syntax?
ConclusionConclusion
The idea that we can characterize a language by considering enough linguistic data is a hypothesis
It’s very unlikely that it’s possible to write a complete generative grammar
The idea that we can characterize a language by considering enough linguistic data is a hypothesis
It’s very unlikely that it’s possible to write a complete generative grammar