PARSING TECHNIQUES - linguistics.byu.edulinguistics.byu.edu/classes/ling581dl/humansynsem.pdf ·...

45
PARSING TECHNIQUES (Human sentence processing) 1

Transcript of PARSING TECHNIQUES - linguistics.byu.edulinguistics.byu.edu/classes/ling581dl/humansynsem.pdf ·...

PARSING TECHNIQUES(Human sentence processing)

1

Comprehension in syntax is:• Incremental, word-by-word, on-line

• Time course becomes important• Interleaved with semantics• Incredibly fast• Robust• Subject to limitations

Review • Reading, saccades• Garden-path sentences• Unproblematic ambiguities• Uneven processing time profile

• Parsing can be done top-down or bottom-up• What do we do?

Timed reading experiments• more difficult => more time to process• bar-pressing experiments

• Wordwise presentation, press space bar for next word• NOT equal times across sentence• end of phrases, end of clauses noticeably longer to

process (i.e. some integration going on)• longer for nouns, verbs; less time for function words

(prepositions, determiners, etc.)

4

5

English sentence types• Active agent-type• Passive agent-type

What’s different about these?

• Active “fear”-type• Active “frighten”-type• Atypical realization

• Experiencer-Stimulus and vice-versa• More WM required

6

Processing difficulty

• Actives: easier than passives• Intransitives: easier than transitives, ditransitives

hardOK

AD and verbal processing• Verb-specific deficits common• Effect of working memory, attention: controversial

• Working memory decline• Syntax doesn’t seem too affected, though

• Almost as good as unimpaired• Semantics is problematic, though

• How?• So far all AD findings: grammar not affected, resources

OK

10

The stimuli

12

Results

13

Perception and cognition• Sensory information• Receive input• Organize signal• Identify patterns• Interpret output

• Learning• Memory• Expectation

• Process perception• Apply knowledge• Attitudes• Thought• Agency• Intelligence

• Learning• Memory• Attention• Goals, agendas, plans

14

The problem: bottleneck• Working memory (WM) is finite

• Evanescent: subject to decay• Limited: can be overwhelmed• Central: all processes funneled through it• Strategic: attention must be focused

15

Issues• On-line parsing (vs. batch)• Syntactic well-formedness• Syntactic complexity• Overgeneration, undergeneration• Strategies• Crosslinguistic differences• Interactions with other processing

16

Limitations: processing overload

Limitations: processing overload

Phenomena and judgements

• Ambiguity (e.g. PP attachments)• I saw a man with a telescope.

• Subcategorization• She kept the dogs on the beach.

• Syntactic processing phenomena• Unproblematic (local ambiguity gets resolved)• Problematic (garden-path)

• Modeling (statistical, cognitive)

19

Unproblematic ambiguities• I knew the man.• I knew the man hated me passionately.

• When the boys strike the dog kills.• When the boys strike the dog the cat hisses.

• Without her we failed.• Without her contributions we failed.

• Is the block in the box?• Is the block in the box red?

• The building blocks are red.• The building blocks the sun.

20

Unproblematic ambiguities• I knew the man.• I knew the man hated me passionately.

• When the boys strike the dog kills.• When the boys strike the dog the cat hisses.

• Without her we failed.• Without her contributions we failed.

• Is the block in the box?• Is the block in the box red?

• The building blocks are red.• The building blocks the sun.

21

Garden Path Constructions (1)• GP1: Since Jay always jogs a mile seems like a short distance to him.

• GP2: The girls believe the man who claims the ugly boys struck the dog killed the cat.

• GP3: I believe that John smokes annoys Mary.

• GP4: Before the boy kills the man the dog bites strikes.

• GP5: When the horse kicks the boy the dog bites the man.

• GP6: Without her contributions failed to come in.

• GP7: I convinced her professors hate me.

• GP8: The doctor warned the patient would be contagious.

• GP9: John gave the boy the dog bit a dollar.

• GP10: Sue gave the man who was racing the car.

• GP11: The psychologist told the wife that he was having trouble with to leave.

• GP12: I sent the letters to Ron to Rex.

• GP13: The psychologist told the wife that he was having trouble with her husband.

• GP14: The horse raced past the barn fell.

22

Garden Path Constructions (2)• GP15: The boat floated sank.

• GP16: The woman brought the flowers smiled broadly.

• GP17: The dog that was fed next to the cat walked to the park chewed the bone.

• GP18: The building blocks the sun faded are red.

• GP19: The granite rocks by the seashore with the waves.

• GP20: The cotton clothing is made of grows in Mississippi.

• GP21: The old train the young.

• GP22: The boy got fat melted.

• GP23: Before she knew that she went to the store.

• GP24: I saw that white moose are ugly.

• GP25: That coffee tastes terrible surprised John.

• GP26: Have the boys given gifts by their friends.

23

Parser Breakdown• PB1: The man that the woman that the dog bit likes eats fish.• PB2: The man the woman the dog bit likes eats fish.• PB3: Who did John donate the furniture that the repairman that the dog bit found?

• PB4: The man that the woman that won the race likes eats fish.• PB5: That that Joe left bothered Susan surprised Max.• PB6: The woman that for John to smoke would annoy works in the office.• PB7: The company hired the woman that for John to smoke would annoy.• PB8: Mary’s belief that for John to smoke would be annoying is apparent from her expression.

• PB9: John’s suspicion that a rumor that the election had not been run fairly was true motivated him to investigate further.

• PB10: The man who the possibility that students are dangers frightens is nice.• PB11: Who does the information that the weapons that the government built don’t work properly affect most?

• PB12: It is the enemy’s defense strategy that the information that the weapons that the government built didn’t work properly affected.

• PB13: It is the enemy’s strategy that for the weapons to work would affect.• PB14: What the information that the weapons that the government built didn’t work properly affected was the enemy’s defense strategy.

• PB15: What for the weapons to work properly would affect is the enemy’s defense strategy.

• PB16: Surprising though the information that the weapons that the government built didn’t work properly was, no one took advantage of the mistakes.

• PB17: Surprising though for the weapons to work properly would be for the general populace, it would not surprise some military officials.

24

Bever’s strategy• The canonical sentoid strategy:

• The first N..V..(N).. clause is the main clause, unless the verb is marked as subordinate.

• #The horse raced past the barn fell.• #The editor authors the newspaper hired liked laughed.• but predicts bad: The desert trains are especially tough on young

people.

25

Frazier’s principles• Minimal attachment

• Attach incoming material using the fewest nodes, consistent with the grammar.

• Late closure• When possible, attach incoming words into the phrase or clause

currently being processed.• MA has preference

26

Deterministic parsing• Three rules:

• All syntactic substructures created by the parser are permanent• All syntactic substructures must be used• There must be no temporary, throw-away substructures

• No need for backtracking, parallel results

27

Marcus parsing• PARSIFAL (1980)• First strictly deterministic parser• Limited lookahead

• Three-constituent lookahead buffer• Allows to delay attachment decisions

• Pattern-action rules make attachment decisions• Parsing cost is linear w/rt input length

28

Minimal committment• Deterministic parsing, kind of…• Dispense with lookahead• Use D-theory instead

• Underspecification: not trees, but rather dominance relations• Some problems: requires reworking of temporary

structures

29

Yngve’s complexity measure• Basis: # of categories

• On RHS of unexpanded rules• That have not yet been found

• Assumes TD, DFS• NL maximal complexity: 7 ± 2

30

Left-corner parsing• Combines TD, BU parsing• Parses:

• Leftmost category on RHS of rule, via BU parsing• Rest of grammar rule from top down

• Doesn’t require huge stack space

31

Other complexity measures• Node ratios: words/sentence• # of interruptions in grammatical relations

• The man the cat the dog chased saw died.• Two sentences: can parse at most 2 S’s at any one time

(Kimball)

32

The Sausage Machine• Two-stage parsing model

• Preliminary Phrase Packager (PPP)• View input via window, 6 words at at time; assign structure within this

window• Sentence Structure Supervisor (SSS)

• Consult the grammar to piece windows together

33

Frazier again (1985)• Local nonterminal count

• Sum of value of all nonterminals across any three adjacent terminals

• S or S’ get 1.5, others get 1• Sentence value: maximal LNC

34

Lewis• Max. # of constituents of a given category• Theory, computational instantiation• English (implemented: NL-Soar), other lang’s

(theoretically)

35

Gibson• Syntactic Prediction Locality Theory: memory load

• Sum of word-based loads for obligatory items• Sum of discourse referents

36

Modeling syntactic processing

• (X)NL-Soar cognitive modeling system for natural language

• Most complete X-bar model consistent with lexical properties, syntactic principles

• Non-productive partial structures are later discarded

• Input for semantic processing

37

Those dogs chew leashes.

Modeling semantic processing

• Also done on word-by-word basis• Uses lexical-conceptual structure• Leverages syntax• Builds linkages between concepts• Previous versions used 8 semantic primitives

• Coverage useful but inadequate• Difficult to encode adequate distinctions

• WordNet lexfile names now used as semantic categories

38

Preliminary semantic objects

• Pieces of conceptual structure

• Correspond to lexical/phrasal constructions in syntactic model

• Compatible pieces fused together as appropriate

39

Pragmatic constraints

• Enforce compatibility of pieces of semantic model

• Reflect limited disambiguation• Based on semantic classes• Ensure proper linkages, reject

improper ones• Implemented as preferences

for potential attachments

40

Final semantic model

• Most fully connected linkage• Includes other sem-related

properties not illustrated here

• Serves as input for further processing (discourse/dialogue, extralinguistic task-specific functions, etc.)

41

Word-sense disambiguation• Word sense

• Choosing most correct sense for a word in context• Problem: WordNet senses too narrow (large # of

senses)• Avg. 4.74 for nouns (not a big problem)• Avg. 8.63; high of 41 senses for verbs (a problem)

• Semantic classes• Select appropriate WordNet semantic class of word in

context• An easier, more plausible task

42

Sem constraint for #29 v-bodyMost frequent verbs in class:wear, sneeze, yawn, wake up• (most frequent) Subjects:

• People• Animals• Groups

• Direct Objects:• Body Parts• Artifacts

• Indirect Objects: none

Subject Constraint sp {top*access*body*external

(state <g> ^top-state <ts> ^op <o>)

(<o> ^name access)

(<ts> ^sentence <word>)

(<word> ^word-id.word-name <wordname>)

(<word> ^wndata.vals.sense.lxf v-body)

-->

(<word> ^semprofile <sempro> + &)

(<sempro> ^category v-body ^annotation verbclass + & ^psense <wordname> ^external <subject>)

(<subject> ^category *

^semcat n-animal + &^semcat n-person + &

^psense * ^internal *empty*) }

43

Sample sentence: the woman yawned(basic case: most frequent senses succeed.)

Syntax: • first tree works.

Semantics:• v-body & n-

person match.

• v-stative never tried.

44

Example #2: The chair yawned(most frequent noun sense inappropriate)

Syntax: • chairverb rejected• chairnoun accepted

Semantics:• chairverb senses rejected• n-artifact incompatible w/ v-body• n-person accepted

45

v-socialchair

|E|*|*|

v-bodyyawn

|E|

n-personchair

v-bodyyawn

|E|

n-artifactchair

Example #3: The crevasse yawned.(most frequent verb sense inappropriate)

46

Syntax: first tree works

Semantics: all noun senses incompatible w/ v-

body n-object matches with v-stative

v-bodyyawn

|E|

n-objectcrevasse

v-stativeyawn

|E|

n-objectcrevasse