1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning...

87
1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition

Transcript of 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning...

Page 1: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

1

David Chen & Raymond MooneyDepartment of Computer Sciences

University of Texas at Austin

Learning to Sportscast: A Test of Grounded Language Acquisition

Page 2: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

2

Motivation

• Constructing annotated corpora for language learning is difficult

• Children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment

Page 3: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

3

Goals

• Learn to ground the semantics of language

Block

• Learn language through correlated linguistic and visual inputs

Page 4: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

4

Challenge

Page 5: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

5

Challenge

Page 6: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

6

Challenge

A linguistic input may correspond to many possible events

Block ?

??

Page 7: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

7

Overview

• Sportscasting task

• Tactical generation

• Strategic generation

• Human evaluation

Page 8: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

8

Learning to Sportscast

• Robocup Simulation League games

• No speech recognition– Record commentaries in text form

• No computer vision– Ruled-based system to automatically extract

game events in symbolic form

• Concentrate on linguistic issues

Page 9: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

9

Robocup Simulation League

Page 10: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

10

Robocup Simulation League

Pink4’s pass was intercepted by Purple6

Page 11: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

11

Learning to Sportscast

• Learn to sportscast by observing sample human sportscasts

• Build a function that maps between natural language (NL) and meaning representation (MR)– NL: Textual commentaries about the game– MR: Predicate logic formulas that represent

events in the game

Page 12: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

12

Mapping between NL/MR

NL: “Purple3 passes the ball to Purple5”

MR: Pass ( Purple3, Purple5 )

Semantic Parsing (NL MR)

Tactical Generation (MR NL)

Page 13: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

13

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

Page 14: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

14

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

Page 15: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

15

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

Page 16: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

16

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

P6 ( C1, C19 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

P5 ( C1, C19 )

P2 ( C22, C19 )

P2 ( C19, C22 )

P0

P2 ( C19, C22 )

P1 ( C22 )

P1( C19 )

P1 ( C22 )

P1 ( C22 )

P1 ( C19 )

Page 17: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

17

Robocup Data

• Collected human textual commentary for the 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509

• Each sentence matched to all events within previous 5 seconds.– Avg # MRs/sentence = 2.5 (min 1, max 12)

• Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

Page 18: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

18

Overview

• Sportscasting task

• Tactical generation

• Strategic generation

• Human evaluation

Page 19: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

19

Tactical Generation

• Learn how to generate NL from MR

• Example:

• Two steps1. Disambiguate the training data

2. Learn a language generator

Pass(Pink2, Pink3) “Pink2 kicks the ball to Pink3”

Page 20: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

20

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( Purple5, Purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

Page 21: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

21

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( Purple5, Purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

Semantic Parser Learner

Initial SemanticParser

Page 22: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

22

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

Initial SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink2 )

Kick ( pink5 )

Page 23: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

23

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink2 )

Kick ( pink5 )

Semantic Parser Learner

Page 24: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

24

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

Turnover ( purple7 , pink2 )

Page 25: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

25

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

Turnover ( purple7 , pink2 )

Page 26: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

26

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Semantic Parser Learner

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Page 27: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

27

Semantic Parser Learners

• Learn a function from NL to MR

NL: “Purple3 passes the ball to Purple5”

MR: Pass ( Purple3, Purple5 )

Semantic Parsing (NL MR)

Tactical Generation (MR NL)

•We experiment with two semantic parser learners–WASP (Wong & Mooney, 2006; 2007)–KRISP (Kate & Mooney, 2006)

Page 28: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

28

WASP: Word Alignment-based Semantic Parsing

• Uses statistical machine translation techniques– Synchronous context-free grammars (SCFG)

(Wu, 1997; Melamed, 2004; Chiang, 2005)– Word alignments (Brown et al., 1993; Och &

Ney, 2003)

• Capable of both semantic parsing and tactical generation

Page 29: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

29

KRISP: Kernel-based Robust Interpretation by Semantic Parsing

• Productions of MR language are treated like semantic concepts

• SVM classifier is trained for each production with string subsequence kernel

• These classifiers are used to compositionally build MRs of the sentences

• More resistant to noisy supervision but incapable of tactical generation

Page 30: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

30

Matching

• Ability to find correct NL/MR pair• 4 Robocup championship games from 2001-2004.

– Avg # events/game = 2,613– Avg # sentences/game = 509

• Leave-one-game-out cross-validation• Metric:

– Precision: % of system’s annotations that are correct– Recall: % of gold-standard annotations correctly

produced– F-measure: Harmonic mean of precision and recall

Page 31: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

31

Systems

Learner

KRISPER

(Kate & Mooney, 2007)

KRISP

WASPER WASP

WASPER-GEN WASP’s language generator

Page 32: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

32

KRISPER and WASPER

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

(KRISP/WASP)

Turnover ( purple7 , pink2 )

Page 33: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

33

Systems

Learner

KRISPER

(Kate & Mooney, 2007)

KRISP

WASPER WASP

WASPER-GEN WASP’s language generator

Page 34: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

34

WASPER-GEN

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

TacticalGenerator

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Tactical Generator Learner(WASP)

Turnover ( purple7 , pink2 )

Page 35: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

35

Matching Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Average results on leave-one-game-out cross-validation

F-m

easu

re randomKRISPERWASPERWASPER-GEN

Page 36: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

36

Overview

• Sportscasting task

• Tactical generation

• Strategic generation

• Human evaluation

Page 37: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

37

Strategic Generation

• Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation).

• For automated sportscasting, one must be able to effectively choose which events to describe.

Page 38: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

38

Example of Strategic Generation

pass ( purple7 , purple6 )

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

ballstopped

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

badPass ( purple3 , pink9 )

turnover ( purple3 , pink9 )

Page 39: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

39

Example of Strategic Generation

pass ( purple7 , purple6 )

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

ballstopped

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

badPass ( purple3 , pink9 )

turnover ( purple3 , pink9 )

Page 40: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

40

Strategic Generation

• For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster.

• Requires correct NL/MR matching– Use estimated matching from tactical

generation– Iterative Generation Strategy Learning

Page 41: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

41

Iterative Generation Strategy Learning (IGSL)

• Directly estimates the likelihood of an event being commented on

• Self-training iterations to improve estimates

• Uses events not associated with any NL as negative evidence

Page 42: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

42

Strategic Generation Performance

• Evaluate how well the system can predict which events a human comments on

• Metric:– Precision: % of system’s annotations that are correct

– Recall: % of gold-standard annotations correctly produced

– F-measure: Harmonic mean of precision and recall

Page 43: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

43

Strategic Generation Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Average results on leave-one-game-out cross-validation

F-m

easu

re

inferred fromWASPinferred fromKRISPERinferred fromWASPERinferred fromWASPER-GENIGSL

inferred fromgold matching

Page 44: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

44

Overview

• Sportscasting task

• Tactical generation

• Strategic generation

• Human evaluation

Page 45: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

45

• 4 fluent English speakers as judges• 8 commented game clips

– 2 minute clips randomly selected from each of the 4 games

– Each clip commented once by a human, and once by the machine

• Presented in random counter-balanced order• Judges were not told which ones were human or

machine generated

Human Evaluation (Quasi Turing Test)

Page 46: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

46

Demo Clip

• Game clip commentated using WASPER-GEN with IGSL, since this gave the best results for generation.

• FreeTTS was used to synthesize speech from textual output.

Page 47: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

47

Human Evaluation

ScoreEnglish Fluency

Semantic Correctness

Sportscasting Ability

5 Flawless Always Excellent

4 Good Usually Good

3 Non-native Sometimes Average

2 Disfluent Rarely Bad

1 Gibberish Never Terrible

Page 48: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

48

Human Evaluation

CommentatorEnglishFluency

Semantic Correctness

SportscastingAbility

Human 3.94 4.25 3.63

Machine 3.44 3.56 2.94

Difference 0.5 0.69 0.69

ScoreEnglish Fluency

Semantic Correctness

Sportscasting Ability

5 Flawless Always Excellent

4 Good Usually Good

3 Non-native Sometimes Average

2 Disfluent Rarely Bad

1 Gibberish Never Terrible

Page 49: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

49

Future Work

• Expand MRs to beyond simple logic formulas• Apply approach to learning situated language in a

computer video-game environment (Gorniak & Roy, 2005)

• Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)

Page 50: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

50

Conclusion

• Current language learning work uses expensive, unrealistic training data.

• We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment.

• We have evaluated it on the task of learning to sportscast simulated Robocup games.

• The system learns to sportscast almost as well as humans.

Page 51: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

51

Backup Slides

Page 52: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

52

Systems

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Page 53: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

53

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

Page 54: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

54

KRISPER and WASPER

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

(KRISP/WASP)

Turnover ( purple7 , pink2 )

Page 55: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

55

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

Page 56: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

56

WASPER-GEN

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

TacticalGenerator

Purple7 loses the ball to Pink2

Unambiguous Training Data

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Tactical Generator Learner(WASP)

Turnover ( purple7 , pink2 )

Page 57: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

57

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

Page 58: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

58

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

Matching

Page 59: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

59

Matching

• 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613

– Avg # sentences/game = 509

• Leave-one-game-out cross-validation• Metric:

– Precision: % of system’s annotations that are correct

– Recall: % of gold-standard annotations correctly produced

– F-measure: Harmonic mean of precision and recall

Page 60: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

60

Matching Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Average results on leave-one-outexperiments

F-m

easu

re WASPKRISPERWASPERWASPER-GEN

Page 61: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

61

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

Page 62: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

62

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

SystemsTactical

Generation

Page 63: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

63

Tactical Generation

• 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509

• Leave-one-game-out cross-validation• NIST score

– Evaluate the quality of machine translations based on matching n-grams

– BLEU metric with some modifications– More weight for rarer n-grams– Less sensitive to translation length

Page 64: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

64

Tactical Generation Results

0

1

2

3

4

5

6

Average results on leave-one-out experiments

NIS

T

WASP

WASPER

KRISPER-WASPWASPER-GEN

WASP with goldmatching

Page 65: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

65

Parsing Results

010203040

5060708090

Average results on leave-one-out experiments

F-m

easu

re

WASP

KRISPER

WASPER

KRISPER-WASPWASPER-GEN

WASP with goldmatching

Page 66: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

66

Matching Results

Page 67: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

67

Parse Results

Page 68: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

68

Generation Results

Page 69: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

69

Strategic Generation Results

Page 70: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

70

Grounded Language Learning in Robocup Robocup Simulator

Sportscaster

Simulated Perception

Perceived Facts

Score!!!!Grounded

Language LearnerLanguageGenerator

SemanticParser

SCFG Score!!!!

Page 71: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

71

“Mary is on the phone”

???

Page 72: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

72 72

“Mary is on the phone”???

Page 73: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

73 73

“Mary is on the phone”???

Ironing(Mommy, Shirt)

Page 74: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

74 74

“Mary is on the phone”???

Ironing(Mommy, Shirt)

Working(Sister, Computer)

Page 75: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

75 75

“Mary is on the phone”???

Ironing(Mommy, Shirt)

Working(Sister, Computer)

Carrying(Daddy, Bag)

Page 76: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

76 76

“Mary is on the phone”???

Ironing(Mommy, Shirt)

Working(Sister, Computer)

Carrying(Daddy, Bag)

Talking(Mary, Phone)

Sitting(Mary, Chair)

Page 77: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

77

Grounding Language

“Spanish goalkeeper Iker Casillas blocks the ball”

Page 78: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

78

Grounding Language

“Spanish goalkeeper Iker Casillas blocks the ball”

Page 79: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

79

Grounding Language

Blocks

• WordNet - (v) parry, block, deflect (impede the movement of (an opponent or a ball)) "block an attack“

• Merriam-Webster - intransitive verb: to block an opponent in sports

Page 80: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

80

Grounding Language

Blocks

• WordNet - (v) parry, block, deflect (impede the movement of (an opponent or a ball)) "block an attack“

• Merriam-Webster - intransitive verb: to block an opponent in sports

Page 81: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

81

Grounding Language

Blocks

Page 82: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

82

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

turnover ( purple3 , pink9 )

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

Natural Language Commentary Meaning Representation

Page 83: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

83

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

pass ( purple6 , purple2 )

pass ( purple2 , purple3 )

Natural Language Commentary Meaning Representation

Page 84: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

84

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple6 )

kick ( purple2 )

kick ( purple3 )

Natural Language Commentary Meaning Representation

Page 85: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

85

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple 3 )

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

kick ( purple2 )

turnover ( purple3 , pink9 )

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

kick (purple 3

Natural Language Commentary Meaning Representation

Page 86: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

86

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple 3 )

kick ( purple6 )

kick ( purple2 )

kick ( purple2 )

kick ( purple3 )

kick (purple 3

Natural Language Commentary Meaning Representation

Page 87: 1 David Chen & Raymond Mooney Department of Computer Sciences University of Texas at Austin Learning to Sportscast: A Test of Grounded Language Acquisition.

87

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple 3 )

kick ( purple6 )

kick ( purple2 )

kick ( purple2 )

kick ( purple3 )

kick (purple 3 )

Natural Language Commentary Meaning Representation

Negative Evidence