Predicting object and scene descriptions with an...

1
Central claim: By assuming that speakers attempt to communicate optimally in context, listeners and learners can infer meanings even from ambiguous messages. Conclusions Predicting object and scene descriptions with an information-theoretic model of pragmatics Michael C. Frank, Avril Kenney, Noah D. Goodman, Joshua Tenenbaum, Antonio Torralba, & Aude Oliva Formalizing Grice’s maxims Normative maxims: Quantity: Be informative Quality: Be truthful Relation: Be relevant Manner: Be perspicuous Used by listeners to make inferences about speakers’ intentions Our goal is to formalize these maxims using information theory “One of my avowed aims is to see talking as a special case or variety of purposive, indeed rational, behavior…” speaker speaker’s intended meaning M S listener’s inferred meaning M L listener context C the red one! message w p( wM S , C ) " e # D KL ( M S || w ) " 1 w 50 60 70 80 50 60 70 80 model mean bet ! ! ! ! ! ! ! ! ! ! r=.96 N=221 T=1486 Informativeness inferences in children 3-4 year olds Novel substance and texture properties – counterbalance which one is named Two trials/child • Conditions: – context – baseline • Test whether children can make these inferences “this one is feppy.” cork squiggle spiral straw percent correct 0 20 40 60 80 100 dowel popsicle informativeness baseline n=32 “shiny surface; translucent top hemisphere with surfer inside; opaque bottom hemisphere with red green blue yellow stripes” “surfer in half, other half yellow/blue/red stripes” “half transparent, half opaque (colorful), surfer inside clear part” “surfer with horizontal rainbow stripes on bottom half of ball” “clay figurine of man surfing on light blue/gray water glass. reflects light yellow. blue, red claylike bottom” 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 9lives alien alligator animal aquarium astronauts basketball blimp brine butterfly car cat checkers clear clouds coin coral dinosaur dolphin earth egg eight elf embossed eye facets fence fish flag flower flowers foot football frog frogs fuzzy giraffe glitter globe gold half helmet hole iceberg japan jet lacrosse ladybug landscape lemon lion lizard morris navy oceans octopus opaque painted parachute penguin penny pig planet print rhinoceros ribs rocket rose rubberbands silver skier skull skydivers smiley smooth snake solid spider star starfish stars striped stripes submarine sunflower superball surfer swirl textured tiger train translucent transparent triangles turtle usa watermelon web wheel zebra r = .51 informativeness probability of reference N=44, T=1320 Task: choose objects to pick a scene out of a set of contexts Goal: predict which words are chosen using informativeness model LabelMe (online database of hand- segmented images) provides ground truth Analysis: match descriptions to objects, calculate probability of referring to a particular object. hand-labeled object boundaries (Russell, Torralba, et al., 2008) Context condition No Context condition 4.0 3.5 3.0 2.5 2.0 1.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 log informativeness proportion subordinate uses building car door parkingmeter road tree window building car parkingmeter road sidewalk sign streetlight window bicycle building car firehydrant person road window awning building car crosswalk manhole road tree window building car person streetlight traffic light tree umbrella awning building car cloud person road sidewalk tree building car person road tree umbrella building newspaper box person tree awning building car mailbox parkingmeter person road window awning building car crosswalk manhole road tree window Subordinate use in context sometimes there is no good subordinate label but if an object is informative you never use a subordinate “car” / “door” “silver minivan” / “door” 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 probability of reference without context probability of reference in context bottle bowl bread cabinet candle anister ceiling eemachine counter cup ing board dish faucet floor flower fridge glass island jar knife knob nozzle onion oven pitcher plate pot sink stove towel wall window carpet ceiling himney cushion decanter door fireplace floor glass picture pillow shelf sofa switch table wall window wood book bottle bowl bush cactus carpet chair curtain fence replace floor flower glass jar light ottoman picture pillow sofa table vase wall w branches window book bow bowl box candle carpet chair curtain rative bow dish gurine flower light mirror pillow plate sofa table tree vase wall window apron basket blind boat book bottle bowl cabinet ceiling chips counter cup tting board sh rack shwasher drawer faucet floor flower glass jar knife yonnaise microwave mixer mustard pickles picture pitcher plant plate pot andwich shaker sink soap stove oaster towel tray vase window wisk apple basket blanket book bowl box carpet chair door flower ottoman pillow plate sofa table tree wall wall art aseball book bowl carpet ceiling chair counter door gurine floor light mirror kin holder object phone one jack pot shelf shoe stool switch vase vent wall decoration window beam bench ceiling cup floor flower grass light picture rock screen table tree wall window ichokes bottle bowl bread cabinet cheese rkscrew counter dish faucet flower glass grapes hood jar light picture pitcher plate pot quiche sink spoon stool stove teapot vase wall window blind book bowl cage chair diamond flower light orange ttoman pillow plate shelf sofa table teapot tree wall wood yarn r = 0.94 Indoor scenes had a smaller context effect 5.0 4.5 4.0 3.5 3.0 2.5 2.0 0.5 0.0 0.5 log informativeness effect of context bread cabinet counter dish flower fridge island oven sink stove window door fireplace picture pillow sofa table wall window wood cactus chair curtain light ottoman pillow sofa table window chair curtain flower light mirror sofa table window apron bottle cabinet counter flower microwave sink stove towel window blanket book chair pillow plate sofa table wall book chair door light mirror phone shelf shoe stool wall window beam cup floor flower light picture screen table tree window bread cabinet cheese counter flower grapes hood stove window blind chair flower light orange pillow plate sofa table tree r = 0.27 Tested predictions of communication framework more realistic stimuli: real-world objects and scenes more natural response format: keywords and sentences Hypothesize features are chosen by the product of two things How relevant they are How informative they are in context Factoring this product is often difficult When we can measure each factor independently we can predict responses Dale & Reiter (1996) Look at the following set of objects: How many red objects are there? How many circular objects are there? Now imagine someone is talking to you in a foreign language. You don't know the meaning of the adjective, daxy, that he uses to refer to the object with the box around it. Your job is to guess the meaning of daxy. Your guess should take the form of "bets." Imagine that you have $100 to spend betting on the meaning of the word. You should divide your money among the possible meanings -- the amount of money bet on each option should correspond to how confident you are that it is correct. Bets must sum to 100! The one with the box around it is daxy. What do you think daxy means? red: _______ circular: ________ Word learning Object descriptions Scene descriptions Scenario KL Divergence as measure 1 speaker’s intended meaning M S .25 .25 .25 .25 possible meanings for listener M L 1 extension of “red” .5 .5 extension of “circle” distribution probabilities D KL ( M S w ) 0.00 bits 2.00 bits 1.00 bits 0.00 bits Grice (1975) “Grice’s maxims taken collectively mean ‘Don’t include elements that don’t do anything.’ Under a goal- oriented view of language generation, there is no need to explicitly follow such a directive at all; the desired behaviour just falls out of the mechanism.” 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 probability of reference without context probability of reference in context building bush car door parkingmeter plant road sidewalk tree wheel window building car parkingmeter person road sidewalk sign sky streetlight text wheel window ndshield awning alcony bicycle building car door firehydrant lamp mirror pane person road sidewalk text wheel window ndshield awning building car crosswalk manhole person road streetlight affic light tree wheel window arm building car WalkSign head leg person sidewalk sign sky streetlight torso traffic light tree umbrella arm awning building car cloud head leg person road sidewalk sign sky torso tree awning building car door pane person poster road sidewalk sign ail light tree umbrella wheel window arm building head leg newspaper box person pole sign sky torso tree window awning alcony building car eadlight nse plate mailbox parkingmeter person road sidewalk sky text raffic light wheel window ndshield awning building car crosswalk manhole person road streetlight raffic light tree wheel window 10 pictures x 50 descriptions = 500 datapoints Context: relevance and informativeness informativeness “relevance” r = .82 4.0 3.5 3.0 2.5 2.0 1.5 0.5 0.0 0.5 log informativeness effect of context building car door parkingmeter road tree window building car parkingmeter road sidewalk sign streetlight window bicycle building car firehydrant person road window awning building car crosswalk manhole road tree window building car person streetlight traffic light tree umbrella awning building car cloud person road sidewalk tree building car person road tree umbrella building newspaper box person tree awning building car mailbox parkingmeter person road window awning building car crosswalk manhole road tree window r = .69 Residual context effects vs. informativeness

Transcript of Predicting object and scene descriptions with an...

Page 1: Predicting object and scene descriptions with an ...langcog.stanford.edu/papers/frank-vss-2010.pdfstove teapot vase wall window blind book cage chair diamond flower light orange ottoman

Central claim: By assuming that speakers attempt to communicate optimally in context, listeners and learners can infer meanings even from ambiguous messages.

Conclusions

Predicting object and scene descriptions with an information-theoretic model of pragmatics Michael C. Frank, Avril Kenney, Noah D. Goodman, Joshua Tenenbaum, Antonio Torralba, & Aude Oliva

Formalizing Grice’s maxims •  Normative maxims:

– Quantity: Be informative – Quality: Be truthful – Relation: Be relevant – Manner: Be perspicuous

•  Used by listeners to make inferences about speakers’ intentions

•  Our goal is to formalize these maxims using information theory

“One of my avowed aims is to see talking as a

special case or variety of purposive, indeed

rational, behavior…”

speaker

speaker’s intended meaning

MS

listener’s inferred meaning

ML listener

context C

the red one!

message w

!

p(w MS,C)" e#DKL (M S ||w )

"1w

50 60 70 80

5060

7080

model

mea

n be

t

!

!

!

!

!

!

!

!!

!

r2 = 0.92r=.96

N=221 T=1486

Informativeness inferences in children

•  3-4 year olds •  Novel substance

and texture properties – counterbalance

which one is named

•  Two trials/child •  Conditions:

– context – baseline

•  Test whether children can make these inferences

“this one is feppy.”

cork squiggle spiral straw

perc

ent c

orre

ct

020

4060

8010

0

dowel popsicle

informativenessbaseline

n=32

“shiny surface; translucent top hemisphere with surfer inside; opaque bottom hemisphere with red green blue yellow stripes”

“surfer in half, other half yellow/blue/red stripes”“half transparent, half opaque (colorful), surfer inside clear

part”“surfer with horizontal rainbow stripes on bottom half of ball”“clay figurine of man surfing on light blue/gray water glass.

reflects light yellow. blue, red claylike bottom”0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

model prediction

tag-w

ord

pro

babili

ty

9lives

alien

alligator

animal

aquarium

astronauts

basketball

blimp

brinebutterfly

car

cat

checkers

clear

clouds

coincoral

dinosaur

dolphin

earth

egg

eightelf

embossed

eye

facets

fence

fishflag

flower

flowers

foot

football

frog

frogs

fuzzygiraffe

glitter

globe

gold

half

helmet

hole

iceberg

japan

jet

lacrosse

ladybuglandscape

lemon

lion

lizard

morris

navy

oceans

octopus

opaque

painted

parachute

penguinpenny

pig

planet

print

rhinoceros

ribs

rocket

rose

rubberbands

silver

skier

skull

skydivers

smiley

smooth

snake

solid

spider

star

starfish

stars

striped

stripes

submarine

sunflower

superball

surfer

swirl

textured

tiger

train

translucenttransparent

triangles

turtleusa

watermelon

web

wheel

zebra

r = .51

informativeness

prob

abili

ty o

f ref

eren

ce

N=44, T=1320

•  Task: choose objects to pick a scene out of a set of contexts

•  Goal: predict which words are chosen using informativeness model

•  LabelMe (online database of hand-segmented images) provides ground truth

•  Analysis: match descriptions to objects, calculate probability of referring to a particular object.

hand-labeled object boundaries (Russell, Torralba, et al., 2008)

Context condition

No Context condition

−4.0 −3.5 −3.0 −2.5 −2.0 −1.5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

log informativeness

prop

ortio

n su

bord

inat

e us

es

building

car

door parkingmeterroadtreewindowbuilding

car

parkingmeterroadsidewalksign

streetlightwindow bicyclebuilding

car

firehydrant

person

roadwindow awningbuilding

car

crosswalkmanhole

road

treewindowbuildingcar

person

streetlighttraffic lighttree umbrellaawningbuilding car cloud

person

roadsidewalk treebuilding

car

personroadtree umbrellabuilding newspaper box

person

tree awningbuilding

car

mailboxparkingmeterperson

roadwindow awningbuilding

car

crosswalkmanhole

road

treewindow

Subordinate use in context

sometimes there is no good subordinate label

but if an object is informative you never use a subordinate

“car” / “door”

“silver minivan” / “door”

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

probability of reference without context

prob

abilit

y of

refe

renc

e in

con

text

bottlebowl

bread

cabinet

candlecanisterceilingcoffeemachine

counter

cupcutting board

dishfaucet

floor

flower

fridge

glass

island

jarknifeknobnozzleonion

oven

pitcherplatepot

sinkstove

towelwall

window

carpetceilingchimneycushion

decanter

door

fireplace

floorglass

picture

pillow

shelf

sofa

switch

table

wall

window

wood

bookbottlebowlbush

cactus

carpet

chair

curtain

fencefireplacefloorflower

glassjar

lightottoman

picture

pillowsofa

table

vasewallwillow branches

window

bookbowbowlboxcandlecarpet

chair

curtain

decorative bowdishfigurine

flower

light

mirror

pillowplate

sofa

tabletreevasewall

window

apron

basketblindboatbook

bottle

bowl

cabinet

ceilingchips

counter

cupcutting boarddish rack

dishwasherdrawer

faucetfloor

flower

glassjarknifemayonnaise

microwave

mixermustardpicklespicturepitcherplant

platepotsandwichshaker

sink

soap

stove

toaster

towel

trayvase

window

wiskapplebasket

blanket

book

bowlboxcarpet

chair

doorflower

ottoman

pillowplate

sofa

table

tree

wall

wall artbaseball

book

bowlcarpetceiling

chair

counter

door

figurinefloor

light

mirror

napkin holderobject

phone

phone jackpot

shelfshoe

stool

switchvasevent

wall

wall decoration

window

beam

benchceilingcupfloor

flower

grass

light

picture

rock

screen

table

treewall

window

artichokesbottlebowl

breadcabinet cheese

corkscrew

counter

dishfaucet

flower

glass

grapes

hood

jarlightpicture

pitcherplatepotquiche

sink

spoonstool

stove

teapotvase

wall

window

blind

bookbowl

cage

chair

diamond

flower

light

orange

ottoman

pillowplate

shelf

sofa

table

teapot

tree

wallwood

yarn

r = 0.94

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

probability of reference without context

prob

abilit

y of

refe

renc

e in

con

text

bottlebowl

bread

cabinet

candlecanisterceilingcoffeemachine

counter

cupcutting board

dishfaucet

floor

flower

fridge

glass

island

jarknifeknobnozzleonion

oven

pitcherplatepot

sinkstove

towelwall

window

carpetceilingchimneycushion

decanter

door

fireplace

floorglass

picture

pillow

shelf

sofa

switch

table

wall

window

wood

bookbottlebowlbush

cactus

carpet

chair

curtain

fencefireplacefloorflower

glassjar

lightottoman

picture

pillowsofa

table

vasewallwillow branches

window

bookbowbowlboxcandlecarpet

chair

curtain

decorative bowdishfigurine

flower

light

mirror

pillowplate

sofa

tabletreevasewall

window

apron

basketblindboatbook

bottle

bowl

cabinet

ceilingchips

counter

cupcutting boarddish rack

dishwasherdrawer

faucetfloor

flower

glassjarknifemayonnaise

microwave

mixermustardpicklespicturepitcherplant

platepotsandwichshaker

sink

soap

stove

toaster

towel

trayvase

window

wiskapplebasket

blanket

book

bowlboxcarpet

chair

doorflower

ottoman

pillowplate

sofa

table

tree

wall

wall artbaseball

book

bowlcarpetceiling

chair

counter

door

figurinefloor

light

mirror

napkin holderobject

phone

phone jackpot

shelfshoe

stool

switchvasevent

wall

wall decoration

window

beam

benchceilingcupfloor

flower

grass

light

picture

rock

screen

table

treewall

window

artichokesbottlebowl

breadcabinet cheese

corkscrew

counter

dishfaucet

flower

glass

grapes

hood

jarlightpicture

pitcherplatepotquiche

sink

spoonstool

stove

teapotvase

wall

window

blind

bookbowl

cage

chair

diamond

flower

light

orange

ottoman

pillowplate

shelf

sofa

table

teapot

tree

wallwood

yarn

r = 0.94

Indoor scenes had a smaller context effect

−5.0 −4.5 −4.0 −3.5 −3.0 −2.5 −2.0

−0.5

0.0

0.5

log informativeness

effe

ct o

f con

text bread

cabinet

counter

dishflower

fridge

island

oven

sink

stove

window

doorfireplacepicture pillow

sofa

table

wall

window

wood

cactus

chair

curtain

light

ottoman

pillowsofa

tablewindow

chaircurtain

flower

light

mirrorsofa

table

window

apron

bottle

cabinet

counter

flower

microwavesinkstovetowel

window

blanket

book

chair pillow

plate

sofatable

wall

book

chair

doorlight

mirror

phone shelf

shoe

stool

wall

window

beam

cup

floor

flowerlight

picture screen

table

treewindow

bread

cabinet

cheese

counterflower

grapeshood

stove

window

blind

chair

flower

light

orange

pillow

plate sofa

table

tree

r = 0.27

−5.0 −4.5 −4.0 −3.5 −3.0 −2.5 −2.0

−0.5

0.0

0.5

log informativeness

effe

ct o

f con

text bread

cabinet

counter

dishflower

fridge

island

oven

sink

stove

window

doorfireplacepicture pillow

sofa

table

wall

window

wood

cactus

chair

curtain

light

ottoman

pillowsofa

tablewindow

chaircurtain

flower

light

mirrorsofa

table

window

apron

bottle

cabinet

counter

flower

microwavesinkstovetowel

window

blanket

book

chair pillow

plate

sofatable

wall

book

chair

doorlight

mirror

phone shelf

shoe

stool

wall

window

beam

cup

floor

flowerlight

picture screen

table

treewindow

bread

cabinet

cheese

counterflower

grapeshood

stove

window

blind

chair

flower

light

orange

pillow

plate sofa

table

tree

r = 0.27

•  Tested predictions of communication framework –  more realistic stimuli: real-world objects and

scenes –  more natural response format: keywords and

sentences •  Hypothesize features are chosen by the

product of two things –  How relevant they are –  How informative they are in context

•  Factoring this product is often difficult –  When we can measure each factor independently

we can predict responses Dale & Reiter (1996)

Look at the following set of objects:

How many red objects are there? How many circular objects are there?

Now imagine someone is talking to you in a foreign language. You don't know the meaning of the adjective, daxy, that he uses to refer to the object with the box around it.

Your job is to guess the meaning of daxy. Your guess should take the form of "bets." Imagine that you have $100 to spend betting on the meaning of the word. You should divide your money among the possible meanings -- the amount of money bet on each option should correspond to how confident you are that it is correct. Bets must sum to 100!

The one with the box around it is daxy. What do you think daxy means?

red: _______ circular: ________

Word learning

Object descriptions Scene descriptions

Scenario

KL Divergence as measure

1 speaker’s intended

meaning MS

.25 .25

.25 .25 possible meanings for listener ML

1 extension of

“red”

.5 .5 extension of

“circle”

distribution probabilities

!

DKL (MS w)

0.00 bits

2.00 bits

1.00 bits

0.00 bits

Grice (1975)

“Grice’s maxims taken collectively mean ‘Don’t include

elements that don’t do anything.’ Under a goal-

oriented view of language generation, there is no need to explicitly follow such a directive at all; the desired behaviour just

falls out of the mechanism.”

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

probability of reference without context

prob

abilit

y of

refe

renc

e in

con

text

building

bush

car

door

parkingmeter

plant

road

sidewalk

tree

wheel

window building

car

parkingmeter

personroad

sidewalk

signsky

streetlight

textwheel

window

windshieldawningbalcony

bicycle

building

car

door

firehydrant

lampmirrorpane

person

road

sidewalktextwheel

window

windshield

awning

building

carcrosswalk

manhole

person

road

streetlighttraffic light

tree

wheel

window

arm

building

car

dontWalkSignheadleg

person

sidewalk

sign

sky

streetlight

torso

traffic light

tree

umbrella

arm

awning

building

car

cloud

headleg

person

roadsidewalk

sign skytorso

tree

awning building

car

doorpane

person

poster

road

sidewalksigntail light

tree

umbrella

wheelwindowarm

building

headleg

newspaper box

person

polesignskytorso

tree

window

awning

balcony

building

car

headlightlicense plate

mailboxparkingmeter

person

road

sidewalkskytexttraffic lightwheel

window

windshield

awningbuilding

carcrosswalk

manhole

person

road

streetlighttraffic light

tree

wheel

window

r = 0.82

10 pictures x 50 descriptions = 500 datapoints

Context: relevance and informativeness

informativeness

“relevance”

r = .82

−4.0 −3.5 −3.0 −2.5 −2.0 −1.5

−0.5

0.0

0.5

log informativeness

effe

ct o

f con

text

building

car

doorparkingmeter

road

tree

window

building

car

parkingmeter

road

sidewalksign

streetlight

window

bicycle

building

car

firehydrant

person

roadwindow

awning

building

car

crosswalk

manhole

road

treewindow

buildingcarperson

streetlighttraffic light

tree

umbrella

awning

building

car

cloudperson

road

sidewalk

tree

building

car

person

road

tree

umbrella

building

newspaper box

person

tree

awning

building

car

mailbox

parkingmeter

person

road

window

awning

buildingcar

crosswalkmanhole

road

tree

window

r = 0.69

r = .69

Residual context effects vs. informativeness