Predicting object and scene descriptions with an...
Transcript of Predicting object and scene descriptions with an...
![Page 1: Predicting object and scene descriptions with an ...langcog.stanford.edu/papers/frank-vss-2010.pdfstove teapot vase wall window blind book cage chair diamond flower light orange ottoman](https://reader036.fdocuments.net/reader036/viewer/2022081515/610fd353d8a8e73ee34e3f43/html5/thumbnails/1.jpg)
Central claim: By assuming that speakers attempt to communicate optimally in context, listeners and learners can infer meanings even from ambiguous messages.
Conclusions
Predicting object and scene descriptions with an information-theoretic model of pragmatics Michael C. Frank, Avril Kenney, Noah D. Goodman, Joshua Tenenbaum, Antonio Torralba, & Aude Oliva
Formalizing Grice’s maxims • Normative maxims:
– Quantity: Be informative – Quality: Be truthful – Relation: Be relevant – Manner: Be perspicuous
• Used by listeners to make inferences about speakers’ intentions
• Our goal is to formalize these maxims using information theory
“One of my avowed aims is to see talking as a
special case or variety of purposive, indeed
rational, behavior…”
speaker
speaker’s intended meaning
MS
listener’s inferred meaning
ML listener
context C
the red one!
message w
!
p(w MS,C)" e#DKL (M S ||w )
"1w
50 60 70 80
5060
7080
model
mea
n be
t
!
!
!
!
!
!
!
!!
!
r2 = 0.92r=.96
N=221 T=1486
Informativeness inferences in children
• 3-4 year olds • Novel substance
and texture properties – counterbalance
which one is named
• Two trials/child • Conditions:
– context – baseline
• Test whether children can make these inferences
“this one is feppy.”
cork squiggle spiral straw
perc
ent c
orre
ct
020
4060
8010
0
dowel popsicle
informativenessbaseline
n=32
“shiny surface; translucent top hemisphere with surfer inside; opaque bottom hemisphere with red green blue yellow stripes”
“surfer in half, other half yellow/blue/red stripes”“half transparent, half opaque (colorful), surfer inside clear
part”“surfer with horizontal rainbow stripes on bottom half of ball”“clay figurine of man surfing on light blue/gray water glass.
reflects light yellow. blue, red claylike bottom”0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
model prediction
tag-w
ord
pro
babili
ty
9lives
alien
alligator
animal
aquarium
astronauts
basketball
blimp
brinebutterfly
car
cat
checkers
clear
clouds
coincoral
dinosaur
dolphin
earth
egg
eightelf
embossed
eye
facets
fence
fishflag
flower
flowers
foot
football
frog
frogs
fuzzygiraffe
glitter
globe
gold
half
helmet
hole
iceberg
japan
jet
lacrosse
ladybuglandscape
lemon
lion
lizard
morris
navy
oceans
octopus
opaque
painted
parachute
penguinpenny
pig
planet
rhinoceros
ribs
rocket
rose
rubberbands
silver
skier
skull
skydivers
smiley
smooth
snake
solid
spider
star
starfish
stars
striped
stripes
submarine
sunflower
superball
surfer
swirl
textured
tiger
train
translucenttransparent
triangles
turtleusa
watermelon
web
wheel
zebra
r = .51
informativeness
prob
abili
ty o
f ref
eren
ce
N=44, T=1320
• Task: choose objects to pick a scene out of a set of contexts
• Goal: predict which words are chosen using informativeness model
• LabelMe (online database of hand-segmented images) provides ground truth
• Analysis: match descriptions to objects, calculate probability of referring to a particular object.
hand-labeled object boundaries (Russell, Torralba, et al., 2008)
Context condition
No Context condition
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5
0.0
0.1
0.2
0.3
0.4
0.5
0.6
log informativeness
prop
ortio
n su
bord
inat
e us
es
building
car
door parkingmeterroadtreewindowbuilding
car
parkingmeterroadsidewalksign
streetlightwindow bicyclebuilding
car
firehydrant
person
roadwindow awningbuilding
car
crosswalkmanhole
road
treewindowbuildingcar
person
streetlighttraffic lighttree umbrellaawningbuilding car cloud
person
roadsidewalk treebuilding
car
personroadtree umbrellabuilding newspaper box
person
tree awningbuilding
car
mailboxparkingmeterperson
roadwindow awningbuilding
car
crosswalkmanhole
road
treewindow
Subordinate use in context
sometimes there is no good subordinate label
but if an object is informative you never use a subordinate
“car” / “door”
“silver minivan” / “door”
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
probability of reference without context
prob
abilit
y of
refe
renc
e in
con
text
bottlebowl
bread
cabinet
candlecanisterceilingcoffeemachine
counter
cupcutting board
dishfaucet
floor
flower
fridge
glass
island
jarknifeknobnozzleonion
oven
pitcherplatepot
sinkstove
towelwall
window
carpetceilingchimneycushion
decanter
door
fireplace
floorglass
picture
pillow
shelf
sofa
switch
table
wall
window
wood
bookbottlebowlbush
cactus
carpet
chair
curtain
fencefireplacefloorflower
glassjar
lightottoman
picture
pillowsofa
table
vasewallwillow branches
window
bookbowbowlboxcandlecarpet
chair
curtain
decorative bowdishfigurine
flower
light
mirror
pillowplate
sofa
tabletreevasewall
window
apron
basketblindboatbook
bottle
bowl
cabinet
ceilingchips
counter
cupcutting boarddish rack
dishwasherdrawer
faucetfloor
flower
glassjarknifemayonnaise
microwave
mixermustardpicklespicturepitcherplant
platepotsandwichshaker
sink
soap
stove
toaster
towel
trayvase
window
wiskapplebasket
blanket
book
bowlboxcarpet
chair
doorflower
ottoman
pillowplate
sofa
table
tree
wall
wall artbaseball
book
bowlcarpetceiling
chair
counter
door
figurinefloor
light
mirror
napkin holderobject
phone
phone jackpot
shelfshoe
stool
switchvasevent
wall
wall decoration
window
beam
benchceilingcupfloor
flower
grass
light
picture
rock
screen
table
treewall
window
artichokesbottlebowl
breadcabinet cheese
corkscrew
counter
dishfaucet
flower
glass
grapes
hood
jarlightpicture
pitcherplatepotquiche
sink
spoonstool
stove
teapotvase
wall
window
blind
bookbowl
cage
chair
diamond
flower
light
orange
ottoman
pillowplate
shelf
sofa
table
teapot
tree
wallwood
yarn
r = 0.94
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
probability of reference without context
prob
abilit
y of
refe
renc
e in
con
text
bottlebowl
bread
cabinet
candlecanisterceilingcoffeemachine
counter
cupcutting board
dishfaucet
floor
flower
fridge
glass
island
jarknifeknobnozzleonion
oven
pitcherplatepot
sinkstove
towelwall
window
carpetceilingchimneycushion
decanter
door
fireplace
floorglass
picture
pillow
shelf
sofa
switch
table
wall
window
wood
bookbottlebowlbush
cactus
carpet
chair
curtain
fencefireplacefloorflower
glassjar
lightottoman
picture
pillowsofa
table
vasewallwillow branches
window
bookbowbowlboxcandlecarpet
chair
curtain
decorative bowdishfigurine
flower
light
mirror
pillowplate
sofa
tabletreevasewall
window
apron
basketblindboatbook
bottle
bowl
cabinet
ceilingchips
counter
cupcutting boarddish rack
dishwasherdrawer
faucetfloor
flower
glassjarknifemayonnaise
microwave
mixermustardpicklespicturepitcherplant
platepotsandwichshaker
sink
soap
stove
toaster
towel
trayvase
window
wiskapplebasket
blanket
book
bowlboxcarpet
chair
doorflower
ottoman
pillowplate
sofa
table
tree
wall
wall artbaseball
book
bowlcarpetceiling
chair
counter
door
figurinefloor
light
mirror
napkin holderobject
phone
phone jackpot
shelfshoe
stool
switchvasevent
wall
wall decoration
window
beam
benchceilingcupfloor
flower
grass
light
picture
rock
screen
table
treewall
window
artichokesbottlebowl
breadcabinet cheese
corkscrew
counter
dishfaucet
flower
glass
grapes
hood
jarlightpicture
pitcherplatepotquiche
sink
spoonstool
stove
teapotvase
wall
window
blind
bookbowl
cage
chair
diamond
flower
light
orange
ottoman
pillowplate
shelf
sofa
table
teapot
tree
wallwood
yarn
r = 0.94
Indoor scenes had a smaller context effect
−5.0 −4.5 −4.0 −3.5 −3.0 −2.5 −2.0
−0.5
0.0
0.5
log informativeness
effe
ct o
f con
text bread
cabinet
counter
dishflower
fridge
island
oven
sink
stove
window
doorfireplacepicture pillow
sofa
table
wall
window
wood
cactus
chair
curtain
light
ottoman
pillowsofa
tablewindow
chaircurtain
flower
light
mirrorsofa
table
window
apron
bottle
cabinet
counter
flower
microwavesinkstovetowel
window
blanket
book
chair pillow
plate
sofatable
wall
book
chair
doorlight
mirror
phone shelf
shoe
stool
wall
window
beam
cup
floor
flowerlight
picture screen
table
treewindow
bread
cabinet
cheese
counterflower
grapeshood
stove
window
blind
chair
flower
light
orange
pillow
plate sofa
table
tree
r = 0.27
−5.0 −4.5 −4.0 −3.5 −3.0 −2.5 −2.0
−0.5
0.0
0.5
log informativeness
effe
ct o
f con
text bread
cabinet
counter
dishflower
fridge
island
oven
sink
stove
window
doorfireplacepicture pillow
sofa
table
wall
window
wood
cactus
chair
curtain
light
ottoman
pillowsofa
tablewindow
chaircurtain
flower
light
mirrorsofa
table
window
apron
bottle
cabinet
counter
flower
microwavesinkstovetowel
window
blanket
book
chair pillow
plate
sofatable
wall
book
chair
doorlight
mirror
phone shelf
shoe
stool
wall
window
beam
cup
floor
flowerlight
picture screen
table
treewindow
bread
cabinet
cheese
counterflower
grapeshood
stove
window
blind
chair
flower
light
orange
pillow
plate sofa
table
tree
r = 0.27
• Tested predictions of communication framework – more realistic stimuli: real-world objects and
scenes – more natural response format: keywords and
sentences • Hypothesize features are chosen by the
product of two things – How relevant they are – How informative they are in context
• Factoring this product is often difficult – When we can measure each factor independently
we can predict responses Dale & Reiter (1996)
Look at the following set of objects:
How many red objects are there? How many circular objects are there?
Now imagine someone is talking to you in a foreign language. You don't know the meaning of the adjective, daxy, that he uses to refer to the object with the box around it.
Your job is to guess the meaning of daxy. Your guess should take the form of "bets." Imagine that you have $100 to spend betting on the meaning of the word. You should divide your money among the possible meanings -- the amount of money bet on each option should correspond to how confident you are that it is correct. Bets must sum to 100!
The one with the box around it is daxy. What do you think daxy means?
red: _______ circular: ________
Word learning
Object descriptions Scene descriptions
Scenario
KL Divergence as measure
1 speaker’s intended
meaning MS
.25 .25
.25 .25 possible meanings for listener ML
1 extension of
“red”
.5 .5 extension of
“circle”
distribution probabilities
!
DKL (MS w)
0.00 bits
2.00 bits
1.00 bits
0.00 bits
Grice (1975)
“Grice’s maxims taken collectively mean ‘Don’t include
elements that don’t do anything.’ Under a goal-
oriented view of language generation, there is no need to explicitly follow such a directive at all; the desired behaviour just
falls out of the mechanism.”
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
probability of reference without context
prob
abilit
y of
refe
renc
e in
con
text
building
bush
car
door
parkingmeter
plant
road
sidewalk
tree
wheel
window building
car
parkingmeter
personroad
sidewalk
signsky
streetlight
textwheel
window
windshieldawningbalcony
bicycle
building
car
door
firehydrant
lampmirrorpane
person
road
sidewalktextwheel
window
windshield
awning
building
carcrosswalk
manhole
person
road
streetlighttraffic light
tree
wheel
window
arm
building
car
dontWalkSignheadleg
person
sidewalk
sign
sky
streetlight
torso
traffic light
tree
umbrella
arm
awning
building
car
cloud
headleg
person
roadsidewalk
sign skytorso
tree
awning building
car
doorpane
person
poster
road
sidewalksigntail light
tree
umbrella
wheelwindowarm
building
headleg
newspaper box
person
polesignskytorso
tree
window
awning
balcony
building
car
headlightlicense plate
mailboxparkingmeter
person
road
sidewalkskytexttraffic lightwheel
window
windshield
awningbuilding
carcrosswalk
manhole
person
road
streetlighttraffic light
tree
wheel
window
r = 0.82
10 pictures x 50 descriptions = 500 datapoints
Context: relevance and informativeness
informativeness
“relevance”
r = .82
−4.0 −3.5 −3.0 −2.5 −2.0 −1.5
−0.5
0.0
0.5
log informativeness
effe
ct o
f con
text
building
car
doorparkingmeter
road
tree
window
building
car
parkingmeter
road
sidewalksign
streetlight
window
bicycle
building
car
firehydrant
person
roadwindow
awning
building
car
crosswalk
manhole
road
treewindow
buildingcarperson
streetlighttraffic light
tree
umbrella
awning
building
car
cloudperson
road
sidewalk
tree
building
car
person
road
tree
umbrella
building
newspaper box
person
tree
awning
building
car
mailbox
parkingmeter
person
road
window
awning
buildingcar
crosswalkmanhole
road
tree
window
r = 0.69
r = .69
Residual context effects vs. informativeness