Niket Tandon - Max Planck SocietyNiket Tandon Ph.D. Supervisor: Gerhard Weikum Max Planck Institute...

61
Commonsense Knowledge Acquisition and Applications Niket Tandon Ph.D. Supervisor: Gerhard Weikum Max Planck Institute for Informatics Towards Commonsense Enriched Machines

Transcript of Niket Tandon - Max Planck SocietyNiket Tandon Ph.D. Supervisor: Gerhard Weikum Max Planck Institute...

  • Commonsense Knowledge Acquisition and Applications

    Niket TandonPh.D. Supervisor: Gerhard Weikum

    Max Planck Institute for Informatics

    Towards Commonsense Enriched Machines

  • 2

    Hard Rock

    Hand, leg

    Climbing a rock

    brown

    Person

    Adventurous Activity

    property

    part of

    scene

    Climber Personis a

  • 3

    Hard Rock

    Hand, leg

    Climbing a rock

    brown

    Person

    Adventurous Activity

    property

    part of

    scene

    Humans

    Climber Personis a

    Machines

    1 Rock

    2 Hands

    2 Legs

    1 Person

    Human- Machine Knowledge Gap

  • 4

    Hard Rock

    Hand, leg

    Climbing a rock

    brown

    Person

    Adventurous Activity

    property

    part of

    scene

    Humans

    Climber Personis a

    Machines

    1 Rock

    2 Hands

    2 Legs

    1 Person

    Human- Machine Knowledge Gap

    Commonsense of

    objects

    Commonsense of

    relationships

    Commonsense of

    interactions

  • 5

    How will the machines be smarter if we fill this knowledge gap

    Smarter Robots

    Get me a coffee (where?)

    Smarter Vision

    Better classifiers Monitor or TV?given mouse, keyboard

    Smarter IR

    Adventurous activities

  • 6

    Encyclopedic Knowledge

    Commonsense

    Knowledge

    Facts about instances/events

    Facts about Instances:A. Honnold, married, Lisa Honnold

    Their events:A. Honnold, married on, 19.08.2016

    Facts about classes/activities

    Can we fill the human machine knowledge gap using existing Encyclopedic KBs like FreeBase?

  • 7

    Encyclopedic Knowledge

    Commonsense Knowledge

    Facts about instances

    1. EKB acquisition Unimodal

    2. EKB Curation Textual verification

    3. EKB CompletionNegative training assumptions hold

    If (ei, rk, ej) holds, then

    (ei, rk, ej’ != ej) is -ve

    A. Honnold, bornIn, USA. Honnold, bornIn, UK

    Facts about classes

    1. CKB acquisitionMultimodal

    2. CKB Curation Textual + Visual

    3. CKB CompletionNegative trainingassumptions fail

    climber, at location, {mountain, university}

  • 8

    Encyclopedic Knowledge

    Commonsense Knowledge

    Facts about instances

    1. EKB acquisition Unimodal

    2. EKB Curation Textual verification

    3. EKB CompletionNegative training assumptions hold

    If (ei, rk, ej) holds, then

    (ei, rk, ej’ != ej) is -ve

    A. Honnold, bornIn, USA. Honnold, bornIn, UK

    Facts about classes

    1. CKB acquisitionMultimodal

    2. CKB Curation Textual + Visual

    3. CKB CompletionNegative trainingassumptions failEKBs have several functional relations

    hence the assumption holds.

    0

    0.2

    0.4

    0.6

    0.8

    1

    EKB CKB

    Functional

    Non-functional

  • Commonsense knowledge acquisition is different and harder

    Humans hardly express the obvious: Scarce & Implicit

    Spread across multiple modalities: Multimodal

    Unusual reported more than usual: Reporting Bias

    Culture specific, Location specific: Contextual

    9

  • KBs possessing commonsense knowledge

    10

    Need: automatically constructed, semantically organized Commonsense KB

    KB Supervision Pros Cons

    Cyc manually curated

    accuracy costcoverage

    ConceptNet semi-automated

    coverage accuracy

    less organized

    Tandon et. al AAAI’11

    bootstrapped usingConceptNet

    coverage noise, less organized

    Desiderata minimalsupervision

    organized,high accuracy > 80%, high coverage >10M

    ---

  • Need: robust techniques to automatically construct semantically organized Commonsense KB

  • Three research questions:Investigate robust techniques to acquire:

    RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.

  • Three research questions:Investigate robust techniques to acquire:

    RQ 2. Commonsense of relationships between objects. - part whole relation, comparative relation…

  • Three research questions:Investigate robust techniques to acquire:

    RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.

  • Three research questions:Investigate robust techniques to acquire:

  • Three research questions:Investigate robust techniques to acquire:

    RQ.1

    RQ.2

    RQ.3

  • RQ.3

    Research question 1

    RQ.2

    Previous work: • lump together these properties • do not distinguish the meanings of the words• have low coverage

    RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.

  • 18

    Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 ∶ < 𝑤1𝑛𝑠 , 𝑟, 𝑤2𝑎

    𝑠 >

    Input ∶ 𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠

    𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒. 𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝

    𝑠𝑢𝑚𝑚𝑖𝑡𝑛2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝𝑎

    3

  • 19

    disambiguated n

    1.)

    2.)

    3.)

    fine-grained relations: r∈R

    hasAppearancehasSoundhasTastehasTemperaturehasSoundevokesEmotion

    Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 ∶ < 𝑤1𝑛𝑠 , 𝑟, 𝑤2𝑎

    𝑠 >

    Input ∶ 𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠

    𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒. 𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝

    disambiguated a

    1.)

    2.)

    3.)

    𝑠𝑢𝑚𝑚𝑖𝑡𝑛2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝𝑎

    3

  • 20

    Extract generic hasProperty

    triples over input

    verb [adv] e.g. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝..

    Disambiguate argsand classify triple

    𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑

    Our approach

    𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅

    𝒄𝒉𝒊𝒍𝒊, 𝒉𝒐𝒕

  • Extract generic hasProperty

    triples over input

    Disambiguate argsand classify triple

    Typically requirestraining data

  • 22

    < 𝒘𝟏𝒏 , 𝒘𝟐𝒂 >

    < 𝒘𝟏𝒏𝒔 , 𝒓, 𝒘𝟐𝒂

    𝒔 >

    < 𝒘𝟏𝒏𝒔 , 𝒓,∗>

    Suppose 𝑟 =ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒𝑠𝑢𝑚𝑚𝑖𝑡, 𝑐𝑟𝑖𝑠𝑝

    Extract generic hasProperty

    triples over input

    Disambiguate argsand classify triple

    𝒄𝒓𝒊𝒔𝒑𝒂𝟑, 𝒉𝒐𝒕𝒂

    𝟏, 𝒄𝒐𝒍𝒅𝒂𝟏,

    𝒊𝒄𝒚𝒂𝟐 …

    𝒃𝒆𝒂𝒄𝒉𝒏𝟑 , 𝒔𝒖𝒎𝒎𝒊𝒕𝒏

    𝟐 , 𝒎𝒆𝒕𝒂𝒍𝒏

    𝟏 , 𝒎𝒆𝒕𝒂𝒍𝒏𝟐 …

    < 𝒔𝒖𝒎𝒎𝒊𝒕𝒏𝟐 , 𝒄𝒓𝒊𝒔𝒑𝒂

    𝟑 >< 𝒃𝒆𝒂𝒄𝒉𝒏

    𝟏 , 𝒉𝒐𝒕𝒂𝟏 > …

    𝒓𝒂𝒏𝒈𝒆 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆

    𝒅𝒐𝒎𝒂𝒊𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆

    𝒂𝒔𝒔𝒆𝒓𝒕𝒊𝒐𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆

  • 𝑑𝑜𝑚𝑎𝑖𝑛(𝑟), 𝑟𝑎𝑛𝑔𝑒(𝑟), 𝑎𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛(𝑟) 𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒

    23

    Noisy, Surface

    form candidates

    for 𝒓

    Graph construction

    Graph inference

  • An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)

    24

    summit mountain dancer

    cold 20 50 3

    hot 30 40 10

    crisp 15 15 1

  • An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)

    25

    𝒄𝒓𝒊𝒔𝒑𝒂𝟏 clearly defined

    𝒄𝒓𝒊𝒔𝒑𝒂𝟑 cold and invigorating

    temperature

    𝒄𝒐𝒍𝒅𝒂𝟏 low or inadequate

    temperature

  • An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)

    26

    sense #1 sense #2 sense #3

    1/2 1/3 1/4

  • Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature

    27

    Similar nodes Similar labels

    But, limitedtraining data

    𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑

    𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅

    s𝒂𝒍𝒔𝒂, 𝒉𝒐𝒕

  • 28

    Similar nodes Similar labels

    But, limitedtraining data

    Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature

  • Label Propagation: Loss function (Talukdar et. al 2009)

    Seed label loss

    Similar node diff label loss

    Label prior loss (high

    degree nodes are noise)

    29

    UV

  • 30

    Seed label loss

    Similar node diff label loss

    Label prior loss

    Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature

  • WebChild : Model recap

    31

    Noisy, surface form candidates

    for 𝒓

    Clean, disambiguated triples in

    𝒓

    Graph construction

    Graph inference

  • Resulting KB

    Domain (hasShape)

    mountain-n1

    leaf-n1

    ...

    Range (hasShape)

    triangular-a1

    tapered-a1

    ...

    Assertions (hasSshape)

    lens-n1, spherical-a2

    palace-n2, domed-a1

    ...

    WebChild: Large (~5Million), Semantically organized Accurate (0.82 sampled precision)

  • Summary of property commonsense

    WebChild: First commonsense KB with fine-grained relations and disambiguated arguments ; 4.6 million assertions including domain and range for 19 relations.

    Take away message: Transductive methods help

    overcome sparsity of commonsense in text.

  • Research question 3

    RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.

    Previous work: • largely discuss events, but activities only at small-scale• do not organize the attributes of the activities• do not distinguish the meanings of the attribute values

  • 35

    {Climb up a mountain , Hike up a hill}

    Participants climber, boy, rope

    Location camp, forest, sea shore

    Time day, holiday

    Visuals

    An Activity frame

  • 36

    {Climb up a mountain

    , Hike up a hill}

    Participants climber, boy, rope

    Location camp, forest, sea shore

    Time day, holiday

    Visuals

    Get to village

    .. ..

    Go up an elevation

    .. ..

    Previous activityParent activity

    Reach at the top

    .. ..

    Next activity

    Semantic organization of Activity frames

  • 37

    Contain events but not activity knowledge

    May contain activities but no visuals and varying granularity of scene boundaries, transitions.

  • 38

    Hollywood narratives are good

    Contain events but not activity knowledge

    May contain activities but no visuals and varying granularity of scene boundaries, transitions.

  • 39

    Semantic parsing of scripts

    Graph construction

  • 40

    Input: Text in a scene taken from a semi-structured movie script e.g. : He began to shoot a video on the summit

    Output: Disambiguated, semantic roles e.g.the man : agent began to shoot : action a video : patientsummit : location

    SRL systems are computationally expensive, domain specific

    Semantic parsing of scripts

    Graph construction

  • 41

    State of the art WSD customized for phrases

    man.1

    video.1

    shoot.1

    shoot.4

    man.2

    the man

    began to

    shoot

    a video

  • 42

    State of the art WSD customized for phrases

    man.1

    video.1

    shoot.1

    shoot.4

    man.2

    the man

    began to

    shoot

    a video

    agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    NP VP NP

    NP VP NP

    VerbNet contains curated semantic roles for verbs

    Selectional restriction

    Selectional restriction

    Can we use two different information sources to perform SRL given no training data?

  • 43

    State of the art WSD customized for phrasesSyntactic and semantic role

    semantics from VerbNet

    man.1

    video.1

    shoot.1

    shoot.4

    man.2 agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    the man

    began to

    shoot

    a video

    NP VP NP

    NP VP NP

    Thing/ inanimate

    WordNet class hierarchy

    WordNet VerbNetlinkage

    Jointly leverage

  • 44

    State of the art WSD customized for phrasesSyntactic and semantic role

    semantics from VerbNet

    man.1

    video.1

    shoot.1

    shoot.4

    man.2 agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    the man

    began to

    shoot

    a video

    NP VP NP

    NP VP NP

    Thing/ inanimate

    WordNet class hierarchy

    WordNet VerbNetlinkage

    Jointly leverage

    Binary decision variable

  • 45

    State of the art WSD customized for phrasesSyntactic and semantic role

    semantics from VerbNet

    man.1

    video.1

    shoot.1

    shoot.4

    man.2 agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    the man

    began to

    shoot

    a video

    NP VP NP

    NP VP NP

    Thing/ inanimate

    WordNet class hierarchy

    WordNet VerbNetlinkage

    Jointly leverage

    WSD prior WN prior

  • 46

    State of the art WSD customized for phrasesSyntactic and semantic role

    semantics from VerbNet

    man.1

    video.1

    shoot.1

    shoot.4

    man.2 agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    the man

    began to

    shoot

    a video

    NP VP NP

    NP VP NP

    Thing/ inanimate

    WordNet class hierarchy

    WN VN linkage

    Jointly leverage

    Sense, VN syntactic match score

  • 47

    State of the art WSD customized for phrasesSyntactic and semantic role

    semantics from VerbNet

    man.1

    video.1

    shoot.1

    shoot.4

    man.2 agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    the man

    began to

    shoot

    a video

    NP VP NP

    NP VP NP

    Thing/ inanimate

    WordNet class hierarchy

    WN VN linkage

    Jointly leverage

    Sense, VN semantic match score

  • 48

    xij = binary decision var. for word i, mapped to WN sense j

    WSD prior WN prior Word, VN match score

    Selectional restriction score

    One VN sense per verb

    WN, VN sense consistency

    Selectional restr. constraints

    binary decision

    Joint WSD and SRL

    … …

  • Joint WSD and SRL O/P

    Agent:

    man.1

    Action:

    shoot.4

    Patient:

    video.1

    man.1

    video.1

    shoot.1

    shoot.4

    man.2 agent.animate

    shoot.vn.1patient.animate

    agent.animate

    shoot.vn.3patient.

    inanimate

    the man

    began to

    shoot

    a video

    NP VP NP

    NP VP NP

    Semantic parsing of scripts

    Graph construction

  • Climb up a mountain

    Participants climber, rope

    Location summit, forest

    Time day

    Semantic parsing of scripts

    Graph construction

  • 51

    Climb up a mountain

    Participants climber, rope

    Location summit, forest

    Time day

    Hike up a hill

    Participants climber

    Location sea shore

    Time holiday

    Go up an

    elevation

    .. ..

    Reach top

    .. ..

    Semantic parsing of scripts

    Graphconstruction

    Construct a graph of activity frames with three edge types:

    Similar : S(a,b) Previous: P(a,b)TypeOf : T(a,b)

  • 52

    Similarity: S (climb up a mountain, hike up a hill)

    Attribute similarity

    Climb up a mountain

    Participants climber, rope

    Location forest

    Time day

    Hike up a Hill

    Participants climber

    Location woods

    Time holiday

    +Activity Similarity

  • 53

    Attribute hypernymy

    Climb up a mountain

    Participants climber, rope

    Location forest

    Time day

    Go up an elevation

    Participants Person

    Location Exterior

    Time day

    +Activity hypernymy

    TypeOf: T (climb up a mountain, go up an elevation)

  • 54

    Climb up a mountain

    … …

    Reach the top

    … …

    Previous: P (reach the top, climb up a mountain)

    Allow gaps between activities within one scene.PMI style counting to suppress generic activities.

    Scene:

    Carrie and Big start out early to head to the village. They climb up the beautiful mountain which felt as if they were in a different world. After several hours they eventually reach the top.

  • 55

    Climb up a mountain

    Participants climber, rope

    Location summit, forest

    Time day

    Hike up a hill

    Participants climber

    Location sea shore

    Time holiday

    Go up an elevation

    .. ..

    Reach top

    .. ..

    Semantic parsing of scripts

    Graph construction

    similar

  • 56

    Semantic parsing of scripts

    Graph construction

  • 57

    Knowlywood Statistics

    Scenes 1,708,782Activity synsets 505,788

    Accuracy 0.85 ± 0.01#Images from scenes 30,000

    Resulting KB: Knowlywood

  • Summary of activity commonsense

    Knowlywood: First organized commonsense activity KB with activity attributes and disambiguated values containing nearly 1 million activities with visuals.

    Take away message: Jointly leveraging different annotated

    resources helps overcome sparsity of training data.

  • The overall KB: WebChild KB

    > 3M concepts, > 18M triples, >1000 relations

  • Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information

    • Research Question 1Properties

    (WSDM’14)

    • Research Question 2Comparatives, part-whole

    (AAAI’14, AAAI’16)

    • Research Question 3Activities

    (WWW’15, CIKM’15)

    60

    WEBCHILD KB Applications(CVPR’15, ACL’15, ISWC’16..)

  • Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information

    • RQ1

    • Range, domain, assertions of fine-grained relations

    Properties

    (WSDM’14)

    • RQ2

    • Fine-grained comparative, part-whole relations

    Comparatives, part-whole

    (AAAI’14, AAAI’16)

    • RQ3

    • Activity frames with semantic attributes

    Activities

    (WWW’15, CIKM’15)

    61

    WEBCHILD KB Applications(CVPR’15, ACL’15, ISWC’16..)

    ML + NLP community

    limited training data can be overcome by jointly leveraging multiple cues

    Computer Vision community

    commonsense helps computer vision

    vision helps commonsense acquisition

    AI community

    semantically organized knowledge is a step towards filling human machine gap