On WordNet, Text Mining, and Knowledge Bases of the Future

71
On WordNet, Text Mining, and Knowledge Bases of the Future Peter Clark March 2006 Knowledge Systems Boeing Phantom Works

description

On WordNet, Text Mining, and Knowledge Bases of the Future. Peter Clark March 2006 Knowledge Systems Boeing Phantom Works. Introduction. Interested in text understanding & question-answering use of world knowledge to go beyond text Used WordNet as (part of) the knowledge repository - PowerPoint PPT Presentation

Transcript of On WordNet, Text Mining, and Knowledge Bases of the Future

Page 1: On WordNet, Text Mining, and Knowledge Bases of the Future

On WordNet, Text Mining, and Knowledge Bases of the Future

Peter ClarkMarch 2006

Knowledge SystemsBoeing Phantom Works

Page 2: On WordNet, Text Mining, and Knowledge Bases of the Future

Introduction• Interested in text understanding & question-answering

– use of world knowledge to go beyond text

• Used WordNet as (part of) the knowledge repository

– got some leverage

– can we get more?

– what would a WordNet KB look like?

Page 3: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 4: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 5: On WordNet, Text Mining, and Knowledge Bases of the Future

On Machine Understanding

• Consider

• Suggests:• there a rocket launch• China owns the satellite• the satellite is for monitoring weather• the orbit is around the Earth• etc.

None of these are explicitly stated in the text

“China launched a meteorological satellite into orbit Wednesday, the first of five weather guardians to be sent into the skies before 2008.”

Page 6: On WordNet, Text Mining, and Knowledge Bases of the Future

On Machine Understanding• Understanding = creating a situation-specific model

(SSM), coherent with data & background knowledge– Data suggests background knowledge which may be

appropriate– Background knowledge suggest ways of interpreting data

Fragmentary,ambiguous

inputs

Coherent Model(situation-specific)

?

?

Page 7: On WordNet, Text Mining, and Knowledge Bases of the Future

On Machine Understanding

Fragmentary,ambiguous

inputsCoherent Model

(situation-specific)

?

? Assembly of pieces, assessment of coherence,inference

World Knowledge

Page 8: On WordNet, Text Mining, and Knowledge Bases of the Future

On Machine Understanding

• Conjectures about the nature of the beast:– “Small” number of core theories

• space, time, movement, …• can encode directly

– Large amount of “mundane” facts• a dictionary contains many of these facts

World Knowledge

Page 9: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 10: On WordNet, Text Mining, and Knowledge Bases of the Future

Caption-Based Video Retrieval

English captionsdescribing

a video segment(partial, ambiguous)

Coherent representationof the scene (elaborated,disambiguated)

Question-Answering, Search, etc.

World Knowledge

?

?

Page 11: On WordNet, Text Mining, and Knowledge Bases of the Future

“A man opens an airplane door”

“A lever is rotated to the unarmed position”

“…” “…”

Video

Captions(manualauthoring)

Open

Man Door Airplane

agent object

is-part-of

Caption textInterpretation

Elaboration (inference,scene-building) Open

ManDoor Airplane

World Knowledge

SearchTouch

Person Door

Query:

Illustration: Caption-Based Video Retrieval

Page 12: On WordNet, Text Mining, and Knowledge Bases of the Future

Semantic Retrieval• Query: “A person walking”

→ Result: “A man carries a box across a room”

• “Someone injured”→ “An employee was drilling a hole in a piece of wood.

The drill bit of the drill broke.The drill twisted out of the employee's right hand.The drill injured the employee's right thumb.”

• “An object was damaged”→ the above caption (x 2)→ “Someone broke the side mirrors of a Boeing truck.”

Page 13: On WordNet, Text Mining, and Knowledge Bases of the Future

The Knowledge Base• Representation:

– Horn-Clause rules• plus add/delete lists for “before” and “after” rules

– Authored in simplified English• NLP system interactively translates to logic

– WordNet + UT Austin relations as the ontology

– ~1000 rules authored• just a drop in the ocean!

• Reasoning:– depth-limited forward chaining– precondition/effects just asserted (no sitcalc simulation)

Page 14: On WordNet, Text Mining, and Knowledge Bases of the Future

Some of the Rules in the KB:IF a person is carrying an entity that is inside a room THEN (almost) always the person is in the room.

IF a person is picking an object up THEN (almost) always the person is holding the object.

IF an entity is near a 2nd entity AND the 2nd entity contains a 3rd entity THEN usually the 1st entity is near the 3rd entity.

ABOUT boxes: usually a box has a lid.

BEFORE a person gives an object, (almost) always the person possesses the object.

AFTER a person closes a barrier, (almost) always the barrier is shut.

…1000 more…

Page 15: On WordNet, Text Mining, and Knowledge Bases of the Future

Some of the Rules in the KB:IF a person is carrying an entity that is inside a room THEN (almost) always the person is in the room.

isa(_Person1, person_n1), isa(_Carry1, carry_v1), isa(_Entity1, entity_n1), isa(_Room1, room_n1),

agent(_Carry1, _Person1), object(_Carry1, _Entity1), is-inside(_Entity1, _Room1),

==== (almost) always ===>

is-inside(_Person1, _Room1).

Page 16: On WordNet, Text Mining, and Knowledge Bases of the Future

Critique: 2 Big Questions Hanging

• Representation: The Knowledge Base– Unscalable to build the KB from scratch– WordNet helped a lot– Could it be extended to help more?– What would that WordNet KB look like?– How could it be built?

• Reasoning:– Deductive inference is insufficient– How looks with large, noisy, uncertain knowledge?

Page 17: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 18: On WordNet, Text Mining, and Knowledge Bases of the Future

What Knowledge Do We Need?

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

Like system to infer that:The bomb explodedThe explosion caused the

devastationThe shrine was damaged…

System needs to know:Bombs can explodeExplosions can destroy thingsDestruction ≈ devastationAttacks are usually done by people…

Page 19: On WordNet, Text Mining, and Knowledge Bases of the Future

What Knowledge Do We Need? "Israeli troops were engaged in a fierce gun battle with militants in a West Bank town. An Israeli soldier was killed.

Like system to infer that:There was a fight.The soldier died.The soldier was shot.The soldier was a member of the Israeli troops.…

System needs to know:A battle involves a fight.Soldiers use guns.Guns can kill.If you are killed you are dead.Soldiers belong to troops…

Page 20: On WordNet, Text Mining, and Knowledge Bases of the Future

WordNet (Princeton Univ)– Is not a word net; is a concept net– 117,000 lexically motivated concepts (synsets)– organized into a taxonomy (hypernymy)– massively used in AI (~7000 downloads/month)

201378060: "shuffle", "ruffle", "mix": (mix so as to make a random order or arrangement; "shuffle the cards")

201378060

201174946

201173984

superclass / genls / supertype

Page 21: On WordNet, Text Mining, and Knowledge Bases of the Future

WordNet (Princeton Univ)– Is not a word net; is a concept net– 117,000 lexically motivated concepts (synsets)– organized into a taxonomy (hypernymy)– massively used in AI (~7000 downloads/month)

mix_v6: "shuffle", "ruffle", "mix": (mix so as to make a random order or arrangement; "shuffle the cards")

mix_v6

manipulate_v2

handle_v4

superclass / genls / supertype

Page 22: On WordNet, Text Mining, and Knowledge Bases of the Future

The Evolution of WordNet• v1.0 (1986)

– synsets (concepts) + hypernym (isa) links• v1.7 (2001)

– add in additional relationships• has-part• causes• member-of• entails-doing (“subevent”)

• v2.0 (2003)– introduce the instance/class distinction

• Paris isa Capital-City is-type-of City

– add in some derivational links• explode related-to explosion

• …• v10.0 (2010?)

– ?????

lexicalresource

knowledgebase?

Page 23: On WordNet, Text Mining, and Knowledge Bases of the Future

WordNet as a Knowledge BaseGot: just “isa” and “part-of” knowledge

But still need: I. Axioms about each concept!

– From definitions and examples (?)– shallow extraction has been done (LCC and ISI)– getting close to useful logic

II. Relational vocabulary (case roles, semantic relns) – could take from: FrameNet, Cyc, UT Austin

III. Relations between word senses:• bank (building) vs. bank (institution) vs. bank (staff)• cut (separate) vs. cut (sweeping motion)

Page 24: On WordNet, Text Mining, and Knowledge Bases of the Future

• Ide & Veronis: – dictionaries have no broad contextual/world knowledge– e.g., no connection between “lawn” and “house”

• Not true!

garden -- (a yard or lawn adjoining a house)

1 sense of lawn Sense 1 lawn#1 -- (a field of cultivated and mowed grass) -> field#1 -- (a piece of land cleared of trees and usually enclosed) => yard#2, grounds#2 -- (the land around a house or other building; "it was a small house with almost no yard at all")

WN1.7.1

WN1.6

I. Knowledge in the word sense definitions: How much knowledge is in WordNet?

Page 25: On WordNet, Text Mining, and Knowledge Bases of the Future

I. Knowledge in the word sense definitions: How much knowledge is in WordNet?

"lawn". WordNet seems to "know", among other things, that lawns– need watering– can have games played on them– can be flattened, mowed– can have chairs on them and other furniture– can be cut/mowed– things grow on them– have grass ("lawn grass" is a common compound)– leaves can get on them– can be seeded

Page 26: On WordNet, Text Mining, and Knowledge Bases of the Future

"accident" (ignoring derivatives like "accidentally")– accidents can block traffic– you can be prone to accidents– accidents happen– result from violent impact; passengers can be killed– involve vehicles, e.g., trains– results in physical damage or hurt, or death– there are victims– you can be blamed for accidents

I. Knowledge in the word sense definitions: How much knowledge is in WordNet?

Page 27: On WordNet, Text Mining, and Knowledge Bases of the Future

Let’s take a look…

Page 28: On WordNet, Text Mining, and Knowledge Bases of the Future

I. Knowledge in the word sense definitions: Generating Logic from Glosses

• Definitions appear deceptively simple– really, huge representational challenges underneath

hammer_n2: (a hand tool with a heavy rigid head and a handle; used to deliver an impulsive force by striking) launch_v3: (launch for the first time; "launch a ship") cut_n1: (the act of reducing the amount or number)love_v1: (have a great affection or liking for)theater_n5: (a building where theatrical performances can be held)

• Want logic to be faithful but also simple (usable)• Claim: We can get away with a “shallow” encoding

– all knowledge as Horn clauses– some loss of fidelity– gain in syntactic simplicity and reusability

Page 29: On WordNet, Text Mining, and Knowledge Bases of the Future

I. Knowledge in the word sense definitions: Simplifying

1. “Timeless” representations– No tagging of facts with situations– Representation doesn’t handle change

break_v4: (render inoperable or ineffective; "You broke the alarm clock when you took it apart!")

Ax,y isa(x,Break_v4) & isa(y,Thing) & object(x,y) → property(y,Inoperable)

Break

Thing Inoperable

object

property

Page 30: On WordNet, Text Mining, and Knowledge Bases of the Future

I. Knowledge in the word sense definitions: Simplifying

“hammer_n2: (… used to deliver an impulsive force by striking)”

2. For statements about types, use instances instead:

Ax isa(x,Hammer_n2) → Ed,f,s,y,z … & isa(d, Deliver_v9) &isa(s, Hit_v2) &isa(f, Force_n3) &purpose(x, d) &object(d, f) &subevent(d, s).

Hammer

Handle Head

Rigid Heavy

property

has-part

Deliver Force

Strike

subevent

purposeobject

Strictly, should be purpose(x,Deliver-Impulsive-Force)

Page 31: On WordNet, Text Mining, and Knowledge Bases of the Future

II: Relational Vocabulary

• Is this enough?• No, also need relational vocabulary

• Which relational vocabulary to use?– agent, patient, has-part, contains, destination, …

• Possible sources:– UT Austin’s Slot Dictionary (~100 relations)– Cyc (~1000 relations)– FrameNet (??)

Page 32: On WordNet, Text Mining, and Knowledge Bases of the Future

III. Relations between word senses: Nouns

• Nouns often have multiple, related senses

School_n1: an institution School_n2: a building School_n3: the process of being educated School_n4: staff and students School_n5: a time period of instruction

• Reasoner needs to know these are related

The school declared that the teacher’s strike was over.

Students should arrive at 9:15am tomorrow morning.

School_n1 (institution)

School_n4 (staff,students)

School_n2 (building)

Page 33: On WordNet, Text Mining, and Knowledge Bases of the Future

staff and students (School_n4)

educational process (School_n3)

institution (School_n1)

building (School_n2)

time period of instruction (School_n5)

participants

constituent

location

constituent

during

• Can hand-code these relationships (slow)III. Relations between word senses: Nouns

Page 34: On WordNet, Text Mining, and Knowledge Bases of the Future

members

process

institution

building

time period

participants

constituent

location

constituent

during

• Can hand-code these relationships (slow)• BUT: The patterns repeat (Buitelaar)

III. Relations between word senses: Nouns

staff and students (School_n4)

educational process (School_n3)

institution (School_n1)

building (School_n2)

time period of instruction (School_n5)

participants

constituent

location

constituent

during

Page 35: On WordNet, Text Mining, and Knowledge Bases of the Future

members

process

institution

building

time period

participants

constituent

location

constituent

during

• Can hand-code these relationships (slow)• BUT: The patterns repeat (Buitelaar)

– can encode and reuse the patterns

III. Relations between word senses: Nouns

staff and students (School_n4)

educational process (School_n3)

institution (School_n1)

building (School_n2)

time period of instruction (School_n5)

participants

constituent

location

constituent

during

Page 36: On WordNet, Text Mining, and Knowledge Bases of the Future

III. Relations between word senses: Verbs• WordNet’s verb senses:

– 41 senses of “cut”– linguistically not representationally motivated

• “cut grass” (cut_v18) ≠ “cut timber” (cut_v31) ≠ “cut grain” (cut_v28) (“mow”, “chop”,

“harvest”)• cut_v1 (separate) ≠ cut_v3 (slicing movement) • fails to capture commonality

• Better: – Organize verbs into “mini taxonomy” – “Supersenses”, to group same meanings– Identify facets of verbs, use multiple inheritance

• result of action• style of action

Page 37: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 38: On WordNet, Text Mining, and Knowledge Bases of the Future

“We don’t believe that there’s any shortcut to being intelligent; the “secret” is to have lots of knowledge.” Lenat & Guha ‘86

“Knowledge is the primary source of the intellectual power of intelligent agents, both human and computer.” Feigenbaum ‘96

The Myth of Common-Sense:All you need is knowledge…

Page 39: On WordNet, Text Mining, and Knowledge Bases of the Future

The Myth of Common-Sense

• Common, implicit assumption (belief?) in AI:– Knowledge is the key to intelligence– Acquisition of knowledge is bottleneck

• Spawned from:– ’70s experience with Expert Systems

• Feigenbaum’s “knowledge acquisition bottleneck”

– Introspection

Page 40: On WordNet, Text Mining, and Knowledge Bases of the Future

Thought Experiment…

• Suppose we had• good logical translations of the WordNet definitions• good relational vocabulary• rich relationships between related word senses

– How would these be used?– Would they be enough?– What else would be needed?

Page 41: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

DevastateAttack

timecauses

objectinstrument

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

Initial Scenario Sentence

Page 42: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

DevastateAttack

timecauses

objectinstrument

“bomb: an explosive devicefused to detonate”

Bomb Detonate

Explosive

contains

purpose

Device

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

One Elaboration Step (knowledge of bomb)

Page 43: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

DevastateAttack

timecauses

objectinstrument

Bomb Shrine

Dawn

DevastateAttackcauses

objectinstrument

“bomb: an explosive devicefused to detonate”

Bomb Detonate

Explosive

contains

purpose

Device

Detonate

Explosivecontains

purpose

Device

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

One Elaboration Step (knowledge of bomb)

Page 44: On WordNet, Text Mining, and Knowledge Bases of the Future

“bomb: an explosive devicefused to detonate”

Bomb Detonate

Explosive

contains

purpose

Device

Bomb

BombingTerroristagent

instrument

“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”

“plastic explosive: an explosive material …intended to destroy”

Explosive

Destroy

purpose

Destroy

Explode

causes

“explode: destroy by exploding”

Destroy Damagecauses

“destroy: damage irrepairably”

Additional, Relevant Knowledge in WordNet

Page 45: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

DevastateAttack

{Detonate,Explode}

Explosive

contains

timecauses

objectinstrument

Bomb Shrine

Dawn

{Devastate,Destroy}Attack

timecauses

objectinstrument

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

Multiple Elaboration Steps

“bomb: an explosive device fused to detonate”

Page 46: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

DevastateAttack

{Detonate,Explode}

Explosive

contains

timecauses

objectinstrument

Bomb Shrine

Dawn

{Devastate,Destroy}Attack

timecauses

objectinstrument

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

Multiple Elaboration Steps

“bomb: an explosive device fused to detonate”

“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”

Terroristagent

Page 47: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

Attack

{Detonate,Explode}

Explosive

contains

timecauses

objectinstrument

Bomb Shrine

Dawn

{Devastate,Destroy}Attack

timecauses

objectinstrument

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

Multiple Elaboration Steps

“bomb: an explosive device fused to detonate”

“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”

“plastic explosive: an explosive material …intended to destroy”

Terroristagent {Devastate,

Destroy}

purpose

Page 48: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

Attack

{Detonate,Explode}

Explosive

contains

timecauses

objectinstrument

Bomb Shrine

Dawn

{Devastate,Destroy}Attack

timecauses

objectinstrument

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

Multiple Elaboration Steps

“bomb: an explosive device fused to detonate”

“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”

“plastic explosive: an explosive material …intended to destroy”

“destroy: damage irrepairably”

Terroristagent {Devastate,

Destroy}

purpose

Damagecauses

Page 49: On WordNet, Text Mining, and Knowledge Bases of the Future

Bomb Shrine

Dawn

{Devastate,Destroy}Attack

{Detonate,Explode}

Explosive

DamageTerrorist

contains

agenttime

causes causes

causes

purpose

objectinstrument

Bomb Shrine

Dawn

{Devastate,Destroy}Attack

timecauses

objectinstrument

"A dawn bomb attack devasted a major Shiite shrine in Iraq..."

WordNet

Multiple Elaboration Steps

“bomb: an explosive device fused to detonate”

“bombing: the use of bombs for sabotage; a tactic frequently used by terrorists”

“plastic explosive: an explosive material …intended to destroy”

“destroy: damage irrepairably”

“explode: destroy by exploding”

Page 50: On WordNet, Text Mining, and Knowledge Bases of the Future

How this really works…

• Pieces may not “fit together” so neatly– multiple ways of saying the same thing– Uncertainty at all stages of the process

• definitions are often only typical facts• errors in both English and translations

• Process is not a chain of deductions, rather– is a search of possible elaborations– looking for the most “coherent” elaboration

• More “crystallization” rather than “deduction”

Page 51: On WordNet, Text Mining, and Knowledge Bases of the Future

1. Reasoning as a Search for Coherence

“A bomb attack

devasted a shrine.."

Page 52: On WordNet, Text Mining, and Knowledge Bases of the Future

1. Reasoning as a Search for Coherence

“A bomb attack

devasted a shrine.."

Bomb = explosive which detonates?

Bomb=calorimeter; measures heat??

?

Page 53: On WordNet, Text Mining, and Knowledge Bases of the Future

1. Reasoning as a Search for Coherence

“A bomb attack

devasted a shrine.."

Bomb = explosive which detonates?

Bomb=calorimeter; measures heat?

Detonate=explode; explode causes destroy?

Detonate=explode; explode=increase in population?

?

?

?

Page 54: On WordNet, Text Mining, and Knowledge Bases of the Future

1. Reasoning as a Search for Coherence

“A bomb attack

devasted a shrine.."

Bomb = explosive which detonates?

Bomb=calorimeter; measures heat?

Measure = assess quantity?

Detonate=explode; explode causes destroy?

Detonate=explode; explode=increase in population?

?

?

?

?

Page 55: On WordNet, Text Mining, and Knowledge Bases of the Future

Reasoning as a Search for Coherence

“A bomb attack

devasted a shrine.."

Bomb = explosive which detonates?

Bomb=calorimeter; measures heat?

Measure = assess quantity?

Detonate=explode; explode causes destroy?

Detonate=explode; explode=increase in population?

?

?

?

?

Page 56: On WordNet, Text Mining, and Knowledge Bases of the Future

Matching pieces of the representation

• Problem: – There are additional, implied facts– Need to compute and match against these also

• For example:– (X in state S) & (X part-of Y) ~→ (Y in state S)

• S = broken, injured, valuable, …– (X causes Y) & (Y causes Z) ~→ (X causes Z)– (X does Y) & (Y causes Z) ~→ (X does Z)

Bomb

Throw

Man

Destroy

Shrine

causes

objectagent

Bomb

Throw

Man

Destroy

Shrine

causes

objectagent causes

Page 57: On WordNet, Text Mining, and Knowledge Bases of the Future

Assessing Coherence• Does a representation seem “sensible”?

– Minsky: We proactively ask certain questions• Coherence criteria (examples)

– No contradictions– Agents perform actions in pursuit of their goals– Agents have resources for their actions– Events have a cause (including randomness)– Artifacts are used for their purpose– Structures are physically possible– Observation not unusual (“sanctioned” by experience)

Page 58: On WordNet, Text Mining, and Knowledge Bases of the Future

Assessing Coherence:“Sanctioning” and Possibilistic Knowledge

• We know from experience what is “usual”– Cats can drink milk– Rockets can be launched– Helicopters can land– etc.

• If we see these, we are comfortable– These statements sanction our tentative conclusions– logically, these are strange beasts

• How to accumulate this “database of possibilities”?

Page 59: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 60: On WordNet, Text Mining, and Knowledge Bases of the Future

Knowledge Mining: Acquiring Possibilistic Knowledge

There is a largely untapped source of general knowledge in texts, lying at a level beneath the explicit assertional content, and which can be harnessed.

“The camouflaged helicopter landed near the embassy.” helicopters can land helicopters can be camouflaged

Schubert’s Conjecture:

Our attempt: “lightweight” LFs generated from ReutersLF forms: (S subject verb object (prep noun) (prep noun) …) (NN noun … noun) (AN adj noun)

Page 61: On WordNet, Text Mining, and Knowledge Bases of the Future

Knowledge Mining: Acquiring Possibilistic Knowledge

HUTCHINSON SEES HIGHER PAYOUT. HONG KONG. Mar 2.Li said Hong Kong’s property market remains strong while its economy is performing better than forecast. Hong Kong Electric reorganized and will spin off its non-electricity related activities. Hongkong Electric shareholders will receive one share in the new subsidiary for every owned share in the sold company. Li said the decision to spin off …

Newswire Article

Shareholders may receive shares.

Companies may be sold.

Shares may be owned.

Implicit, tacit knowledge

Page 62: On WordNet, Text Mining, and Knowledge Bases of the Future
Page 63: On WordNet, Text Mining, and Knowledge Bases of the Future
Page 64: On WordNet, Text Mining, and Knowledge Bases of the Future

Outline

• Machine understanding and question-answering

• An initial attempt

• From WordNet to a Knowledge Base

– Representation

– Reasoning

– Text Mining for Possibilistic Knowledge

– A Knowledge Base of the Future?

Page 65: On WordNet, Text Mining, and Knowledge Bases of the Future

Knowledge Bases of the Future

0. Core WordNet

1. Gloss Axioms

2. Relations

3. Related senses

4. Core Rules

5. Possibilities

Page 66: On WordNet, Text Mining, and Knowledge Bases of the Future

Knowledge Bases of the Future1. Machine-sensible glosses and examples

;;; "Bomb: An explosive device fused to detonate" isa(_Bomb1, bomb_n1) ------->

isa(_Bomb1, device_n1) isa(_Explosive1, explosive_n2) isa(_Fuse1, fuse_v1) isa(_Detonate1, detonate_v1) contains(_Bomb1,_Explosive1) purpose(_Bomb1,_Detonate1) object(_Fuse1, _Bomb1)

;;; "The bomb exploded" isa(_Bomb1, device_n1) isa(_Explode1, explode_v1) explode(_Bomb1, _Explode1)

1. File wordnet3.0/glosses.wn

Page 67: On WordNet, Text Mining, and Knowledge Bases of the Future

purpose(bomb_n1,detonate_v1). purpose(knife_n1,cut_v4). purpose(car_n1,transport_v5). ...

2a. File wordnet3.0/purpose.wn

contains(bomb_n1,explosive_n4). contains(river_n1,water_n2). contains(body_n1,blood_n2). ...

2b. File wordnet3.0/contains.wn

2c. File wordnet3.0/instrument.wn ...

Knowledge Bases of the Future2. Extra relational tables

Page 68: On WordNet, Text Mining, and Knowledge Bases of the Future

pattern(p1, _Process1, _Building1, _Members1, _TimePeriod1)

location( _Process1, _Building1)partipants(_Process1, _Members1)during( _Process1, _TimePeriod1)

pattern(p1, school_n1, school_n4, school_n2, school_n6).pattern(p1, university_n1, university_n2, university_n6, university_n4).pattern(p1, government_n1, government_n4, government_n2, government_n3).

...

3a. File wordnet3.0/polysemy-patterns.wn

Knowledge Bases of the Future3. Related Senses

3b. File wordnet3.0/verb-facets.wnhypernym(cut_v3,cut_v5).hypernym(cut_v9,cut_v5).hypernym(cut_v11,cut_v5)....

Page 69: On WordNet, Text Mining, and Knowledge Bases of the Future

has-part(_X,_Y)has-part(_Y,_Z)

------->has-part(_X,_Z)

does(_X,_A)causes(_A,_B)

------->does(_X,_B)

...

4. File wordnet3.0/rules.wn

Knowledge Bases of the Future4. General Rules

Page 70: On WordNet, Text Mining, and Knowledge Bases of the Future

can(cat_n1, sit_v1).can(cat_n1, drink_v1).can(airplane_n1, fly_v2).can(airplane_n1, land_v4)...

5. File wordnet3.0/possibilities.wn

Knowledge Bases of the Future5. Possibilitistic Statements

Page 71: On WordNet, Text Mining, and Knowledge Bases of the Future

Summary• WordNet is on the path to being a knowledge base

– Needs logical definitions of its word senses– Relational vocabulary– More relationships between word senses

• Notions of reasoning have to change too– Search for coherence (crystalization process)

• Also need:– Core rules– Possibilitistic knowledge

• Is this doable?– Yes! Is getting close with work on glosses– But, result will never be perfect