707.009 Foundations of Knowledggge Management „Broad...
Transcript of 707.009 Foundations of Knowledggge Management „Broad...
Knowledge Management Institute
707.009 Foundations of Knowledge Managementg g
„Broad Knowledge Bases“
Markus Strohmaier
Univ. Ass. / Assistant ProfessorKnowledge Management Institute
Graz University of Technology, Austria
e-mail: [email protected]: http://www.kmi.tugraz.at/staff/markus
1
Markus Strohmaier 2011
Knowledge Management Institute
Rückblick
ThesaurausHie a chie
OntologieKonzepteEigenschaftenBeziehungen
IndexSchlagworte
TaxonomieHierarchieGehört zuKl ifik ti
HierarchieÄquivalenzAssoziation
BeziehungenRegeln
SchlagworteListeKatalogLexikon
Klassifikation
2
Markus Strohmaier 2011
Knowledge Management Institute
RückblickHomonyme: Mehrdeutige Benennungen (z BHomonyme: Mehrdeutige Benennungen (z.B.
Bank)Homophone: Gleichlautende Benennungen (z.B.
Mohr, Moor)
Objekt
„Reale Welt“Homographen: Gleiche Schreibweisen (z.B.
Wach(-)s(-)tube)Synonyme: Mehrere Bezeichnungen stehen für
denselben Begriff (Auto PKW)
Semiotisches Dreieck
denselben Begriff (Auto, PKW) Antonyme: Gegensätze (z.B. hart - weich)Hyper/Hyponyme: Abstraktere / Spezifischere
Begriffe (z.B. Fahrzeug / PKW)
WortAusdruckSymbol
BegriffKonzept
DreieckFormale Begriffssysteme zielen oft darauf ab
wenig Raum für Interpretation zu lassen!– Homonymzusätze (Qualifikatoren) – (z.B. „Ring <Schmuckstück>, Ring <Mathematik>)
SpracheWissen
( „ g , g )– Korrekte Zuordnung von Begriffen und Benennungen oft
erst aus dem Kontext heraus interpretierbar!
3
Markus Strohmaier 2011
Knowledge Management Institute
Overview
• Knowledge Organization (last lecture)• Broad Knowledge Bases
– Ontologies– WordNetWordNet– ConceptNet– And more
Systems Perspective
• Knowledge Acquisition (next lecture)
Based in part on slides prepared by D. Reisinger
4
Markus Strohmaier 2011
Knowledge Management Institute
Reading the WebReading the WebNELL: Never Ending Language Learning
htt // t l d / t /http://rtw.ml.cmu.edu/rtw/
http://techcrunch.com/2010/10/09/nell-computer-l i t t /
https://www.nytimes.com/2010/10/05/science/05t ht l? 1
5
Markus Strohmaier 2011
language-carnegie-tctv/compute.html?_r=1
Knowledge Management Institute
Konzeptueller Graph und SemantischesKonzeptueller Graph und Semantisches Netz
Eine geordnete Zusammenstellung von Begriffen und Eine geordnete Zusammenstellung von Begriffen und Bezeichnungen, deren Zusammenhang über beliebige Beziehungen miteinander definiert wird.
Graphische Begriffsnetze mit definierter SemantikSowohl Begriffe als auch Beziehungen sind typisiert und es existiert
eine Grammatik für deren Verwendunge e G a at ü de e e e du gZur Überführung von Information in anwendbares Wissen sind
„verwandt-mit“-Relationen nicht mehr ausreichend -> Sprung vom Thesaurus zum semantischen Netz
Eingeführt von Linguisten, um die Bedeutung von Wörtern entsprechend ihrer Verwendung darzustellen
6
Markus Strohmaier 2011
Knowledge Management Institute
Ontologie – Eine Definition"A t l i f l li it ifi ti f h d "An ontology is a formal, explicit specification of a shared conceptualization of a domain of Interest. ... For AI systems, what „exists“ is that which can be represented„ (Gruber)
Eine Ontologie ist eine formale Beschreibung von Konzepten und Beziehungen, eine abstrakte,Konzepten und Beziehungen, eine abstrakte, vereinfachte Sicht auf die Welt
Explicit: festgeschrieben, definiertF l f li i A fb d h hi l bFormal: formalisierter Aufbau, daher maschinenlesbarShared: Übereinkunft einer CommunityDomain of Interest: WissensgebietDomain of Interest: Wissensgebiet Conceptualisation: Begrifflichkeiten schaffen
8
Markus Strohmaier 2011
Knowledge Management Institute
BegrifflichkeitenOntology Engineering: Entwicklung Verwendung und• Ontology Engineering: Entwicklung, Verwendung und Instandhaltung von Ontologien
• Meta-Ontologie: eine Ontologie, die einer anderen Ontologie d li t b t hi t B h ib O t l izugrunde liegt = abstrahierte Beschreibung von Ontologien
und so die Verknüpfung des Wissens verschiedener DomänenOff W lt A h O t l i llt t ti ll• Offene-Welt-Annahme: Ontologien sollten potentiell von anderen Ontologien verwendbar bzw. einbindbar sein
• Ontology Mapping: aufeinander Abbilden von Ontologien• Ontology Merge: Konsolidierung, Zusammenführen von
Ontologien
10
Markus Strohmaier 2011
Knowledge Management Institute
NutzenTo share common understanding of the structure of• „To share common understanding of the structure of
information among people and software agents• To enable reuse of domain knowledge• To make domain assumptions explicit• To separate domain knowledge from the operational
knowledgeknowledge• To analyze domain knowledge“ (Noy, McGuinness)
Interoperabilität in heterogenen Landschaften erreichenInformations- und Interaktionsqualität steigernZeitersparnis KostensenkungZeitersparnis, Kostensenkung
11
Markus Strohmaier 2011
Knowledge Management Institute
EinsatzbereicheEine unterstützende Technologie des Wissensmanagements
Wissens-Engineering und -RepräsentationInformationsretrieval extraktion und visualisierung
Eine unterstützende Technologie des Wissensmanagements
Informationsretrieval, -extraktion und -visualisierungInformationsmodellierung und -integrationKünstliche Intelligenz, Entscheidungsunterstützung g , g gIntegration von Anwendungssystemen (EAI), Offene Systemeu.v.m
12
Markus Strohmaier 2011
Knowledge Management Institute
Semantic Web - Definition The Semantic Web is an extension of the „The Semantic Web is an extension of the
current web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation ” cooperation.” (Berners-Lee, Hendler, Lassila)“The Semantic Web is a vision: the idea of having data on the web defined and linked in a way that it can be used by machines in a way that it can be used by machines not just for display purposes, but for automation, integration and reuse of data across various applications.” (W3C)
-> Angleichung der formalen an die natürliche Sprache
15
Markus Strohmaier 2011
Knowledge Management Institute
RDF / RDF SchemaObjektSubjekt Prädikat
Bedingung: Definition beliebiger Klassen, Properties, deren
Zielsetzung: „say anything about anything“
g g g , p ,Wiederverwendung
RDF = „Ressource Description Framework“RDF-Modell ist ein formal fundiertes grafisches Modell (gerichteter Graph)RDF-Modell ist ein formal fundiertes grafisches Modell (gerichteter Graph) Drei Elemente: Subjekt (Knoten), Prädikat (Kante), Objekt (Knoten):
„Tripel“Subjekt: Ressource über die eine Aussage getroffen wird– Subjekt: Ressource, über die eine Aussage getroffen wird
– Prädikat: Art der Beziehung zwischen Subjekt und Objekt– Objekt: „Wert“ der Beziehung
Vokabulare können von anderen RDF-Graphen referenziert werden (URIs)
16
Markus Strohmaier 2011
Knowledge Management Institute
Eine vereinfachte „Napoleon-Ontologie“p g
A f i W b itName
G b Dtrdf:property
„Thing“
rdf:subclass
Auf einer Website:„Napoleon ist 1.50 gross
und leistete einen Beitrag
PersonGröße
Geb.Dt.zur alten Geschichte.“
Klassenebene
rdf:typeWissensch.
leistet Beitrag
Klassenebene
„Napoleon“
Nameh // / h // /
rdf:type
Instanzenebene
„150“
Größe
http://x/Napoleonhat Adres. http://x/
Alte Geschichteleistet BeitragInstanzenebene
17
Markus Strohmaier 2011
Knowledge Management Institute
OntologieeditorOntologieeditorBeispiel: Protégé
h // f d d /http://protege.stanford.edu/
19
Markus Strohmaier 2011
Knowledge Management Institute
Freebase
MOVIE DEMO: http://mqlx.com/~david/parallax/
20
Markus Strohmaier 2011
Knowledge Management Institute
Two current research efforts focusing on theTwo current research efforts focusing on the construction of broad knowledge bases
WordNet
ConceptNet
21
Markus Strohmaier 2011
Knowledge Management Institute
WordnetWordnethttp://wordnet.princeton.edu/
W dN t i j t ( t t d i 1985)WordNet is a project (started in 1985) at the Cognitive Science Laboratory at the Princeton University.
Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets) each expressing asynonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations The resulting networklexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser One purpose of the dataset isbrowser. One purpose of the dataset is to support Natural Language Processing.RDF-Modelle unter:
22
Markus Strohmaier 2011
RDF-Modelle unter: http://www.semanticweb.org/library/ http://wordnet.princeton.edu/man2.1/wnstats.7WN#toc2
Knowledge Management Institute
Wordnet Glossary [Excerpt]Wordnet Glossary [Excerpt][http://wordnet.princeton.edu/gloss]
sense– A meaning of a word in WordNet. Each sense of a word is in a different synset.
Example: •strike work stoppage -- (a group's refusal to work in protest against low pay orstrike, work stoppage (a group s refusal to work in protest against low pay or bad work conditions; "the strike lasted more than a month before it was settled")•strike -- ((baseball) a pitch that the batter swings at and misses, or that the batter hits into foul territory, or that the batter does not swing at but the umpire judges to be in the area over home plate and between the batter's knees and shoulders; "this
synsetA synonym set; a set of words that are interchangeable in some context (Sharing
be in the area over home plate and between the batter s knees and shoulders; this pitcher throws more strikes than balls")
– A synonym set; a set of words that are interchangeable in some context (Sharing the same word sense). Example: car, auto, automobile, autocar
23
Markus Strohmaier 2011
Knowledge Management Institute
Wordnet Glossary [Excerpt]Wordnet Glossary [Excerpt][http://wordnet.princeton.edu/gloss]
hypernymhypernym– The generic term used to designate a whole class of specific instances. Y is a
hypernym of X if X is a (kind of) Y.hyponym
Illustration: vehiclehyponym
– The specific term used to designate a member of a class. X is a hyponym of Y if X is a (kind of) Y.
holonym Illustration: automotive vehicleholonym– The name of the whole of which the meronym names a part. Y is a holonym of X
if X is a part of Y.meronym Illustration: Carmeronym
– The name of a constituent part of, the substance of, or a member of something. X is a meronym of Y if X is a part of Y.
sisterIllustration: Engine
– Matching strings that are both the immediate hyponyms of the same superordinate (or hypernym).
Illustration: automotive vehicle, motor vehicle
24
Markus Strohmaier 2011
motor vehicle
Knowledge Management Institute
Wordnet Glossary [Excerpt]Wordnet Glossary [Excerpt] [http://wordnet.princeton.edu/gloss]
b fbase form– The base form of a word or collocation is the form to which inflections are added.
Illustration: Base form of playing, played, plays, playpart of speech
– WordNet defines "part of speech" as either noun, verb, adjective, or adverb. Same as syntactic category.
collocation– A collocation in WordNet is a string of two or more words, connected by spaces
Illustration: {buy\VERB fast\ADJECTIVE skis\NOUN}
g , y por hyphens. Examples are: man-eating shark, blue-collar, depend on, line of products.
25
Markus Strohmaier 2011
Knowledge Management Institute
W d tWordnethttp://wordnet.princeton.edu/doc
DEMO WordNet Browser/Babylony
Each sense matching the search selected displayed as follows:displayed as follows:
Sense n [{synset_offset}]
[<lex_filename>] word1[#sense_number][,word2...]
synset_offset is the byte offset of the synset in the data pos file corresponding to the syntactic categorydata.pos file corresponding to the syntactic category, lex_filename is the name of the lexicographer file that the synset comes from, word1 is the first word in the synset (note that this is not necessarily the search word) and sense number is the WordNet senseword) and sense_number is the WordNet sense number assigned to the preceding word. synset_offset , lex_filename , and sense_number are generated if the appropriate Options are specified.
26
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNetConceptNethttp://web.media.mit.edu/~hugo/conceptnet/
http://conceptnet.media.mit.edu/Th C tN t k l d b i tiThe ConceptNet knowledgebase is a semantic network consisting of concepts and relations between concepts.
Commonsense knowledge in ConceptNet encompasses the spatial, physical, social,
l d h l i l ftemporal, and psychological aspects of everyday life.
27
Markus Strohmaier 2011
Knowledge Management Institute
Wordnet vs ConceptNetWordnet vs. ConceptNetLiu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,.
Volume 22, Kluwer Academic Publishers.
I C t tIn Conceptnet,1. nodes can be compound elements representing higher-order
compound conceptsp pConceptnet, does not distinguish between word senses
2 Extends some of WordNet‘s relationships (synonym is-a2. Extends some of WordNet s relationships (synonym, is-a, part-of) to more than twenty semantic relations including, for example, CapableOf, EffectOf, SubeventOf, PropertyOff, MotivationOf, etcMotivationOf, etc
3. Knowledge is more informal, defeasible and practically orientedorientedContains knowledge that is defeasible (often true, but not always – e.g. EffectOf(“fall of bicycle’, ‘get hurt’)
28
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNetConceptNethttp://web.media.mit.edu/~hugo/conceptnet/
http://conceptnet.media.mit.edu/O ll Obj tiOverall Objective:
Represent commonsense knowledge, which is knowledge that everyRepresent commonsense knowledge, which is knowledge that every person is assumed to possess. Commonsense knowledge is typically ommitted from social communications
ConceptNet was designed to make practical context-based inferences over real-world texts.
29
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNetConceptNetLiu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,.
Volume 22, Kluwer Academic Publishers.
30
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNetConceptNetLiu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,.
Volume 22, Kluwer Academic Publishers.
I i 2 C tN t t i dIn version 2, ConceptNet contained,
1.6 million assertions interrelating 300 000 nodes300.000 nodes.
f t th b f tif counts the number of times afact is uttered in the OMCS corpus.
i counts how many times anyassertion was inferred during the relaxation phase
31
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNetConceptNethttp://web.media.mit.edu/~hugo/conceptnet/
http://conceptnet.media.mit.edu/C tN t t t t l i t k l ld d tConceptNet supports textual-reasoning tasks over real-world documents including for example
• topic-jisting (e.g. a news article containing the concepts, “gun,” “convenience store,” “demand money” and “make getaway” might suggest the topics “robbery” and “crime”)suggest the topics robbery and crime ), • affect-sensing (e.g. this email is sad and angry), • analogy-making (e.g. “scissors,” “razor,” “nail clipper,” and “sword” are gy g ( g ppperhaps like a “knife” because they are all “sharp,” and can be used to “cut something”), • text summarization• text summarization• and others
32
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNet‘s Relations
33
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNet FeaturesConceptNet FeaturesLiu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,.
Volume 22, Kluwer Academic Publishers.
C t t l i hb h dContextual neighbourhoods
• Provided by the API method Get Context()– Performs spreading activation radiating outward from a source node– Considering the number and strengths of all paths which connect the two nodes
T i G iTopic Generation
• Utilizing Get Context() as well, Example: Query Expansiong () p y p– Entering ‚restaurant‘ would return related queries such as ‚order food‘, ‚waiter‘ and
‚menu‘
34
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNet FeaturesConceptNet FeaturesLiu, H. & Singh, P. (2004) ConceptNet: A Practical Commonsense Reasoning Toolkit. BT Technology Journal,.
Volume 22, Kluwer Academic Publishers.
A l kiAnalogy making• Analogy
– Two ConceptNet nodes are analogous if their sets of back-edges (incoming edges) overlapoverlap
• ConceptNet‘s GetAnalogousConcepts() supports Analogy making
Projection• Projection is graph traversal from an origin node, following a single transitive relation type (Modus ponens: If A->B and B->C then A->C)
Affect Sensing• Uses ConceptNet‘s method GuessMood()• Uses ConceptNet s method GuessMood()• Leveraging edges between concepts and specified affect categories
35
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNet
DEMO
36
Markus Strohmaier 2011
Knowledge Management Institute
ConceptNet – Application ExampleS i t t f Wiki di [1]Summarize text from Wikipedia[1]:“A car accident or car crash is an incident in which an automobile collides with anything that causes damage to the automobile, including other automobiles, telephone poles buildings or trees or in which the driver loses control of the vehicletelephone poles, buildings or trees, or in which the driver loses control of the vehicle and damages it in some other way, such as driving into a ditch or rolling over. Sometimes a car accident may also refer to an automobile striking a human or animal. Car crashes — also called road traffic accidents (RTAs), traffic collisions, auto accidents, road accidents, personal injury collisions, motor vehicle accidents (MVAs), — kill an estimated 1.2 million people worldwide each year, and injure about forty times this number.” [1] http://en.wikipedia.org/wiki/Car_accident
Text summarization provided by ConceptNet“Car accident or car crash was incident in which. Automobile collided with anything that cause damaged to automobile include other automobile telephone pole buildingthat cause damaged to automobile include other automobile telephone pole building or tree. Driver lost control of vehicle and damages. Drove into ditch. Rolled . Car accident referred to automobile. Car crash called road traffic accident. Killed estimate 1. Injured about forty timed number.”
37
Markus Strohmaier 2011
j y
Knowledge Management Institute
F N t A B i f O iFrameNet – A Brief Overviewhttp://framenet.icsi.berkeley.edu/
Th B k l F N t j t i ti li l i lThe Berkeley FrameNet project is creating an on-line lexical resource for English, based on frame semantics and supported by corpus evidence.
The aim is to document the range of semantic and syntactic combinatory possibilities (valences) of each word in each of itscombinatory possibilities (valences) of each word in each of its senses, through computer-assisted annotation of example sentences and automatic tabulation and display of the annotation results.
The major product of this work, the FrameNet lexical database, currently contains more than 10,000 lexical units (defined below), morecurrently contains more than 10,000 lexical units (defined below), more than 6,100 of which are fully annotated, in more than 825 semantic frames, exemplified in more than 135,000 annotated sentences.
38
Markus Strohmaier 2011
Knowledge Management Institute
F N t A B i f O iFrameNet – A Brief Overviewhttp://framenet.icsi.berkeley.edu/
Semantic frames are schematic representations of situation typesSemantic frames are schematic representations of situation types (eating, spying, removing, classifying, etc.) together with lists of the kinds of participants, props, and other conceptual roles that are seen as components of such situationsseen as components of such situations.
Example: Cause_change_of_position_on_a_scale
39
Markus Strohmaier 2011
Knowledge Management Institute
FrameNetSource: Excerpt of Framenet, http://framenet.icsi.berkeley.edu/, accessed on July 2nd, 2007
Verb “Increase” is related to Frame:
Who causesWho causes increase?
Increase of what numbers?numbers?
What causes increase?
40
Markus Strohmaier 2011
What is increased?
Knowledge Management Institute
F N t A B i f O iFrameNet – A Brief Overviewhttp://framenet.icsi.berkeley.edu/
Lexical Units invoke Frames.Lexical Units invoke Frames. Example: LUs forCause_change_of_position_on_a_scale
41
Markus Strohmaier 2011
Knowledge Management Institute
C A B i f O iCyc – A Brief Overviewwww.cyc.com
• Began as a research project in 1984• Initiated and conducted by Cycorp Inc.• Project founder and CEO Doug Lenat: j g
– Watch his Google video of the year 2006!– Computers Versus Common Sense http://video.google.com/videoplay?docid=-
7704388615049492068&q=engedu
• Initially „hand-crafted“ knowledge base -> now based on several strategiesInitially „hand crafted knowledge base now based on several strategies
•"Once you have a truly massive amount of information integrated as knowledge, then the human-software system will be superhuman, in the same sense that
ki d ith iti i h d t ki d b f iti " Dmankind with writing is superhuman compared to mankind before writing." ~ Doug Lenat, June 21, 2001
• Open Source Version available • „OpenCyc is the open source version of the Cyc technology, the world's largest and most complete general knowledge base and commonsense reasoning engine.”(http://www cyc com/cyc/opencyc/overview)
42
Markus Strohmaier 2011
(http://www.cyc.com/cyc/opencyc/overview)
Knowledge Management Institute
CCyc www.cyc.com
Cyc‘s Objective• Cycorp's goal is to break the "software brittleness bottleneck" once and for all by constructing a foundation of basic "common sense" knowledgefor all by constructing a foundation of basic common sense knowledge -a semantic substratum of terms, rules, and relations - that will enable a variety of knowledge-intensive products and services.
What is Cyc?•The Cyc knowledge base (KB) is a formalized representation of a vast•The Cyc knowledge base (KB) is a formalized representation of a vast quantity of fundamental human knowledge: facts, rules of thumb, and heuristics for reasoning about the objects and events of everyday life. The
di f t ti i th f l l C L d ib d b lmedium of representation is the formal language CycL, described below. The KB consists of terms--which constitute the vocabulary of CycL--and assertions which relate those terms.
43
Markus Strohmaier 2011
Knowledge Management Institute
Wh t d C k ?What does Cyc know?http://www.cyc.com/cyc/technology/whatiscyc_dir/whatdoescycknow
44
Markus Strohmaier 2011
Knowledge Management Institute
Next week…
we will talk about how to construct such knowledge bases inclbases incl.Games with a purpose and other participative forms ofother participative forms ofknowledge acquisition
45
Markus Strohmaier 2011
http://www.peekaboom.org/
Knowledge Management Institute
Any further questions?y q
See you wednesday!See you wednesday!
46
Markus Strohmaier 2011