Knowledge Acquisition on the Web
description
Transcript of Knowledge Acquisition on the Web
![Page 1: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/1.jpg)
Knowledge Enabled Information and Services Science
Knowledge Acquisition on the Web
Growing the amount of available knowledge from within
Christopher Thomas
1
![Page 2: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/2.jpg)
Knowledge Enabled Information and Services Science 2
Overview
• Knowledge Representation– GlycO – Complex Carbohydrates domain
ontology• Information Extraction
– Taxonomy creation (Doozer/Taxonom.com)– Fact Extraction (Doozer++)
• Validation
![Page 3: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/3.jpg)
Knowledge Enabled Information and Services Science 3
Circle of knowledge on the Web
![Page 4: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/4.jpg)
Knowledge Enabled Information and Services Science
Goal:Harness the Wisdom of the
Crowds to automatically model a domain, verify the model and
give the verified knowledge back to the community
4
![Page 5: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/5.jpg)
Knowledge Enabled Information and Services Science 5
Circle of knowledge on the Web
What is knowledge?
How do we turn propositions/beliefs into knowledge?
How do we acquire knowledge?
![Page 6: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/6.jpg)
Knowledge Enabled Information and Services Science
Background Knowledge
[15] Christopher Thomas and Amit Sheth, “On the Expressiveness of the Languages for the Semantic Web–Making a Case for ‘A Little More,’”in Fuzzy Logic and the Semantic Web, Eli Sanchez (Ed.), Elsevier, 2006.
[11] Amit Sheth, Cartic Ramakrishnan, and Christopher Thomas, “Semantics for The Semantic Web: the Implicit, the Formal and the Powerful,”International Journal on Semantic Web & Information Systems, 1 (no. 1), 2005, pp. 1–18.
6
![Page 7: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/7.jpg)
Knowledge Enabled Information and Services Science 7
Different Angles
• Social construction– Large scale creation of knowledgevs.– Small communities define their domains
• Normative vs. Descriptive=Top-Down vs. Bottom-Up
• Formal vs. Informal=Machine-readable vs. human-readable
![Page 8: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/8.jpg)
Knowledge Enabled Information and Services Science
Community-created knowledge
• Descriptive• Bottom-up• Formally less rigid• May contain false information• If a statement in the world is in conflict with
the Ontology, both may be wrong or both may be right
• Good for broad, shallow domains• Good for human processing and IR tasks
8
![Page 9: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/9.jpg)
Knowledge Enabled Information and Services Science
Wikipedia and Linked Open Data
• Created by large communities• Constantly growing• Domains within the linked data are not
always easily discernible• Contain few axioms and restrictions
– Little value to evaluation using logics
9
![Page 10: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/10.jpg)
Knowledge Enabled Information and Services Science
Formal - Modeling deep domains
• Prescriptive / Normative• Top-down• Contains “true knowledge”• If a statement in the world is in conflict with the
Ontology, the statement is false• Good for scientific domains• Good for computational reasoning/inference• Usually created by small communities of experts• Usually static, little change is expected
10
![Page 11: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/11.jpg)
Knowledge Enabled Information and Services Science
Example: GlycO
• Created in collaboration with the Complex Carbohydrate Research Center at the University of Georgia on an NCRR grant.
• Deep modeling of glycan structures and metabolic pathways
[6] Christopher Thomas, Amit P. Sheth, and William S. York, “Modular Ontology Design Using Canonical Building Blocks in the Biochemistry Domain,”in Formal Ontology in Information Systems (FOIS 2006)
[5] Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William York, and Samir Tartir, “Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies,”15th International World Wide Web Conference (WWW2006),
11
![Page 12: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/12.jpg)
Knowledge Enabled Information and Services Science12
GlycO
![Page 13: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/13.jpg)
Knowledge Enabled Information and Services Science
N-Glycosylation metabolic pathway
GNT-Iattaches GlcNAc at position 2
UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2
GNT-Vattaches GlcNAc at position 6
UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021
N-acetyl-glucosaminyl_transferase_VN-glycan_beta_GlcNAc_9N-glycan_alpha_man_4
13
![Page 14: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/14.jpg)
Knowledge Enabled Information and Services Science
N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15, 2003: 235-251
b-D-Manp-(1-6)+ | b-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-D-GlcNAc |b-D-Manp-(1-3)+
Glycan Structures for the ontology
• Import structures from heterogeneous databases
• Possible connections modeled in the form of GlycoTree
• Match structures to archetypes
14
![Page 15: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/15.jpg)
Knowledge Enabled Information and Services Science
Interplay of extraction and evaluation
• Errors in the source databases are propagated through various new databases comparing multiple sources fails for error correction
• Less than 2% of incorrect information makes a database useless for automatic validation of hypotheses
• The ontology contains rules on how carbohydrate structures are known to be composed
• By mapping information in databases to the ontology and analyzing how successful the mapping was, we can identify possible errors.
15
![Page 16: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/16.jpg)
Knowledge Enabled Information and Services Science 16
Database Verification using GlycO
N. Takahashi and K. Kato, Trends in Glycosciences and Glycotechnology, 15, 2003: 235-251
b-D-Manp-(1-6)+ | a-D-Manp-(1-4)-b-D-GlcpNAc-(1-4)-D-GlcNAc |b-D-Manp-(1-3)+
a-D-Manp-(1-4) is not part of the identified canonical structure for N-Glycans, hence it is likely that the database entry is incorrect
![Page 17: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/17.jpg)
Knowledge Enabled Information and Services Science
Pathway Steps - Reaction
Evidence for this reaction from three experiments
Pathway visualization tool by M. Eavenson and M. Janik, LSDIS Lab, Univ. of Georgia
17
![Page 18: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/18.jpg)
Knowledge Enabled Information and Services Science 18
![Page 19: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/19.jpg)
Knowledge Enabled Information and Services Science 19
Summary - GlycO
• The amount of accuracy and detail that can be found in ontologies such as GlycO could most likely not be acquired automatically
• Only a small community of experts has the depth of knowledge to model such scientific ontologies
![Page 20: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/20.jpg)
Knowledge Enabled Information and Services Science 20
Summary - GlycO
• However, the automatic population shows that a highly restrictive, expert-created rule set allows for automation or involvement of larger communities.
• Frame-based population of knowledge• The formal knowledge encoded in the
ontology serves to acquire new knowledge• The circle is completed
![Page 21: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/21.jpg)
Knowledge Enabled Information and Services Science
Summary Background Knowledge
• Large amounts of information and knowledge are available
• Some machine readable by default• Others need specific algorithms to extract
information• The more available information we can use,
the better the extraction of new information will be.
21
![Page 22: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/22.jpg)
Knowledge Enabled Information and Services Science 22
Circle of knowledge on the Web
What is knowledge?
How do we turn propositions into knowledge?
Part 2
How do we acquire knowledge?
![Page 23: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/23.jpg)
Knowledge Enabled Information and Services Science
Model Creation
[3] Christopher Thomas, Pankaj Mehra, Roger Brooks and Amit Sheth. Growing Fields of Interest -Using an Expand and Reduce Strategy for Domain Model Extraction. Web Intelligence 2008, pp. 496-502[2] Christopher Thomas, Wenbo Wang, Delroy Cameron, Pablo Mendes, Pankaj Mehra and Amit Sheth, What Goes Around Comes Around - Improving Linked Open Data through On-Demand Model Creation, WebScience 2010[1] Christopher Thomas, Pankaj Mehra, Wenbo Wang, Amit Sheth, Gerhard Weikum and Victor Chana Automatic Domain Model Creation Using Pattern-Based Fact Extraction, Knoesis Center technical report.
Knowledge Acquisition through
[3]
[2]
[1]
23
![Page 24: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/24.jpg)
Knowledge Enabled Information and Services Science
First create a domain hierarchy
Example: a hierarchy for the domain of Human Performance and Cognition
24
![Page 25: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/25.jpg)
Knowledge Enabled Information and Services Science
Connect with learned facts
25
![Page 26: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/26.jpg)
Knowledge Enabled Information and Services Science
Example: strongly connected component
26
![Page 27: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/27.jpg)
Knowledge Enabled Information and Services Science
Excerpt: strongly connected component
27
![Page 28: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/28.jpg)
Knowledge Enabled Information and Services Science
Expert evaluation of facts in the ontology
7-9: Correct Information not commonly known
1-2: Information that is overall incorrect
3-4: Information that is somewhat correct
5-6: Correct general Information
28
![Page 29: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/29.jpg)
Knowledge Enabled Information and Services Science
Technical Details
29
![Page 30: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/30.jpg)
Knowledge Enabled Information and Services Science 30
Domain hierarchy creation
• Input terms e.g. related to Human Performance and Cognition
• Hierarchy is automatically carved from articles and categories on Wikipedia
Step 1
![Page 31: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/31.jpg)
Knowledge Enabled Information and Services Science
Overview - conceptual
• Expand and Reduce approach– Start with ‘high recall’ methods
• Exploration - Full text search• Exploitation – Node Similarity Method• Category growth
– End with “high precision” methods• Apply restrictions on the concepts found• Remove unwanted terms and categories
31
![Page 32: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/32.jpg)
Knowledge Enabled Information and Services Science
Graph-based expansion
32
Expand - conceptually
Full text search on Article texts
Delete results with low confidence score
![Page 33: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/33.jpg)
Knowledge Enabled Information and Services Science 33
Collecting Instances
![Page 34: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/34.jpg)
Knowledge Enabled Information and Services Science 34
Creating a Hierarchy
![Page 35: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/35.jpg)
Knowledge Enabled Information and Services Science
Step 2: Pattern-Based Relationship Extraction
Extracting meaningful relationships by macro-reading
free text
35
![Page 36: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/36.jpg)
Knowledge Enabled Information and Services Science 36
Extracting from Plain text or hypertext
• Informal, human-readable presentation of information
• Vast amounts of information available– Web– Scientific publications– Encyclopediae
• Need sophisticated algorithms to extract information
![Page 37: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/37.jpg)
Knowledge Enabled Information and Services Science 37
Pattern-based Fact Extraction
• Learn textual patterns that express known relationship types
• Search the text corpus for occurrences of known entities (e.g. from domain hierarchy)
• Semi-open– Types are known and limited– Types are automatically expanded when LOD
grows• Vector-Space Model• Probabilistic representation
![Page 38: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/38.jpg)
Knowledge Enabled Information and Services Science
Training
• Relationship data in the UMLS Metathesaurus or the Wikipedia Infobox-data provide a large set of facts in RDF Triple format– Limited set of relationships that can
be arranged in a schema– Semi-open
• Types are known and limited• Types are automatically expanded
when LOD grows
38
![Page 39: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/39.jpg)
Knowledge Enabled Information and Services Science
Training procedure
• Iterate through all facts (S->P->O triples)• Find evidence for the fact in a corpus
– Wikipedia, WWW, PubMed or any other collection
– If triple subject and triple object occur in close proximity in text, add the pattern in-between to the learned patterns
• Combined evidence from many different patterns increases the certainty of a relationship between the entities
39
![Page 40: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/40.jpg)
Knowledge Enabled Information and Services Science
Overview – initial computations
Fact Collection
Text Corpus
EntropySVD/LSI
CP2P CP2PmodCP2P R2P
Modifications *
Pertinence
R2P
Matrix Computations
*R2Pmod
40
![Page 41: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/41.jpg)
Knowledge Enabled Information and Services Science 41
Training procedure cont’d
Canberra::Australia
Canberra, the Australian capital city
Canberra, capital of theCommonwealth of Australia
Canberra, the Australian capital
Canberra, the Australian capital city
<Subject>, the <Object> capital city
<Subject>, capital of the Commonwealth of
<Object>
<Subject>, the <Object> capital
1 1 1
![Page 42: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/42.jpg)
Knowledge Enabled Information and Services Science
Relationship Patterns
X, the Y capital city
X, capital of theCommonwealth of Y
X, the Y capital
Capital_of 1 1 1
X, the Y capital city
X, capital of Y X, the Y capital
Capital_of 1 1 1
Extracted Synonyms
X, the Y capital * X, capital of YCapital_of 2 1
Generalize
42
![Page 43: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/43.jpg)
Knowledge Enabled Information and Services Science
Relationship Patterns
X, the Y capital *
X, capital of Y X, * * Y X, predecessor of Y
Capital_of 2 2 2 0
predecessor 0 0 2 2
X, the Y capital *
X, capital of Y X, * * Y X, predecessor of Y
Capital_of 1.0 1.0 0.5 0
predecessor 0 0 0.5 1.0
43
![Page 44: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/44.jpg)
Knowledge Enabled Information and Services Science
Resolve Relationships
X, the Y capital *
X, capital of Y
X, * * Y X, predeces-sor of Y
Capital_of
1.0 1.0 0.5 0
predecessor
0 0 0.5 1.0
0.5 X, the Y capital *
0.25 X, capital of Y
0.25 X, * * Y
0 X, predecessor of Y
x
44
![Page 45: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/45.jpg)
Knowledge Enabled Information and Services Science
Resolve Relationships
X, the Y
capital *
X, capital of Y
X, * * Y X, predecessor
of Y
Capital_of
1.0 1.0 0.5 0
predecessor
0 0 0.5 1.0
0.5 X, the Y capital *
0.25 X, capital of Y
0.25 X, * * Y
X, predecessor of Y
xCapital_of predecessor
0.875 0.125
45
![Page 46: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/46.jpg)
Knowledge Enabled Information and Services Science
Advanced Computations
Fact Collection
Text Corpus
EntropySVD/LSI
CP2P CP2PmodCP2P R2P
Modifications *
Pertinence
R2P
Matrix Computations
*R2Pmod
46
![Page 47: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/47.jpg)
Knowledge Enabled Information and Services Science
Advanced Computations
EntropySVD/LSI Pertinence
R2P
Matrix Computations
*R2Pmod
LSI to determine relationship similaritiesReduces sparsity in the matrix and makes relationship rows more comparableAllows better use of pertinence computation
EntropyIncrease weights for more unique patterns
PertinenceSmoothing of pattern occurrence frequencies
47
![Page 48: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/48.jpg)
Knowledge Enabled Information and Services Science
Example Output (DBPedia)
Subject :: ObjectExtracted Rank 1(Rel;Confidence) Rank 2 Rank 3
Howard Pawley :: Gary Filmon
successor;0.799
after;0.768
office;0.686
Species Deceases:: Midnight Oil
producer;0.761
artist;0.719
genre;0.467
The Crystal City :: Orson Scott Card
artist;0.625
author;0.617
writer;0.583
Horatio Allen :: William Maxwell predecessor;0.629 before;0.475
Basdeo Panday :: Trinidad &Tobago deathPlace;0.658
birthplace;0.658
nationality;0.330
Beccles railway station :: Suffolk district;0.772
borough;0.770
friend;0.749
48
![Page 49: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/49.jpg)
Knowledge Enabled Information and Services Science
Pertinence for Relations
• Looking at fact extraction as a classification of concept pairs into classes of relations
• Class boundaries are not clear cut• E.g. has_physical_part has_part• don’t punish the occurrence of the same
pattern with relationship types that are similar49
![Page 50: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/50.jpg)
Knowledge Enabled Information and Services Science
Relationship Patterns
X, the Y capital *
X, capital of Y X, * * Y X, located in Y
Capital_of 2 2 2 2
Located_in 0 0 2 4
X, the Y capital *
X, capital of Y X, * * Y X, located in Y
Capital_of 1.0 1.0 0.2 0.5
Located_in 0 0 0.2 0.9
50
![Page 51: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/51.jpg)
Knowledge Enabled Information and Services Science
Resolve Relationships
X, the Y capital *
X, capital of Y
X, * * Y X, located in Y
Capital_of
1.0 1.0 0.2 0.5
Located_in
0 0 0.2 0.9
0.4 X, the Y capital *
0.1 X, capital of Y
0.3 X, * * Y
0.2 X, located in Y
xCapital_of Located_in
0.66 0.24
51
![Page 52: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/52.jpg)
Knowledge Enabled Information and Services Science
Evaluation of the fact extraction - DBPedia
52
Pre
cisi
on /
Rec
all
Confidence Threshold
Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over relation types.
60% training set, 40% testing, DBPedia Infobox fact corpus, Wikipedia text corpus
![Page 53: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/53.jpg)
Knowledge Enabled Information and Services Science
Evaluation of the fact extraction - UMLS
53
Pre
cisi
on /
Rec
all
Confidence Threshold
Strict evaluation:Only 1st ranked extracted relation is compared to gold-standard.Averaged over relation types.
60% training set, 40% testing, UMLS fact corpus, MedLine text corpus
![Page 54: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/54.jpg)
Knowledge Enabled Information and Services Science
Manual Evaluation strategy (DBPedia)
Score Subject :: Objectsuggested Relationship
Extracted Rank 1(Rel;Confidence) Rank 2 Rank 3
1Howard Pawley :: Gary Filmon after
successor;0.799
after;0.768
office;0.686
0.5 Mulan :: Tarzan afternextSingle;0.603
followedBy;0.533
after;0.416
1
Species Deceases:: Midnight Oil artist
producer;0.761
artist;0.719
genre;0.467
1The Crystal City :: Orson Scott Card author
artist;0.625
author;0.617
writer;0.583
1Horatio Allen :: William Maxwell before predecessor;0.629 before;0.475
1Basdeo Panday :: Trinidad &Tobago birthplace deathPlace;0.658
birthplace;0.658
nationality;0.330
1Bob Nystrom :: Stockholm birthplace cityOfBirth;0.677 birthplace;0.513
1Beccles railway station :: Suffolk borough district;0.772
borough;0.770
friend;0.749
54
![Page 55: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/55.jpg)
Knowledge Enabled Information and Services Science
Manual Evaluation strategy (UMLS)
poisoning, fluoride::teeth[finding_site_of] finding_site_of 1
polyneuritis, endemic::vitamin b 1[associated_with] has_form 0
polyp of cervix nos (disorder)::768 polyps[associated_with] associated_with 1
polyp of cervix nos (disorder)::neck of uterus[location_of] finding_site_of 1
polyp of colon::benign neoplasms[related_to] associated_with 0.5
brain::brain contusion [has_location]associated_morphology_of 0.25
brain::brain ischemia [has_finding_site] location_of 0.5polyp of colon::gastrointestinal tract, nos[is_primary_anatomic_site_of_disease] location_of 0.5
polyvesicular vitelline tumor::gamete structure (cell structure)[is_normal_cell_origin_of_disease]
is_normal_cell_origin_of_disease 1
proptosis::apert syndrome[has_manifestation] has_manifestation 1
55
![Page 56: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/56.jpg)
Knowledge Enabled Information and Services Science
Manually evaluated precision for different confidence values
56
![Page 57: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/57.jpg)
Knowledge Enabled Information and Services Science
Manually evaluated precision, confidence > 0.5 (on UMLS – MedLine corpus)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
UMLS - Pert - Ent
57
![Page 58: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/58.jpg)
Knowledge Enabled Information and Services Science
Summary Model Creation
• Using background knowledge in the form of a fact corpus and a text corpus, we can suggest new facts/propositions
• Possible to try all combinations of known concepts (e.g. Read-the-Web project), but huge validation backlog
• Letting users drive the model creation focuses the creation on the parts that are of common interest
Willingness to help validate facts58
![Page 59: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/59.jpg)
Knowledge Enabled Information and Services Science
Circle of knowledge on the Web
59
What is knowledge?
How do we turn propositions/beliefs into knowledge?
How do we acquire knowledge?
Part 3
![Page 60: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/60.jpg)
Knowledge Enabled Information and Services Science
Evaluation and Use
Christopher Thomas, Wenbo Wang, Delroy Cameron, Pablo Mendes, Pankaj Mehra and Amit Sheth, What Goes Around Comes Around - Improving Linked Open Data through On-Demand Model Creation, to appear in WebScience 2010
60
Current Work
![Page 61: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/61.jpg)
Knowledge Enabled Information and Services Science
Explicit evaluation
• “Evaluate for evaluation’s sake”– Domain-experts rank the value of a proposition– Committees of experts and/or laymen vote on
the correctness of propositions
61
![Page 62: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/62.jpg)
Knowledge Enabled Information and Services Science
Explicit evaluation in the Semantic Browser
• The user can vote on facts• Some facts are presented randomly• Most facts are presented after the user (by
browsing) showed interest in– The full triple– Subject/Object of the triple
62
![Page 63: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/63.jpg)
Knowledge Enabled Information and Services Science
Implicit evaluation
• Evaluation that does not explicitly involve a vote on the extracted information
• Use the Wisdom of the Crowds• Users show support for a proposition by
performing an action• Every action taken on a piece of
information is recorded and analyzed• The cumulative behavior of the users gives
an indication of which propositions are correct or interesting
63
![Page 64: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/64.jpg)
Knowledge Enabled Information and Services Science
Implicit evaluation in the Semantic Browser
• The user simply searches and browses• The search history and the click-stream
provide information about whether a page transition using an extracted triple was successful
• Assumption: on average, a successful trail-browsing session includes valid triples
• Problem: requires extensive use
64
![Page 65: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/65.jpg)
Knowledge Enabled Information and Services Science
Implicit evaluation in the Semantic Browser
65
1st triple
2nd triple
Triples
![Page 66: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/66.jpg)
Knowledge Enabled Information and Services Science
Conclusion
• Creating domain models gives a way of selectively adding knowledge to a system
• We showed that it is possible to automatically create such models with high accuracy
• The models immediately impact users Willingness to help evaluate
• Evaluation becomes integral part in knowledge lifecycle
66
![Page 67: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/67.jpg)
Knowledge Enabled Information and Services Science
?
67
![Page 68: Knowledge Acquisition on the Web](https://reader035.fdocuments.net/reader035/viewer/2022062323/56816457550346895dd6252c/html5/thumbnails/68.jpg)
Knowledge Enabled Information and Services Science
Thank you
68