SEDE: An Ontology For Scholarly Event Description
-
Upload
senator-jeong -
Category
Technology
-
view
903 -
download
1
description
Transcript of SEDE: An Ontology For Scholarly Event Description
SEDE: An Ontology for Scholarly Event
DescriptionSenator Jeong
[email protected] Knowledge Engineering Lab.,
Seoul National University
Publications
Senator Jeong. Toward Scholarly Event Digital Library Services. Bulletin of IEEE Technical Committee on Digital Libraries. 2008 Fall 2008;4(2).
Senator Jeong, Hong-Gee Kim. “SEDE: An Ontology for Scholarly Event Description“. Journal of Information Science. [in press] DOI: 10.1177/0165551509358487.
Senator Jeong, Sungin Lee, Hong-Gee Kim. “Are You an Invited Speaker?: A Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics“. Journal of the American Society for Information Science and Technology. 2009;60(6). pp.1118-1131.
Senator Jeong, Hong-Gee Kim. “Intellectual Structure of Biomedical Informatics reflected in Scholarly Events“. Scientometrics. [in press].
2
Table of Contents
1 Introduction & Background
2 Generic Event Model
3 The SEDE Model & Implementation
4 Application Use Case Scenarios
5 Ontology Evaluation
6 Discussion & Conclusion
INTRODUCTION & BACKGROUND
Scholarly Events
5
• Conferences, Workshops, Seminars, Symposia• A sequentially and spatially organized collection of scholars’
interactions• with the intention of
• Delivering and Sharing knowledge, • Exchanging Research Ideas, and • Performing related activities.
Scholarly Events
Publish up-to-date scientific research results,
Get feedback from scientific communities
Exchange research interests and ideas with each other
Demonstrate current research trends
Information Needs wrt Scholarly Events
Information need of a simple magnitude
• Event Name, Topics• Event Date, Venue, Organizer• Due dates for Calls for Paper
A scientist does not gets a full and exhaustive picture of scholarly events held in the world
• Due to the sheer volume of events held by various academic societies and organizations
• no single information channel has been successful at keeping track of ever-growing conferences and providing their information to scientists
Information Needs wrt Scholarly Events
Scientifically meaningful inference• prominent scientists• prominent events• best scientists suited for consultations and collaboration
might be met partially at a minimal level • since almost all event websites list leadership members such as• general chairs, committee members, invited speakers and/or award
winners• Users are not able to get the whole picture• existing library services do not provide this kind of meaningful
information in an integrated and collective manner
Research Goal
Satisfy scientists’ basic information needs• by collecting, archiving and providing access to scholarly event
information.
Satisfy users’ in-depth information needs • by excavating scholarly meaningful information through reasoning
about knowledge
To define a description base for scholarly events • to enable software agents to crawl and extract event data, and • to facilitate the unified access to, and reason about, the collected
data
Previous Work
• EventSeer, PapersInvited, Conference Alerts– focus on calls for papers– simple metadata about forthcoming events– proprietary description formats
• Semantic Web Conference ontology– best only for the ESWC conference
• Event Driven Model– ABC ontology, INDECS, OntologX, FRBR, CIDOC-
CRM, Enterprise Architecture, Event Ontology
GENERIC EVENT MODEL
provide enough descriptive power and granularity to
span over multiple scientific disciplines and capture as many varied event types as possible
“2008-11-08”
“John Smith”
“Biomedical Modeling”
“Washington, DC”
Time
Place
Agent
Entity
Agent (Who)
Time (When)
Place (Where)
Entity (What)
Action (How)
Event Presentation Event
“Present” Action
Generic Event ModelEvent≡ (∃Agent) ∧ (∃Action) ∧ (∃Entity) ∧ (∃Place) ∧ (∃Time)
12
(∃Agent(John.Smith)) ∧ (∃Action(present)) ∧ (∃Entity(Biomedical Modelling)) ∧(∃Place(Washington)) ∧ (∃Time(2008–11–08)).
The classes of the generic event model
THE SEDE MODEL & IMPLEMENTATIONOntology modelling principle
Scholarly event description structure
Key concepts in the SEDE ontology
n-ary relations and reification heuristics
Ontology improvement
Scholarly Event
Track
…
SessionAtomEvent
AtomEvent
…
SessionAtomEvent
AtomEvent
…
SessionAtomEvent
AtomEvent
…
…
Track
SessionAtomEvent
AtomEvent
…Session
AtomEvent
AtomEvent
…… Scholarly
Event
ScholarlyEvent
…
Scholarly Event Description Structure
15
foaf:Agent
Event Series
Committee
Call
Track
Session
hasSession
hasSessionChair
skos:Concept
skos:ConceptScheme
foaf:Group
Program
hasProgram
hasTrackhasCommittee
hasAtomEvent
CommitteeRolehasPresenter
foaf:Person
isMemberEventOf
hasTopic
hasTopic
Time
hasChildEvent
hasCommitteeRole
hasProceedings
Paper Proceedings
endDate
startDate
Role
playedBy
Presentation
hasTheme
hasSession
hasCall
skos:inSchemefoaf:Document
Artifact
hasArtifact
VideoClipPlace Country
City
Venue
geo:SpatialThing
Event
heldAt
AtomEvent
hasTopic
16
RDFS/OWL
18
http://eventography.org/sede
http://eventography.org/sede
UML representation of Scholarly Event.
21
The reified relationship btw. Committeeand Agent via CommitteeRole
22
APPLICATION USE CASE SCENARIOS
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
24
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
25
26
Ontology-based Information Extraction
Ontology-based Information Extraction
• The limitations of fully automatic information extraction techniques
• The heterogeneous nature of event web pages • Strategy
– to make use of a more simple approach of data extraction,
– utilizes manually defined patterns of text content and HTML formatting based on general conventions for listing data in human-readable formats on the web.
27
Method: Rule based Pattern Matching
Text Tokenizer
Extender
HTML Parser
Directory
Parse HTML
HTML Document
Opening HTML Tags:• tr, p, div à newlines• td à Tab• li à bulletClosing HTML tags:• p, table, li, h1-5, br à
newlines
• Tokenize text• pre-tag • Separate punctuation marks
(/n, “”, ,, !, (), :,;, .)• append EOF tag• split text by spaces• return array of tokens
Assign Tags
• Directory class call ‘createTagIndex’ function
• Match Tags using REG keyword matches and gazetter lookup
matchLookup
Text string
Tokenizetext
TokenArray
TagArray
Tag form: /aBCD• a: Tag Category• BCD: Tag description
Start
(Grammar Parser)Chainer
List of rules for identifying similar patterns of tags
String + chain index
+chain Type
Realmer
Realm Data
ModifyRealm Data
Exporter
Add Realm
Holds a hierarchy of realmsEach realm correspond to a different chain in the document
Extracted Data
Data Extraction
RuleLookup
End
Regular Expression Keyword
Gazetteer
28
Method: Tag Cassification
Punctuations /pCOM
Literal/lEML
Tag
Data & Numbers/iYEA
Name-Related/nTTL
Keywords/kUNI
Additional/xCAP
Grammar related /gOF
Category Tag MeaningGrammerCategory
/gART [article ex. the|this|its|...]/gOF of/gFOR for/gON on/gAT at/gIN in/gABT about/gFRM from/gTO To | through | until/gCNJ [conjunction = and | or | &] 29
Method: Tag Cassification
Tag Meaning Example/UNI university universtiy|college|academy|Universitat.../CTR center center|centre|institute|department|division/ORG organization society|association|council|consortium/EVT event conference|conf|symposium|meeting|congress|roundtable|colloquium|seminar|summit|convention|forum|program/QUA qualifier annual|biannual|biennial|interdisciplinary|special|joint|asian|european|international|metropolitan|national|polytechnic|glob
al|graduate|limited|ltd(\\.)?|incorporated|inc(\\.)?|int(\\.)|applied)/SBJ subject (Aeronautics|aerospace|Agriculture|applications|Astronomy|Biology|Biotechnology|Biochemistry|bioinformatics|business
|Chemistry|Cryptology|Ecology|economics|Electronics|Energy|Engineering|Environment|Forensics|Geography|health|informatics|information|Mathematics|Mechanical|medicine|Meteorology|Nanotechnology|Oceanography|Paleontology|Physics|Policy|Psychology|Research|science(s)?|security|securities|solution(s)?|Space|systems|technology|Vibrations|Wireless)"
/OTH other (webpage-related)
"(Main|Media|Home|you|of|(Us)|((?i)(tutorial|proceeding(s?)|download|PDF|PostScript|HTML|MSWord|LaTex|Format|ASCII|collocated|copyright|see|contact)))
Punctuations /pCOM
Literal/lEML
Tag
Data & Numbers/iYEA
Name-Related/nTTL
Keywords/kUNI
Additional/xCAP
Grammar related /gOF
Realms: ExampleThere were few surprises about the submission of the paperIt will take place at the University of Technology, Brahms, Canada.
Submission due date: September 5th, 2009
TEXT_CHUNKSUBMISSION_MARKERUNIVERSITY_NAMECOUNTRY
DEADLINE_CONTAINERSUBMISSION_MARKERDATE
DEADLINE_CONTAINERNOTIFICATION_MARKERDATE
COMMITTEE_MARKERAFFILIATION_GROUPNAMEUNIVERSITY_NAMECOUNTRY
Program Committee:Dolldrum Flannery, University of Texas, USA
HTML Text Realms
Notification date: November 6th, 2009
Implementation: Workbench
32
Implementation: Export to RDF KB
33
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
34
Semantic S&R on Scholarly Events(1)
• Finding events with a specific call-for-paper topic, a submission deadline, and an event start date
SELECT DISTINCT ?Topic ?Event ?Deadline ?Event_StartWHERE { ?x a sede:Event; rdfs:label ?Event. ?x sede:hasCall ?y.?y rdfs:label ?Call. ?y sede:hasTopic ?z. ?z skos:prefLabel ?Topic.?y sede:submissionDeadline ?Deadline. ?x sede:startDate ?Event_Start.FILTER ( (regex(?Topic, "data mining")||regex(?Topic, "Data mining") )|| (regex(?Topic, "Ontolog*")||regex(?Topic, "ontolog*") ) ) }ORDER BY ?Topic
35
Semantic S&R on Scholarly Events(2)
• Retrieving artifacts from an atom event:
• A user missed an invited talk session on the topic of “semantic search” at the ESWC2008 Conference. So, the user searches for invited talk session covering that topic to come up with its video clip URI.
36
VideoClipPresentationPaperPresentationVersionOf
AtomEvent
hasAtomEvent
Event
hasSessionfoaf:Person
Session
hasPresenter
hasArtifact
Track
hasTrack
hasArtifact
hasAuthor
hasArtifact
skos:Concept
hasTopicRDF Endpoint: http://eventography.org/query/
SPARQL Query
End User
Bibliographic RepositoriesData Repositories
PresentationRepositories
Artifacts
Video ClipRepositories
37
SELECT ?Topic ?Presenter ?Video_Clip ?Event ?Session WHERE {?x a sede:Event. ?x skos:altLabel ?Event.?x sede:hasSession ?y. ?y rdfs:label ?Session.?y sede:hasAtomEvent ?z.?z sede:hasPresenter ?p.?p foaf:name ?Presenter.?z rdfs:label ?AtomEvent.?z sede:hasArtifact ?c. ?c dc:identifier ?Video_Clip.?z sede:hasTopic ?t. ?t skos:prefLabel ?Topic.FILTER ((regex(?Event, "ESWC*"))&&((regex(?Session, "Invited Talk")||regex(?Session, "invited talk")))&&((regex(?Topic, "Semantic Search")||regex(?Topic, "semantic search"))))}
Semantic S&R on Scholarly Events(2)
38
Semantic S&R on Scholarly Events(3)
• Finding domain experts
SELECT DISTINCT ?Domain ?Expert ?AffiliationWHERE{
?x a sede:Session. ?x sede:hasTopic ?topic. ?topic skos:prefLabel ?Domain.?x sede:hasSessionChair ?chair. ?chair foaf:name ?Expert.FILTER (regex(?Domain, "Decision")|| regex(?Domain, "decision”))OPTIONAL{?chair sede:hasAffiliation ?y. ?y foaf:name ?Affiliation.}
}ORDER BY ?Domain
39
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
40
Coupling of Events and Scientists
( ) , ,
2 2, ,
, t i t ji j
t i t j
w wsim E E
w w= ∑
∑ ∑
41
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
42
Domain Knowledge Structure Analysis
43(data mining and its usage context in Bioinformatics, cosine ≥0.1; k-nn 2; n=69)
*Co-word Analysis: Assumption
article Topic A
Topic B
These two topics are likely to be related
article
articleTopic C
……
……
44
*Co-word Analysis
1 1
2 2 2 2
1 1 1 1
Cosine( , )( ) ( )
n n
i i i ii i
n n n n
i i i ii i i i
x y x yx y
x y x y
= =
= = = =
= =
×
∑ ∑
∑ ∑ ∑ ∑
SNA.dat file
d3
d2
d1
111001100101t4t3t2t1
t3t2t1
021205310t3t2t1
t
t
tt
t
t
t
tt
t
t
t
,,
,
logi ji j
k j ik
f NW TF IDFn n
= × = ×∑
Papers from Events
Event Topics
45
*Tool: BiKE Text Analyzer (BTA)
• Java Application
• Vocabulary Manager
• Synonym Manager
• Stopword Manager
• Stemming Manager
46
*Tool: BTA: Identify variables
47
*Tool: BTA: SNA data file
48
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
49
Generation of Domain KOS<skos:Concept rdf:ID="BiomedicalInformaticsAndComputation">
<skos:prefLabel>Biomedical informatics and computation</skos:prefLabel> <skos:inScheme rdf:resource="#BIBE2007Themes"/><skos:narrower rdf:resource="#Bio-molecularAndPhylogeneticDatabases"/><skos:narrower rdf:resource="#DataVisualization"/><skos:narrower rdf:resource="#Interoperability"/><skos:narrower rdf:resource="#BiomedicalImaging"/><skos:narrower rdf:resource="#DrugDiscoveryGeneExpressionAnalysis"/><skos:narrower rdf:resource="#MolecularEvolutionAndPhylogeny"/><skos:narrower rdf:resource="#Bio-Ontology"/><skos:narrower rdf:resource="#BioinformaticsEngineering"/><skos:narrower rdf:resource="#ProteinStructurePredictionAndMolecularSimulation"/><skos:narrower rdf:resource="#SystemBiology"/><skos:narrower rdf:resource="#SignalingAndComputationBiomedicalDataEngineering"/><skos:narrower rdf:resource="#ModelingAndSimulation"/><skos:narrower rdf:resource="#QueryLanguages"/><skos:narrower rdf:resource="#SequenceSearchAndAlignment"/><skos:narrower rdf:resource="#Proteomics"/><skos:narrower rdf:resource="#Telemedicine"/><skos:narrower rdf:resource="#FunctionalGenomics"/><skos:narrower rdf:resource="#IdentificationAndClassificationOfGenes"/><skos:narrower rdf:resource="#Biolanguages"/>
</skos:Concept>
<skos:Concept rdf:ID="Semantic_Web"><skos:prefLabel>Semantic Web</skos:prefLabel><skos:inScheme rdf:resource="#ICSD2009CfPTopics"/><skos:topConceptOf rdf:resource="#ICSD2009CfPTopics"/>
…………….. <skos:narrower rdf:resource="#Knowledge_Organization_and_Ontologies"/>
</skos:Concept>
<skos:Concept rdf:ID="Bio-Ontologies"><skos:prefLabel>Bio-Ontologies</skos:prefLabel><skos:inScheme rdf:resource="#Bio-OntologiesBioLink2006Topics"/>
<skos:narrower rdf:resource="#Current_Research_In_Ontology_Languages_and_its_implication_for_Bio-Ontologies"/><skos:narrower rdf:resource="#Biological_Applications_of_Ontologies"/><skos:narrower rdf:resource="#Reports_on_Newly_Developed_or_Existing_Bio-Ontologies"/><skos:narrower rdf:resource="#Tools_for_Developing_Ontologies"/><skos:narrower rdf:resource="#Use_of_Semantic_Web_technologies_in_Bioinformatics"/><skos:narrower rdf:resource="#The_implications_of_Bio-Ontologies_or_the_Semantic_Web_for_the_drug_discovery_process"/>
</skos:Concept>
<skos:Concept rdf:ID="ComputingLearningOrBehaviour"><skos:prefLabel>Computing learning or behaviour</skos:prefLabel><skos:topConceptOf rdf:resource="#BSBT2009Theme"/><skos:inScheme rdf:resource="#BSBT2009Theme"/><rdfs:label>Computing learning or behaviour</rdfs:label><skos:narrower rdf:resource="#Ontologies"/><skos:narrower rdf:resource="#MathematicalBiology"/><skos:narrower rdf:resource="#ModellingLearningInLivingSystems"/><skos:narrower rdf:resource="#TeachingHumanoidRobots"/>
</skos:Concept>
skos:related
skos:broader
owl:sameAs
50
AcademicProminenceEvaluation
Web
KnowledgeStructureAnalysis
Domain KOSGeneration
EventCoupling
SEDEOntology
SemanticSearch &Retrieval
Event Data Crawler
Event Data Extractor
Knowledge Base
Crawled Data
Ontology Editor
APIs
…..
51
Academic Performance Evaluation
52
Scholar’s Prominence Evaluation
1( | )( )
nt t t
t T
f
w k fP S
nτ
=∈=∑
Weight# of Elite Group
Membership
Field
# of Events in a Specific Field
Normalizer
Definition (1)
Prominence of Scholar S
Elite Group Type
53
Scholarly Event’s Prominence Evaluation Metrics
54
Scholarly Event’s Prominence Evaluation Metrics
1 ( )( )
ns
s S
f
P SP E
cτ
=∈=∑
Event’s Prominence
Scholar’s Prominence(Def. 1)
# of Elite Group Member for an Event belong to a Specific Field
Definition (2)
55
Event Series’ Prominence Evaluation
56
Event Series’ Prominence Evaluation
1 ( )( )
ng
g G
f
P EP
zε τ
=∈=∑
Event Prominence(Def. 2)
Definition (3)
# of event instances (e.g.,AMIA2009)belonging to Event Series (AMIA)in a given subject field (Medical Informatics)
Event Series Prominence
57
ONTOLOGY EVALUATION
Ontology Evaluation
Ontology EvaluationCompetency Question SEDE SWC
Does it have a container for topics?
Yes. It uses SKOS to describe topics.
No. It uses SWRC’s research topic which has a limited number of topics.
Does it have a container for committees?
Yes. It has the Committee class No.
Does it identify various roles in a committee?
Yes. It defines a generic class Role identifiable with a label.
No. It enumerates Chair, Delegate, Presenter, Program Committee Member, resulting in no mechanisms to identify variant names such as co-chair, vice-chair, founder, etc.
Does it support the representation of an event’s structure in a flexible way?
Yes. It is more flexible than SWC, in that it furnishes the class from the top level (Event) down to the leaf level classes (AtomEvent).
Arguable. The WorkshopEvent, TutorialEvent, ConferenceEvent, and PanelEvent should be deprecated, since they can be described with the top level class, such as AcademicEvent, TrackEvent and SessionEvent.
Does it have a container for Call?
Yes, it has the Call class No. The Call class was deprecated, and it uses the CfP ontology.[1] CfP Vocabulary Specification, http://sw.deri.org/2005/08/conf/cfp.html 60
DISCUSSION & CONCLUSION
Discussion & Conclusion
• The SEDE ontology provides a backbone to represent, collect, share and allow inference from scholarly event information in a logical way
• Basic information needs– semantic search and retrieval using the facts stored in the KB
• Scientifically meaningful information needs– unearth hidden knowledge for the academic community
• SEDE– helps to improve information accessibility through greater
semantic interoperability of information.– makes it possible to build a scholarly semantic web
• isolated pieces of scholarly event data integrated through relationships with other scientific data on the web thus creating added information.
SEDE: An Ontology for Scholarly Event
DescriptionSenator Jeong
Biomedical Knowledge Engineering Lab.,
Seoul National University