1 Towards Scholarly Publishing on the Semantic Web Simon Buckingham Shum Senior Lecturer Open...

32
1 Towards Scholarly Publishing Towards Scholarly Publishing on the Semantic Web on the Semantic Web Simon Buckingham Shum Simon Buckingham Shum Senior Lecturer Senior Lecturer Open University, Knowledge Media Institute Open University, Knowledge Media Institute Gary Li, Victoria Uren Gary Li, Victoria Uren John Domingue, Enrico Motta John Domingue, Enrico Motta EPSRC DIMnet Workshop, Manchester, 7-8 Oct., 2002 EPSRC DIMnet Workshop, Manchester, 7-8 Oct., 2002

Transcript of 1 Towards Scholarly Publishing on the Semantic Web Simon Buckingham Shum Senior Lecturer Open...

1

Towards Scholarly Publishing Towards Scholarly Publishing on the Semantic Webon the Semantic Web

Simon Buckingham ShumSimon Buckingham ShumSenior LecturerSenior Lecturer

Open University, Knowledge Media Institute Open University, Knowledge Media Institute

Gary Li, Victoria Uren Gary Li, Victoria Uren John Domingue, Enrico MottaJohn Domingue, Enrico Motta

EPSRC DIMnet Workshop, Manchester, 7-8 Oct., 2002EPSRC DIMnet Workshop, Manchester, 7-8 Oct., 2002

2

In 2010, will scholarly work still be published solely in prose, or can we

imagine a complementary infrastructure that is ‘native’ to the emerging semantic, collaborative

web, enabling more effective dissemination and analysis of ideas?

3

Project facts and figuresProject facts and figures Scholarly OntologiesScholarly Ontologies (ScholOnto) Project (ScholOnto) Project

3 year project /started Feb. 20013 year project /started Feb. 2001

PI:PI: Simon Buckingham Shum Simon Buckingham Shum Co-Investigator’s:Co-Investigator’s: John Domingue, Enrico Motta John Domingue, Enrico Motta Research Fellows:Research Fellows: Gary Li, Victoria Uren Gary Li, Victoria Uren PhDs: PhDs: 5 related projects5 related projects

Partner: Partner: Academic PressAcademic Press

Synergy with other EPSRC projects at KMi:Synergy with other EPSRC projects at KMi: Advanced Knowledge Technologies Advanced Knowledge Technologies IRC IRC CoAKTinG: CoAKTinG: eScience Grid collaboration toolseScience Grid collaboration tools

4

OverviewOverview

Problem:Problem: little computational support little computational support for interpreting and analysing research for interpreting and analysing research literaturesliteratures

Approach:Approach: literatures as networks of literatures as networks of ‘claims’: connected concepts‘claims’: connected concepts

Theoretical basis:Theoretical basis: argumentation, argumentation, coherence relations, KB-hypertextcoherence relations, KB-hypertext

Infrastructure:Infrastructure: ClaiMaker – a ‘claims ClaiMaker – a ‘claims server’ to construct and analyse server’ to construct and analyse scholarly claimsscholarly claims

ProgressProgress to date to date

5

Phenomena of interest to scholarsPhenomena of interest to scholars

““Who’s building on the ideas in this paper, and in what way?”Who’s building on the ideas in this paper, and in what way?”

““Who’s challenged this paper?”Who’s challenged this paper?”

““Has anyone proposed a similar solution but from a different Has anyone proposed a similar solution but from a different theoretical perspective?”theoretical perspective?”

““Are there groups building on theory T, but who contradict each Are there groups building on theory T, but who contradict each other?”other?”

““Has anyone generalised method M from domain D to E?”Has anyone generalised method M from domain D to E?”

““Is there any software which tackles problem P?”Is there any software which tackles problem P?”

““What impact did Language L have?”What impact did Language L have?”

““Are there distinctive theoretical perspectives on problem P?”Are there distinctive theoretical perspectives on problem P?”

6

What students/researchers/information What students/researchers/information analysts want to knowanalysts want to know

AuthorityAuthority

ImpactImpact

Schools of thoughtSchools of thought

Intellectual lineageIntellectual lineage

ConsistencyConsistency

7

resourcesdocuments, datasets, etc…

metadata generally uncontroversial:

minimise inconsistency, ambiguity, controversy

domain ontologies richer formalisation of consensus:

minimise inconsistency, ambiguity, controversy

interpretations?interpretations?

8

“The Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage, and Usability of a Web Site”

Web User Flow by Information Scent (WUFIS)

“Information foraging”

Information foraging theory

Information scent models

People try to maximise their rate of gaining information

?

extends

From From undifferentiatedundifferentiated, inter-document citations…, inter-document citations…

……to inter-to inter-conceptconcept,, semantic semantic connectionsconnections

9

?

Claims

Counterclaims

Emergent domain model grounded in perspectives

10

Claims

Counterclaims

Emergent domain model grounded in perspectives

11

ScholOnto in a nutshell…ScholOnto in a nutshell…

Literatures as Literatures as networks of conceptsnetworks of concepts…… ……which are which are grounded in documentsgrounded in documents Connections between nodes are Connections between nodes are claimsclaims

Core set of connection typesCore set of connection types, which can , which can be expressed in discipline-specific be expressed in discipline-specific dialectsdialects

Multiple claim structures from Multiple claim structures from diverse perspectivesdiverse perspectives

A server A server mediates and helps managemediates and helps manage the the complexity of the claims networkcomplexity of the claims network

12

Claim

Structure of a connective Structure of a connective ClaimClaim

LinkLink

Concept Type Optional classification of object(s)

in the context of this link

- Label: summarising... - Type - Polarity - Weight - Direction - Author - Timestamp

Object

- concept- data- set/claim

13

A A SetSet of Concepts, Claims, Objects of Concepts, Claims, Objects

14

Claim

Structure of a connective Structure of a connective ClaimClaim

Link

Concept Type Optional classification of object(s)

in the context of this link

Object

- concept- data- set/claim

- Label: summarising... - Type - Polarity - Weight - Direction - Author - Timestamp

Set

15

‘‘Concepts’Concepts’ Succinct summaries of a publication’s contribution to the literature (granularity chosen by the user)Succinct summaries of a publication’s contribution to the literature (granularity chosen by the user) Optionally given a typeOptionally given a type

Example 1Example 1 [Theory][Theory] Salomon (1987) Salomon (1987) [Hypothesis][Hypothesis] Animations can supplant key cognitive processes in learning collision mechanics, impairing deep understanding Animations can supplant key cognitive processes in learning collision mechanics, impairing deep understanding [Data][Data] Animations explaining momentum in the tool Animations explaining momentum in the tool XtremePhysics XtremePhysics improve the performance of middle-high achieving 16 yr olds, but impair low achieversimprove the performance of middle-high achieving 16 yr olds, but impair low achievers

Example 2Example 2 [Problem][Problem] How to reduce disorientation in non-linear narrative? How to reduce disorientation in non-linear narrative? [Theory][Theory] Cognitive Coherence Relations (Knott and Sanders, 1999) Cognitive Coherence Relations (Knott and Sanders, 1999) [Theory] [Theory] Semiotics of CinemaSemiotics of Cinema [Framework] [Framework] Cinematic HypermediaCinematic Hypermedia

16

Relations: Discourse Ontology Relations: Discourse Ontology (v2)(v2)

18

Making connections in ClaiMakerMaking connections in ClaiMaker

19

The need for visualizations…The need for visualizations…

20

Towards visual claim-makingTowards visual claim-making

21

Visual claim-making

22

Visual claim-making

23

Conceptual claim-making template Conceptual claim-making template for an Evaluation Reportfor an Evaluation Report

24

Discovery ServicesDiscovery Services The The paybackpayback for modelling for modelling New forms of New forms of digital visibility digital visibility for researchfor research

GraphGraph-based services -based services Dense Dense cluster detectioncluster detection Scientometrics Scientometrics (e.g. co-citation at the semantic inter-concept level)(e.g. co-citation at the semantic inter-concept level)

OntologyOntology-based services-based services Semantic structuralSemantic structural search search Show Show supportingsupporting documents documents Show Show challengingchallenging documents documents Show a concept’s Show a concept’s lineagelineage

VisualizationsVisualizations to support to support navigation and queryingnavigation and querying

25

Identifying potentially significant clustersIdentifying potentially significant clusters

Simple linear SVMRules made with CHARADE outperform Naive Bayes and decision trees

Decision Forest classifier improves on C4.5 and kNN

Simple linear SVM is among the best reported text categorizers

CDM performs moderately better than Naive Bayes and decision trees

Optimised rules outperform Naive Bayes and decision trees

Decision trees and Naive Bayes perform well for text categorization

SVMs are well suited to text categorization

Support Vector Machines (SVM)

Naive Bayes underperforms other classifiersNaive Bayes is the worst classifier

Nearest Neigbour is one of the best categorizers

SVM and kNN outperform other classifiers

Which classifier is best?

Rule learning

Instance based learning

Bayesian learning

Decision tree learning

Machine learning

A 3-core cluster extracted from a network of claims and argumentation links. From hundreds of nodes modelling literature on text categorization, only those which connect to at least 3 other nodes in the cluster are presented (with link labels switched off). A flavour of key issues in the field is given without overwhelming the viewer.

26

Visualizing the results of a structural search on specific relational types (TouchGraph applet)

27

Navigating the network by document: incoming and outgoing concepts

28

Visualizing the ‘lineage’ (intellectual history) of a concept

Zooming, rotation, focusing and filtering

29

What documents challenge this one?What documents challenge this one?

1.1. Extract concepts for this documentExtract concepts for this document2.2. Trace concepts on which they buildTrace concepts on which they build3.3. Trace concepts challenging this setTrace concepts challenging this set4.4. Show root documentsShow root documents

30

Focusing on a concept from Focusing on a concept from previous viewprevious view

31

Next stepsNext steps

ClaiMaker ClaiMaker releasedreleased wide interest from both researchers wide interest from both researchers

(academia/government/corporate) and publishers(academia/government/corporate) and publishers

Develop customisable software Develop customisable software agentsagents monitor the claims network for patterns of interest to usersmonitor the claims network for patterns of interest to users

Extend the Extend the discovery servicesdiscovery services tools to interrogate/navigatetools to interrogate/navigate

Extend the Extend the visualization servicesvisualization services making sense of the claims networkmaking sense of the claims network

Foster Foster user communitiesuser communities broad spectrum of science/arts/humanities to test generalitybroad spectrum of science/arts/humanities to test generality

32

Visualizing Argumentation (2002, in press), Springer

www.VisualizingArgumentation.info

Argument mapping for scholarly publishing, scientific and public policy debates, education, teamwork, and organisational memory

33

Scholarly Ontologies Scholarly Ontologies ProjectProject

Tech. Details/PublicationsTech. Details/Publications::kmi.open.ac.uk/projects/scholontokmi.open.ac.uk/projects/scholonto

ClaiMaker test area:ClaiMaker test area:claimaker.open.ac.uk/Sandpitclaimaker.open.ac.uk/Sandpit