26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina...

18
1 26/10/2008 SWESE'08 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva

Transcript of 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina...

Page 1: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

126/10/2008 SWESE'08

Enhanced Semantic Access to Software Artefacts

Danica Damljanović and Kalina Bontcheva

Page 2: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

2

University of Sheffield NLP

26/10/2008 SWESE'08

Outline

Motivation

The GATE case study

Semantic-based prototypeData collection

Automatic content augmentation

Storing implicit annotations

Querying using text-based queries

Example

Conclusion and Future work

Page 3: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

3

University of Sheffield NLP

26/10/2008 SWESE'08

Motivation

Large software frameworks:hard to maintain: never enough documentation

hard to find specific information

significant learning curve for new developers working on software extensions

software engineers who integrate relevant parts into their applications

Page 4: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

4

University of Sheffield NLP

26/10/2008 SWESE'08

Can semantic technologies help?

forumpost

Website

sourcecode

forumpost

forumpost

Website

paper

paper

sourcecode

forumpost

forumpost

Website

Website

sourcecode

sourcecode

Software documentation

Page 5: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

5

University of Sheffield NLP

26/10/2008 SWESE'08

The GATE case studyGATE (gate.ac.uk):

open-source, General Architecture for Text Engineering

development team over 15 people at present, over 30 over the years

documentation about GATE software:

dispersed on the Web: not easy to find by new/existing developers/users

no unified interface: Google, gate.ac.uk, gmane mailing list search, etc.

Page 6: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

6

University of Sheffield NLP

26/10/2008 SWESE'08

The GATE case study: requirements

Automatic generation of reference pages from the ontology:

provide users with a single point of access to all knowledge, continuously kept up to date.

generate automatically a web page:shown on its own or alongside the ontology tree, where searched concept is selected

Page 7: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

7

University of Sheffield NLP

26/10/2008 SWESE'08

Semantic-based prototype

Software documentationlearn domain ontology

annotate content

Semantic repository

store

text-based query

Page 8: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

9

University of Sheffield NLP

26/10/2008 SWESE'08

Data collection

Downloaded around 10000 software artefacts about GATE:

source code,source documentation, GATE manual, forum posts, publications.

Page 9: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

10

University of Sheffield NLP

26/10/2008 SWESE'08

Annotate content

Page 10: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

11

University of Sheffield NLP

26/10/2008 SWESE'08

Export annotations

Merge

document metadata and

annotations

into the owl file using an information-extraction ontology:

PROTON KM (http://proton.semanticweb.org/2005/04/protonkm)

Page 11: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

12

University of Sheffield NLP

26/10/2008 SWESE'08

Information-extraction ontology

Document class

resourceType property: refers to the type of the document,

informationResourceIdentifier property: refers to the URL of the annotated document.

Mention class:

occursIn Document

hasStartOffset and hasEndOffset: storing position of the annotation

(new) refersAnything: to preserve the URI of the resource to which the mention is referring to

Page 12: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

13

University of Sheffield NLP

26/10/2008 SWESE'08

Export annotations

Page 13: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

16

University of Sheffield NLP

26/10/2008 SWESE'08

Access knowledge using text-based queries

QuestIO (Question-based interface to ontologies):

keyword-based queriesfull-blown questions

Page 14: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

University of Sheffield NLP

SWESE'08

QuestIO:Text-based query >> SeRQL

select c0,"[inverseProperty]", p1, c2,"[inverseProperty]", p3, c4,"[inverseProperty]", p5, i6

from {c0} rdf:type {<http://gate.ac.uk/ns/gate-ontology#JavaClass>}, {c2} p1 {c0}, {c2} rdf:type {<http://gate.ac.uk/ns/gate-ontology#ResourceParameter>}, {c4} p3 {c2}, {c4} rdf:type {<http://gate.ac.uk/ns/gate-ontology#ProcessingResource>}, {i6} p5 {c4}, {i6} rdf:type {<http://gate.ac.uk/ns/gate-ontology#GATEPlugin>}

where p1=http://gate.ac.uk/ns/gate-ontology#parameterHasType and p3=http://gate.ac.uk/ns/gate-ontology#hasRunTimeParameter and p5=http://gate.ac.uk/ns/gate-ontology#containsResource and i6=<http://gate.ac.uk/ns/gate-ontology#annic>

“Java Class for parameters for processing resources in ANNIC?”

Page 15: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

18

University of Sheffield NLP

26/10/2008 SWESE'08

An example

Page 16: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

19

University of Sheffield NLP

26/10/2008 SWESE'08

Demo

http://gate.ac.uk/document-search

Page 17: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

22

University of Sheffield NLP

26/10/2008 SWESE'08

Future Work

optimise query execution time: migrate from SeRQL >> SPARQL

include simple ontology-driven data in the interface

evaluation to follow:user-centric evaluation with GATE users

Page 18: 26/10/2008 SWESE'08 1 Enhanced Semantic Access to Software Artefacts Danica Damljanović and Kalina Bontcheva.

23

University of Sheffield NLP

26/10/2008 SWESE'08

Thank you!

Questions?