SAnno: a unifying framework for semantic annotation

Post on 11-Nov-2014

604 views 1 download

Tags:

description

A talk presentation of SAnno at IDSIA, 2010/06/01.

Transcript of SAnno: a unifying framework for semantic annotation

SAnno: a unifying framework for semantic annotation

Davide Eynard

IDSIA, 01/06/2010

2

Davide EynardIDSIA, 01/06/2010

Introduction

• S(emantic)Anno(tations) • … in Italy, “sanno” also means “they know”

• Basic principle: anyone should be able to say anything about anything else

• Well, this should hold in general :-)• Actually, in our case it is “anything about any URI”• And we would like everyone to say that in a formal way

• But first, a little step back in time...

3

Davide EynardIDSIA, 01/06/2010

Participation and semantics

Data

Structure

4

Davide EynardIDSIA, 01/06/2010

Sanno's grandfather: Speakinabout [1]

• Purpose: produce semantic annotations about named entities• When you read “Harry Potter”, is it the book or the movie?

• Plays with user gratifications• When users annotate a string as matching a specific concept, they

are shown a list of services/search engines which are related to it

• Relies on user provided data:• Freebase types• User generated search templates, built inside a wiki system

5

Davide EynardIDSIA, 01/06/2010

Sanno's grandfather: Speakinabout [1]

6

Davide EynardIDSIA, 01/06/2010

Sanno's grandfather: Speakinabout [1]

7

Davide EynardIDSIA, 01/06/2010

Sanno's father: RDFMonkey [2]

• Purpose: augment browsing experience by providing information/services related to the visited URL

• Relies on Freebase types• … as in SpeakinAbout, but without requiring user interaction• Types are found by searching backlinks in Freebase (which topics

are linking the visited page)

• Related services as widgets inside a browser extension• The app could load widgets at runtime (from Freebase itself or

another collaborative system)

8

Davide EynardIDSIA, 01/06/2010

Sanno's father: RDFMonkey [2]

Musical Artists

Cities

Books

9

Davide EynardIDSIA, 01/06/2010

The problem

• We already have semantics on the annotation (i.e. Annotea), but how can we have semantics within the annotation?

• Good starting points:• Some participative systems already provide semi-structured

information (i.e. infoboxes in Wikipedia)• Some communities of practice already built their own bottom-up

way to structure information (i.e. machine tags)• Some (relatively new) systems allow, with some additional effort, to

save information in a structured way almost without requiring users to know that (i.e. semantic wikis)

• Challenges• Provide a shared way to describe annotations coming from

heterogeneous systems• Aggregate this information to provide something new and useful

10

Davide EynardIDSIA, 01/06/2010

SAnno as a framework

• Sanno is built up of many different parts, which all together provide something (we consider) new and useful

• An ontology to describe annotations (the “shells” that contain metadata about a resource)

• An ontology describing the types of properties we are already able to aggregate

• A set of conversion tools which are able to translate existing annotations from other systems into our notation

• A system to show the results of the aggregation of different annotations

• A system to manage provenance, authorship, and filters on incoming annotations

11

Davide EynardIDSIA, 01/06/2010

The annotations ontology

• Every annotation can be considered as a “Post-it”, a piece of paper where something is written about something else

• … you can say things about what is written there, but also about the Post-it itself

• The annotation is about a resource, it is created by someone in a specific date, it comes from a particular annotation system and might be connected to a specific community

• Main goal: do not reinvent everything from scratch• Reuse well-known ontologies such as DC, SIOC, etc.• Use named graphs as an alternative to reifications

• Start in an easy way: restriction to URLs• Also a way to provide instant gratification to users: show

annotations while they are browsing a website

12

Davide EynardIDSIA, 01/06/2010

The aggregation ontology

• Aggregation deals with the contents of the annotation (i.e. The triples found in the NG)

• Objectives• Avoid constraining users to a specific vocabulary for annotations• Find a way to collect different annotations and provide something

new and interesting by aggregating them

Our approach• Properties used inside annotations could be described as belonging

to families we already know how to deal with• Examples: very specific (tags, ratings), more general (transitive

relations)• Properties inside some external vocabulary are mapped as

subproperties of ours• … by whom? High-experience users who have incentives to do this

(think about users building templates in Wikipedia...)

13

Davide EynardIDSIA, 01/06/2010

Conversion tools

• Our worst enemy: the bootstrap• who is going to annotate the first resources? I don't have time!

• Our best friends: already existing annotation systems• why don't we convert existing data to our notation and show the

advantages of our approach?

Different families of conversion tools• Easy: already existing APIs, with realtime search functionalities

(i.e. del.icio.us)• Medium: conversions from existing structured repositories such as

SPARQL endpoints (advantage: the conversion is very clean, you just need one tool and different CONSTRUCTs)

• A little harder: Web scraping when no other sources are available

14

Davide EynardIDSIA, 01/06/2010

Annotation client

• Actually, two possible clients in our mind:• a browser extension which shows annotations while users are

browsing the Web• an independent service which is able to aggregate heterogeneous

information related to similar resources (i.e. URLs marked as being MP3 files)

• Filter annotations according to author, date, originating system, and community

• Users should be able to “subscribe” to some annotating communities and ignore others

• System is thought as distributed, as data can come from different, unrelated sources

15

Davide EynardIDSIA, 01/06/2010

The prototype

• Early annotation ontology• Property families: tag, rating, generically related URI• Conversions from SMW, Delicious• Visualization as a web service + Firefox extension• No subscriptions yet

16

Davide EynardIDSIA, 01/06/2010

The prototype

17

Davide EynardIDSIA, 01/06/2010

The end

Thank you! Questions?

References:• [0] D.Laniado, D.Eynard and M.Colombetti. Using WordNet to turn a folksonomy into a

hierarchy of concepts. Semantic Web Application and Perspectives 192–201, 2007.

• [1] D.Eynard and M.Colombetti. Exploiting User Gratification for Collaborative Semantic Annotation. Proceedings of SWUI 2008. April 2008.

• [2] D.Eynard. Using semantics and user participation to customize personalization. HP Labs Technical Report HPL-2008-197. September 2008.

• [3] L.Mazzola, D.Eynard and R.Mazza. GVIS: a framework for graphical mashups of heterogeneous sources to support data interpretation. HSI 2010. May 2010.

Contact Davide Eynard

eynard@elet.polimi.it

http://davide.eynard.it

Tel. 02 2399 4010

Fax 02 2399 3411

Back

Project page @AIRLab: http://airwiki.elet.polimi.it