ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

137
Providing Linked Data Presented by: Barry Norton Maribel Acosta

Transcript of ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Page 1: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Providing  Linked  Data  

Presented  by:  Barry  Norton  

Maribel  Acosta  

Page 2: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Motivation:  Music!  

2  

Visualiza3on  Module  

Metadata  Streaming  providers  

Physical  Wrapper  

Downloads  

Data  acquisi3

on   R2R  Transf.  LD  Wrapper  

Musical  Content  

Applica3

on  

Analysis  &  Mining  Module  

LD  Data  set  

Access  

LD  Wrapper  

RDF/  XML  

Integrated  Dataset  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

RDFa  

Other  content  

Page 3: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

LINKED  DATA  LIFECYCLE  

EUCLID  -­‐  Querying  Linked  Data   3  

Page 4: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Linked  Data  Principles  

1.  Use  URIs  as  names  for  things.  2.  Use  HTTP  URIs  so  that  users  can  look  up  

those  names.  3.  When  someone  looks  up  a  URI,  provide  

useful  informa9on,  using  the  standards  (RDF*,  SPARQL).  

4.  Include  links  to  other  URIs,  so  that  users  can  discover  more  things.  

EUCLID  -­‐  Providing  Linked  Data   4  

CH  1  

Page 5: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Linked  Data  Lifecycle  

Linked  Data  Lifecycle  

EUCLID  -­‐  Providing  Linked  Data   5  

Source:  Sören  Auer.  “The  Seman3c  Data  Web”  (slides)  Source:  José  M.  Alvarez.  “My  Linked  Data  Lifecycle”  

Source:  Michael  Hausenblas.  “Linked  Data  lifeyclcle”  

Page 6: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Core  Tasks  for  Providing            Linked  Data    

EUCLID  -­‐  Providing  Linked  Data   6  

Based  on  the  proposed  LD  lifecycles  and  the  LD  principles,  we  can  iden3fy  3  main  tasks  for  providing  LD:  

①  Crea9ng:  includes  data  extrac3on,  crea3on  of  HTTP  URIs,  and  vocabulary  selec3on.  (LD  principles  1  &  2)  

②  Interlinking:  involves  the  crea3on  of  (RDF)  links  to  external  data  sets.  (LD  principle  4)  

③  Publishing:  consists  of  crea3ng  the  metadata  and  making  the  data  set  accessible.  (LD  principle  3)  

 

Page 7: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Agenda  1.   Crea9ng  Linked  Data  

2.   Interlinking  Linked  Data  

3.   Publishing  Linked  Data  

4.   Linked  Data  publishing  checklist  

7  EUCLID  -­‐  Providing  Linked  Data  

Page 8: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

CREATING  LINKED  DATA  

EUCLID  -­‐  Querying  Linked  Data   8  

Page 9: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

•  The  data  of  interest  may  be  stored  in  a  wide  range  or  formats:  

 

•  Several  tools  support  the  process  of  mining  data  from  different  repositories,  for  example:  

Extracting  the  Data  

9  EUCLID  -­‐  Providing  Linked  Data  

Spreadsheets  or  tabular  data     Databases   Text  

R2RML  

Page 10: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Using  the  RDF  Data  Model  

EUCLID  -­‐  Providing  Linked  Data   10  

•  The  RDF  data  model  is  used  to  represent  the  extracted  informa3on  

•  The  nodes  represent  the  concepts/en33es  within  the  data.  A  node  corresponds  to  a  URI,  a  blank  node  or  a  literal  (only  in  predicates)  

•  The  rela3onships  between  the  concepts/en33es  are  modeled  as  arcs  

Subject   Object  Predicate  

Page 11: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Naming  Things:  URIs  •  All  the  things  or  dis3nct  en33es  within  the  data  must  be  named  

•  According  to  the  Linked  Data  principles,  the  standard  mechanism  to  name  en33es  is  the  URI  

•  Designing  Cool  URIs:  –  Leave  out  informa3on  about  the  data  regarding  to:  author,  technologies,  status,  access  mechanisms,  …  

–  Simplicity:  short,  mnemonic  URIs  –  Stability:  maintain  the  URIs  as  long  as  possible  –  Manageability:  issue  the  URIs  in  a  way  that  you  can  manage  

11  EUCLID  -­‐  Providing  Linked  Data  

Source:hjp://www.w3.org/TR/cooluris/  

Page 12: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Selecting  Vocabularies  •  Vocabularies    model  the  concepts  and  the  rela9onship  between  them  in  a  knowledge  domain  

•  Terms  from  well-­‐known  vocabularies  should  be  reused  wherever  possible  

•  New  terms  should  be  define  only  if  you  can  not  find  required  terms  in  exis3ng  vocabularies  

•  A  large  number  of  vocabularies  in  RDF  are  openly  available,  e.g.,  Linked  Open  Vocabularies  (LOV)  

12  EUCLID  -­‐  Providing  Linked  Data  

Page 13: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Selecting  Vocabularies  (2)  

EUCLID  -­‐  Providing  Linked  Data   13  

Linked  Open  Vocabularies  

322  vocabularies  classified  by  domain  

Source:hjp://lov.okfn.org/dataset/lov/  

Page 14: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Selecting  Vocabularies  (3)  

EUCLID  -­‐  Providing  Linked  Data   14  

Linked  Open  Vocabularies:  Analyzing  MusicOntology  

Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html  

Page 15: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Selecting  Vocabularies  (4)  

EUCLID  -­‐  Providing  Linked  Data   15  

Other  lists  of  well-­‐known  vocabularies  are  maintained  by:  

•  W3C  SWEO  Linking  Open  Data  community  project  hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies  

•  Library  Linked  Data  Incubator  Group:  Vocabularies  in  the  library  domain  hjp://www.w3.org/2005/Incubator/lld/XGR-­‐lld-­‐vocabdataset-­‐20111025  

Page 16: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

INTERLINKING  LINKED  DATA  

EUCLID  -­‐  Providing  Linked  Data   16  

Page 17: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Interlinking  Data  Sets  •  It’s  one  of  the  Linked  Data  principles!  

•  Involves  the  crea3on  of  RDF  links  between  two  different  RDF  data  sets:  –  Links  at  instance  level  (rdfs:seeAlso,  owl:sameAs)  –  Links  at  schema  level  (RDFS  subclass/subproperty,  OWL  equivalent  class/property,  SKOS  mapping  proper9es)  

•  Appropriate  links  are  detected  via  link  discovery  

EUCLID  -­‐  Providing  Linked  Data   17  

4.  Include  links  to  other  URIs,  so  that  users  can  discover  more  things.  

Page 18: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Interlinking  Data  Sets  (2)  

Challenges  for  link  discovery  •  Linked  Data  sets  are  heterogeneous  in  terms  of  vocabularies,  formats  and  data  representa3on  

•  Large  range  of  knowledge  domains    

•  Scalability:  LD  is  composed  of  a  large  number  of  data  sets  and  RDF  triples,  hence  it  is  not  possible  to  compare  every  possible  en3ty  pair  

EUCLID  -­‐  Providing  Linked  Data   18  

Source:  Robert  Isele.  “LOD2  Webinar  Series:Silk”    

Page 19: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Interlinking  Data  Sets  (3)  

Challenges  for  link  discovery  •  It  corresponds  to  the  en9ty  resolu9on  problem:  

deciding  whether  two  en..es  correspond  to  same  object  in  the  real  world  

•  Name  ambigui9es:  typos,  misspellings,  different  languages,  homonyms    

•  Structural  ambigui9es:  same  concepts/en33es  with  different  structures.  Requires  the  applica3on  of  ontology  and  schema  matching  techniques  

EUCLID  -­‐  Providing  Linked  Data   19  

Page 20: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Interlinking  Data  Sets  (4)  

EUCLID  -­‐  Providing  Linked  Data   20  

RDF  data  sets    can  be  interlinked:  

Manually  •  Involves  the  manual  explora3on  of  

LD  data  sets  and  their  RDF  resources  to  iden3fy  linking  targets  

•  May  not  be  feasible  when  the  number  of  en33es  within  the  data  set  is  very  large  

 Automatically  •  Using  tools  that  perform  link  

discovery  based  on  linkage  rules,  for  example:  Silk,  Limes  and  xCurator  

Page 21: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

owl:sameAs  &  rdfs:seeAlso  •  owl:sameAs  

•  Creates  links  between  individuals    •  States  that  two  URIs  refer  to  the  same  individuals    

•  rdfs:seeAlso  •  States  that  a  resource  may  provide  addi3onal  informa3on  about  the  subject  resource  

•  Links  in  MusicBrainz:  –  owl:seeAlso  is  used  for  music  ar3sts  –  rdfs:seeAlso  is  used  for  albums  

EUCLID  -­‐  Providing  Linked  Data   21  

Page 22: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

SKOS  •  Simple  Knowledge  Organiza3on  System  

–  hjp://www.w3.org/TR/skos-­‐reference/    

•  Data  model  for  knowledge  organiza3on  systems  (thesauri,  classifica3on  scheme,  taxonomies)    

•  SKOS  data  is  expressed  as  RDF  triples  

•  Allows  the  crea3on  of  RDF  links  between  different  data  sets  with  the  usage  of  mapping  proper9es  

EUCLID  -­‐  Providing  Linked  Data   22  

Page 23: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

SKOS:  Mapping  Properties  These  proper3es  are  used  to  link  SKOS  concepts  (par3cularly  instances)  in  different  schemes:  

•  skos:closeMatch:  links  two  concepts  that  are  sufficiently  similar  (some3mes  can  be  used  interchangeably)  

•  skos:exactMatch:  indicates  that  the  two  concepts  can  be  used  interchangeably.    •  Axiom:  It  is  a  transi9ve  property  

•  skos:relatedMatch:  states  an  associa3ve  mapping  link  between  two  concepts  

EUCLID  -­‐  Providing  Linked  Data   23  

Page 24: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Example  of  SKOS  exact  match  

 

 

SKOS:  Mapping  Properties  (2)  

EUCLID  -­‐  Providing  Linked  Data   24  

mo:MusicArtist  skos:exactMatch  dbpedia-­‐ont:MusicalArtist.  

@prefix  skos:  <http://www.w3.org/2004/02/skos/core#>  @prefix  mo:  <http://purl.org/ontology/mo/>  @prefix  dbpedia-­‐ont:  <http://dbpedia.org/ontology/>  @prefix  schema:  <http://schema.org/>        

mo:MusicGroup  skos:exactMatch  schema:MusicGroup.  

mo:MusicGroup  skos:exactMatch  dbpedia-­‐ont:Band.  

Page 25: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Example  of  SKOS  close  match  

 

 

SKOS:  Mapping  Properties  (3)  

EUCLID  -­‐  Providing  Linked  Data   25  

mo:SignalGroup  skos:closeMatch  schema:MusicAlbum.  

@prefix  skos:  <http://www.w3.org/2004/02/skos/core#>  @prefix  mo:  <http://purl.org/ontology/mo/>  @prefix  dbpedia-­‐ont:  <http://dbpedia.org/ontology/>  @prefix  schema:  <http://schema.org/>      

mo:SignalGroup  skos:closeMatch  dbpedia-­‐ont:Album.  

Page 26: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Integrity  conditions  •  Guarantee  consistency  and  avoid  contradic3ons  in  the  rela3onships  between  SKOS  concepts  

SKOS:  Mapping  Properties  (4)  

EUCLID  -­‐  Providing  Linked  Data   26  

skos:Mapping  Relation  

skos:close  Match  

skos:exact  Match  

skos:related  Match  

Symmetric  &  Transi9ve  

Disjoint  with  

Par3al  Mapping  Rela3on  diagram  with  integrity  condi3ons  

Symmetric  

Page 27: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

PUBLISHING  LINKED  DATA  

EUCLID  -­‐  Providing  Linked  Data   27  

Page 28: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Publishing  Linked  Data  Once  the  RDF  data  set  has  been  created  and  interlinked,  the  publishing  process  involves  the  following  tasks:  

1.   Metadata  crea3on  for  describing  the  data  set    

2.  Making  the  data  set  accessible  

3.  Exposing  the  data  set  in  Linked  Data  repositories  

4.   Valida9ng  the  data  set  

EUCLID  -­‐  Providing  Linked  Data   28  

Page 29: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

•  Consists  of  providing  (machine-­‐readable)  metadata  of  RDF  data  sets  which  can  be  processed  by  engines  

•  This  informa3on  allows  for:  

–  Efficient  and  effec3ve  search  of  data  sets  

–  Selec3on  of  appropriate  data  sets  (for  consump3on  or  interlinking)  

–  Get  general  sta3s3cs  of  the  data  sets    

EUCLID  -­‐  Providing  Linked  Data   29  

Describing  RDF  Data  Sets  

Page 30: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Describing  RDF  Data  Sets  (2)  •  The  common  language  for  describing  RDF  data  sets  is  VoID  (Vocabulary  of  Interlinked  Data  sets)    

•  Defines  an  RDF  data  set  with  the  predicate  void:Dataset  

 

•  Covers  4  types  of  metadata:  

EUCLID  -­‐  Providing  Linked  Data   30  

•  General  metadata  

•  Structural  metadata  

•  Descrip3ons  of  linksets  •  Access  metadata  

Page 31: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  General  Metadata  •  General  metadata  is  used  by  users  to  iden3fy  appropriate  data  sets.  

•  Specifies  informa3on  about  descrip3on  of  the  data  set,  contact  person/organiza3on,  the  license  of  the  data  set,  data  subject  and  some  technical  features.  

•  VoID  (re)uses  predicates  from  the  Dublin  Core  Metadata1  and  FOAF2  vocabularies.  

EUCLID  -­‐  Providing  Linked  Data   31  

1  hjp://dublincore.org/documents/2010/10/11/dcmi-­‐terms/  2  hjp://xmlns.com/foaf/spec/  

Page 32: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  General  Metadata  (2)  

Predicate   Range   Descrip9on  dcterms:title   Literal   Name  of  the  data  set.  

dcterms:description   Literal   Descrip3on  of  the  data  set.  

dcterms:source   RDF  resource   Source  from  which  the  data  set  was  derived.  

dcterms:creator   RDF  resource   Primarily  responsible  of  crea3ng  the  data  set.  

dcterms:date   xsd:date   Time  associated  with  an  event  in  the  life-­‐cycle  of  the  resource.  

dcterms:created   xsd:date   Date  of  crea3on  of  the  data  set.  

dcterms:issued   xsd:date   Date  of  publica3on  of  the  data  set.  

dcterms:modified   xsd:date   Date  on  which  the  data  set  was  changed.  

foaf:homepage   Literal   Name  of  the  data  set.  

dcterms:publisher   RDF  resource   En3ty  responsible  for  making  the  data  set  available.  

dcterms:contributor   RDF  resource   En3ty  responsible  for  making  contribu3ons  to  the  data  set.  

EUCLID  -­‐  Providing  Linked  Data   32  

Source:    hjp://www.w3.org/TR/void/#metadata    General  Information  Contains  informa3on  about  the  crea3on  of  the  data  set    

Page 33: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  General  Metadata  (3)  

Other  Information  •  License  of  the  data  set:  specifies  the  usage  condi3ons  of  

the  data.  The  license  can  be  pointed  with  the  property  dcterms:license  

•  Category  of  the  data  set:  to  specify  the  topics  or  domains  covered  by  the  data  set,  the  property  dcterms:subject  can  be  used  

•  Technical  features:  the  property  void:feature  can  be  used  to  express  technical  proper3es  of  the  data  (e.g.  RDF  serializa3on  formats)    

EUCLID  -­‐  Providing  Linked  Data   33  

Page 34: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Structural  Metadata    

EUCLID  -­‐  Providing  Linked  Data   34  

•  Provides  high-­‐level  informa3on  about  the  internal  structure  of  the  data  set  

•  This  metadata  is  useful  when  exploring  or  querying  the  data  set  

•  Includes  informa3on  about  resources,  vocabularies  used  in  the  data  set,  sta3s3cs  and  examples  of  resources  in  the  data  set  

Page 35: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Structural  Metadata  (2)    

EUCLID  -­‐  Providing  Linked  Data   35  

Information  about  resources  •  Example  resources:  allow  users  to  get  an  impression  of  the  

kind  of  resources  included  in  the  data  set.  Examples  can  be  shown  with  the  property  void:exampleResource  

•  Pajern  for  resource  URIs:  the  void:uriSpace  property  can  be  used  to  state  that  all  the  en3ty  URIs  in  a  data  set  start  with  a  given  string    

:MusicBrainz  a  void:Dataset;        void:exampleResource            <http://musicbrainz.org/artist/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d>  .  

:MusicBrainz  a  void:Dataset;        void:uriSpace  "http://musicbrainz.org/"  .  

Page 36: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Structural  Metadata  (3)    

EUCLID  -­‐  Providing  Linked  Data   36  

Vocabularies  used  in  the  data  set  •  The  void:vocabulary  property  iden3fies  the  vocabulary  or  

ontology  that  is  used  in  a  data  set  

•  Typically,  only  the  most  relevant  vocabularies  are  listed  

•  This  property  can  only  be  used  for  en3re  vocabularies.  It  cannot  be  used  to  express  that  a  subset  of  the  vocabulary  occurs  in  the  data  set.    

:MusicBrainz  a  void:Dataset;        void:vocabulary  <http://purl.org/ontology/mo/>  .  

Page 37: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Structural  Metadata  (4)  

EUCLID  -­‐  Providing  Linked  Data   37  

Source:    hjp://www.w3.org/TR/void/#metadata    

Statistics  about  a  data  set  Express  numeric  sta3s3cs  about  a  data  set:        

Predicate   Range   Descrip9on  

void:triples   Number   Total  number  of  triples  contained  in  the  data  set.    

void:entities   Number   Total  number  of  en33es  that  are  described  in  the  data  set.  An  en3ty  must  have  a  URI,  and  match  the  void:uriRegexPajern    

void:classes   Number   Total  number  of  dis3nct  classes  in  the  data  set.  

void:properties   Number   Total  number  of  dis3nct  proper3es  in  the  data  set.  

void:distinctSubjects   Number   Total  number  of  dis3nct  subjects  in  the  data  set.  

void:distinctObjects   Number   Total  number  of  dis3nct  objects  in  the  data  set.  

void:documents   Number   Total  number  of  documents,  in  case  that  the  data  set  is  published  as  a  set  of  individual  documents.  

Page 38: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Structural  Metadata  (5)    

EUCLID  -­‐  Providing  Linked  Data   38  

Partitioned  data  sets  •  The  void:subset  property  provides  descrip3on  of  parts  of  a  

data  set  

 •  Data  sets  can  be  par33oned  based  on  classes  or  proper9es:  

•  void:classPartition  contains  only  instances  of  a  par3cular  class  •  void:propertyPartition  contains  only  triples  with  a  par3cular  predicate  

:MusicBrainz  a  void:Dataset;      void:subset  :MusicBrainzArtists  .  

:MusicBrainz  a  void:Dataset;      void:classPartition  [  void:class  mo:Release  .]  ;      void:propertyParition  [  void:property  mo:member  .]  .  

Page 39: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Describing  Linksets      

EUCLID  -­‐  Providing  Linked  Data   39  

•  Linkset:  collec3on  of  RDF  links  between  two  RDF  data  sets  

:DS1   :DS2  

:LS1   :LS2  

Image  based  on  hjp://seman3cweb.org/wiki/File:Void-­‐linkset-­‐conceptual.png  

owl:sameAs  

                       

@PREFIX  void:<http://rdfs.org/ns/void#>    @PREFIX  owl:<http://www.w3.org/2002/07/owl#>    :DS1  a  void:Dataset  .  :DS2  a  void:Dataset  .  :DS1  void:subset  :LS1  .  :LS1  a  void:Linkset;            void:linkPredicate                  owl:sameAs;              void:target  :DS1,  :DS2  .  

Page 40: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

VoID:  Describing  Linksets  (2)      

EUCLID  -­‐  Providing  Linked  Data   40  

Example  

                       

@PREFIX  void:<http://rdfs.org/ns/void#>    @PREFIX  skos:<http://www.w3.org/2002/07/owl#>    :MusicBrainz  a  void:Dataset  .  :DBpedia  a  void:Dataset  .    :MusicBrainz  void:classPartition  :MBArtists  .  :MBArtists  void:class  mo:MusicArtist  .    :MBArtists  a  void:Linkset;            void:linkPredicate                  skos:exactMatch;              void:target  :MusicBrainz,  :DBpedia  .  

Page 41: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

The  access  metadata  describes  the  methods  of  accessing  the  actual  RDF  data  set                  

 *  This  assumes  that  the  default  graph  of  the  SPARQL  endpoint  contains  the  data  set.  VoID  cannot  express  that  a  data  set  is  contained  a  specific  named  graph.  This  can  be  specified  with  SPARQL  1.1.  Service  Descrip3on      

VoID:  Access  Metadata    

EUCLID  -­‐  Providing  Linked  Data   41  

Method   Predicate    

Descrip9on  

URI  look  up  endpoint   void:uriLookupEndpoint   Specifies  the  URI  of  a  service  for  accessing  the  data  set  (different  from  the  SPARQL  protocol)  

Root  resource   void:rootResource   URI  of  the  top  concepts  (only  for  data  sets  structured  as  trees)  

SPARQL  endpoint   void:sparqlEndpoint   Provides  access  to  the  data  set  via  the  SPARQL  protocol.*    

RDF  data  dumps   void:dataDump   Specifies  the  loca3on  of  the  dump  file.  If  the  data  set  is  split  into  mul3ple  files,  then  several  values  of  this  property  are  provided.    

CH  5  

Page 42: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Providing  Access  to  the        Data  Set    The  data  set  can  be  accessed  via  different  mechanisms:        

EUCLID  -­‐  Providing  Linked  Data   42  

RDFa   RDF  dump  

SPARQL  endpoint  

Dereferencing  HTTP  URIs  

Page 43: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Dereferencing  HTTP  URIs  •  Allows  for  easily  exploring  certain  resources  contained  in  the  data  set  

•   What  to  return  for  a  URI?  •  Immediate  descrip9on:  triples  where  the  URI  is  the  subject.  

•  Backlinks:  triples  where  the  URI  is  the  object.  •  Related  descrip9ons:  informa3on  of  interest  in  typical  usage  scenarios.  

•  Metadata:  informa3on  as  author  and  licensing  informa3on.  

•  Syntax:  RDF  descrip3ons  as  RDF/XML  and  human-­‐readable  formats.  

•  Applica3ons  (e.g.  LD  browsers)  render  the  retrieved  informa3on  so  it  can  be  perceived  by  a  user.  

EUCLID  -­‐  Providing  Linked  Data   43  

Source:    How  to  Publish  Linked  Data  on  The  Web  -­‐  Chris  Bizer,  Richard  Cyganiak,  Tom  Heath.    

CH  1  

Page 44: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Dereferencing  HTTP  URIs  (2)  Example:  Dereferencing  

EUCLID  -­‐  Providing  Linked  Data   44  

Page 45: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa  •  RDFa  =  “RDF  in  ajributes”  

•  Extension  to  HTML5  for  embedding  RDF  within  HTML  pages:  –  The  HTML  is  processed  by  the  browser,  the  (human)  consumer  don’t  see  the  RDF  data    

–  The  RDF  triples  within  the  page  are  consumed  by  APIs  to  extract  the  (semi-­‐)structured  data  

 

•  It  is  considered  as  the  bridge  between  the  Web  of  Data  and  the  Web  of  Documents  

•  It  is  a  complete  serializa9on  of  RDF  

EUCLID  -­‐  Providing  Linked  Data     45  

Page 46: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Attributes  A]ribute  role   A]ribute   Descrip9on  

Syntax  prefix   List  of  prefix-­‐name  IRIs  pairs  

vocab   IRI  that  specifies  the  vocabulary  where  the  concept  is  defined  

Subject   about   Specifies  the  subject  of  the  rela3onship  

Predicate  

property   Express  the  rela3onship  between  the  subject  and  the  value  

rel   Defines  a  rela3on  between  the  subject  and  a  URL    

rev   Express  reverse  rela3onships  between  two  resources  

Resource  

href   Specifies  an  object  URI  for  the  rel  and  rev  ajributes  

resource   Same  as  href  (used  when  href  is  not  present)  

src   Specifies  the  subject  of  a  rela3onship  

Literal  

datatype   Express  the  datatype  of  the  object  of  the  property  ajribute  

content   Supply  machine-­‐readable  content  for  a  literal  

xml:lang,  lang   Specifies  the  language  of  the  literal  

Macro   typeof   Indicate  the  RDF  type(s)  to  associate  with  a  subject  

inlist   An  object  is  added  to  the  list  of  a  predicate.    EUCLID  -­‐  Providing  Linked  Data  

  46  

Page 47: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Example    Extracting  RDF  from  HTML  

EUCLID  -­‐  Providing  Linked  Data     47  

<div  class="ar3stheader"        about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"          typeof="hjp://purl.org./ontology/mo/MusicGroup">        …  </div>  

<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>          

HTML  (+RDFa):  

RDF:  

Page 48: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Example    Extracting  RDF  from  HTML  

EUCLID  -­‐  Providing  Linked  Data     48  

<div  class="ar3stheader"        about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"          typeof="hjp://purl.org./ontology/mo/MusicGroup">        …  </div>  

<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>      <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>        

HTML  (+RDFa):  

RDF:  

Page 49: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Example    Extracting  RDF  from  HTML  

EUCLID  -­‐  Providing  Linked  Data     49  

<div  class="ar3stheader"        about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"          typeof="hjp://purl.org./ontology/mo/MusicGroup">        …  </div>  

<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>      <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>          <hjp://purl.org./ontology/mo/MusicGroup>.      

HTML  (+RDFa):  

RDF:  

Page 50: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Example  (2)  

Extracting  RDF  from  MusicBrainz.org  

EUCLID  -­‐  Providing  Linked  Data     50  

hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d  

Page 51: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Example  (2)  

Extracting  RDF  from  MusicBrainz.org  

EUCLID  -­‐  Providing  Linked  Data     51  

Source:  hjp://www.w3.org/2007/08/pyRdfa/  

Page 52: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa:  Example  (2)  

Extracting  RDF  from  MusicBrainz.org  

EUCLID  -­‐  Providing  Linked  Data     52  

hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org%2Far3st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d&format=nt  

Watch  the  EUCLID  screencast:  http://vimeo.com/euclidproject  

Page 53: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDF  Dump  •  An  RDF  dump  refers  to  a  file  which  contains  (part  of)  a  data  set  specified  in  an  RDF  format  (RDF/XML,  N-­‐Triples,  N-­‐Quads)  

•  The  data  set  can  be  split  into  several  RDF  dumps    

•  A  list  of  available  data  sets  available  as  RDF  dumps  can  be  found  at:  –  hjp://www.w3.org/wiki/DataSetRDFDumps    

EUCLID  -­‐  Providing  Linked  Data   53  

Page 54: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

SPARQL  Endpoint  •  The  SPARQL  endpoint  refers  to  the  URI  of  the  listener  of  the  SPARQL  protocol  service,  which  handles  requests  for  SPARQL  protocol  opera3ons  

 

•  The  user  submits  SPARQL  queries  to  the  SPARQL  endpoint  in  order  to  retrieve  only  a  desired  subset  of  the  RDF  data  set  

 

•  List  of  available  SPARQL  endpoints:  •  hjp://www.w3.org/wiki/SparqlEndpoints  •  hjp://labs.mondeca.com/sparqlEndpointsStatus/    

EUCLID  -­‐  Providing  Linked  Data   54  

CH  2  

Page 55: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Using  Linked  Data  Catalogs  •  Data  catalogs,  markets  or  repositories  are  pla{orms  dedicated  to  provide  access  to  a  wide  range  of  data  sets  from  different  domains    

•  Allow  data  consumers  to  easily  find  and  use  the  data  

•  Usually  the  catalogs  offer  relevant  metadata  about  the  crea3on  of  the  data  set  

EUCLID  -­‐  Providing  Linked  Data   55  

Page 56: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Using  Linked  Data  Catalogs  (2)  How  to  publish  an  RDF  data  set  into  a  catalog?  

EUCLID  -­‐  Providing  Linked  Data   56  

Create  your  own  data  catalog  

Recommended  for  big  organiza3ons/ins3tu3ons  aiming  at  providing  a  large  number  of  data  sets  

Use  a  data  management  system,  for  example:  

Upload  your  data  set  into  an  exis3ng  catalog  

Allows  data  consumers  to  easily  find  new  data  sets  

Common  LD  catalogs  are:    -­‐    -­‐  The  Linking  Open  Data  Cloud  

Page 57: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Validating  Data  Sets  There  are  different  ways  to  validate  the  published  RDF  data  set:  

EUCLID  -­‐  Providing  Linked  Data   57  

General  validators  

Parsing  &  Syntax  

•  Vapour  -­‐  Performs  two  types  of  tests:  without  content  nego3a3on  and  reques3ng  RDF/XML  content  

               hjp://validator.linkeddata.org/vapour  

•  URI  Debugger  -­‐  Retreieves  the  HTTP  responses  of  accessing  a  URI                                                                                                                                                                          hjp://linkeddata.informa3k.hu-­‐berlin.de/uridbg/    

•  RDF  Triple-­‐Checker  –  Dereferences  namespaces  associated  with  the  resources  used  in  the  document                                                                                                                                                  

               hjp://graphite.ecs.soton.ac.uk/checker/  

•  W3C  RDF/XML  Valida9on  Service  –  Evaluates  the  syntax  of  RDF/XML  documents  and  displays  the  RDF  triples  in  it                                                                      hjp://validator.linkeddata.org/vapour      

•  W3C  Markup  Valida9on  Service  –  Checks  syntac3c  correctness  for  web  documents  with  RDFa  markup                                                                                                hjp://validator.w3.org/  

•  RDF:ALERTS  –  Validates  syntax,  undefined  resources,  datatype  and  other  types  of  errors                                                                                                                                                                                  hjp://swse.deri.org/RDFAlerts/  

Accessibility  

Page 58: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Validating  Data  Sets  (2)  Example:  Validating  URIs  with  Vapour  

EUCLID  -­‐  Providing  Linked  Data   58  

Source:  hjp://idi.fundacionc3c.org/vapour    

Page 59: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Validating  Data  Sets  (3)  Example:  Validating  URIs  with  Vapour  

EUCLID  -­‐  Providing  Linked  Data   59  

Source:  hjp://idi.fundacionc3c.org/vapour    

Page 60: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Validating  Data  Sets  (4)  

Example:  Validating  URIs  with  Vapour  

EUCLID  -­‐  Providing  Linked  Data   60  

Source:  hjp://idi.fundacionc3c.org/vapour    

Example:  Validating  URIs  with  Vapour  hjp://dbpedia.org/page/The_Beatles  

hjp://dbpedia.org/data/The_Beatles.xml  

HTML  conten

t  

RDF  do

cumen

t  

Page 61: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

PROVIDING  LINKED  DATA:  CHECKLIST  

EUCLID  -­‐  Providing  Linked  Data   61  

Page 62: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Providing  Linked  Data:  Checklist  (1)  Creating  Linked  Data  o All  the  relevant  en33es/concepts  were  effec3vely  extracted  from  the  raw  data  ?  

o Are  all  the  created  URIs  dereferenceable?  o Are  you  reusing  terms  from  widely  accepted    vocabularies?  

EUCLID  -­‐  Providing  Linked  Data   62  

Page 63: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Providing  Linked  Data:  Checklist  (2)  Interlinking  Linked  Data  o  Is  the  data  set  linked  to  other  RDF  data  sets?  o Are  the  created  vocabulary  terms  linked  to  other  vocabularies?  

EUCLID  -­‐  Providing  Linked  Data   63  

Page 64: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Providing  Linked  Data:  Checklist  (3)  Publishing  Linked  Data  o Do  you  provide  data  set  metadata?  o Do  you  provide  informa3on  about  licensing?  o Do  you  provide  addi3onal  access  methods?  o  Is  the  data  set  available  in  LD  catalogs?  o Did  the  data  set  pass  the  valida3on  tests?  

EUCLID  -­‐  Providing  Linked  Data   64  

Page 65: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Summary  

EUCLID  -­‐  Providing  Linked  Data   65  

•  The  Linked  Data  lifecycle:  •  3  core  tasks:  crea3ng,  interlinking  and  publishing  

•  Crea3on  of  Linked  Data:  •  Extrac3ng  relevant  data,  using  URIs  to  name  en33es  and  selec3ng  vocabularies  and  expressing  the  data  using  the  RDF  data  model  

•  Interlinking  Linked  Data:  •  Challenges  of  link  discovery,  using  Silk  to  create  links  between  two  data  sets  and  using  SKOS  links    

•  Publishing  Linked  Data:  •  Crea3on  of  data  set  metadata;  publishing  the  data  set  via  RDF  dumps,  SPARQL  endpoints  or  RDFa;  using  RDFa  and  schema.org  to  enrich  search  results,  and  uploading  the  data  set  to  a  LD  catalog  

In  this  chapter  we  studied:  

Page 66: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

The  Web  &  Linked  Data  

•  Linked  Data  catalogs  •  Applica9ons  

Page 67: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

CKAN  •  CKAN  is  an  open  source  pla{orm  for  developing  data  set  catalogs  

•  Implement  useful  tools  for  data  publishers  to  support:  •  Data  harves3ng  •  Crea3on  of  metadata  •  Access  mechanisms  to  the  data  set  •  Upda3ng  the  data  set  •  Monitoring  the  access  to  the  data  set  

EUCLID  -­‐  Providing  Linked  Data   67  

Page 68: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

CKAN  (2)  

EUCLID  -­‐  Providing  Linked  Data   68  Source:  hjp://ckan.org  

Page 69: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

CKAN  (3)  

EUCLID  -­‐  Providing  Linked  Data   69  Source:  hjp://ckan.org  

Page 70: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

•  The  Data  Hub  is  a  community-­‐run  data  catalog  which  contains  more  than  5,000  data  sets1  

•  “(…)  is  an  openly  editable  open  data  catalogue,  in  the  style  of  Wikipedia”.2  

 

•  It  is  implemented  on  top  of  the  CKAN  pla{orm    

•  Allows  the  crea3on  of  groups:  –  The  Linking  Open  Data  Cloud  group  exclusively  contains  Linked  Data  sets    

EUCLID  -­‐  Providing  Linked  Data   70  

1  According  to  the  informa3on  presented  in  the  portal  on  March  2013  2  Source:  hjp://datahub.io/about    

The  Data  Hub    

Page 71: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

EUCLID  -­‐  Providing  Linked  Data   71  

Source:  hjp://datahub.io/  

The  Data  Hub  (2)  

Page 72: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

The  Data  Hub  (3)  

EUCLID  -­‐  Providing  Linked  Data   72  

Source:  hjp://datahub.io/  

Page 73: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

The  Linking  Open  Data  Cloud  

EUCLID  -­‐  Providing  Linked  Data   73  

September  2011  

Source:  Linking  Open  Data  cloud  diagram,  by  Richard  Cyganiak  and  Anja  Jentzsch  

Page 74: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

The  Linking  Open  Data  Cloud  

How  to  publish  an  RDF  data  set  in  this  cloud?  1.  The  data  set  must  follow  the  Linked  Data  principles  2.  The  data  set  must  contain  at  least  1,000  RDF  triples  3.  The  data  set  must  contain  at  least  50  RDF  links  to  a  

data  set  that  is  already  in  the  diagram  4.  Access  to  the  data  set  must  be  provided  

Once  these  criteria  are  met,  the  data  publisher  must  add  the  data  set  to  the  Data  Hub  catalog,  and  contact  the  administrators  of  the  Linking  Open  Data  Cloud  group  

 

EUCLID  -­‐  Providing  Linked  Data   74  

Source:  hjp://lod-­‐cloud.net/  

Page 75: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Linked  Data  &  Search  Engines  

EUCLID  -­‐  Providing  Linked  Data   75  

•  Search  engines  collect  informa3on  about  web  resources  in  order  to  produce  richer  search  results  by  improving  the  display  of  the  results  

•  This  is  only  possible  if  the  search  engines  are  able  to  understand  the  content  within  the  web  pages    

•  The  HTML  pages  must  be  annotated  with  machine-­‐readable  content  to  describe  their  content:  

Mark  up  format   Vocabulary  

Page 76: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDFa  for  marking  up  data  

EUCLID  -­‐  Providing  Linked  Data   76  

•  RDFa  is  used  to  provide  (semi-­‐)structured  Linked  Data  embedded  in  web  content  

•  Examples:  – Some  search  engines  use  RDFa,  e.g.,  Google,  Yahoo!  and  Bing  

– Facebook’s  Open  Graph  is  based  on  RDFa  

Page 77: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Google  Rich  Snippets  

EUCLID  -­‐  Providing  Linked  Data   77  

•  Embedding  seman3cs  via  RDFa  (or  microformats/microdata)  enhances  search  results:  

Page 78: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Google  Rich  Snippets  (2)  

EUCLID  -­‐  Providing  Linked  Data   78  

Page 79: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Schema.org  

EUCLID  -­‐  Providing  Linked  Data   79  

•  Collec3on  of  schemas/vocabularies  to  markup  the  HTML  pages  

•  It  is  recognized  by  Bing,  Google,  Yahoo!  and  Yandex  

•  Covers  a  wide  range  of  knowledge  domains    

•  It  also  offers  an  extension  mechanism  in  case  the  publisher  is  interested  in  adding  new  concepts  to  the  vocabularies  

Page 80: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Schema.org  (2)  

EUCLID  -­‐  Providing  Linked  Data   80  

The  vocabularies  cover  the  following  topics:  

Source:  hjp://schema.org/docs/schemas.html  

“The  world  is  too  rich,  complex  and  interes.ng  for  a  single  schema  to  describe  fully  on  its  own.  With  schema.org  we  aim  to  find  a  balance,  by  providing  a  core  schema  that  covers  lots  of  situa.ons,  alongside  extension  mechanisms  for  extra  detail.”  (Dan  Brickley,  schema.org)  

Page 81: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

EUCLID  -­‐  Providing  Linked  Data   81  

Integrates(/aligns)  exis3ng  vocabularies  where  appropriate,  e.g.  rNews  

Source:  hjp://schema.org/Ar3cle  

Schema.org  (3)  

Page 82: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Google  Knowledge  Graph  

EUCLID  -­‐  Providing  Linked  Data   82  

•  The  user  is  able  to  find  answer  to  their  queries  without  browsing  pages  

•  Provides  detailed  informa3on  

 

Page 83: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Google  Knowledge  Graph  (2)  

EUCLID  -­‐  Providing  Linked  Data   83  

•  Google  Search  results    include  structured  data  from  Freebase  

 •  Might  disambiguate  

search  terms  

Page 84: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

             Freebase  

EUCLID  -­‐  Providing  Linked  Data   84  

                 •  Knowledge  base  of  

structured  data  

•  Data  is  stored  as  a  graph  

 •  Describes  data  from  

different  domains  

Page 85: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Bing  Snapshot  

EUCLID  -­‐  Providing  Linked  Data   85  

•  Provides  structured  data  related  to  the  search  term  

•  Includes  a  significant  number  of  en33es  from  more  domains  

•  Connects  data  from  LinkedIn  

•  Is  is  powered  by  the  graph  engine  Trinity.RDF  

Page 86: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Bing  Snapshot  (2)  

EUCLID  -­‐  Providing  Linked  Data   86  

Page 87: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

             Open  Graph  Protocol  

EUCLID  -­‐  Providing  Linked  Data   87  

•  It  was  originally  created  by  Facebook  

•  Allows  describing  web  content  as  graph  objects,  establishing  connec3ons  between  people  and  objects  

•  The  descrip3ons  are  embedded  in  the  web  page  as  RDFa  data  

•  Supports  descrip3on  of  several  domains:  basic  metadata,  music,  video,  ar3cles,  books,  websites  and  user  profiles  

Source:  hjp://ogp.me/  

Page 88: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

             Open  Graph  Protocol  (2)  

EUCLID  -­‐  Providing  Linked  Data   88  

Source:  hjp://ogp.me/  

Who  is  using  Open  Graph  protocol?  

Source:  hjp://ogp.me/  

Facebook  

Google  Mixi  

Consumers   Publishers  

IMDb  

Microso�  NHL  

Posterous  

Rojen  Tomatoes  TIME  

Page 89: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

             Open  Graph  Protocol  (3)  

EUCLID  -­‐  Providing  Linked  Data   89  

•  Facebook  expands  vocabulary  of  rela3onships  beyond  “friendship”  and  “like”                    more  ac9ons!    

Source:  hjps://developers.facebook.com/docs/opengraph/  

Page 90: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

             Open  Graph  Protocol  &                                                                                              Facebook  

EUCLID  -­‐  Providing  Linked  Data   90  

List  of  domains  and  ac9ons  

Source:  hjps://developers.facebook.com/docs/opengraph/  

•  Listen  •  Create  a  playlist  

•  Watch  •  Rate  •  Wants  to  watch  

•  Rate  •  Read  •  Quote  •  Wants  to  read  

•  Achieve  •  High  score  

•  Bike  •  Run  •  Walk  

•  Like  •  Recommend  •  Follow  

General  

Music  

Movies    &  TV  

Games  

Fitness  

Book  

How  can  we  exploit  these  links  and  rela3onships?  

Page 91: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

           Facebook  Graph  Search  

EUCLID  -­‐  Providing  Linked  Data   91  

•  focuse  on  people  and  their  interests,  exploi3ng  how  everything  is  related  to  each  other  

•  Queries  are  specified  using  natural  language  

•  Takes  advantage  of  context  and  suggest  possible  queries    

•  Allows  for  building  more  complex  (expressive)  queries  that  are  not  possible  with  normal  search:  –  For  example,  “music  liked  by  me  and  friends  who  live  in  my  city”  

Page 92: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

           Facebook  Graph  Search  (2)  

EUCLID  -­‐  Providing  Linked  Data   92  

 

 Context  (informa3on  from  profile):  

Graph  search  sugges9ons:  

Page 93: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

           Facebook  Graph  Search  (3)  

EUCLID  -­‐  Providing  Linked  Data   93  

Results  

Page 94: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

           Facebook  Graph  Search  (4)  

EUCLID  -­‐  Providing  Linked  Data   94  

Observations    •  Allows  for  conjunc3ve  queries  (applying  filter  over  intermediate  

results  =  “apply  operator”)  

•  Disjunc9ve  queries  are  not  supported:  –  For  example:  “My  friends  who  like  Seman3cWeb.com  OR  ReadWrite”  

 

•  Post  search  is  not  supported  –  It  is  not  possible  to  search  in  post  content  submijed  to  the  3meline  

•  User  privacy  segngs  affect  the  results  

Page 95: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Tools  for  providing  Linked  Data  

•  Extrac9ng  data  from  spreadsheets:  OpenRefine  •  Extrac9ng  data  from  RDBMS:  R2RML  •  Extrac9ng  data  from  text:  Zemanta,  OpenCalais,  GATE  •  Interlinking  data  sets:  Silk  

Page 96: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

EXTRACTING  DATA  FROM  SPREADSHEETS  WITH  OPENREFINE  

EUCLID  -­‐  Providing  Linked  Data   96  

Page 97: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Integrate  Chart  Data  •  Task:  Integrate  latest  chart  

informa3on  into  your  RDF  database.  

•  Data  may  be  available  in  non-­‐RDF  formats:  –  Plain  text  –  CSV,  TSV,  separator-­‐based  

files  –  HTML  tables  –  Spreadsheets  

(OpenDocument,  Excel,  …)  –  XML  –  JSON  –  …  

97  

LD  Data  set  

Access  

Integrated  Data  Set  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

CSV/  TSV   HTML   Spreadsheets   JSON  

Data  acquisi3

on  

EUCLID  -­‐  Providing  Linked  Data  

Page 98: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Example  Data  

The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million

Queen, 90.5 million    

98  

hjp://en.wikipedia.org/wiki/  List_of_best-­‐selling_music_ar3sts  

Ar3st   Country  of  origin  

Period  ac3ve  

Release-­‐year  of  first  charted  record  

Total  cer3fied  units  (from  available  markets)[Notes]  

The  Beatles   United  Kingdom  

1960–1970[4]   1962[4]   Total  available  cer9fied  units:    

250  million[show]  

Elvis  Presley   United  States  

1954–1977[28]   1954[28]   Total  available  cer9fied  units:  

203.3  million[show]  

Michael  Jackson[Note  2]  

United  States  

1964–2009[32]   1971[32]   Total  available  cer9fied  units:  

157.4  million[show]  

Madonna   United  States  

1979–present[44]   1982[44]   Total  available  cer9fied  units:  

160.1  million[show]  

Led  Zeppelin   United  Kingdom  

1968–1980[50]   1969[50]   Total  available  cer9fied  units:  

135.5  million[show]  

Queen   United  Kingdom  

1971–present[53]   1973[53]   Total  available  cer9fied  units:    

90.5  million[show]  

{

"artist": { "class": "artist", "name": "The Beatles"

}, "rank": 1,

"value": 250 million }, …

CSV  

JSON  

HTML  tables  

EUCLID  -­‐  Providing  Linked  Data  

Page 99: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

                                     OpenRefine    •  transforms  and  cleans  messy  

input  data  sets.    

•  is  an  open-­‐source  successor  of  Google  Refine.    

•  allows  for  en3ty  reconcilia3on  against  SPARQL  endpoints  or  RDF  data.    

•  is  extended  with  plugins  that  enhance  its  func3onality,  e.g.  for  RDF  support.  

99  EUCLID  -­‐  Providing  Linked  Data  

Quick  Facts  

Page 100: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Use  of  OpenRefine  

100  

1.  Messy  input  data  is  imported,    transformed  into  a  table  represen-­‐ta3on  and  cleaned.  

3.  Define  the  structure  of  the  RDF  output.    

4.  The  data  is  exported  into  some  RDF  syntax.  

2.  En3ty  reconcilia3on  is  applied  to  allow  for  interlinking  with  exis3ng  data  sets.  

The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million

Queen, 90.5 million    

CSV  

musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int . musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int . musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int . musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int . musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int . musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int .

RDF  

EUCLID  -­‐  Providing  Linked  Data  

Page 101: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Typical  steps:  •  Group  and  explore  data  

items  •  Dele3ng  columns  or  rows  

based  on  filter  condi3on  •  Split  columns  into  several  

columns  based  on  condi3on  

•  Modify  messy  data  items  with  GREL,  a  powerful  expression  language  

•  Replay  steps  from  a  previous  Refine  project  

101  EUCLID  -­‐  Providing  Linked  Data  

Data  Transformation  

Page 102: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

How  to  Generate  RDF?  

•  Addi3onal  problem:  data  needs  to  be  interlinked  with  exis3ng  MusicBrainz  data  

•  This  is  the  point  where  plugins  come  into  play:  –  RDF  Refine:  developed  by  DERI  – An  extension  of  OpenRefine  to  support  RDF  

102  

?  RDF  

EUCLID  -­‐  Providing  Linked  Data  

Page 103: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Core  Capabilities  

•  Interlinking  of  data  by  en3ty  reconcilia3on  – Against  SPARQL  endpoints,  RDF  dumps  – Discovery  of  relevant  RDF  data  sets    

•  RDF  export  with  the  help  of  RDF  skeletons  – Define  the  vocabulary  and  graph  structure  of  the  RDF  serializa3on  

–  In  Turtle,  RDF/XML  

103  EUCLID  -­‐  Providing  Linked  Data  

Page 104: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Typical  steps:  •  Define  a  reconcilia3on  service  •  Select  specific  types  to  reconcile  against  •  Start  reconciling  a  column  against  the  

service  

104  EUCLID  -­‐  Providing  Linked  Data  

Entity  Reconciliation  

Page 105: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Define  RDF  Skeletons  

•  An  RDF  skeleton  defines  the  structure  of  the  RDF  triples  that  are  exported  

EUCLID  -­‐  Providing  Linked  Data   105  

Page 106: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

RDF  Skeletons  

03.09.13   106  106  EUCLID  -­‐  Providing  Linked  Data  

Page 107: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

EXTRACTING  DATA  FROM  RDBMS  WITH  R2RML  

EUCLID  -­‐  Providing  Linked  Data   107  

Page 108: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

W3C  RDB2RDF  

•  Task:  Integrate  data  from  rela3onal  DBMS  with  Linked  Data  

•  Approach:  map  from  rela3onal  schema  to  seman3c  vocabulary  with  R2RML  

•  Publishing:  two  alterna3ves  –  –  Translate  SPARQL  into  

SQL  on  the  fly  –  Batch  transform  data  into  

RDF,  index  and  provide  SPARQL  access  in  a  triplestore  

108  

LD  Data  set  

Access  

Integrated  Data  in  

Triplestore  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

Data  acquisi3

on  

EUCLID  -­‐  Providing  Linked  Data  

R2RML  Engine  

Rela3onal  DBMS  

Page 109: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

W3C  RDB2RDF  •  The  W3C  made,  last  year,  two  recommenda3ons  for  mapping  between  rela3onal  databases  and  RDF:  –  Direct  mapping  directly  exposes  data  as  RDF  

•  Not  allowance  for  vocabulary    mapping  •  No  allowance  for  interlinking  (unless  URIs  used  in  rela3onal  data)  •  Not  appropriate  for  this  topic  

– R2RML,  the  RDB  to  RDF  mapping  language  •  Allows  vocabulary  mapping  (subject,  predicate  and  object  maps  with  class  op3ons)  

•  Allows  interlinking  –  URIs  can  be  constructed  •  Means  to  provide  MusicBrainz  RDF/SPARQL  itself  

EUCLID  -­‐  Providing  Linked  Data   109  

hjp://www.w3.org/2001/sw/rdb2rdf/  

Page 110: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

MusicBrainz  Next  Gen  Schema  

EUCLID  -­‐  Providing  Linked  Data   110  

•  Ar9st    As  pre-­‐NGS,  but      

       further  ajributes  

•  Ar9st  Credit    Allows  joint  credit  

•  Release  Group    Cf.  ‘album’    

       versus:  

•  Release  •  Medium          

•  Track  •  Track  List  

•  Work  •  Recording  

Source:  hjps://wiki.musicbrainz.org/Next_Genera3on_Schema  

Page 111: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Music  Ontology  

•  OWL  ontology  with  following  core  concepts  (classes)  and  rela3onships  (proper3es):  

EUCLID  -­‐  Providing  Linked  Data   111  

Source:  hjp://musicontology.com  

Page 112: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

R2RML  Class  Mapping  •  Mapping  tables  to  classes  is  ‘easy’:  

lb:Artist  a  rr:TriplesMap  ;      rr:logicalTable  [rr:tableName  "artist"]  ;      rr:subjectMap            [rr:class  mo:MusicArtist  ;            rr:template                        "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:musicbrainz_guid  ;            rr:objectMap  [rr:column  "gid"  ;                                          rr:datatype  xsd:string]]  .  

EUCLID  -­‐  Providing  Linked  Data   112  

Page 113: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

R2RML  Property  Mapping  •  Mapping  columns  to  proper3es  can  be  easy:  

lb:artist_name  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery            """SELECT  artist.gid,  artist_name.name                    FROM  artist                    INNER  JOIN  artist_name  ON  artist.name  =  

artist_name.id"""]  ;      rr:subjectMap  lb:sm_artist  ;      rr:predicateObjectMap            [rr:predicate  foaf:name  ;            rr:objectMap  [rr:column  "name"]]  .  

EUCLID  -­‐  Providing  Linked  Data   113  

Page 114: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

NGS  Advanced  Relations  

EUCLID  -­‐  Providing  Linked  Data   114  

•  Major  en33es  (Ar3st,  Release  Group,  Track,  etc.)  plus  URL  are  paired    (l_ar3st_ar3st)  

•  Each  pairing    of  instances    refers  to  a  Link  

•  Links  have  types      (cf.  RDF  proper3es)    and  ajributes  

     

Source:  hjp://wiki.musicbrainz.org/Advanced_Rela3onship  

Page 115: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

R2RML  Advanced  Mapping  •  Mapping  advanced  rela3onships  (SQL  joins):  lb:artist_member  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery          """SELECT  a1.gid,  a2.gid  AS  band                FROM  artist  a1                    INNER  JOIN  l_artist_artist  ON  a1.id  =  

l_artist_artist.entity0                      INNER  JOIN  link  ON  l_artist_artist.link  =  link.id                      INNER  JOIN  link_type  ON  link_type  =  link_type.id                      INNER  JOIN  artist  a2  on  l_artist_artist.entity1  =  a2.id                  WHERE  link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'                    AND  link.ended=FALSE"""]  ;      rr:subjectMap  lb:sm_artist  ;      rr:predicateObjectMap            [rr:predicate  mo:member_of  ;            rr:objectMap  [rr:template  "http://musicbrainz.org/artist/

{band}#_"  ;                                        rr:termType  rr:IRI]]  .  

EUCLID  -­‐  Providing  Linked  Data   115  

Page 116: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

EXTRACTING  DATA  FROM  TEXT  

EUCLID  -­‐  Providing  Linked  Data   116  

Page 117: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

OpenCalais  

   •  Not  easily  customised/extended  •  Domain-­‐specific  coverage  varies  

EUCLID  -­‐  Providing  Linked  Data   117  

Source:  hjp://viewer.opencalais.com/  

Page 118: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

DBpedia  Spotlight    

   •  Not  easily  customised/extended  •  Is  currently  only  available  for  English  

EUCLID  -­‐  Providing  Linked  Data   118  

Source:  hjp://dbpedia-­‐spotlight.github.com/demo/  

hjp://dbpedia.org/page/Slowcore  

hjp://dbpedia.org/page/Dorothy_Parker  

Page 119: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Zemanta  

EUCLID  -­‐  Providing  Linked  Data   119  

Source:  hjp://www.zemanta.com/demo/  

•  Common  problem  with  general  purpose,  open-­‐domain        seman3c      annota3on      tools  

•  Best  results      require      bespoke      customisa3on  

Page 120: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

•  General  Architecture  for  Text  Engineering  •  Free  open-­‐source  (LGPL)    framework  and  development  environment  

•  Started  1996,  large  developer  community  •  Used  worldwide  by  many  organisa3ons  to  build  bespoke  solu3ons;  e.g.  Press  Associa3on  and  the  Na3onal  Archive  

•  Informa3on  Extrac3on  in  many  languages  

GATE  

EUCLID  -­‐  Providing  Linked  Data   120  

hjp://www.gate.ac.uk/  

Page 121: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

•  Increases  recall  over  DBpedia  by  deriving  new  lexicalisa3ons  for  URIs  from  link  anchor  texts,  disambigua3on  pages,  and  redirect  pages  

GATE  Example  -­‐  LODIE  

EUCLID  -­‐  Providing  Linked  Data   121  

Page 122: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Precision  and  Recall  •  Generic  services  typically  very  low  recall  •  Combina3on  is  one  solu3on  

•  Other  solu3on  is  custom  extrac3on  

122  

PER LOC ORG TOTAL DB  Spotlight 0.97  /  0.40 0.82  /  0.46 0.86  /  0.31 0.85  /  0.39 Zemanta 0.96  /  0.84 0.89  /  0.62 0.82  /  0.57 0.90  /  0.68 LODIE 0.81  /  0.82 0.73  /  0.76 0.56  /  0.59 0.71  /  0.74

Zemanta  ∩  LODIE 1.00  /  0.74 0.95  /  0.45 0.97  /  0.42 0.97  /  0.54

Zemanta  U  LODIE 0.94  /  0.93 0.77  /  0.76 0.72  /  0.71 0.82  /  0.81

EUCLID  -­‐  Providing  Linked  Data  

Page 123: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Custom  GATE  Gazetteer  •  Retrieve  MusicBrainz    en3ty/label/class      with  SPARQL  query  

123  EUCLID  -­‐  Providing  Linked  Data  

Page 124: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

GATECloud  •  Custom  (e.g.  based  around  custom  gazejeer)  GATE  pipelines  can  be  executed  on  the  cloud:  

124  EUCLID  -­‐  Providing  Linked  Data  

Page 125: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

INTERLINKING  DATA  SETS  WITH  SILK  

EUCLID  -­‐  Providing  Linked  Data   125  

Page 126: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Interlinking  with  Silk  •  Task:  Create  links  between  

the  data  set  and  external  Linked  Data  sources.  

•  Approach:  Crea3on  of  specified  links  by  querying  the  target  data  sets  

•  Alterna9ves:  –  Manual  crea3on  of  

linkage  rules  by  the  user    –  Automa3c  learning  

linkage  rules  by  submi�ng  predefined  SPARQL  queries  

126  

LD  Data  set  

Access  

Integrated  Data  Set  

Interlinking   Cleansing  Vocabulary  Mapping  

SPARQL  Endpoint  

Publishing  

CSV/  TSV   HTML   Spreadsheets   JSON  

Data  acquisi3

on  

EUCLID  -­‐  Providing  Linked  Data  

Page 127: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Link  Discovery  with  Silk  •  Open  source  tool  for  discovering  RDF  links  between  data  items  within  different  Linked  Data  sources  

•  It  is  based  on  the  Silk  Link  Specifica3on  Language  (Silk-­‐LSL)  for  expressing  linkage  rules  

•  It  accesses  the  target  RDF  data  sets  via  SPARQL  endpoints  to  generate  RDF  links  

EUCLID  -­‐  Providing  Linked  Data   127  

Source:  Robert  Isele.  “LOD2  Webinar  Series:Silk”    

Page 128: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Silk  Variants  •  Silk  Single  Machine  

•  Generates  RDF  links  on  a  single  machine  •  Data  sets  can  reside  either  locally  or  in  remote  machines  •  Provides  mul3threading  and  caching  

•  Silk  MapReduce  •  Uses  a  cluster  composed  of  mul3ple  machines  •  Based  on  Hadoop  and  designed  to  scale  to  big  data  sets    

•  Silk  Server  •  Used  within  applica3ons  that  consume  Linked  Data  from  the  Web  

while  keeping  track  of  known  en33es    •  Provides  an  HTTP  API  for  matching  en33es  from  an  incoming  

stream  

EUCLID  -­‐  Providing  Linked  Data   128  Source:  hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/  

Page 129: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Source:  Silk  workflow  is  par3ally  based  on  “LOD2  Webinar  Series:  Silk  -­‐(Simplified)  Linking  Workflow”  by  Rober  Isele.          

Silk  Workflow  

EUCLID  -­‐  Providing  Linked  Data   129  

Select  LD  data  sets  

•  Iden3fy  suitable  data  sets  in  LD  catalogs*  

•  Select  the  two  data  sets  to  link  

Specify  LD  data  sets  

•  Specify  the  access  method  to  the  data  set  (RDF  dump,  SPARQL  endpoint)*  

•  Specify  the  en3ty  types  to  be  linked  

Write  linkage  rule  

•  Specifies  how  to  compare  the  resources  

•  Use  Silk-­‐LSL  •  The  rules  can  also  be  learnt    

Generate  RDF  links  

•  Output  links  can  be  stored  in  a  file  or  a  triple  store  

•  Can  discover  SKOS  links  

Silk  framework  

*  See  sec3on  “Publishing  Linked  Data”    

Page 130: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Linkage  Rule  Components  •  Linkage  rules  define  the  condi3ons  to  create  the  links  between  the  data  sets.  These  rules  are  composed  of:  

EUCLID  -­‐  Providing  Linked  Data   130  Source:  hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/  

RDF  Paths  •  Describe  the  elements  to  be  

compared  •  Example:  ?a/rdfs:label    

Transforma9ons  •  Apply  transforma3ons  to  the  

result  set  of  an  RDF  path  •  Examples:  LowerCase,  

Concatenate,  Replace,  …  

Comparators  •  Compute  the  similarity  of  two  

inputs  •  Examples:  String  similarity  

metrics,  Date  similarity,  …    

Aggrega9ons  •  Compute  an  aggregated  value  

from  mul3ple  comparators  •  Examples:  Min,  Max,  Avg,  various  

means,  Euclidian  distance  …  

1   2  

3   4  

Page 131: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Silk  Workbench  •  Web  applica3on  built  on  top  of  Silk,  which  allows  the  crea3on  of  projects  to  manage  the  crea3on  of  links  between  RDF  data  sets  

•  The  data  sets  can  be  stored  locally  or  accessed  remotely  by  specifying  the  SPARQL  endpoint  

•  The  user  is  able  to  create  customized  linking  tasks:  –  The  tool  offers  a  graphical  editor  to  create  linkage  rules  by  combining  the  linkage  rules  components  via  drag  &  drop  elements  

–  Includes  support  for  (automa3c)  learning  linkage  rules  EUCLID  -­‐  Providing  Linked  Data   131  

Page 132: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Project  configuration            

Silk  Workbench  (2)  

EUCLID  -­‐  Providing  Linked  Data   132  

1  2  

3  

4  

1.   Project:  name  and  components  (data  sources,    linking  tasks  and  output  tasks)  

2.   Data  sources:  specifica3on  of  the  data  sets  to  be  interlinked  

3.   Linking  task:  specifica3on  of  the  linkage  rules  and  type  of  links  to  be  created  

4.   Output  task:  mechanism  to  store  the  results  from  the  lnking  process  

2  

Page 133: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Editing  a  linking  task                

Silk  Workbench  (3)  

EUCLID  -­‐  Providing  Linked  Data   133  

1  

4  

2  

3  

1.   Linkage  rule  components    2.   Graphical  editor:  the  items  from  (1)  are  dragged  &  

dropped  in  this  area,  and  connected  to  compose  the  linkage  rules    

3.   Generate  links:  based  on  the  defined  linkage  rules  in  (2),  the  data  sets  are  accessed  to  discover  possible  links  

4.   Learn:  automa3c  learning  of  linkage  rules  

Page 134: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Adding  a  linkage  rule                

Silk  Workbench  (4)  

EUCLID  -­‐  Providing  Linked  Data   134  

The  previous  linkage  rule  states:  1.  Retrieve  the  foaf:name  values  from  MusicBrainz  and  

the  rdfs:label  from  DBpedia  2.  Apply  lower  case  transforma3on  to  the  output  of  (1)  3.  Compare  the  output  from  (2)  using  the  metric  

“Levenshtein  distance”.  If  this  distance  is  greater  than  0.90,  then  create  a  link.  

1   2  

3  

Page 135: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Generate  Links            

Silk  Workbench  (5)  

EUCLID  -­‐  Providing  Linked  Data   135  

Page 136: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Learn  Rules              

Silk  Workbench  (6)  

EUCLID  -­‐  Providing  Linked  Data   136  

Page 137: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

For  exercises,  quiz  and  further  material  visit  our  website:    

EUCLID  -­‐  Providing  Linked  Data   137  

@euclid_project   euclidproject   euclidproject  

http://www.euclid-­‐project.eu  

Other  channels:  

eBook   Course