Techniques used in RDF Data Publishing at Nature Publishing Group

32
Techniques used in RDF Data Publishing at Nature Publishing Group Tony Hammond Data Architect, NPG March 5, 2013

description

Lotico London Semweb Meetup - March 2013

Transcript of Techniques used in RDF Data Publishing at Nature Publishing Group

Page 1: Techniques used in RDF Data Publishing at Nature Publishing Group

Techniquesused in

RDF Data Publishingat

Nature Publishing Group

Tony HammondData Architect, NPG

March 5, 2013

Page 2: Techniques used in RDF Data Publishing at Nature Publishing Group

22

Nature Publishing Group

● NPG a division of Macmillan (a privately owned company)

● Publishes ~120 titles in all● 34 Nature branded titles● 53 academic and society journals● 16 magazines (incl. Scientific American)

● ~1000 employees,17 offices (5 continents)● ~30 society partners● Databases, conferences/events, multimedia

Page 3: Techniques used in RDF Data Publishing at Nature Publishing Group

33

Semantic Publishing at NPG

• Prior Work• RSS 1.0 webfeeds• HTML metadata• PDF metadata (XMP)• Urchin – RSS aggregator• OAI-PMH, OpenSearch (SRU), OpenURL

• Linked Data Apps• Public Data: test viability of data publishing• Hub: application of technology internally

Page 4: Techniques used in RDF Data Publishing at Nature Publishing Group

44

Public Data

Page 5: Techniques used in RDF Data Publishing at Nature Publishing Group

55

NPG by Numbers

Page 6: Techniques used in RDF Data Publishing at Nature Publishing Group

66

NPG Ontology

Page 7: Techniques used in RDF Data Publishing at Nature Publishing Group

77

Cloud Hosting

• TSO OpenUp® SaaS platform• Offers 5store as a triplestore• Scale-out architecture (C/C++)• Supports up to a trillion triples• 150,000tps load speed• SPARQL 1.0, with 1.1 features

(aggregates, etc)

Page 8: Techniques used in RDF Data Publishing at Nature Publishing Group

88

data.nature.com

Page 9: Techniques used in RDF Data Publishing at Nature Publishing Group

99

data.nature.com/query

Page 10: Techniques used in RDF Data Publishing at Nature Publishing Group

1010

Hub

Page 11: Techniques used in RDF Data Publishing at Nature Publishing Group

1111

Hub: Problem

Page 12: Techniques used in RDF Data Publishing at Nature Publishing Group

1212

Hub: Solution

Page 13: Techniques used in RDF Data Publishing at Nature Publishing Group

1313

Hub: Method

Page 14: Techniques used in RDF Data Publishing at Nature Publishing Group

1414

XMP

Page 15: Techniques used in RDF Data Publishing at Nature Publishing Group

1515

Building the Graph

Page 16: Techniques used in RDF Data Publishing at Nature Publishing Group

1616

Local Hosting

• Apache TDB• Single-node architecture (Java)• Supports up to ~1.5b triples (tested)• SPARQL 1.1

Page 17: Techniques used in RDF Data Publishing at Nature Publishing Group

1717

Data Publishing

Page 18: Techniques used in RDF Data Publishing at Nature Publishing Group

1818

Hub Finder

Page 19: Techniques used in RDF Data Publishing at Nature Publishing Group

1919

Hub Finder: Results

Page 20: Techniques used in RDF Data Publishing at Nature Publishing Group

2020

Techniques

Page 21: Techniques used in RDF Data Publishing at Nature Publishing Group

2121

Naming Architecture

Page 22: Techniques used in RDF Data Publishing at Nature Publishing Group

2222

Naming Policy

Object Example Usage

Graph npgg:gadgets gadgets:33 ex:title "Title" npgg:gadgets .

Class npg:Gadget gadgets:33 a npg:Gadget npgg:gadgets .

Object Property

npg:hasGadget _:12 npg:hasGadget gadgets:33 npgg:_ .

Data Property

ex:title gadgets:33 ex:title "Title" npgg:gadgets .

Instance gadgets:33 gadgets:33 ex:title "Title" npgg:gadgets .

npg: http://ns.nature.com/terms/npgg: http://ns.nature.com/graphs/

Page 23: Techniques used in RDF Data Publishing at Nature Publishing Group

2323

Publishing

Page 24: Techniques used in RDF Data Publishing at Nature Publishing Group

2424

Monitoring

Page 25: Techniques used in RDF Data Publishing at Nature Publishing Group

2525

ETL Process

Page 26: Techniques used in RDF Data Publishing at Nature Publishing Group

2626

Datastore: Imports

Page 27: Techniques used in RDF Data Publishing at Nature Publishing Group

2727

Datastore: Exports

Page 28: Techniques used in RDF Data Publishing at Nature Publishing Group

2828

Contracts

npgg:affiliations a npg:Graph, void:Dataset ; dcterms:description "Graph of npg:Affiliation objects" ; dcterms:issued "2013-02-15"^^xsd:date ; dcterms:modified "2013-02-15"^^xsd:date ; dcterms:publisher [ a foaf:Organization ; foaf:mbox <mailto:[email protected]> ; foaf:name "Nature Publishing Group" ] ; dcterms:source "extractor-xml" ; dcterms:title "npgg:affiliations" ; rdfs:label "npgg:affiliations" ; void:classPartition [ void:class npg:Affiliation ; void:entities "973208"^^xsd:int ] ; void:propertyPartition [ void:property vcard:url ; void:triples "326"^^xsd:int ], [ void:property vcard:street-address ; void:triples "82638"^^xsd:int ], [

void:property vcard:region ; void:triples "183483"^^xsd:int ], [ void:property vcard:organisation-name ; void:triples "694290"^^xsd:int ], [ void:property vcard:locality ; void:triples "412042"^^xsd:int ], [ void:property vcard:email ; void:triples "21650"^^xsd:int ], [ void:property vcard:country-name ; void:triples 0 ], [ void:property rdfs:label ; void:triples "973208"^^xsd:int ], [ void:property rdf:type ; void:triples "973208"^^xsd:int ] ; void:triples "3340845"^^xsd:int ; void:vocabulary npg:, rdf:, rdfs:, void: .

Page 29: Techniques used in RDF Data Publishing at Nature Publishing Group

2929

Linked Data API

• ./api/articles [.json, .rdf, .xml]• ./api/articles?hasProduct.pcode=ng• ./api/contributors?familyName=Smith• ./api/products.json?pcode=ng&_page=2• ./api/products?_view=none&_properties=pcode• ./api/search?title=black+hole• ./api/tree/subjects/children.xml?_sort=title

Page 30: Techniques used in RDF Data Publishing at Nature Publishing Group

3030

Closing

Page 31: Techniques used in RDF Data Publishing at Nature Publishing Group

3131

Positions Available

goo.gl/bYIt8www.linkedin.com/jobs?jobId=4890057&viewJob