NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

131
NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations February 19, 2014 Speakers: Ramanathan V. Guha, Ralph Swick, Kevin Ford, Henry Story, Pierre-Paul Lemyre, Bob DuCharme http://www.niso.org/news/events/2014/virtual/semant

description

Feb 19, 2014: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations Deck includes presentations from: Ramanathan V. Guha, Google Fellow; Founder of Schema.org; Pierre-Paul Lemyre, Director of Business Development, Lexum; Bob Du Charme, Director of Digital Media Solutions, TopQuadrant

Transcript of NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Page 1: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and

Implementations

February 19, 2014

Speakers: Ramanathan V. Guha, Ralph Swick,

Kevin Ford, Henry Story, Pierre-Paul Lemyre, Bob DuCharme

http://www.niso.org/news/events/2014/virtual/semantic/

Page 2: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Agenda11:00 a.m. – 11:10 a.m. – Introduction Todd Carpenter, Executive Director, NISO

11:10 a.m. - 12:00 p.m. Keynote AddressRamanathan V. Guha, Google Fellow; Founder of Schema.org

12:00 p.m. - 12:45 p.m.: The W3C Semantic Web InitiativeRalph Swick, Domain Lead of the Information and Knowledge Domain, W3C

12:45 p.m. - 1:30 p.m. Lunch Break

1:30 p.m. - 2:15 p.m. Semantic Web Applications in Libraries: The Road to BIBFRAMEKevin Ford, Network Development & MARC Standards Office, Library of Congress

2:15 p.m. - 3:00 p.m.: The Social Data Graph: The Friend of a Friend (FOAF) ProjectHenry Story, Chief Technical Officer & Co-founder at Stample

3:00 p.m. - 3:15 p.m. Afternoon Break

3:15 p.m. - 3:45 p.m. Sharing Information on the Semantic Web: The Need for a Global License RepositoryPierre-Paul Lemyre, Director of Business Development, Lexum

3:45 p.m. - 4:15 p.m. Semantic Web Applications in Publishing Bob Du Charme, Director of Digital Media Solutions, TopQuadrant

4:15 p.m. - 5:00 p.m. Conference Roundtable: Services that Build on Others Semantic Web Data: Semantic Search Beyond RDFModerated by: Todd Carpenter, Executive Director, NISO

Page 3: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Towards a web of Data

R.V.GuhaGoogle

Page 4: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Outline of talk

• How did we end up here … a personal perspective– The tortuous path through standards and products

• Schema.org– Why schema.org, principles, how does it work– Status : schemas, adoption, partners, applications

• Reports from  Google, Bing, Yahoo! & Yandex

• Looking forward: Schemas in the pipeline, Research problems

Page 5: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

One day, 16 years ago, …

• Ralph Swick, Ora Lasilla, Tim Bray, Eric Miller and myself

• Trying to make sense of a flurry of activity– XML, MCF, CDF, Sitemaps, …

• There were a number of problems – PICS, Meta data, sitemaps, …

• But one unifying idea

Page 6: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Context: The Web for humans

   

Structured Data

Web server

HTML

Page 7: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Goal: Web for Machines & Humans

   

Web server

Structured    Data

Apps

Page 8: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

 What does that mean?   

Chuck Norris

Ryan, Oklahama

March 10th 1940

birthdate

birthplace

Actor

type

Page 9: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

How do we get there?

• How does the author give us the graph– Data Model– Syntax– Vocabulary– Identifiers for objects

• Why should the author give us the graph?

Page 10: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Going depth first

• Many heated battles– Lot of proposals, standards, companies, …

• Data model– Trees vs DLGs vs Vertical specific vs who needs one?

• Syntax– XML vs RDF vs json vs …

• Model theory anyone– We need one vs who cares vs what’s that?

Page 11: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Timeline of ‘standards’

• ‘96: Meta Content Framework (MCF) (Apple)• ’97: MCF using XML (Netscape)  RDF, CDF• ’99 -- : RDF, RDFS • ’01 -- : DAML, OWL, OWL EL, OWL QL, OWL RL• ’03: Microformats• And many many many more … SPARQL, Turtle, N3, GRDDL, 

R2RML, FOAF, SIOC, SKOS, …

• Lots of bells & whistles: model theory, inferencing, type systems, …

Page 12: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

But something was missing …

• Fewer than 10k sites were using these standards

• Something was clearly missing and it wasn’t more language features

• We had forgotten the ‘Why’ part of the problem

• The RSS story

Page 13: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

’07 - :Rise of the consumers

• Yahoo! Search Monkey, Google Rich Snippets, Facebook Open Graph

• Offer webmasters a simple value proposition

• Search engines to webmasters:– You give us data … we make your results nicer

• Usage begins to take off– 1000x increase in markup’ed up pages in 3 years

Page 14: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Yahoo Search Monkey

• Give websites control over snippet presentation• Moderate adoption 

– Targeted at high end developers – Too many choices 

Page 15: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Yandex Enhanced Snippets & Services

• Snippets for Web Search

• Data for Vertical Search      Services

• Various syntax and      vocabularies

• Yandex specific vocabs      for unexamined      use cases

Page 16: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Google Rich Snippets: Reviews

    

Page 17: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Google Rich Snippets: People

 

Page 18: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Google Rich Snippets: Events

 

Page 19: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Google Rich Snippets: Recipes

 

Page 20: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Google Rich Snippets: Recipe View

 

Page 21: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Google Rich Snippets

• Multi-syntax• Adhoc vocabulary for each vertical• Very clear carrot • Lots of experimentation on UI• Moderately successful• Scaling issues with vocabulary

Page 22: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Situation in 2010

• Too many choices/decisions for webmasters– Divergence in vocabularies

• Too much fragmentation • N versions of person, address, …

• A lot of bad/wrong markup– ~25% for microformats, ~40% with RDFA– Some spam, mostly unintended mistakes

• Absolute adoption numbers still rather low– Less than 100k sites

Page 23: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

In the meantime … 

• The Web has grown– >25 million independent active sites – Trillions of web pages– ~5 billion pages change every day– Spam, malware, …

• In other words: scaling problems along every dimension

Page 24: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org

• Work started in August 2010– Google, Yahoo!, Microsoft & then Yandex

• Goals:– One vocabulary understood by all the search engines– Make it very easy for the webmaster

• It is A vocabulary. Not The vocabulary.– Webmasters can use it together other vocabs– We might not understand the other vocabs. Others might

Page 25: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org: report card

• Over 15% of all sites/pages now have schema.org• Over 5 million sites, over 25 billion entity references

• In other words– Same order of magnitude as the web– Not just ‘promising new stuff’ anymore

1.06.1

1

1.08.1

1

1.10.1

1

1.12.1

1

1.02.1

2

1.04.1

2

1.06.1

2

1.08.1

2

1.10.1

2

1.12.1

2

1.02.1

3

1.04.1

3

1.06.1

3

1.08.1

3

1.10.1

3

% urls

% urls

Page 26: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org: Major sites

• News: Nytimes, guardian.com, bbc.co.uk,• Movies: imdb, rottentomatoes, movies.com• Jobs / careers: careerjet.com, monster.com, indeed.com• People: linkedin.com, • Products: ebay.com, alibaba.com, sears.com, cafepress.com, 

sulit.com, fotolia.com• Videos: youtube, dailymotion, frequency.com, vinebox.com• Medical: cvs.com, drugs.com• Local: yelp.com, allmenus.com, urbanspoon.com• Events: wherevent.com, meetup.com, zillow.com, eventful• Music: last.fm, myspace.com, soundcloud.com

Page 27: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org: categories

• Most used categories by occurrence– Person, Offer, Product, PostalAddress, VideoObject, 

ImageObject, BlogPosting, WebPage, Article, AggregateRating, LocalBusiness, Place, Organization, MusicRecording, JobPosting, Recipe, Book, Movie, Blog, Photograph, ImageGallery

• Most used categories by domains– ImageObject, WebPage, PostalAddress, BlogPosting, 

Product, Person, Offer, Article, LocalBusiness, Organization, Blog, AggregateRating, Review, VideoObject, Place, Event, Rating, AudioObject, MusicRecording, Store

Page 28: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org: properties

• Top properties by occurrence– name, url, image, description, offers, author, price, 

thumbnailUrl, datePublished, addressLocality, address, itemOffered, duration, streetAddress, isFamilyFriendly, priceCurrency, playerType, paid, regionsAllowed, postalCode, hiringOrganization, jobLocation, 

• Top properties by domain– Name, description, url, image, contentURL, address, 

author, telephone, price, postalCode, offers, ratingValue, priceCurrency, datePublished, addressRegion, availability, email, bestRating, creator, review, location, startDate

Page 29: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org principles: Simplicity

• Simple things should be simple– For webmasters, not necessarily for consumers of markup– Webmasters shouldn’t have to deal with N namespaces

• Complex things should be possible– Advanced webmasters should be able to mix and match 

vocabularies

• Syntax– Microdata, usability studies– RDFa, json-ld, …

Page 30: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org principles: Simplicity

• Can’t expect webmasters to understand Knowledge Representation, Semantic Web Query Languages, etc.

• It has to fit in with existing workflows

• Avoid KR system driven artifacts– domainIncludes/rangeIncludes– No classes like ‘Agent’  – Categories and attributes should be concrete

Page 31: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org principles: Simplicity

• Copy and edit as the default mode for authors– It is not a linear spec, but a tree of examples

• Vocabularies– Authors only need to have local view– But schema.org tries to have a single global coherent 

vocabulary

Page 32: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org principles: Incremental

• Started simple –  ~ 100 categories at launch

• Applies to every area– Add complexity after adoption–  now ~1200 vocab items– Go back and fill in the blanks 

• Move fast, accept mistakes, iterate fast

Page 33: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org Principles: URIs • ~1000s of terms like Actor, birthdate

– ~10s for most sites– Common across sites

• ~10ks of terms like USA– External enumerations

• ~10m-100m terms like Chuck Norris and Ryan, Oklahama– Cannot expect agreement on these– Consumers can reconcile entity 

references

Chuck Norris

Ryan, Oklahama

March 10th 1940

birthplace

Actor

type

citizenOf

USA

birthdate

Page 34: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org Principles: Collaborations

• Most discussions on public W3C lists

• Work closely with interest communities

• Work with others to incorporate their vocabularies– We give them attribution on schema.org– Webmasters should not have to worry about where each 

piece of the vocabulary came from– Webmasters can mix and match vocabs

Page 35: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org Principles: Collaborations

• IPTC /NYTimes / Getty with rNews• Martin Hepp with Good Relations• US Veterans, Whitehouse, Indeed.com with Job Posting• Creative Commons with LRMI• NIH National Library of Medicine for Medical vocab.• Bibextend, Highwire Press for Bibliographic vocabulary• Benetech for Accessibility• BBC, European Broadcasting Union for TV & Radio schema• Stackexchange, SKOS group for message board• Lots and lots and lots of individuals

Page 36: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org Principles: Partners

• Partner with Authoring platforms– Drupal, Wordpress, Blogger, YouTube

• Drupal 8– Schema.org markup for many types

• News articles, comments, users, events, …

– More schema.org types can be created by site author– Markup in HTML5 & RDFa Lite– Coming out early 2014

Page 37: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org & W3C

• Web Schemas – vocabulary discussion Part of W3C Semantic Web Interest Group

• W3C Community Group(s) e.g. http://w3.org/community/schemabibex/ 

• Wiki, Mercurial, email lists, … Informal collaboration rather than classic standardization Ideas for extending schema.org welcomed Also room this week Thu/Fri, see blog.schema.org

Page 38: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

W3C Data Activity• Subsumes & expands on Sem Web & eGov

• Success for Sem Web in several communities who publish self-describing, re-usable structured data

• schema.org makes it easier for developers to use common vocabulary

• W3C pleased with the success of schema.org and continues to encourage data formats that support mixing of vocabs

Page 39: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.Org Usage @ MS

Today• Used primarily to support Bing’s Rich Captions• Also recommended for Windows 8 App Contracts

Tomorrow• Extended usage in Rich Captions and Bing Search• Development support via new Bing Platform Dev 

Center• Innovative experiences in other Microsoft products

Page 40: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Bing Rich Captions

• Ensure your Schema.Org annotations represent the primary content on the page -> ensures correct caption data is shown

• Ensure your Schema.Org annotations are appropriate and relevant to the content of the page -> improves your chances of being shown

• Rich Captions do not support RDFa 1.1 or JSON-LD -> stick with RDFa 1.0 or Microdata for now

Page 41: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Yahoo! Search

• Rich results– Blanco et al. Enhanced Results for Web Search, SIGIR 2011– Mika & Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012

• Related entities– Blanco et al.

Entity recommendations in Web Search– Wed 15:15 Search track

Page 42: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Yahoo! Media

• Schema.org across the Yahoo! Network– Article, Photo and Video markup– Q&A and Sports under development

• Personalization– Based on past reading behavior, Facebook profile, implicit feedback– Concepts from a news taxonomy and entity-graph– Nicolas Torzec. The Y! Knowledge Base: Making Knowledge Reusable at

Yahoo! SemTech 2013

Page 43: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Applications

• Applications drive adoption

• First generation of applications– Rich presentation of search results

• Many new applications are coming up– On search page and beyond

Page 44: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Newer Applications: Knowledge Graph

 

Page 45: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Newer Applications: Knowledge Graph

 

Page 46: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Non web search Applications

• Searching for Veteran friendly jobs

Page 47: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Non search applications: Google Now

Page 48: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Pinterest: Schema.org for Rich Pins

Page 49: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Non search Applications

• Open Table website  confirmation email  Android Reminder

Page 50: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Future application

• Clinical trials• 4000+ clinical trials at any time in the US alone• Almost all the data ‘thrown away’• All that gets published is a textual ‘abstract’

• Over half the trials are redundant• Earlier trials have the data• Assumptions, etc. cannot be re-examined• Longitudinal studies extremely hard, but super

Page 51: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Vocabularies in the pipeline

• Actions (Potential Actions), Events• Accessibility• Commerce: Orders, Reservations, …• Communication: Fleshing out TV, Radio, Email, Q&A, …

• Media: Scholarly works, Comics, Serials• Sports•  and many many more …

Page 52: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Big initiatives underway

• Representing time–  Lot of triples with associated time interval– Hard to force into the triple model– Looking at named graphs 

• Tabular / CSV data– Census data, Scientific data, etc.– Need mechanisms for external specification of the meaning of these tables

Page 53: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Research Problems

• Propositional Attitudes– Fictional characters– Possible actions

• Mapping Entities across sites– Billions of entities– But hundreds of billions of entity references– Very large scale cross site entity mapping

Page 54: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Schema.org: concluding

• Provide webmasters – A good reason for adding structured data markup– Allow more advanced authors to add much richer markup 

from many different vocabularies– An easy way to do it

• Many different applications emerging 

• Many interesting research problems left

• Light at the end of the tunnel

Page 55: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Questions?

    

Page 56: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

MCF

• Introduced the Directed Labeled Graph (DLG) model

• Used to describe site structure• Provided simple visualization plugin• ~3m downloads, few thousand sites               (not too bad for ‘96)• Documentation was about a page• Lead to MCF using XML         (4 pages including diagrams)

Page 57: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

97-99: RDF / RDFS

• MCF (4 pages) got renamed to RDF• Data model becomes a religion• Lot of new things … reification, sequences, chains, model theory, …

• Final version done only by 2002• RDF Primer is 100 pages long!• But … used by very few websites

Page 58: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

‘99: RSS 0.91

• Netscape needs customizable home page ala my.yahoo

• No $s to do deals• Create format and open it up to everyone

– 2 year target was 5k feeds– 14 years later  100m+ feeds

Page 59: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

2002: OWL

• Tim BL, et. al. ‘Semantic Web’ paper• Answer to lack of adoption: we need more features

• Lots of Darpa and EU funding• Very powerful solutions … looking for problems• ~10 years later … < 100 sites use it

Page 60: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

‘04-07: Microformats

• Reaction to the complexity of RDF & OWL• Retrofitted vCard, etc. into HTML  hCard• Adoption by Google & Yahoo! made it wildly successful

• Evolutionary dead-end: no new vocabulary in over 5 yrs.

Page 61: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Lessons

• The RSS story• Make sure you have your carrot!

– Carrots work much better than sticks!• Find the right initial level of generality• Start simple and iterate fast• Optimize for flexibility

Page 62: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Sharing Information on the Semantic Web:The Need for a Global License Repository

Page 63: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Distributed Production of Information• Originates in research (not

commercial venture)

• Access is its driving force (not property)

• Has defined the Internet as we know it– Open standards & Open Source Software

(OSS)

– Web 2.0

– And now the semantic web

Page 64: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Distributed Production of Information

Why does it work?

Page 65: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

OSS• Motives to produce OSS

– Ethical / policy reasons

– Non-monetary incentives

– Increase the speed of market adoption

– Co-create and appropriate value• By building on previous works• By integrating external contributions

Page 66: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

OSS• Each contributor is adding to the pool

of knowledge available to all

• This knowledge is more valuable than what any contributor can achieve individually

• Not new phenomenon (science, music, education, ...)

Page 67: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

OSS - Legal Framework• Collaborative software development

existed before– Under informal agreements

– Under bilateral contractual schemes

• OSS licenses created a favourable legal environment– By favouring reciprocity (BSD)

– By securing openness (GPL)

Page 68: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

OSS• What the success of OSS makes us

see clearly

– In a networked world, centralized corporate production can sometime be surpassed by distributed production

– Mass adoption requires a proper legal framework

Page 69: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

OSS - Legal Framework

Is this conclusion applicable only to software?

Page 70: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Page 71: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Page 72: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Page 73: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Web 2.0• Extensive use of Web services

facilitating mass collaboration– Rich Internet applications

– Web forums

– Blogs

– Wikis

– Folksonomies (Social tagging)

Page 74: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Web 2.0• Web-as-participation-platform

– Architecture of participation

– Users become producers

– Collective intelligence

Page 75: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Web 2.0 - Legal Framework• Adaptation of OSS licenses

– GNU Free Documentation License

– OpenContent License

– Creative Commons (CC)

Page 76: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Web 2.0 - Legal Framework• Extended the favourable

environment to all types of freely accessible information– By specifying the applicable reuse

conditions (no permission required)

– By clarifying the spectrum of potential rights

– By standardizing the licensing process and making automated retrieval possible (CC)

Page 77: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Web 2.0 - Legal Framework

The question of online right management should be solved by

now?

Page 78: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Page 79: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
Page 80: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Semantic web• Collaborative initiatives developed

independently from each other– Vertical information flow (Information

silos)

• Accessibility & reusability of information call for reciprocity between them and other sources of information– Horizontal information flow (seamless

interoperability)

Page 81: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Semantic Web• Using technology for data sharing

and reuse– Data modeling (XML, RDF)

– Syndication technologies (RSS)

– Ontologies (OWL)

– Heuristic (text-recognition)

Page 82: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Semantic Web• Web understandable by computers

– Data get a meaning

– Dynamic discovery, composition and execution

– Layers of services

Page 83: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Semantic Web• Current state of implementation

– Information sources are pre-qualified (limited dynamic discovery)

• Public domain sources• Use of APIs under bilateral agreements

– Reproduction of information is often limited (links are the norm, mashup still the exception)

– Management of personal information is a nightmare

Page 84: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Semantic web -Legal Framework

Here again successful mass adoption is not just about technology

Page 85: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Legal Issue:Fragmentation of Rights• Rights are fragmented

– By reuse conditions

– By jurisdictions

– By formats / domains

• Scalable to the smallest element (website > webpage > data)

Page 86: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Legal Issue:Fragmentation of Rights• Not a new problem

– Requests to limit the proliferation of OSS licenses (FSF)

– Initiatives to standardize licensing (CC)

• Not fundamental as long as humans are in charge of the reuse of information

Page 87: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Legal Issue:Fragmentation of Rights• Seamless interoperability through

semantic technologies require computers to automatically– Retrieve applicable licenses

– Resolve their respective terms

– Select information with adequate conditions for the anticipated reuse

Page 88: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Legal Issue:Fragmentation of Rights• Standardization under CC is helping

– Embedded license information ease their retrieval

– Computer readable version ease their resolution

– Standardized terminology and low number ease the selection

Page 89: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Legal Issue:Fragmentation of Rights• But it is a partial solution

– Most content is not licensed under CC (and often cannot)

– Copyright holders have the right to attach alternative conditions to their content

– CC is not generally accepting alternative licenses

• Example of difficulties– Website Terms of Use vs Google “Usage

rights” feature

Page 90: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Legal Issue:Fragmentation of Rights

A higher level resolution mechanism is required

Page 91: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Solution:A Global License Repository?• A database of varying copyright

licenses and their conditions

• A standardized approach to licenses resolution and selection (expand the CC model?)

• A Web service that can be queried by users and computers

Page 92: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Solution:A Global License Repository?• Issues that need to be addressed

– Unlimited number of reuse conditions

– Format and domain specific restrictions

– Internationalization

– Versioning of licenses

– Compatibility between licenses (relicensing)

Page 93: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Solution:A Global License Repository?• Possible solutions

– Organizing conditions and restrictions into groups or categories?

– Managing licenses at the lowest possible level and associating related ones?

– Limiting the designation of compatibility to the most common licenses?

Page 94: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Solution:A Global License Repository?• A successful implementation requires

– Promotion and large-scale adoption of a standard tagging model for information

– Involvement of copyright holders in feeding and updating the database

– Transparency and quality control procedures generating trust in the system

Page 95: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Solution:A Global License Repository?• A successful implementation requires

– Scalability insuring efficient interactions at every level of development

– Provision of outputs under standardized formats

– Provision of simple APIs facilitating interactions with the repository

Page 96: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

The Solution:A Global License Repository?• Architecture

– Use of open standards and OSS to display transparency

– Use of collaborative technologies to distribute updates

– Use of aggregative technologies to promote exploitation and reuse

Page 97: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Conclusion• Facilitating the sharing of information

is the core function of the Internet– Semantic web is a no-brainer in principle

• But mass adoption requires a legal framework– Providing a lawful mean to co-create and

appropriate value

– Securing the existing rights of copyright holders

Page 98: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Conclusion

TechnologyLegal Framework

OSS Copyleft

Web 2.0Creative

Commons

Semantic Web ???

Page 99: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Contact

Let me know what you think!

Pierre-Paul Lemyre <[email protected]>

Page 100: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 100

Semantic Web Applications in Publishing

Bob DuCharmeFebruary 19, 2014

Page 101: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 101

Outline

http://snee.com/rdf/niso2014/

RDF

Taxonomies, semantics, and content

Metadata management

Content creation

Page 102: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 102

RDF

● Resource Description Format● W3C Standard● Syntaxes exist, but ultimately a data model● Very easy to aggregate● Tools exist to treat relational data and 

spreadsheets as RDF

Page 103: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 103

An RDF “statement”: the triple

● (Subject, predicate, object)● “This resource, for this property, has this 

value.”● “John Smith has a hire date of 2012-10-11.”● “Chair 523 located in in room 47.”● “index.html has the title ‘My Home Page'.”

Page 104: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 104

Using URIs

A triple?

Subject:     index.html

Predicate:  title

Object:       “My Home Page”

Page 105: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 105

Using URIs*

A triple?

Subject:     index.html

Predicate:  title

Object:       “My Home Page”A triple!

Subject: <http://www.myco.com/members/index.html>Predicate: <http://purl.org/dc/elements/1.1/title>Object: “My Home Page”

*Uniform Resource Identifier, not Uniform Resource Locator—an identifier, not an address.

Page 106: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 106

URIs as triple objects

<urn:isbn:9780062515872><http://purl.org/dc/elements/1.1/creator>“Tim Berners-Lee” .or…

<urn:isbn:9780062515872><http://purl.org/dc/elements/1.1/creator><http://topbraid.org/ids/TimBernersLee> .or…

<urn:isbn:9780062515872><http://purl.org/dc/elements/1.1/creator><http://www.w3.org/People/Berners-Lee/card#i> .

Page 107: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 107

URIs as triple objects<urn:isbn:9780062515872><http://purl.org/dc/elements/1.1/creator><http://www.w3.org/People/Berners-Lee/card#i> . 

<http://www.w3.org/People/Berners-Lee/card#i><http://www.w3.org/2000/01/rdf-schema#label>“Tim Berners-Lee” . 

107

Page 108: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 108

URIs as triple objects<urn:isbn:9780062515872><http://purl.org/dc/elements/1.1/creator><http://www.w3.org/People/Berners-Lee/card#i> . 

<http://www.w3.org/People/Berners-Lee/card#i><http://www.w3.org/2000/01/rdf-schema#label>“Tim Berners-Lee” . 

108

< http://www.w3.org/People/Berners-Lee/card#i><http://xmlns.com/foaf/0.1/mbox> “[email protected]” .

< http://www.w3.org/People/Berners-Lee/card#i><http://dbpedia.org/property/almaMater><http://dbpedia.org/resource/The_Queen's_College,_Oxford> .

<http://dbpedia.org/resource/The_Queen's_College,_Oxford><http://dbpedia.org/property/established> 1341 .

Page 109: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 109

http://www.w3.org/People/Berners-Lee/card#i

urn:isbn:9780062515872

http://dbpedia.org/resource/The_Queen's_College,_Oxford

“Tim Berners-Lee”

[email protected]”1341

http://purl.org/dc/elements/1.1/creator

http://xmlns.com/foaf/0.1/mbox

http://www.w3.org/2000/01/rdf-schema#label

http://dbpedia.org/property/almaMater

http://dbpedia.org/property/established

A “graph”

Page 110: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 110

http://www.w3.org/People/Berners-Lee/card#i

urn:isbn:9780062515872

http://dbpedia.org/resource/The_Queen's_College,_Oxford

“Tim Berners-Lee”

[email protected]”1341

http://purl.org/dc/elements/1.1/creator

http://xmlns.com/foaf/0.1/mbox

http://www.w3.org/2000/01/rdf-schema#label

http://dbpedia.org/property/almaMater

http://dbpedia.org/property/established

SPARQL: look for patterns within the graph

Page 111: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 111

Defining structure● Optional

● Nice option in “schema vs. schemaless” debate

● RDFS (RDF Schema Language)● Enumerate classes, properties, and relationships between them

# Prefixes stand in for base URIs, like with XMLfoaf:Person rdf:type owl:Class . viaf:19831453 rdf:type foaf:Person .

● OWL (Web Ontology Language)● Further describe classes and properties, enable fancier inferencing● For example:  Jack has a spouse value of Jill; "spouse" is an 

owl:SymmetricProperty; software can then infer that Jill has a spouse value of Jack. (Semantics!)

– Both are W3C standards, both expressed with triples (and therefore easy to aggregate)

Page 112: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 112

Simple Knowledge Organization System

Controlled vocabulary

Taxonomy

Thesaurus

Ontology

Page 113: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 113

Taxonomies

Mammal

Dog

Bulldog Collie

Horse Cat

Above: subset-of relationship. Alternatives: part-of, instance-of.

metadata!

Page 114: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 114

Thesaurus

Mammal

Building

Dog

Bulldog Collie

Horse Cat

House

Residential Commercial

Doghouse

(use for: mutt, cur)

Related term

Page 115: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 115

Simple Knowledge Organization System

Controlled vocabulary

Taxonomy

Thesaurus

Ontology

Page 116: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 116

Simple Knowledge Organization System

Controlled vocabulary

Taxonomy

Thesaurus

OntologySKOS: the W3C’s OWL ontology forcreating thesauri, taxonomies, and controlled vocabularies.

Page 117: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 117

http://myCompany.com/animals/c43209101

preferred label (English): "dog"

preferred label (Spanish): "perro"

preferred label (French): "chien"

alternative label (English): "mutt"

alternative label (Spanish): "chucho"

history note: "Edited by Jack on 5/4/11"

related term: http://myCompany.com/shelters/c3048293

SKOS: standardized properties

Page 118: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 118

http://myCompany.com/animals/c43209101

preferred label (English): "dog"

preferred label (Spanish): "perro"

preferred label (French): "chien"

alternative label (English): "mutt"

alternative label (Spanish): "chucho"

history note: "Edited by Jack on 5/4/11"

related term: http://myCompany.com/shelters/c3048293

product: http://myCompany.com/vaccinations/c2197503

foo code: “5L-MN1-003”

SKOS: custom properties

Page 119: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 119

Who is using SKOS?

AGROVOC

New York Times: People, Organizations, Locations, Subject Descriptors

Library of Congress subject headers

AGFA drug admin. forms

NASA: many categories

Page 120: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 120

TopQuadrant’s TopBraid EVN

Page 121: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 121

Metadata management

Content metadata is more than just keywords

3 Vs of Big Data: Volume, Velocity and Variety

Gartner October 2013 poll on which is most difficult:  Variety: 16%

Volume: 35%

Variety: 49%

Page 122: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 122

Metadata in images from 3 suppliersSupplier 1 Supplier 2 Supplier 3

file name file name file name

file size file size file size

last modified last modified last modified

shutter speed shutter speed

GPS latitude GPS latitude

GPS longitude GPS longitude

creator name

X resolution X resolution Horz. resolution

Y resolution Y resolution Vertical resolution

copyright notice re-use rights

keywords subject

aperture setting F stop

(etc.)

Page 123: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 123

Accommodating overlapping metadata

Store it all; each is just a triple

Identify relationships for easier searching “aperture setting” = “F stop”

“copyright notice” a subproperty of “re-use rights”

“Y resolution” a subproperty of “vertical resolution”

Image data a simple case. Consider audio, video, management of digital rights…

Page 124: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2011 TopQuadrant Inc. Slide 124

Metadata and…

Prosopography: the study of careers, especially of individuals linked by family, economic, social, or political relationships.

Page 125: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 125

Content creation

From an article on Massive Open Online Courses in the February 8, 2014 Economist:

"...says Tylor Cowen, a co-founder of Marginal Revolution University, it is possible that textbook publishers are better equipped than universities to develop MOOCs profitably."  

Page 126: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 126

Available Data: The Linked Open Data Cloud

Page 127: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 127

Wikipedia page on Andrew Johnson

Page 128: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. Slide 128

DBpedia page for Andrew Johnson

(further down the page…)

Page 129: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

© Copyright 2014 TopQuadrant Inc. 129

Thank you!

[email protected]://snee.com/rdf/niso2014/

Page 130: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

NISO Virtual ConferenceThe Semantic Web Coming of Age: Technologies and Implementations

NISO Virtual Conference • February 19, 2014

Questions?All questions will be posted with presenter answers on the NISO website following the webinar:

http://www.niso.org/news/events/2014/virtual/semantic/

Page 131: NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations

Thank you for joining us today. Please take a moment to fill out the brief online survey.

We look forward to hearing from you!

THANK YOU