Introduction tothe Semantic Web and Linked Data
-
Upload
eric-franzon -
Category
Education
-
view
2.477 -
download
1
description
Transcript of Introduction tothe Semantic Web and Linked Data
Introduction to theSemantic Web and Linking Data
Eric Axel FranzonVice PresidentSemantic Universe/Wilshire Conferences
About Me• Professional
• Wilshire Conferences• Semantic Universe• W3C• Guidewire Group
• Coach / Consultant / Trainer• Geek
Today we will talk about:
• Semantic Technologies
• Semantic Web & Web 3.0
• Linked Data– Linked Open Data
– Linked Enterprise Data
• Use cases
• That harmonica on the first slide
SemanticTechnologies
SemanticWeb
WebTechnologies
WorldWideWeb
Semantic Web = Web 3.0Semantic Web
= Web of Data
ww
w.g
eeka
ndpo
ke.c
om
What is the Web of Data Not?
• A software package• Something that will ever
“be complete”• A replacement for the
current Web• A pipe dream• A silver bullet
It’s also not…
• HAL 9000
• Skynet
It’s also not…
What is the Web of Data?
• A Web-scale architecture• A metadata technology• A layer of meaning on the
existing Web• In use TODAY!
Web of Data
Q: What does Linked Data have to do with the Semantic Web?
Web 1.0 – Linking Documents
Web 1.0
Web 1.0
“I see: characters + formatting + images”--my Computer
Web 1.0 – Linking DocumentsWeb 2.0 – Linking People
Web 2.0
Web 2.0
“I see: characters + formatting + images”--my Computer
Web 1.0 – Linking DocumentsWeb 2.0 – Linking PeopleWeb 3.0 – Linking Data
Web 3.0 – Linking Data
Title Publisher
Price
Format
Cover
Author
Web 3.0 – Linking Data
Title Publisher
Price
Format
Cover
Author
“I see: things + relationships. This informationis about a book.”
SemanticTechnologies
SemanticWeb
LinkedOpenData
Linking Open Data ProjectMay, 2007
March 2009
Data from these trusted sources is available for you
to use in your applications TODAY.
Data you can LINK to.
And not just data…
Semantic Data that is not onlymachine READABLE.
It is machine UNDERSTANDABLE!
Disambiguation
Disambiguation
mole, n.
But…
MetadataDoctorow’s Criticisms LOD/LED Response
“People lie” Allow users to choose a social trust model
“People are lazy”Automate where possible and encourage
authoring where needed
“People are stupid”Automate where possible, check where
possible
“Mission Impossible: know thyself” Allow multiple sources of metadata
“Schemas aren’t neutral” Allow multiple schemas
“Metrics influence results” Allow multiple metrics
“There’s more than one way to describe something”
Allow multiple descriptions
LOD/LED is flexible
1. By uniquely identifying THINGS2. By uniquely identifying RELATIONSHIPS3. By using TRIPLES
How does LOD/LED work?
So, what’s a THING?
1. By uniquely identifying THINGS
How does LOD/LED work?
A THING is anything that can be uniquely identified by a URI or a literal (string)
Me
My postal code
The White House
L.A. County’s sales tax rate
http://twitter.com/ericaxel
http://www.city-data.com/zips/90043.html
Lat: 38.89859 Long: -77.035971
9.750 %
http://ericfranzon.com/operator.jpg
This is a collection of THINGS:
t_peopleName City State Post codeDavid Fredericksburg VA 22408Eric Culver City CA 90230
Trees and Tables
t_people
Name City State Post code
David Fredericksburg VA 22408
Eric Culver City CA 90230
people
EricDavid
Fredericksburg VA 22408
City
State Postcode
Culver City CA 90230
City
State Postcode
Trees and Tables – Problem 1
t_people
Name City State Post code flag
David Fredericksburg VA 22408 1
Eric Culver City CA 90230
people
EricDavid
Fredericksburg VA 22408
City
State Postcode
Culver City CA 90230
City
State Postcode
flag1
Adding partial data totables leads to sparseness
Trees and Tables – Problem 2
t_people
Name City State Post code
David Culver City CA 90230
Eric Culver City CA 90230
people
EricDavid
Culver City CA 90230
City
State Postcode
Culver City CA 90230
City
State Postcode
Common data leads to (lots!) of duplication
Graphs
people
EricDavidCity
State
Postcode
Culver City
CA
90230
City
State
Postcode
flag1
Who’s your daddy?
1. By uniquely identifying THINGS2. By uniquely identifying RELATIONSHIPS
How does LOD/LED work?
Is Father of
<owl:ObjectProperty rdf:ID="isFather"><rdfs:domain rdf:resource="#Person"/><rdfs:range rdf:resource="#Person"/>
</owl:ObjectProperty>
mailto:[email protected]
1. By uniquely identifying THINGS2. By uniquely identifying RELATIONSHIPS3. By using TRIPLES
What’s a triple?
Predicate
Triples? It’s Elementary! (School)
book has title.
RelationshipThat is a Triple!
“This book has a title.”
“Eric wrote this Web page.”
“This article is about moles.”
“I like blues.”
“I like B.L.U.E.S.”
“This image can be used non-commercially.”
“My email address is [email protected].”
Triples? It’s Elementary!
Book Has Title “Title”
Eric Created Webpage
Image Has License CC Non-Commercial
TriplesSu
bjec
ts
Obj
ects
Predicates
Book
Author Title
PublisherISBN
The Trouble with Triples
Cytoscape.org
Review of the Review
Our Data are Multiplying.
Trends in data growth
• Vast amounts of digital data being produced daily.
–Wal-Mart produces 1 million transactions every hour. DBs estimated at > 2.5 petabytes
• US National Archives creating > 10 million digital assets annually
Data Inflation
• Megabyte (MB) = 220
• Gigabyte (GB) = 230
• Terabyte (TB) = 240
• Petabyte (PB) = 250 or 1000TB
• Exabyte (EB) = 260 or 1,000PB
• Zettabyte (ZB) = 270 or 1,000EB
• Yottabyte (YB) = 280 or 1,000ZB
Acceleration
–Decoding human genome involves analyzing 3 billion base pairs
• what took 10 years to process in 2003, takes a week today
A brand new professional has emerged ....
The data scientist, who combines the skills of
software programmer, statistician and storyteller/artist to extract the
nuggets of gold hidden under mountains of data.
- The Economist, “Data, data everywhere”, Feb 27th 2010
When we come back…
S – T – R – E – T – C - HBreak!
Linked Data is like a harmonica
• It’s easy to play
Facebook• Unique Visitors*: 540,000,000• Page Views: 570,000,000,000
* Per month
Source: Google - The 1000 most-visited sites on the web
FOAF: Friend-Of-A-Friend
http://www.foaf-project.org/
FOAF-a-Matichttp://www.ldodds.com/foaf/foaf-a-matic
semantictweet.com
semantictweet.com
semantictweet.com
Can create four FOAF files: • Friends (who I follow)• Followers• All• Just Me
Linked Data is like a harmonica
• It’s easy to play• It’s a “real” instrument
The Technologies of RDBMS
• Data• Schemas• Query Language
RDBMS Datat_people
Name City State Post codeDavid Fredericksburg VA 22408Eric Culver City CA 90230
RDBMS Schema
RDBMS Query Language: SQL
SELECT isbn, title, price, price * 0.06 AS
sales_taxFROM Book WHERE price > 100.00 ORDER BY title;
The Technologies of LOD/LED
• Data• Schemas• Query Language
The Data Language
ResourceDescriptionFramework
RDF TriplesSubject Predicate Object
http://plushbeautybar.com dc: creator http://www.ericaxel.com/foaf.rdf
http://www.geonames.org/maps/google_34.021_-118.396.html
dc: location N 34° 1' 16''W 118° 23' 47''
http://twitter.com/ericaxel foaf: knows “Brian Sletten”
RDF Triple ComponentsSubject Predicate Object
http://plushbeautybar.com dc: creator http://www.ericaxel.com/foaf.rdf
http://www.geonames.org/maps/google_34.021_-118.396.html
dc: location N 34° 1' 16''W 118° 23' 47''
http://twitter.com/ericaxel foaf: knows “Brian Sletten”
URI URI URI orString Literal
http://twitter.com/bsletten
“RDF is good for distributing dataacross the Web and pretendingit’s in one place.”
-Dean Allemang, TopQuadrant
Just so you know…There are many ways of representing RDF:
• RDF/XML• N3• JSON
• N-Triples • Turtle• RDFa
Each serialization has pros and cons, but they all are used to connect THINGS and RELATIONSHIPS into TRIPLES
The Schemata
Linked Data schemas consist of:
Your RDF relationships (predicates)+
Relationship descriptions
LOD/LED Schemata
id First Name Last Name
1 Tony Shaw
Schema
Data
Initial Schema
hasID
hasFirstName hasLastName
Tony Shaw1
owl:sameAs
hasSurname
Relationshipdescription
Choosing Relationships
• Reuse popular vocabularies
–FOAF (Friend-of-a-friend)
–Dublin Core (library/publisher metadata)
–SIOC (Semantically-Interlinked Online Communities)
• ...or make up your own!
RDF TriplesSubject Predicate Object
http://plushbeautybar.com dc: creator http://www.ericaxel.com/foaf.rdf
http://www.geonames.org/maps/google_34.021_-118.396.html
dc: location N 34° 1' 16''W 118° 23' 47''
http://twitter.com/ericaxel foaf: knows “David Wood”
1. Resource Description Framework Schema (RDFS): Simple, hierarchical classes
2. Simple Knowledge Organization System (SKOS): Port taxonomies to the Semantic Web
3. Web Ontology Language (OWL): Complex logical relationships
Relationship Descriptions
Combine vocabularies and descriptions
LOD/LED Schemata
• Put as much work into creating your LED schema as you put into creating your relational schemas
• ... maybe even a bit more (due to links between your data and others’).
New York Times -SKOS
New York Times -SKOS
New York Times -SKOS
SKOS STUFF
The query language
SPARQLProtocolAndRDFQueryLanguage
SPARQL
SPARQL Example #1FOAF (some people that Eric Franzon knows)
PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?nameFROM <http://ericaxel.com/eric.rdf>WHERE {
?knower foaf:knows ?known .?known foaf:name ?name .
}
SPARQL Example #1
Example #1 - Results
SPARQL Example #2Querying two FOAF Profiles
PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>SELECT ?nameFROM NAMED <http://ericaxel.com/eric.rdf>FROM NAMED <http://zepheira.com/team/dave/dave.rdf>WHERE {GRAPH <http://ericaxel.com/eric.rdf> {?x rdf:type foaf:Person .?x foaf:name ?name .
} .GRAPH <http://zepheira.com/team/dave/dave.rdf> {?y rdf:type foaf:Person .?y foaf:name ?name .
} .}
Example #2 - Results
SPARQL Example #3Bart Simpson's chalkboard gags (DBPedia)
SELECT ?episode,?chalkboard_gagWHERE { ?episode skos:subject ?season .
?season rdfs:label ?season_title . ?episode dbpedia2:blackboard ?chalkboard_gag .
FILTER (regex(?season_title, "The Simpsons episodes, season")) . } ORDER BY ?season
Example #3 - Results
http://www.milinkito.com/swf/bart.php
Are *real* companies using Linked Data?
Easy to play; takes work to master.
…and many more!
E-Commerce
A vocabulary to describe products, services, and other e-commerce terms.
Who is using GoodRelations?
1100+ Best Buy stores
Phase 2
~640,000 “next-gen” product detail pages
21 Open Box Productslisted at this store!
Who is using GoodRelations?
With RDFa + GoodRelations, but no additional SEO work, PlushBeautyBar.com was indexed by Google within one week.
Semantic (Web) Technologies
SemanticWeb
LinkedEnterpriseData
RDBMS
CRM
Calendars
LinkedOpenData
MIXING private and public data?
Absolutely! And it’s really useful to do so!
Example:
iConcertCal
Public + Private Data: iConcertCal
Public + Private Data: iConcertCal
Example:
Siri
Siri.com
Siri is a Virtual Assistant.
I ask it to do things for me.
It does, by mixing data,by disambiguating, andby reasoning.
Siri.com
Siri is a Virtual Assistant.
I ask it to do things for me.
It does, by mixing data,by disambiguating, andby reasoning.
Siri.com
Siri is a Virtual Assistant.
I ask it to do things for me.
It does, by mixing data,by disambiguating, andby reasoning.
Siri.com
Siri is a Virtual Assistant.
I ask it to do things for me.
It does, by mixing data,by disambiguating, andby reasoning.
Siri.com
Siri is a Virtual Assistant.
I ask it to do things for me.
It does, by mixing data,by disambiguating, andby reasoning.
Siri.com
Siri is a Virtual Assistant.
I ask it to do things for me.
It does, by mixing data,by disambiguating, andby reasoning.
Example:
• Largest broadcasting corp. in the world
• 8 national TV channels
• 10 national radio stations
• 40 local radio stations
• An extensive website, bbc.co.uk
• Broadcasts 1,000-1,500 programs per day.
• Publishes information in several formats: audio, video, textual.
• Needed to relate information across media for both users and third-party developers
• Approach: Create a Web presence for each
• Broadcast
• Artist
• Species (and other biological ranks), habitat and adaptation
–that the BBC has an interest in.
"Creating web identifiers for every item the BBC has an interest in, and considering those as aggregations of BBC content about that item, allows us to enable very rich cross-domain user journeys."-- Yves Raimond
• BBC Music is underpinned by the Musicbrainz music database and Wikipedia.
• “BBC Music takes the approach that the Web itself is its content management system. [BBC] editors directly contribute to Musicbrainz and Wikipedia.”
BBC
• Wildlife Finder links existing LOD data with BBC content to make pages about each species, habitat and adaptation:
• Wildlife programmes (clips and episodes) are identified by tagging the clip or episode with the appropriate dbpedia URI.
"The RDF representations of these web identifiers allow developers to use our data to build applications."-- Yves Raimond
A few final thoughts
A little bit can be very powerful!
Web 3.0 = Semantic Web
tripleOWLRDF
SPARQL
Linked Data
RDFs
SKOS
RDFa
Web 3.0 = Semantic Web
Dublin Core
tripleOWLRDF RDFa PURLs
ontology
NLP
OWL-DLOWL-FullRDFs
entity extraction
OWL2OWL-lite
subject objectpredicate
folksonomy
microformats GRDDL
URI
triplestore
SPARQLArtificial Intelligence cloud computing open world reasoning
reasoning engine
Linked Data
taxonomy
data portability
LOD LED
REST
vocabulary
SKOS
microdata
Further Reading
…and more to come!
Semantic UniverseFree Informational Resourcewww.SemanticUniverse.com
Semantic Technology Conferencewww.Semantic-Conference.com
June 21-25, 2010
Resourceshttp://geekandpoke.typepad.com/
http://richard.cyganiak.de/2007/10/lod/
http://iconcertcal.com
http://siri.com
http://data.nytimes.com
http://freedigitalphotos.com
http://aldobucchi.com
http://www.milinkito.com/swf/bart.php
Resourceshttp://www.flickr.com/photos/kellyhogaboom/4369774518/
http://www.flickr.com/photos/zenera/56677048/
http://www.flickr.com/photos/97964364@N00/59780745/
http://www.flickr.com/photos/starwarsblog/793008715/
http://www.flickr.com/photos/peterpearson/871254091/
http://www.flickr.com/photos/birdfarm/60946474/
http://www.flickr.com/photos/entropy1138/173847148/
http://www.flickr.com/photos/wainwright/351684037/
http://data.nytimes.com/50891932523096258603.rdf