Web 3 Mark Greaves
-
date post
17-Oct-2014 -
Category
Business
-
view
3.365 -
download
2
description
Transcript of Web 3 Mark Greaves
Dr. Mark GreavesVulcan Inc.
[email protected] © 2010 Vulcan Inc.
The Evolving Semantic Web:From Military Technology to Venture Capital
Defining TermsDefining Terms
Web 2.0 – “The Read/Write Web”– Web as platform – Social networks, collective intelligence, and crowdsourcing– Long Tail Markets– 1st gen web data exploitation: mashups, tag clouds, RSS
2
Defining TermsDefining Terms
Web 2.0 – “The Read/Write Web”– Web as platform – Social networks, collective intelligence, and crowdsourcing– Long Tail Markets– 1st gen web data exploitation: mashups, tag clouds, RSS
Web 3.0 – “The World Wide Database”– Overall: The use of Semantic technology in Web applications– My talk: The use of Semantic Web technology in Web applications
3
Defining TermsDefining Terms
Web 2.0 – “The Read/Write Web”– Web as platform – Social networks, collective intelligence, and crowdsourcing– Long Tail Markets– 1st gen web data exploitation: mashups, tag clouds, RSS
Web 3.0 – “The World Wide Database”– Overall: The use of Semantic technology in Web applications– My talk: The use of Semantic Web technology in Web applications
Imagine the biggest database in the world…– Is it the Social Security database on steroids?– Or is it like the web?
• Universal publication, web-scale pivot, multiple points of view• Always changing, always evolving, no central control• The Semantic Web is both geeky and transformative
4
5
What is the Semantic Web? 4 AnswersWhat is the Semantic Web? 4 Answers
The set of W3C standards for simple logic languages on the web– RDF, RDF/S, OWL, OWL2, SPARQL– Plus all the pages and web data sources that use these standards
6
What is the Semantic Web? 4 AnswersWhat is the Semantic Web? 4 Answers
The set of W3C standards for simple logic languages on the web– RDF, RDF/S, OWL, OWL2, SPARQL– Plus all the pages and web data sources that use these standards
The Web of Data– A fully distributed web-based system to publish logical assertions – A way to link to someone else’s data and query it across the web– Democratic, crowd-based, scalable knowledge engineering– The hottest area of web architecture right now
7
What is the Semantic Web? 4 AnswersWhat is the Semantic Web? 4 Answers
The set of W3C standards for simple logic languages on the web– RDF, RDF/S, OWL, OWL2, SPARQL– Plus all the pages and web data sources that use these standards
The Web of Data– A fully distributed web-based system to publish logical assertions – A way to link to someone else’s data and query it across the web– Democratic, crowd-based, scalable knowledge engineering– The hottest area of web architecture right now
The largest formal knowledge base on Earth– Also the messiest formal knowledge base on Earth
8
What is the Semantic Web? 4 AnswersWhat is the Semantic Web? 4 Answers
The set of W3C standards for simple logic languages on the web– RDF, RDF/S, OWL, OWL2, SPARQL– Plus all the pages and web data sources that use these standards
The Web of Data– A fully distributed web-based system to publish logical assertions – A way to link to someone else’s data and query it across the web– Democratic, crowd-based, scalable knowledge engineering– The hottest area of web architecture right now
The largest formal knowledge base on Earth– Also the messiest formal knowledge base on Earth
A revolution in the way we think of data, crowds, and schemas– Web techniques (social networks, collaboration, mashup, search) applied to data– Massive, partial, participatory, logically weak, dynamic, schema-last– A way to democratize and socially curate databases
9
Talk Outline: The Evolving Semantic WebTalk Outline: The Evolving Semantic Web
The Origins of the Semantic Web – DARPA’s DAML Program– RDF and OWL
Semantic Web Evolution to 2010
Semantic Web Transformation in Three Application Areas– Markets and Companies
The Fourth Generation– Linked Data and the Scaling Revolution
10
Talk Outline: The Evolving Semantic WebTalk Outline: The Evolving Semantic Web
The Origins of the Semantic Web – DARPA’s DAML Program– RDF and OWL
Semantic Web Evolution to 2010
Semantic Web Transformation in Three Application Areas– Markets and Companies
The Fourth Generation– Linked Data and the Scaling Revolution
11
The Roots of the Semantic WebThe Roots of the Semantic Web
Semantic technology has been a distinct research field for decades– Symbolic Logic (from Russell and Frege)– Knowledge Representation Systems in AI
• Semantic Networks (Bill Woods, 1975)• US and EU R&D programs in information integration (80s and 90s)• Development of simple tractable “description logics” for classification• Conceptual Graphs and Formal Concept Analysis
– Relational Algebras and Schemas in Database Systems
Library Science (classifications, thesauri, taxonomies)
So, What Sparked the Semantic Web?
What’s new was the Web!– The material needed to answer almost any question is somewhere on the web– A massive infrastructure of data servers, protocols, authentication systems,
presentation languages, and thin clients that can be leveraged– A way around needing the “big data warehouse”
12
The Beginnings of the US Semantic Web: DARPA’s DAML ProgramThe Beginnings of the US Semantic Web: DARPA’s DAML Program
Solution:Augment the web to link machine-readable knowledge to web pages
Extend RDF with Description Logic Extensibility via frame-based language design Create the first fully distributed web-scale
knowledge base out of networks of hyperlinked facts and data
Approach:Design a family of new web languages
Basic knowledge representation (OWL)Reasoning (SWRL, OWL/P, OWL/T)Process representation (OWL/S)
Build definition and markup tools Link new knowledge to existing web page elements
Test design approach with operational pilots in US Government
Partner with parallel EU efforts to standardize the new web languages
People use implicit knowledge to reason with web pages
People use implicit knowledge to reason with web pages
Computers require explicit knowledge to reason with web pages
Computers require explicit knowledge to reason with web pages
Existing Web
(HTML/XML over HTTP)
Semantic Web
(OWL over HTTP)
Links via URLs
Problem: Computers cannot process most of the information stored on web pages
13
What is RDF?What is RDF?
RDF is a dialect of XML, designed for dynamic web data– RDFa is a standard way to encode RDF in ordinary XHTML web pages
RDF and RDF Schema defines data entities and basic relations– Assertions are triples of (resource, property, resource) or (resource, property, value)– All elements in triples are either primitive values or URIs– Relations are represented directly, unlike most databases– Assertions are precise enough to be interpreted by machines
typeOf
isa
VIN1234VIN1234
hasManufacturer
isa
isa
isa
isaisa
VehicleVehicle
TruckTruck
199919991999 Corolla1999 Corolla
ToyotaToyota
CorollaCorolla
CarCar
ResourcePropertyProperty Value
VehicleVehicleManufacturerManufacturer
YearYear
Mark’s CorollaMark’s Corolla
manufacturesmanufac
turedIn
manufacturedIn
hasVIN
hasManufacturer
Graphic from Mary Pulvermacher, MITRE
14
What Does OWL Add to RDF?What Does OWL Add to RDF?
OWL extends RDF with more expressive power– Simple relations between classes: Equivalent class (e.g., US_President and
PrincipalResidentofWhiteHouse), Disjoint class (e.g., Male and Female), subClassOf– Create new classes: intersectionOf, unionOf, complementOf)– Property characteristics (inverseOf, transitive, symmetric, subPropertyOf, etc.)– Range and Cardinality constraints (e.g., birthMother is exactly one person)
OWL allows its users to make new inferences– The extended expressive power allows OWL assertions to be combined according to
the rules of OWL’s logic (SHIQ )
OWL enables its users to surround raw data statements with a context (an ontology)
If... And...
spouseOf
Can conclude...
SymmetricPropertySymmetricProperty JimJimJimJim
MaryMaryMaryMaryspouseOfspouseOfspouseOfspouseOf
rdf:type spouseOf
MaryMaryMaryMary
JimJimJimJim
Graphic from Mary Pulvermacher, MITRE
15
Talk Outline: The Evolving Semantic WebTalk Outline: The Evolving Semantic Web
The Origins of the Semantic Web – DARPA’s DAML Program– RDF and OWL
Semantic Web Evolution to 2010
Semantic Web Transformation in Three Application Areas– Markets and Companies
The Fourth Generation– Linked Data and the Scaling Revolution
16
The Semantic Web in 2010The Semantic Web in 2010
CuttingEdge
Mature
StillResearch
“The Famous Semantic Web Technology Stack”
17
The Semantic Web in 2010The Semantic Web in 2010
CommercialCuttingEdge
Mature
Active Researchand StandardsActivity
“The Famous Semantic Web Technology Stack”
18
CommercialCuttingEdge
The Semantic Web in 2010The Semantic Web in 2010
Mature
Active Researchand StandardsActivity
“The Famous Semantic Web Technology Stack”
19
The Semantic Web in 2010The Semantic Web in 2010
Mature
Active Researchand StandardsActivity
CommercialCuttingEdge
“The Famous Semantic Web Technology Stack”
20
Active Researchand StandardsActivity
Completing the Semantic Web PictureCompleting the Semantic Web Picture
Mature
Other Technologies Impact the Semantic Web
More OntologiesTag SystemsMicroformats
Social Authorship
Combined RDF/OWL and
Database systems
Scalable Reasoning Systems
A Huge Base of RDF data
CommercialCuttingEdge
Description LP, SWRL, SILK...
21
Beyond RDF and OWL: 2010 Semantic Web InfrastructureBeyond RDF and OWL: 2010 Semantic Web Infrastructure
Data Markup Languages– HTML-friendly semantic markup dialects:
Microformats and RDFa– OWL 2 improves the original OWL spec
Triplestores and SPARQL Servers– Stores for 1B+ triples now available, – Commercial: AllegroGraph, Virtuoso,
BigOWLIM, Oracle 11g Semantic Technologies, Talis Platform…
– Open Source: 4Store, Sesame, Redland...– Next step is parallel web delivery
architectures
Entity Name Service (Okkam, DBpedia)
Semantic Web Reasoners– Commercial: Oracle 11g RDFS/OWL
engine, Ontobroker, Ontotext, RacerPro– Open Source: Pellet, FaCT++...– BRMS integration
Vocabularies and Design Tools– Ontologies: Dublin Core, FOAF, SIOC...– OpenSource: Protégé, SWOOP...– Commercial: TopBraid Composer,
Knoodl…
Semweb Data Generation– RDF / RDBMS front-ends – Entity extractors and taggers (OpenCalais)– Zemanta-type author’s assistants– Semantic wikis
Semweb Data Exploitation– Semweb search engines (Sindice, Watson,
Falcon...)– Yahoo SearchMonkey / Google Rich
Snippets– Browser extensions and facet systems
Visualization Tools– Simile Project (http://simile.mit.edu/) – Several Commercial Companies
User-layer ToolsServer Infrastructure
22
Evolving Conceptions for the Semantic WebEvolving Conceptions for the Semantic Web
Initial Semantic Web Conception*
* By most people but not Tim Berners-Lee
Semantic markup would be tightly associated with individual web pages
– “Translate the Web for machines”– Markup lives in associated RDF/XML
Core problem is labeling free-text web pages with a (pre-defined) markup vocabulary
– Entity extraction and ontology-aware NLP– Machine learning technologies– Manual annotation tools
Need an all-encompassing ontology or set of logically compatible ontologies
– Agreement was always elusive
Need to achieve scale and consistency in semantic page markup
– Manual annotation has incentive, quality/ consistency, and cost/benefit issues
Hypertext W
eb
(HTML/X
ML over H
TTP)
Seman
tic W
eb
(RDF/O
WL o
ver H
TTP)
Links via URLs
23
Evolving Conceptions for the Semantic WebEvolving Conceptions for the Semantic Web
The Web is a publishing platform for formal knowledge as well as pages
– RDFa allows mixed HTML and RDF– RDF/OWL documents can be published by
web-connected databases– Huge numbers of knowledge publishers– Linkage via owl:sameAs,
owl:equivalentClass, owl:equivalentProperty, skos:exactMatch, etc.
Core problem is maintaining a set of evolving and partial agreements on semantic models and labels
– Semantic Web rule languages (SWRL, SILK, etc.) can be used to specify mappings and symbolic relations
– Overall semantic cohesion is increased as more data is mapped together
– Hard problem is cost-effectively maintaining semantic models
Supplemental semantics is carried in the free-text web
The Semantic Web in 2010
Hypertext W
eb
(HTML/X
ML over H
TTP)
Seman
tic W
eb
(RDF/O
WL o
ver H
TTP)
Linksvia URLs
...
Semantic Data Publishers(publication of RDF and SPARQL endpoints)
RDFaRDFa
24
Recapitulating the Semantic WebRecapitulating the Semantic Web
Semantic Web = “The World Wide Database”– Anyone can read, anyone can write, with very loose version control– “Schema-last” data engineering
• The semweb is a database with a constantly-changing Entity/Relation diagram• High data flexibility at the cost of non-optimized queries
– Most current Semantic Web data originates in traditional databases
Semantic Web encourages the collective publication of data ontologies and data links– Data contexts/ontologies can be created, published, and linked to others– Data items can be extended, linked, grouped, and analyzed within contexts– Easy to add value by linking data with relationships– Trust relations work in the same way as the text web– Network effects are beginning
The Semantic Web is developing into a web-scale social platform for data integration and fusion
25
2 Upfront Challenges in the Semantic Web2 Upfront Challenges in the Semantic Web
The Semantic Web is not consistent– Semantic Web data is published by many people and there is no central control
• The Semantic Web is a mix of formal and informal usages• Consistent usage is only found in islands
– How do you select a data context (ontology/vocabulary)?• Sindice catalogs >36000 uses of the term “publication” in RDF, tags, RDFa,
microformats); how do you choose?• How do you harmonize with other data publishers so that your data can be found and
used?• An opportunity to dust off all the old classification standards!
RDF/OWL is too simple for many kinds of data– Numeric relations can be awkward to express
• Example: “Jane has a 50% chance of havingthe H1N1 flu.”
– RDF/OWL core semantics have no good support for units, probabilities, uncertainty, etc.
– Statements about statements are complex• Example: “Amazon says Sony Bravia-series
LCD TVs are Energy Star compliant.”
26
Talk Outline: The Evolving Semantic WebTalk Outline: The Evolving Semantic Web
The Origins of the Semantic Web – DARPA’s DAML Program– RDF and OWL
Semantic Web Evolution to 2010
Semantic Web Transformation in Three Application Areas– Markets and Companies
The Fourth Generation– Linked Data and the Scaling Revolution
27
First Generation Semantic Web ApplicationsFirst Generation Semantic Web Applications
Semantically-Boosted Search and Presentation
Semantic Web in Search and PresentationSemantic Web in Search and Presentation
Yahoo! SearchMonkey (see also Google’s Rich Snippets, Bing Captions) use semantic data to enhance snippet presentation
– Yahoo’s crawler indexes and interprets user RDFa and microformats– GreaseMonkey-style presentation reformatting for search snippets– Display URL as an enhanced result, with standard or custom presentations
Nick Cox (Yahoo!) reported up to 15% greater CTR for rich presentation– SearchMonkey code seems not to be working currently– Incentives: “Structured data is the new SEO” (Dries Buytaert, Drupal)
28
Alex Moskalyuk | FacebookAlex Moskalyuk is on Facebook. Join Facebook to connect with Alex Moskalyuk and others you may know. Facebook gives people the power to share and makes the ...www.facebook.com/.../Alex-Moskalyuk/100000220291330?... - Cached - Similar
Why Should a Business Use Semantic Web Markup?Why Should a Business Use Semantic Web Markup?
29
30
Second Generation Semantic Web ApplicationsSecond Generation Semantic Web Applications
An only slightly newer problem type– Business exploitation of structured enterprise data (RDBMS, Spreadsheet, ERP data)
• Backwards to Data Management to reduce cost of managing, migrating, integrating• Forwards to Business Process Management
– Support for unified query, analytics, and application access– An original government use case for the semantic web
Markets Segments and Players– Gartner estimates that EII software and services alone is $14B/year, with 40% growth
over 5 years (pre-recession numbers, though)– Very complex market space includes EAI, Entity Analytics, MDM, BI, BPM, CPM...– Huge entrenched players (IBM, SAP, Oracle...) and major consulting shops
Some lessons with applying semantic web technology in this space– Fundamental problem is understanding the semantics from legacy systems, not in KR– Pure Semantic technology companies tend to be unsophisticated about the customer – RDF/OWL is typically too weak and must be augmented by rules, quantities, etc.– Raw performance is typically inferior to a well-designed RDBMS– Tends to be an IT sale (not Line-of-Business sale), with attendant cost pressures
More Agile Business Intelligence
31
Layers of Semantic Data in Strategic Enterprise ITLayers of Semantic Data in Strategic Enterprise IT
Semantic data modeling
Automatic data management
EII - Information integration leadership
BI -- Better business intelligence
Financial modeling & intelligence
Semantic process modeling
BAM and CPM w/ predictive analytics
MDM – Master Data Management
Increasing Value of Semantic Data Models
Sales Pitch for Semantic Data Models– Promote flexibility and improvisation in the face of dynamism– Expose business processes as rules, for governance and compliance– Can be driven all the way through the architecture, from SOA to CPM dashboards
This vision has not (to my knowledge) been proven in a large enterprise
32
Semantic Web in Strategic Enterprise ITSemantic Web in Strategic Enterprise IT
A few companies are using semantic web technology in Enterprise IT contexts
– Majority of revenue is usually from consulting engagements, not software products or services
– Carefully-curated external ontologies with logic engines are used for flexible high-end data integration
– RDF/OWL has been useful as a data language– OWL inferences complement SQL joins
Data.gov and data.gov.uk– Provide the data for agile “citizen analytics”
Lower-cost Data Pooling– Enterprises can use the Semantic Web as a tool to
share the costs of administering foundational information– Reference information can be voluminous and expensive to maintain– Example: Linking Open Drug Data project
• Every drug company has to maintain databases of proteins, pathways, genes, diseases, etc.
• Several major pharma companies have experimented with allowing some of this data to be shared and maintained in the semantic web
• Currently includes 8M RDF triples and 370K links
33
Third Generation Semantic Web ApplicationsThird Generation Semantic Web Applications
A new problem type– “Semantic Web should allow people to have a better online experience” – Alex Iskold,
AdaptiveBlue– Enhance the human activities of content creation, publishing, linking my data to other
data, socializing, forming community, purchasing satisfying things, browsing, etc.– Improve the effectiveness of advertising
Market Segments and Players– Data outsourcing model (Evri, Freebase, ...)– Semantic enhancements to blogs and wikis (Zemanta, Faviki, Ontoprise, Radar, ...)– Semantics for Social Networking (MySpace RDF service and microformats, Facebook
RDF models, etc.)
Some lessons with applying semantic web technology in this space– If we don’t have semantic convergence, then semantics isn’t a differentiator– No one really knows the design principles that allow some Web 2.0/3.0 sites to be
successful and others to never get traction
Web 2.0 and the Socio-Semantic Web
34
Data Outsourcing on the Semantic WebData Outsourcing on the Semantic Web
Do for general data what Bloomberg and Capital-IQ do for financial data– Collect data, clean it, fuse it with quality control– Sell subscriptions, custom datasets and feeds– Knowledge required for targeted advertising
Example: Evri– Use semantic technology to filter and classify the stream– Create semantic contexts and personalized points of view– Follow topic spaces, high-level abstractions, entity clusters– Used by: Washington Post, LetMeKnow, Yahoo
Example: Freebase– A huge collection of structured data harvested from many sources– Contributions from individuals in a wiki-style format– Building a global resource of “almanac”-style data, with widgets to supply
autocomplete suggestions, better tags, related items, topic hubs, etc.– Used by Bing, Wall Street Journal, Google, etc.
35
Large Web Knowledge Bases: FreebaseLarge Web Knowledge Bases: Freebase
~1GB of socially-curated RDF almanac data plus data from partners (Creative Commons licensed)
36
Semantic Blogger Support: ZemantaSemantic Blogger Support: Zemanta
Automatic link, image, keyword, tag suggestions for blogs and email– Average semi-professional blogger spends ~20 mins adding “decorative” content
Key technology is fast language processing Markup accuracy is guaranteed because users explicitly add the suggestions
(feedback)– Includes user specified friends/feeds/photos/etc as well as standard ones
37
Semantic WikisSemantic Wikis
38
Example: HL7 Healthcare Vocabulary ManagementExample: HL7 Healthcare Vocabulary Management
Large-scale terminology management at www.biomedgt.org.
39
Semantic Data in HL7 (www.BioMedGT.org)Semantic Data in HL7 (www.BioMedGT.org)
Subject-matter experts give simple, authorable statements Semantic data provides pivotable dimensions
40
Talk Outline: The Evolving Semantic WebTalk Outline: The Evolving Semantic Web
The Origins of the Semantic Web – DARPA’s DAML Program– RDF and OWL
Semantic Web Evolution to 2010
Semantic Web Transformation in Three Application Areas– Markets and Companies
The Fourth Generation– Linked Data and the Scaling Revolution
41
The Background of Knowledge: DBpediaThe Background of Knowledge: DBpedia
Extract Wikipedia assertions– ~23M Factbox assertions– Category assertions, link structures, tables
DBpedia 3.4 dataset (Nov 09 Wikipedia)– ~2.9M things, ~479M triples
• 282K persons, 339K places, 88K music albums, 44K films, 870K links to images, 3.8M links to relevant external web pages, 4.9M links into RDF datasets
– Classifications via Wikipedia and YAGO categories
– One of the largest broad knowledge bases in the world
Simple queries over extracted data– “Things near the Eiffel Tower”– “The official websites of companies with
more than 50000 employees” – “Soccer players from team with stadium with
>40000 seats, who were born in a country with more than 10M inhabitants”
42
DBpedia for UsersDBpedia for Users
Query Wikipedia like a database
DBpedia Mobile
Putting it all Together: Linked Open DataPutting it all Together: Linked Open Data
43
Goals– Create a single, simple set
of rules for publishing and linking RDF data
– Build a data commons by making open data sources available as RDF
– Set RDF links between data items from different data sources
May 2009 LOD dataset– ~4.7B triples, and ~100M
RDF interlinks, growing faster than I can track
– Estimate 13.5B triples now with ~140M interlinks
– Database linkage means that LOD will soon be impossible to count except via order of magnitude
44
May 2007September 2007
March 2009September 2008
The Growing Web of Linked DataThe Growing Web of Linked Data
45
Topic Distribution in the March 2009 Linked DatasetsTopic Distribution in the March 2009 Linked Datasets
Life Sciences
Publications
Online ActivitiesMusic
Geography
Cross-Domain
4,500,000,000 triples
180,000,000 data links
46
Summing up: Building the Largest Database in the WorldSumming up: Building the Largest Database in the World
In mid-2004...– RDF and OWL had just been standardized– Advances were made via traditional corporate/public R&D programs– The first wave of semantic web startups (many of which have since failed)– Implementations were technically very sophisticated, but fully custom and had
no web involvement– A few early conferences (ISWC, SemTech)
47
Summing up: Building the Largest Database in the WorldSumming up: Building the Largest Database in the World
In mid-2004...– RDF and OWL had just been standardized– Advances were made via traditional corporate/public R&D programs– The first wave of semantic web startups (many of which have since failed)– Implementations were technically very sophisticated, but fully custom and had no web
involvement– A few early conferences (ISWC, SemTech)
Now in 2010...– The Semantic Web is the most exciting thing happening on the web– RDF assertions scaling into the billions, with little to no programmatic control– Search engines are starting to develop snippet products– Early adopters are publishing RDFa and seeing benefits– Network effects are starting
48
Summing up: The Largest Knowledge Base in the WorldSumming up: The Largest Knowledge Base in the World
In mid-2004...– RDF and OWL had just been standardized– Advances were made via traditional corporate/public R&D programs– The first wave of semantic web startups (many of which have since
failed)– Implementations were technically very sophisticated, but fully
custom and had no web involvement– A few early conferences (ISWC, SemTech) and session tracks
Now in 2009...– The Semantic Web is the most exciting thing happening on the web– RDF assertions scaling into the billions, with little to no
programmatic control– Search engines are starting to develop products– Bestbuy is publishing products, store descriptions, hours in RDFa
“Linked Data Now!!” -- Sir Tim Berners-Lee
Put your data on the Semantic WebLink it to Other Data
Be part of the Revolution!
49
Thank You
Disclaimer: The preceding slides represent the views of the author only. All brands, logos and products are trademarks or registered trademarks of their respective companies.