Making Linked Data Diachronic
Vassilis ChristophidesUniversity of Crete & FORTH-ICS
Heraklion, Crete
Data as an asset!
• One of the most significant changes of the past decade has been the widespread recognition of data as an asset– Data is the new “raw material of business” – Economist
Data Products
Emerging Data EcosystemBig Data has blurred the distinction between public and private
PublicPublic
Public
Volunteered Data
CuratedData
ObservedData
Emerging Data Subjects
data marketers
data brokers
data aggregators
http://www.ftc.gov/bcp/workshops/privacyroundtables/personalDataEcosystem.pdf
A series of data stewards , custodians , and curators are producing, consuming and brokering data products forming a far more complex value making chain than in traditional enterprise or scientific contexts
Any data, any size, anywhere
What to Do with this Data?
• Search:– Find structured data when it’s
relevant to search queries• Visualize, enhance,
communicate to relevant audiences– Support Communities [bio-
diversity, climate, water, …]• Relate data across sources• Fusion data from multiple
sources– Data integration!
Immersive insight, wherever you are
Connecting with the world’s data
Microsoft’s Approach to Big Data
Emerging Data Life-cycle
http://www.ipsr.ku.edu/naddi/about.shtml
Data as a Service (DaaS)
Data as a Service
Software as a Service
Platform as a Service
Infrastructure as a Service
© www.emc.com/collateral/software/white-papers/h10839-big-data-as-a-service-perspt.pdf
• DaaS promises that data products can be provided on demand to the user regardless of geographic or organizational separation of provider & consumer
• DaaS brings the notion that data related services can happen in a centralized place – aggregation, quality, cleansing and enriching data and offering it to different systems, applications or mobile users, irrespective of where they were– Virtualized– On-demand– Self-service– Scalable– Pay as you go
Data Marketplaces
• Services that make it easy to find data from a range of secondary data sources, then consume the data in a usable and unified format– Several of these services are trying to create marketplaces for
data, envisioning that data providers can offer their data sets for sale to data seekers (DataMarket.com)
Data Aggregation and Curation Layer
Data Connection Layer
Data Visualization and Analysis Layer
Data Hosted by Third Party
Data Hosted by Data Provider
Data Hosted in
Marketplace
Dat
a as
a S
ervi
ce
Pres
erva
tion
Serv
ice
9 Vertical Data Markets
François Bancilhon Data Publica “de data rerum” WOD Tutorials 2013 Paris
Vertical Example Size (M€)
Financial Reuters 300
Press Press Index 250
Legal Francis Lefebvre 240
Solvability Altarès 160
Scientific Technical Medical
Meteo France 160
Image Sipa 60
Economy Société.com 55
Marketing Acxiom 55
Patents Reuters 25
Only a Small Portion of Big Data!
idgknowledgehub.com/idc-releases-first-worldwide-big-data-technology-and-services-market-forecast-shows-big-data-as-the-next-essential-capability-and-a-foundation-for-the-intelligent-economy/2012/05/07/
Data Hub for Market Intelligence
Source Hjalmar Gislason DataMarket, Inc Emerging DaaS business models: A case study European Data Forum (EDF), Dublin 2013
hortonworks.com/blog/7-key-drivers-for-the-big-data-market
Potential Benefits of Linked Data for Data Marketplaces
• Abstraction layer for virtualized data access across sources– Basis for enabling automation of datasets discovery, linking&fusion
• Flexible data representation model (RDF) and global identifiers for all objects (URI)– Makes easier incremental data integration, interactive exploration
and ad hoc analysis of data• Interlinked datasets
– Newly added data can be integrated with existing ones in the marketplace
– Network effects• Data marketplace interoperability
– Data from different marketplaces can be easily federated• Derived knowledge / facts
– RDF inference of additional implicit facts
Web Data of Increasing StandardizationNot all linked data is open and not all open data is linked!★ Available on the web (whatever format) but with an open license, to be Open Data★★Available as machine-readable structured data (e.g. excel vs. image scan of a table)
★★★ as (2) plus non-proprietary format (e.g. CSV instead of excel)★★★★ as (3), plus using open standards from W3C (RDF and SPARQL ) to identify things through dereferenceable HTTP URIs, to ensure effective access
★★★★★ as all the above plus establishing links between data of different sources
File format
Recommendations(on a scale of 0-5)
csv ★★★
xls ★
pdf ★
doc ★
xml ★★★★
rdf ★★★★★
shp ★★★
ods ★★
tiff ★
jpeg ★
json ★★★
txt ★
html ★★
Key Players Offers Classification
Data Cube
+
DIACHRON Objectives & Approach
Appraising
Integrating
ArchivingProducing
Publishing
Cleaning
• Preserve (semi-)structured, interrelated, evolving data by keeping them constantly accessible & reusable from an open framework such as the Data Web
• Calls for effective & efficient techniques to manage the lifecycle of web data involving data producers, curators, brokers and consumers– Pay-as-you-go data preservation
spreading costs among key players in a community of interest
• Diachronic Data: Enhance data with temporal and provenance annotations as data products are re-used through complex value making chains
DIACHRON Research Agenda• How can we assess the quality of harvested datasets in order to
decide which (the data quality dimensions problem) and how many versions of them deserve to be preserved for future use (the appraisal problem)?
• How can we understand dependencies of datasets (the provenance problem) and how can metadata (temporal, spatial, thematic) can be smoothly represented along the data (the annotation problem)?
• How can we monitor changes of third-party datasets (the evolution tracking problem) or how can local/remote data imperfections (e.g., due to change propagation) can be repaired (the curation problem)?
• How do we cite particular versions of a dataset (the citation problem), and how will we be able to retrieve them when looking up a reference (the long term accessibility problem)?
• How do we maintain the consistency of multiple versions of dependent datasets (the archiving problem) and how we will access the datasets along their evolution history (the longitudinal querying problem)?
Knowledge Bases
Datasets
Linked Open Data cloud
AnnotationServices (WP2)
Diachronic Citations
EvolutionServices (WP3)
Archiving Services (WP4)
Longitudinal Query
Processing
Temporal and Provenance Annotations
Cleaning and Repairing
ChangeRecognition and
Propagation
Acquisition Services (WP5)
Multiversion Archiving
Quality-driven Adaptive Crawling
Ranking and Appraisal
distribute
fetch
applyfetch
annotate
fetch
shareOpen Data Applications (WP7)
Enterprise DataIntranets (WP8)
ScientificLinked Data (WP9)The
DIACHRON Platform (WP6)
WP4
WP6
WP5
WP9
WP3
WP2
WP8
WP7
DIACHRON Data Services & Work Plan
Diachronic Data Services Lifecycle
Data Repurposing
Data Archiving Data Evolution
Data Appraisal
Data Citation
Concluding Remarks• The integrated DIACHRON platform and services aim to
support long term usability of open and/or linked data published in the Web and within Enterprise Intranets
• The concept of diachronic data intends to foster self-preserving data embedding an understanding of their evolving semantics, use contexts, and interpretations
• DIACHRON is expected to:
Improve
our understanding of how
linked/open data evolv
es
Reduce the maintenanc
e costs when integrating linked
/ open data
Foster data accountabil
ity and
transparency in open dynamic data space
s
Address
sustainabili
ty issues
for preserving Big
Data
Data Custodians’Effort
Data Consumer‘s Effort
Data Publisher‘s Effort
Fix Overall Data Preservation
Effort
Business Models for Linked Data Publishers
http://chiefmartec.com/2010/03/business-models-for-linked-data-and-web-30
Business Webs as Types of Value Creation
• Agora: Open electronic marketplaces with regard to pricing and offered products (e.g. Android marketplace)
• Aggregation: Closed, controlled electronic marketplaces (e.g. Apple App Store)
• Distributed Network: Value Network• Value Chain: ICT-enabled Value Chains• Alliance: Loosely cooperation market players (e.g. Open Source projects)
Data-Driven Business Models
Source Michalis Vafopoulos
Top Related