European Data Market SMART 2013/0063 Technical Barriers to ... · deliverable D3.12 of the study...
Transcript of European Data Market SMART 2013/0063 Technical Barriers to ... · deliverable D3.12 of the study...
European Data Market SMART 2013/0063 D 3.10 Technical Barriers to Data Sharing in Europe
6th January 2017
2
Author(s) David Osimo (Open Evidence);
Giorgio Micheletti, Gabriella Cattaneo (IDC)
Deliverable D 3.12, Quarterly Stories – Story 12
Date of delivery 6th January 2017
Version 2.0
Addressee officer Katalin IMREI
Policy Officer
European Commission - DG CONNECT
Unit G1 – Data Policy and Innovation
EUFO 1/178, L-2557 Luxembourg/Gasperich
Contract ref. N. 30-CE-0599839/00-39
3
TABLE OF CONTENTS
EXECUTIVE SUMMARY................................................................................................ 4
1 INTRODUCTION ............................................................................................. 6
1.1 BACKGROUND AND OBJECTIVES ............................................................................ 6
1.2 METHOD .......................................................................................................... 8
1.3 THE STRUCTURE OF THIS DOCUMENT ..................................................................... 8
2 THE THIRD INTEGRATES THE RESEARCH RESULTS TO PROVIDE A SUMMARY
OVERVIEW, AS WELL AS TO DRAW THE FINAL CONSIDERATIONS AND CONCLUSIONS OF THE
STORY. DATA SHARING IN PRACTICE: SELECT CASE STUDIES ................................... 9
2.1 INTRODUCTION .................................................................................................. 9
2.2 CASE STUDY 1: N&W GLOBAL VENDING ............................................................... 10
2.3 CASE STUDY 2: DATARY.IO AND OTHER DATA MARKETPLACES .................................... 13
2.4 CASE STUDY 3: LARGE MANUFACTURING PLANTS .................................................... 16
2.5 CASE STUDY 4: YUKON DIGITAL .......................................................................... 18
3 FINAL CONSIDERATIONS ............................................................................. 20
3.1 CROSS ANALYSIS OF THE CASES .......................................................................... 20
3.2 POLICY IMPLICATIONS ....................................................................................... 21
Table of Tables
Table 1 Cases and key features ...............................................................................................8
4
Executive Summary
While data sharing within sectors and across sectors is gaining momentum in Europe and elsewhere,
several types of barriers may hinder the effectiveness of data transfer and reduce its value-add potential.
The sharing of datasets from different sources facilitates use and re-use of data by multiple actors and
enables the extraction of value from data, leading to the creation of new services, business models, and
the improvement of processes, an evolution called digital transformation. The diffusion of open platforms
for data sharing and the availability of interoperable datasets is one of the key success factors which
may help to drive the European Data Market towards a High Growth scenario by 2020, and push the
contribution of the data economy to the EU GDP up to 4% (compared to 2.5% in the Baseline scenario,
where data sharing is expected to be more limited).1 However, European industries are far from
exploiting the full value of data sharing. According to IDC’s Digital Transformation Maturityscape
Benchmark, the majority of European enterprises (62%) are between the “digital explorer” or “digital
player” maturity stages, characterized by an opportunistic approach to digital transformation not yet fully
integrated across the organization, while 20% are “digital resisters” refusing to engage in this digital
innovation. IDC estimates therefore that the majority of European enterprises have not yet fully
implemented data sharing but are interested to do so and are dealing or starting to deal with data sharing
issues.
There are three main typologies of data sharing barriers emerging from the analysis of the data market:
cultural/organizational (lack of awareness of potential benefits, lack of trust and fear of competition);
legal/regulatory (restrictions of data location and of the free-flow of data, uncertainty about data
ownership and data access); and technical/operational barriers due to the lack of interoperability
between different datasets, lack of standards, high costs of data curation to adapt it for sharing. Data
curation costs, in particular, are estimated to represent more than 50% of the time of data scientists in
data sharing projects.
Based on a set of real-life case studies from the manufacturing industry, the ICT sector and the chemical
industry, this paper focuses on interoperability as the key technical barrier to data sharing and
investigates the issue across different value chains’ players: machinery & equipment providers,
manufactures (industrial plants), manufacturers’ clients, and ICT companies providing ancillary services
(big data start-ups providing data analytics and data marketplaces matchmaking between data supply
and data demand). Our analysis, backed by ad-hoc secondary research and in-depth expert interviews,
reveals that interoperability and technical issues are perceived by stakeholders not as a barrier
preventing data sharing, but simply as an additional cost and a problem with various possible solutions
depending on the specific situation, based on some combination of technology, standardization and
human work. The evolution of the market is currently providing these solutions. Narrow industry
standards, for example, or high level architecture standards can make data easily accessible and
transferrable between operators in the same segment or at industry level. The increasing adoption of
ad-hoc technologies such as APIs and DSK may ensure third-party secure access to data and enable
easier data sharing. What is more, machine learning technologies are expected to enhance data
curation activities, thus facilitating the integration of data from different sources and organizations and
reducing the time needed by data scientists for this thankless task. data sharing projects Finally,
interviewees agree that there is a general trend towards greater openness and data sharing across
sectors. Market forces matter: data are becoming more valuable, the cost of data gathering decreases,
and customers (especially business) expect greater accessibility of the data. We are also witnessing the
emergence of a new category of specialized intermediaries focused on data transformation and curation
to enable data sharing, with several start-ups identified by the Big Data Landscape 2016 of the European
Data Market Monitoring Tool.
1 As reported in the EDM Final Study Report, December 2016, by IDC and Open Evidence
5
While the evolution towards greater data sharing is unanimously recognised, more needs to be done to
increase the speed of this trend according to our interviewees. Also, data interoperability problems which
are currently manageable are likely to become more complex and difficult to solve as the data-driven
ecosystem grows in depth and sophistication, for example through the emergence of Industry 4.0
ecosystems and the diffusion of Internet of Things (IoT) networks of sensors.
European policies and specifically the DSM strategy already address the development of the main
framework conditions for data sharing, particularly through the forthcoming Free-flow of data initiative
and the ICT standardization priorities action plan Communication2, which singles out standards for better
data interoperability as a priority domain.
The open problems emerging from our analysis are the following:
The inherent tension between standardization and innovation/personalization. Enabling data
sharing does not mean imposing standards to all datasets, which would result in excessive
rigidity and prevent innovation, but developing a flexible combination of general and specific
standards, interfaces and common formats, while preventing users lock-ins because of
proprietary technologies.
o Here there is a role to play for policy makers, in order to promote this process, insure a
level playing ground between stakeholders and accelerate as much as possible the
emergence of open standards and common formats, especially in the interest of SMEs
who do not have the market power to drive this process.
Data curation (preparing data for interoperability and sharing) is a time-intensive activity and an
additional cost which is not acting as a barrier: there is no evidence that companies have given
up on data sharing projects because of excessive data curation costs. Fortunately, this task is
becoming more automated and less expensive.
o The policy role here could be to make sure that sufficient R&D investments are made
into the automation of data curation (leveraging machine learning for example), and
also to promote the development of data scientists’ skills in this field. Policy makers
could also raise awareness about the pros and cons of data curation versus
standardization. Enterprises developing a business case for data sharing projects
should make sure that data curation tasks are properly assessed and accounted for,
developing realistic cost-benefits estimates.
Finally, the creation of neutral B2B data exchanges solving interoperability problems is
potentially welcomed by interviewees, but it is unclear how it could be implemented in practice.
This could be an option to be investigated, to understand if such platforms could be a way to
accelerate the solution of technical data sharing issues.
2 https://ec.europa.eu/digital-single-market/en/news/communication-ict-standardisation-priorit ies-digita l-single-
market
6
1 Introduction
1.1 Background and Objectives
The main objective of this report is to investigate the technical barriers to data sharing in Europe in the
context of the European Data Market and its evolution towards an ever-growing data-driven economy.
The report is one of a series of in-depth analyses of focusing on the development of the data-driven
economy in Europe based on specific case studies by sector and/or by technology. It constitutes the
deliverable D3.12 of the study “European Data Market”, SMART 2013/0063 entrusted to IDC and Open
Evidence by the European Commission, DG Connect, Unit G1 – Data Policy and Innovation.
Data sharing is a key enabling factor for the development of the European digital economy. The sharing
of datasets from different sources facilitates use and re-use of data by multiple actors and enables the
extraction of value from data, leading to the creation of new services, business models, and the
improvement of processes, an evolution called digital transformation. The diffusion of open platforms for
data sharing and the availability of interoperable datasets is one of the key success factors which may
help to drive the European Data Market towards a High Growth scenario by 2020, and push the
contribution of the data economy to the EU GDP up to 4% (compared to 2.5% in the Baseline scenario,
where data sharing is expected to be more limited).3 Conversely, relevant barriers to data sharing are
one of the factors leading to the slower growth Challenge scenario. However, data sharing across
companies is still limited and far from its full potential.
Data sharing is a natural consequence of the new wave of technology innovation combining the Internet
of Things (IoT), 5G, cloud computing, cognitive systems and robotics, and of course advanced data
analytics, which is transforming global value chains. Recent studies4 estimate that digitisation of
products and services could add more than 110 B€ of revenue for industry per year in Europe in the
next 5 years, as underlined in the EC Communication “Digitising European Industry”5. According to IDC
estimates, already in 2017 one third of the Global 2000 companies will achieve revenue growth from
information-based products and services at a rate twice faster than from the rest of their offering
portfolio.6
It is increasingly clear that digital transformation is the new competitive must for businesses. IDC defines
it as a continuous process by which enterprises adapt to or drive disruptive changes in their customers
and markets (external ecosystem) by leveraging digital competencies to innovate new business
models, products, and services that seamlessly blend digital and physical experiences. However,
according to IDC’s Digital Transformation Maturityscape Benchmark, based on a yearly survey of
worldwide medium-large enterprises, the European industry is only at the start of this transformative
journey. Last year, approximately 20% of European surveyed enterprises were “Digital resisters”, not
yet engaging in digital transformation, while only 5% were “Digital Disrupters”, already exploiting it fully.
The majority of European enterprises (62%) are between the “digital explorer” or “digital player” maturity
stages, characterized by an opportunistic approach to digital transformation not yet fully integrated
across the organization. Based on this IDC benchmark, we estimate that at least 60-70% of European
enterprises are interested in the exploitation of multiple datasets within their digital transformation
processes and are already dealing with data sharing issues. The increasing demand of Big Data
3 As reported in the European Data Market Study’s Final Study Report, December 2016, by IDC and Open
Evidence (SMART 2013/0063) 4 PwC, opportunities and Challenges of the industrial internet (2015), and Boston Consulting Group: the future
of productivity and growth in manufacturing industries (2015) 5 Brussels, 19.4.2016 COM(2016) 180 final 6 IDC FutureScape: Worldwide Analytics, Cognitive/AI, and Big Data 2017 Predictions. IDC November 2016
7
software and services (forecast by IDC to grow at a Compound Annual Growth Rate of 21.9% from 2015
to 2018, doubling the value of this market by 2018) confirms this trend.
Within this context, it is relevant to understand to what extent barriers to data sharing may play a role in
constraining the European industry journey towards digital transformation. There are three main
typologies of data sharing barriers emerging from the analysis of the data market:
The first is cultural/organisational: it includes lack of awareness of the potential business
benefits of data sharing, lack of trust and therefore unwillingness to share a company’s own
data for fear to lose a competitive advantage (sometimes even between departments of the
same company), the difficulty to assess the value of data assets, the lack of an internal data
supply chain making data available for sharing and exploitation.
The second type of barriers concerns legal/regulatory factors such as uncertainty on data
ownership and access to data, undue restrictions on data location and the free-flow of data,
particularly across international borders.
The third type concerns technical/operational barriers due to lack of interoperability between
different datasets and information systems, lack of standards or incompatible standards, and/or
high costs of data curation of different typologies of data which may undermine the business
case for data sharing.
Cultural/organizational barriers are particularly relevant for SMEs and are usually the first which need
to be overcome before enterprises engage in digital transformation and planning data sharing activities.
Legal/regulatory barriers have been analysed in-depth and their removal represents a key goal of EU
policies. The Digital Single Market (DSM) strategy has recognized the relevance of this issue in its action
for a European ‘Free flow of data’ initiative, which will tackle unjustified restrictions on data location and
the free movement of data (for reasons other than the protection of personal data within the EU), but
will also address “the emerging issues of ownership, interoperability, usability and access to data in
situations such as business-to-business, business to consumer, machine generated and machine-to-
machine data”7.
Technical barriers, which are the focus of this report, tend to be overlooked until enterprises engage in
concrete data exploitation activities and/or develop a proper data value strategy. The situation is in rapid
evolution. The need to solve technical issues is likely to become more relevant as the number of
enterprises engaging in digital transformation increases, as indicated above. Technology itself is
opening new perspectives; for example, the emergence of new distributed architectures or machine
learning technologies may offer alternative and more efficient ways to meet the challenges of
interoperability but also of trust and data ownership/usage. From the point of view of standardization,
on the one hand the existence of incompatible industry standards represents an issue; on the other
hand, the lack of standards is offering new business opportunities for new intermediaries such as data
companies and data marketplaces offering data curation services (to make datasets interoperable).
This report examines the relevance and potential solutions of technical barriers to data sharing, with the
objective to answer the following questions:
To what extent is lack of data interoperability an important barrier to data sharing?
Is it a problem within a sector, or rather between sectors?
Are standards the solution? Are there differences between sectors in terms of standards?
What are the consequences of the problem, in terms of lack of innovation etc. (if any)?
This report is based on desk research and a selected number of real life use cases with the objective to
provide evidence-based insights on the relevance of technical barriers, forthcoming developments,
potential solutions and policy implications.
7 http://ec.europa.eu/smart-regulation/roadmaps/docs/2016_cnect_001_free_flow_data_en.pdf
8
1.2 Method
The method includes both primary and secondary research. In the initial phase, desk research was
carried out in order to frame the overarching question underpinning this research and provide a general
background to the problem, including a review of the evidence from previous stories produced by this
study. Primary research in the form of one-to-one, in-depth interviews was also carried out and was
followed by desk research in order to enrich and complete the information obtained through the
interviews.
Four real-life cases were investigated, covering different typologies of data sharing:
Data sharing between several players in the same value chain;
Data sharing across multiple sectors or within a specific sector;
Data sharing with and without standards;
Data sharing applied to different types of data (sensor generated and IT generated).
Each case included an interview with a senior referent from the company that was featured in the real-
life case (founder, CEO or senior manager).
Table 1 Cases and key features
Cases Sector Open standards Type of data Geographical scope
Large scale
manufacturing
Manufacturing N Sensor Europe
N&W Global
Vending
Manufacturing Y Sensor Italy-Denmark
Yukon Digital Analytics for
Chemicals
N Different type Germany
Datary.io Data marketplaces Y Different type Spain-UK
1.3 The Structure of this Document
The document is structured along three main sections:
The first section introduces the main purpose and methodology of this story;
The second introduces briefly the concept of data interoperability and presents the case studies;
The third integrates the research results to provide a summary overview, as well as to draw the final
considerations and conclusions of the story.
9
2 Data Sharing in Practice: Select Case Studies
2.1 Introduction
Digital transformation has enhanced the awareness of a new wave of opportunities around data, driving
organizations of all sizes and shapes to extract more value from data and unlock additional revenues
from them. For this to happen, data in all formats need to be accessed, retained, disposed of and
exchanged. IDC predicts that by 2020, 66% of all medium-large enterprises worldwide will implement
some sort of advanced classification solutions to automate access , retention, and disposition of
unstructured content so to manage the information generated within the company, but also between
organizations and across sectors throughout the different segments of an industry value chain.
The case studies illustrated below have been selected to cover a wider range of possible situations. In
particular, they cover different segments of the value chain. Taking industrial plants as a central point,
we can envisage different technical barriers to data sharing:
- Between the producer of the machinery (e.g. Siemens) and the manufacturing company running
the industrial plant (e.g. BASF, N&W Global Vending);
- Between the manufacturing company and its clients (e.g. the operators of N&W vending
machines);
- Between the manufacturing company and the big data start-ups (e.g. Yukon Digital);
- Between the manufacturing company and the data marketplace (e.g. Datary.io).
While the cases are not directly related to each other, they cover different areas and relation in the value
chain.
Figure 1: Identification of technical barriers alongside the value chain
10
2.2 Case Study 1: N&W Global Vending
Background Information
N&W Global Vending is the global market leader in manufacturing vending machines. The sector is
more important than one might expect: there are today approximately 3.8 million vending machines in
Europe, the industry employs directly 85.000 people in 10.000 companies and the annual turnover is
around €14.6 Billions.
The value chain includes mainly the following type of players:
The machine’s manufacturer (such as N&W);
The machine’s operators. The operators purchase the machine and operate it. They take care
of technical maintenance and filling the machines with products;
The final client, i.e. owner of the office which contracts the operator;
The services’ payment operators (such as Nayax). They manage payments and offer the
possibility of multiple methods.
Additional players include third party service providers of Vending Machine Management solutions for
operators, and also app developers.
Figure 2: the value chain of vending machines
How Data are managed
A modern vending machine is equipped with sensors to measure its functioning, such as proximity,
temperature and humidity. It includes firmware to manage its key functions, and produces lots of data
related to both its functioning and the products it sells.
Data are necessary to operators in order to:
- Detect possible malfunctioning in the machine;
- Manage products and refilling.
The operator typically visits the machine periodically and downloads data on a mobile or USB device.
They are therefore a time-intensive service.
Manufacturere.g.N&W
Operator Client
Serviceprovidere.g.Digiso
Paymentoperatore.g
Nayax
11
Figure 3: data flows in vending machines
Alternatively, telemetry services enable remote monitoring and management through specialized
software solutions. The operator can regularly monitor both the technical functioning and the product
consumption. It can also in some cases change remotely some parameters, such as prices. Today’s
vending machines include touchscreen that display useful information to the public that can be modified
remotely.
The operator can access the data directly from the vending machine, through a USB port, or via
telemetry. The machine is equipped with own proprietary firmware, but the data are made freely
available using the standard data protocol from the industry, EVA Data Transfer Standard, based on
two protocols DDCMP and DEX/UCS. The operator has full access to the data, while the manufacturer
typically does not access them unless they are provided via telemetry.
The accessibility of the data is considered a fundamental service for the operator because it is in charge
of the maintenance of the machine. There is a competitive market of apps and software to manage and
analyse these data. N&W also offers their own app BlueRed to control the machines (see screenshot
below) but this is just one of the many possibilities.
Figure 4. Example of app to manage vending machines.
12
The data produced by the machine are generated by many different internal sensors. For instance, they
manage proximity, temperature and humidity. The data are extracted and processed directly by the
firmware and exposed using industry standards. The firmware therefore is programmed in order to
extract the data from the chosen sensors directly on the hardware.
According to the interviewee, the standard, however, covers only a minor part of the total data produced
by a machine. The rest of the data are managed through proprietary standards.
Current Situation and Future Perspectives
Today, there is little problem or discussion about the interoperability of data. Data are provided using
the standard format, and the operator can easily access and manage them, generally by plugging into
a USB port in the vending machine. The data are easily managed through a wealth of software and
service provider. The standards are defined at the industry level, and there is little interaction with
players outside the industry.
However, the future looks more unstable and challenging for two reasons. The first is the increasing
reliance in telemetry, which allows remote monitoring and in some cases also modification of the
machine (e.g. changing the prices). Moving from “read” to “write” will require significant adjustment of
processes and could pave the way to greater interaction with players outside the industry (e.g.
advertising agencies interested in using the displays to provide personalized advertising).
Secondly, predictive maintenance is expected to have an impact. Currently, maintenance is provided
directly by the operator who has access to data. But in the future, as we are seeing in other sectors,
manufacturers might consider data-driven predictive maintenance as a business opportunity, in order to
adapt their business models and provide a service measured in products sold by the machines (“vending
machine by the product sold”) just as, for example, Rolls Royce provides “engines by flight hours”
services. This could potentially generate data interoperability issues, since only part of the data is
covered by the industry standard EVA DTS: for instance, EVA DTS does not cover data about speed of
heating (derived from sensors) which could show calcification and hence be used for predictive
maintenance. More in general, any form of predictive maintenance requires the widest range of datasets
from the highest number of sensors in machines, and because of the increasing number of those
sensors, there will always be a gap between the data produced by the machine and the data
requirements included in the standards. So while standards help resolving the problems, they can’t
provide the full solution, especially in the medium term if predictive maintenance of vending machines
starts being a competitive market. Therefore, data interoperability is not a problem now but could
become one for innovative data-driven services in the industry.
13
2.3 Case Study 2: Datary.io and other data marketplaces
Background Information
Data marketplaces, as defined by IDC within the EDM taxonomy, are “a third party, cloud-based
software platform providing Internet access to a disparate set of external data sources for use in IT
systems by business, government or non-profit organizations. The marketplace operator will manage
payment mechanisms to reimburse each dataset owner/provider for data use, as necessary. Optionally,
the marketplace provider may provide access to analysis tools that can operate on the data.”8
The initial hypothesis is that data marketplaces could potentially play a big role in removing technical
barriers, by providing data aggregation and curation services that manage to ensure interoperability.
Datary is one of these data marketplaces, fostering matchmaking between data supply and demand.
According to its presentation, Datary “connects data providers with data consumers (consultants,
analysts ...) through a marketplace that enables integration with business analytics tools. In addition, it
offers version control in the cloud and the ability to synchronize data in Datary with any work tool (either
Excel or more advanced tools such as Pentaho or CartoDB).”
How Data are managed
Currently Datary provides both a pure data matchmaking service, enabling easy and intuitive data
sharing, and curated data services. Because of the fragmented regulation on personal data protection,
the current activity is focused on the latter, aggregating data from open public sources and providing
them through advanced analytics.
For instance, one of the main products is devoted to managing intellectual property information. Datary
aggregates information from open government data, and provides analytical tools that add value to the
information for the final user.
The data are freely available on the government website in open standard format (PDF, XML, HTML).
Figure 5: screenshot of open data on IPR
8 See also the Story report on Data Marketplaces http://www.datalandscape.eu/data-driven-
stories/europe%E2%80%99s-data-marketplaces-%E2%80%93-current-status-and-future-perspect ives
14
Figure 6: screenshot of datary.io output
Datary aggregates these data and provides customized dashboards and analytics. As such, Datary does
not intervene in removing technical barriers related to interoperability, but in providing customers with
more usable and powerful access to the data. The data (in this case, IP data) are already published in
standard, open and machine-readable formats. Datary aggregates data from different public sources in
order to ensure common data structures, and provides advanced analytical tool to make sense of these
data. In other words, it curates and provides added value on top of the data but does not remove
technical barriers as such.
This focus on curating open government data has been mentioned recurrently by other marketplaces
presented in the previous story.
Current Situation and Future Perspectives
As of today, the vision of Datary would be to provide matchmaking services between data providers and
users, without actually owning and accessing the data but simply facilitating the direct transactions
between users – the classic case of a two-sided market. However, this vision is not yet a reality because
of the complexity related to the fragmentation of data protection legislation. What matters for this
analysis is that in this vision, Datary would have no role in manipulating data, hence it would not address
technical barriers to data sharing. Similar positions have been expressed by other marketplaces
interviews, such as Dawex.
Since data marketplace aim to be a neutral intermediary for data transactions, providing only
fundamental support services such as secure hosting and payment, it seems unlikely that they could
play an important role in removing the technical barriers to data sharing. One of the reason could be
that data curation is indeed highly effort-intensive but, according to interviewees, is not recognized by
the market as a valuable service at this stage: data curation is a necessary “back-office” operation but
15
does not deliver visible business value, and for this reason it is typically “bundled” with more advanced
(and more marketable) analytics services. As such, data curation is not typically offered by data
marketplaces to final users. Instead, there are many dedicated “data wrangling” services being created,
mainly aiming at helping data scientists reduce the costs of data curation. The most recent version of
the “Big Data Landscape” includes a new category of “data transformation” which features 10 dedicated
startups. In conclusion, data interoperability problems are being addressed by a small new category of
specialised intermediaries providing data transformation services directly to final users.
Figure 7: Snapshot of the “Big Data Landscape 2016” , infrastructure section.
16
2.4 Case Study 3: Large manufacturing plants
Background Information
Large manufacturing plants are composed by heavy machinery, nowadays equipped with thousands of
sensors – a combination called cyber-physical systems (CPS). As illustrated in the story on industrial
data spaces9, these machines gather and provide huge quantities of data that are instrumental to
optimize processes, improve quality and reduce errors, such as predictive maintenance, in the emerging
Industry 4.0 ecosystem (see figure 7).
Figure 7: overview of the data flows in industrial plants
The typical value chain is composed by the large companies managing multiple plants (e.g. FIAT), large
and midcaps companies manufacturing the machines (e.g. Siemens), and in some cases third party
data analytics SMEs can be involved to help making sense of data (e.g. Worldsensing).
Each machine typically produces data using their own proprietary standards in heterogeneous formats.
Because of the complexity and cross-sector nature of the value chain, there are not yet consolidated
industry standards. While interoperability is important, the interviewee stresses that proprietary
standards are not always just ways to protect competition but can have technological justifications:
specific types of data are best expressed in a specific language. Full interoperability would require, on
the one hand, losing some precious information that can only be expressed in a specific way; on the
other, many more data gathering requirements. A standard able to solve the different data needs of very
different machines delivering different services, would require either a simplification of the data gathered
(hence a loss of information) or a multiplication of the data provided by the machine (hence becoming
less resource intensive in terms of processing power, storage transport, energy consumption etc.).
9 http://www.datalandscape.eu/data-driven-stories/facilitating-industry-40-whats-role-industria l-data-platforms
17
Moreover, one important factor is that large machinery requires multi-million investments, and have long
upgrading times (typically 10 to 15 years), thus slowing down the timing of introduction of innovative
updates
How Data are managed
Today, the head company typically gathers data from the individual machines through wrappers
developed on purpose to ensure syntactic interoperability. In other words, each different machine
requires ad-hoc development in order to extract and integrate the data – in some cases even requiring
ad-hoc hardware adaptors.
This does not represent an insurmountable problem since the industrial plant company is typically large
and can devote internal resources to data integration. In other words, the lack of standard is an additional
cost, rather than a barrier per se.
However, this can represent a significant barrier for third party SMEs, such as a data analytics company
aiming to enter the value chain to provide, for instance, predictive maintenance. Furthermore, ad-hoc
development practices, wrappers and adaptors can represent a barrier for large companies in case of
major mergers that have to undergo massive investment in order to ensure integration of the different
IT systems.
Future Perspectives
Over time, in response to market dynamics, machinery producers are increasing the openness of their
systems. They are gradually moving towards open standards and using APIs to ensure third party
secure access to data.
Providers such as Siemens are developing their own data platforms (MindSphere) that are supposed to
integrate and manage data from different machine manufacturers through connectors (MindConnect
Nano). Software Development Kits are being developed to allow third parties machine manufacturers to
directly “plug in” their data streams to the cloud platform.
There are also on-going efforts towards establishing industry standards, in particular through the Open
Platform Communication Unified Architecture (OPC UA). As the name indicates, this is a high level
standard for information architecture. As illustrated in the figure below, OPC UA works on a layer below
vertical industry standards and models, and it is designed to be future-proof thanks to its multi-layered
architecture.
Figure 8: the OPC UA standard
Manufacturing operates more and more through complex value chains, interlocking different sectors
with their own industry standards: consider for instance providers of furniture for boats, of textiles for
cars, of engines for airplanes, or of ICT for all sectors. But these different industries have different
requirements and data management needs: for example, the automotive sector is highly standardized,
18
the aerospace sector has far higher security requirements than others, and so on. It is probably
impossible to achieve full interoperability between all these sectors but it is also not desirable, because
this would entail loss of detail or increased data requirements. According to the interviewee, there is a
clear trade-off between generic and specific standards, due to the different amount of data required. An
increase in openness and scope of the standard drives a corresponding increase in the amount of data
required, while specific standards require only limited, specific data collection. However, to facilitate
cooperation between different industries as necessary in today’s complex value chains, there is going
to be an increasing demand for high-level, open, cross-sector standards providing open collaboration
platforms, without precluding the existence of specific solutions for specific processes.
Last but not least, it is clear that progress towards these new developments in manufacturing will take
longer than in other industries, because of the high capital investment and slow upgrading cycle of the
machinery. Despite this relatively slow pace of change, the trend towards greater interoperability and
openness is clear.
2.5 Case Study 4: Yukon Digital
Background Information
The chemical industry is not one of the leading users of big data in Europe, but recently it has been
making considerable progress. .
Yukon Digital is a German big data start-up, providing analytics services to chemical manufacturing
industries. It was founded in 2014 by two partners, one being former Vice President at BASF.
Yukon Digital client-base includes large chemicals and oil and gas companies worldwide. The company
covers different phases in the value chain, from supply chain management, to operations, to marketing
and sales. The main datasets it uses are generated by sensors by the machinery in industrial plants,
and data generated by internal IT systems.
How Data are managed
The client company gathers heterogeneous dataset from different IT systems, and provides them to
Yukon. The datasets could refer to data from sensors in the plant, from internal IT management, from
laboratories. Data are neither standardized nor interoperable; they are gathered by the client company
and transmitted to Yukon via traditional means, such as e-mail, cloud, and sometimes in-site analysis.
Yukon carries out an intensive effort to harmonise the data and make them usable. However, this work
used to employ 90% of project resources, while today, thanks to increasing openness of the systems, it
only requires 50%. Hence more resources can be devoted to actually analysing data and providing
added value, rather than curating them.
The effort for data harmonization and cleaning is part of the services provided by the company, but
certainly raised the costs of big data analytics without producing visible benefits for the client. In a recent
interview, Yukon founder Sean Jones puts it clearly: “connecting different data sources together is still
one of the biggest challenges – whether it is for internal or external processes. Additionally, the data is
oftentimes not just stored in an operational storage but also in laboratory information systems. So
bringing the different data with different formats together and then merging it into a data set that is
adequate for predictive work is one of the biggest challenges.”
19
Current Situation and Future Perspectives
There is a clear trend towards greater interoperability of the data provided by the different IT systems
and sensors. Moreover, automatic data curation (powered by machine learning) is expected to
continuously and significantly reduce the costs of data harmonization.
In perspective, the benefits of data sharing are becoming more visible to data holders, and the costs of
data curation are going down. That could pave the way to a strong acceleration in innovation, in
particular when it comes to applying machine learning techniques and artificial intelligence to industrial
data.
20
3 Final Considerations
This report has investigated the technical barriers to data sharing, based on a set of real-life case
studies. The starting point was the need to understand the nature and extent of the problem, and the
possible solutions. The results of this empirical research lead to the following considerations.
3.1 Cross analysis of the cases
The cases show a good variety of solutions to address the technical barriers . In some cases, industry
standard make data sharing seamless (vending machines), in other, sharing requires extensive
additional work (large manufacturing plants).
Overall, we can identify three main category of solutions: standards, technology and human work. Each
individual project to overcome technical barriers relies on a different combination of these three
dimensions, that could be illustrated in the radar chart below.
Figure 9 Three ways to overcome technical barriers
In terms of standards, we identified narrow industry standards (Vending Machines) and high level
architecture standards (OPC UA). These standards extend to the physical connection, which can be
standard (USB stick) or not (as in the case of manufacturing).
With regard to technology, there is an increasing usage of API and SDK to enable easier data sharing.
Moreover, machine learning is expected to facilitate data curation and reduce its costs over the next
years.10 Companies such as Trifacta and Paxata provide tools for simplifying the work of data scientists:
machine learning is used to automate data preparation in order to find, clean, blend and suggest data.
They make data ready to be used by data scientists through their analysis or visualisation tools.
Finally, data curation and wrangling can be solved simply by investing human work. Data marketplaces
and data analytics companies offer this service as part of their work. According to interviewees and
reports, still today data curation usually takes between 50% and 80% of the data scientists’ work.11
According to a recent survey, data preparation accounts for 79% of the time of data scientists, who
consider it as the least enjoyable part of the work.12 Moreover, it appears that this service is not
10 https://www.oreilly.com/ideas/lessons-from-next-generation-data-wrangling-tools 11 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-
work.html?_r=5 12 http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-
data-science-task-survey-says/#76f31cf47f75
21
monetisable as stand alone, as its value is not appreciated by the client, but needs to be bundled with
the overall service provided. There are also voluntary initiatives to curate data of public interest, such
as the OKFN Open Knowledge Labs which aim to “Collect and maintain important and commonly-used
(“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages)”.13
In the future, most probably the solution will come from the interaction of all three dimensions, since
both standards and technology will always require some degree of human intervention. But, interviewees
agree that there is a general trend towards greater openness, standard adoption and data sharing
across sectors. Market forces matter: data are becoming more valuable, the cost of data gathering
decreases, and customers (especially business) expect greater accessibility of the data. Combined with
the technological evolution, there is a clear trend towards a reduction of the human effort for data
curation, freeing up precious resources of data scientists for performing analytics rather than data
curation.
3.2 Policy Implications
Based on the evidence collected from the case studies and the desk research we can now answer the
key questions initially posed about the technical barriers to data sharing.
First of all, technical barriers are perceived when enterprises start dealing with data sharing in practice
in the context of digital transformation or other digitisation projects, most often requiring collaboration
between different enterprises in a value chain, be it mono-sector or multisector. The main issue is lack
of interoperability between different datasets preventing integration and extraction of value from the
data. This issue is becoming more frequent because of the increasing pervasiveness of IoT sensors
generating heterogeneous data flows.
However, industry stakeholders currently perceive lack of interoperability not as a barrier preventing
data sharing, but simply as a problem with various possible solutions depending on the specific situation,
based on some combination of technology, standardization and human work. The evolution of the
market is currently providing these solutions. Specific industry standards are evolving to enable data
sharing, or high level architecture standards are being developed to make data easily accessible and
transferrable between operators in the same segment or at industry level. The increasing adoption of
ad-hoc technologies such as APIs and DSK is ensuring third-party secure access to data and easier
data sharing. What is more, machine learning technologies are expected to enhance data curation
activities, thus facilitating the integration of data from different sources and organizations and reducing
the costs of data sharing. We are also witnessing the emergence of a new category of specialized
intermediaries focused on data transformation to enable data sharing, with several start-ups identified
by the Big Data Landscape 2016 of the European Data Market Monitoring Tool.
Data interoperability problems are easier to solve in single sector value chains (as shown by the N&W
Global Vending case), building on existing industry standards and formats, and more complex in multi-
sector value chains which require brand new approaches. However, even single sector value chains will
likely evolve towards deeper interaction with actors from other sectors, to provide more value added
services (as could be the case with advertising and predictive maintenance for Vending Machines). This
means that new data interoperability issues will emerge and require new solutions.
Overall, interviewees recognise that there is a general trend towards greater openness, standard
adoption and data interoperability across all sectors, but they question the speed of progress which is
quite slow. Also, data interoperability problems which are currently manageable are likely to become
more complex and difficult to solve as the data-driven ecosystem grows in depth and sophistication.
European policies and specifically the DSM strategy already address the development of the main
framework conditions for data sharing, particularly through the forthcoming Free-flow of data initiative
13 http://okfnlabs.org/blog/2015/01/03/data-curators-wanted-for-core-datasets.html
22
and the ICT standardization priorities action plan Communication14, which singles out standards for
better data interoperability as a priority domain.
The open problems emerging from our analysis are the following:
The inherent tension between standardization and innovation/personalization. Enabling data
sharing does not mean imposing standards to all datasets, which would result in excessive
rigidity and prevent innovation, but developing a flexible combination of general and specific
standards, interfaces and common formats, while preventing users lock-ins because of
proprietary technologies. Different languages, protocols and models have their advantages in
delivering context-specific data, and variety and differentiation are positive values which should
be preserved. Therefore, insuring technical data interoperability will be a process accompanying
the evolution of the different sectors and their gradual maturity in digital transformation.
o Here there is a role to play for policy makers, in order to promote this process, insure a
level playing ground between stakeholders and accelerate as much as possible the
emergence of open standards and common formats, especially in the interest of SMEs
who do not have the market power to drive this process.
Data curation (preparing data for interoperability and sharing) is a time-intensive activity which
takes more than 50% of the time of data scientists in data sharing projects. It is a necessary,
but thankless job and a considerable additional cost, whose value is not sufficiently recognized
by the market. Apparently, this cost is not acting as a barrier: there is no evidence that
companies have given up on data sharing projects because of excessive data curation costs.
However, it tends to be underestimated in the planning phase. Fortunately, this task is becoming
more automated and less expensive. Data curation will always be needed to fill the gap between
different standard environments and new intermediaries are emerging to provide these services.
o The policy role here could be to make sure that sufficient R&D investments are made
into the automation of data curation (leveraging machine learning for example), and
also to promote the development of data scientists’ skills in this field. Policy makers
could also raise awareness about the pros and cons of data curation versus
standardization. Enterprises developing a business case for data sharing projects
should make sure that data curation tasks are properly assessed and accounted for,
developing realistic cost-benefits estimates.
Finally, the creation of neutral B2B data exchanges solving interoperability problems is
potentially welcomed by interviewees, but it is unclear how it could be implemented in practice,
who could be the main actors and what should be the scope in terms of business sectors. This
could be an option to be investigated, to understand if such platforms could be a way to
accelerate the solution of technical data sharing issues.
14 https://ec.europa.eu/digital-single-market/en/news/communication-ict-standardisation-priorit ies-digita l-
single-market