European Data Market SMART 2013/0063 Technical Barriers to ... · deliverable D3.12 of the study...

European Data Market SMART 2013/0063 D 3.10 Technical Barriers to Data Sharing in Europe

6th January 2017

2

Author(s) David Osimo (Open Evidence);

Giorgio Micheletti, Gabriella Cattaneo (IDC)

Deliverable D 3.12, Quarterly Stories – Story 12

Date of delivery 6th January 2017

Version 2.0

Addressee officer Katalin IMREI

Policy Officer

European Commission - DG CONNECT

Unit G1 – Data Policy and Innovation

EUFO 1/178, L-2557 Luxembourg/Gasperich

[email protected]

Contract ref. N. 30-CE-0599839/00-39

mailto:[email protected]

3

TABLE OF CONTENTS

EXECUTIVE SUMMARY................................................................................................ 4

1 INTRODUCTION ............................................................................................. 6

1.1 BACKGROUND AND OBJECTIVES ............................................................................ 6

1.2 METHOD .......................................................................................................... 8

1.3 THE STRUCTURE OF THIS DOCUMENT ..................................................................... 8

2 THE THIRD INTEGRATES THE RESEARCH RESULTS TO PROVIDE A SUMMARY

OVERVIEW, AS WELL AS TO DRAW THE FINAL CONSIDERATIONS AND CONCLUSIONS OF THE

STORY. DATA SHARING IN PRACTICE: SELECT CASE STUDIES ................................... 9

2.1 INTRODUCTION .................................................................................................. 9

2.2 CASE STUDY 1: N&W GLOBAL VENDING ............................................................... 10

2.3 CASE STUDY 2: DATARY.IO AND OTHER DATA MARKETPLACES .................................... 13

2.4 CASE STUDY 3: LARGE MANUFACTURING PLANTS .................................................... 16

2.5 CASE STUDY 4: YUKON DIGITAL .......................................................................... 18

3 FINAL CONSIDERATIONS ............................................................................. 20

3.1 CROSS ANALYSIS OF THE CASES .......................................................................... 20

3.2 POLICY IMPLICATIONS ....................................................................................... 21

Table of Tables

Table 1 Cases and key features ...............................................................................................8

4

Executive Summary

While data sharing within sectors and across sectors is gaining momentum in Europe and elsewhere,

several types of barriers may hinder the effectiveness of data transfer and reduce its value-add potential.

The sharing of datasets from different sources facilitates use and re-use of data by multiple actors and

enables the extraction of value from data, leading to the creation of new services, business models, and

the improvement of processes, an evolution called digital transformation. The diffusion of open platforms

for data sharing and the availability of interoperable datasets is one of the key success factors which

may help to drive the European Data Market towards a High Growth scenario by 2020, and push the

contribution of the data economy to the EU GDP up to 4% (compared to 2.5% in the Baseline scenario,

where data sharing is expected to be more limited).1 However, European industries are far from

exploiting the full value of data sharing. According to IDC’s Digital Transformation Maturityscape

Benchmark, the majority of European enterprises (62%) are between the “digital explorer” or “digital

player” maturity stages, characterized by an opportunistic approach to digital transformation not yet fully

integrated across the organization, while 20% are “digital resisters” refusing to engage in this digital

innovation. IDC estimates therefore that the majority of European enterprises have not yet fully

implemented data sharing but are interested to do so and are dealing or starting to deal with data sharing

issues.

There are three main typologies of data sharing barriers emerging from the analysis of the data market:

cultural/organizational (lack of awareness of potential benefits, lack of trust and fear of competition);

legal/regulatory (restrictions of data location and of the free-flow of data, uncertainty about data

ownership and data access); and technical/operational barriers due to the lack of interoperability

between different datasets, lack of standards, high costs of data curation to adapt it for sharing. Data

curation costs, in particular, are estimated to represent more than 50% of the time of data scientists in

data sharing projects.

Based on a set of real-life case studies from the manufacturing industry, the ICT sector and the chemical

industry, this paper focuses on interoperability as the key technical barrier to data sharing and

investigates the issue across different value chains’ players: machinery & equipment providers,

manufactures (industrial plants), manufacturers’ clients, and ICT companies providing ancillary services

(big data start-ups providing data analytics and data marketplaces matchmaking between data supply

and data demand). Our analysis, backed by ad-hoc secondary research and in-depth expert interviews,

reveals that interoperability and technical issues are perceived by stakeholders not as a barrier

preventing data sharing, but simply as an additional cost and a problem with various possible solutions

depending on the specific situation, based on some combination of technology, standardization and

human work. The evolution of the market is currently providing these solutions. Narrow industry

standards, for example, or high level architecture standards can make data easily accessible and

transferrable between operators in the same segment or at industry level. The increasing adoption of

ad-hoc technologies such as APIs and DSK may ensure third-party secure access to data and enable

easier data sharing. What is more, machine learning technologies are expected to enhance data

curation activities, thus facilitating the integration of data from different sources and organizations and

reducing the time needed by data scientists for this thankless task. data sharing projects Finally,

interviewees agree that there is a general trend towards greater openness and data sharing across

sectors. Market forces matter: data are becoming more valuable, the cost of data gathering decreases,

and customers (especially business) expect greater accessibility of the data. We are also witnessing the

emergence of a new category of specialized intermediaries focused on data transformation and curation

to enable data sharing, with several start-ups identified by the Big Data Landscape 2016 of the European

Data Market Monitoring Tool.

1 As reported in the EDM Final Study Report, December 2016, by IDC and Open Evidence

5

While the evolution towards greater data sharing is unanimously recognised, more needs to be done to

increase the speed of this trend according to our interviewees. Also, data interoperability problems which

are currently manageable are likely to become more complex and difficult to solve as the data-driven

ecosystem grows in depth and sophistication, for example through the emergence of Industry 4.0

ecosystems and the diffusion of Internet of Things (IoT) networks of sensors.

European policies and specifically the DSM strategy already address the development of the main

framework conditions for data sharing, particularly through the forthcoming Free-flow of data initiative

and the ICT standardization priorities action plan Communication2, which singles out standards for better

data interoperability as a priority domain.

The open problems emerging from our analysis are the following:

The inherent tension between standardization and innovation/personalization. Enabling data

sharing does not mean imposing standards to all datasets, which would result in excessive

rigidity and prevent innovation, but developing a flexible combination of general and specific

standards, interfaces and common formats, while preventing users lock-ins because of

proprietary technologies.

o Here there is a role to play for policy makers, in order to promote this process, insure a

level playing ground between stakeholders and accelerate as much as possible the

emergence of open standards and common formats, especially in the interest of SMEs

who do not have the market power to drive this process.

Data curation (preparing data for interoperability and sharing) is a time-intensive activity and an

additional cost which is not acting as a barrier: there is no evidence that companies have given

up on data sharing projects because of excessive data curation costs. Fortunately, this task is

becoming more automated and less expensive.

o The policy role here could be to make sure that sufficient R&D investments are made

into the automation of data curation (leveraging machine learning for example), and

also to promote the development of data scientists’ skills in this field. Policy makers

could also raise awareness about the pros and cons of data curation versus

standardization. Enterprises developing a business case for data sharing projects

should make sure that data curation tasks are properly assessed and accounted for,

developing realistic cost-benefits estimates.

Finally, the creation of neutral B2B data exchanges solving interoperability problems is

potentially welcomed by interviewees, but it is unclear how it could be implemented in practice.

This could be an option to be investigated, to understand if such platforms could be a way to

accelerate the solution of technical data sharing issues.

2 https://ec.europa.eu/digital-single-market/en/news/communication-ict-standardisation-priorit ies-digita l-single-

market

6

1 Introduction

1.1 Background and Objectives

The main objective of this report is to investigate the technical barriers to data sharing in Europe in the

context of the European Data Market and its evolution towards an ever-growing data-driven economy.

The report is one of a series of in-depth analyses of focusing on the development of the data-driven

economy in Europe based on specific case studies by sector and/or by technology. It constitutes the

deliverable D3.12 of the study “European Data Market”, SMART 2013/0063 entrusted to IDC and Open

Evidence by the European Commission, DG Connect, Unit G1 – Data Policy and Innovation.

Data sharing is a key enabling factor for the development of the European digital economy. The sharing

of datasets from different sources facilitates use and re-use of data by multiple actors and enables the

extraction of value from data, leading to the creation of new services, business models, and the

improvement of processes, an evolution called digital transformation. The diffusion of open platforms for

data sharing and the availability of interoperable datasets is one of the key success factors which may

help to drive the European Data Market towards a High Growth scenario by 2020, and push the

contribution of the data economy to the EU GDP up to 4% (compared to 2.5% in the Baseline scenario,

where data sharing is expected to be more limited).3 Conversely, relevant barriers to data sharing are

one of the factors leading to the slower growth Challenge scenario. However, data sharing across

companies is still limited and far from its full potential.

Data sharing is a natural consequence of the new wave of technology innovation combining the Internet

of Things (IoT), 5G, cloud computing, cognitive systems and robotics, and of course advanced data

analytics, which is transforming global value chains. Recent studies4 estimate that digitisation of

products and services could add more than 110 B€ of revenue for industry per year in Europe in the

next 5 years, as underlined in the EC Communication “Digitising European Industry”5. According to IDC

estimates, already in 2017 one third of the Global 2000 companies will achieve revenue growth from

information-based products and services at a rate twice faster than from the rest of their offering

portfolio.6

It is increasingly clear that digital transformation is the new competitive must for businesses. IDC defines

it as a continuous process by which enterprises adapt to or drive disruptive changes in their customers

and markets (external ecosystem) by leveraging digital competencies to innovate new business

models, products, and services that seamlessly blend digital and physical experiences. However,

according to IDC’s Digital Transformation Maturityscape Benchmark, based on a yearly survey of

worldwide medium-large enterprises, the European industry is only at the start of this transformative

journey. Last year, approximately 20% of European surveyed enterprises were “Digital resisters”, not

yet engaging in digital transformation, while only 5% were “Digital Disrupters”, already exploiting it fully.

The majority of European enterprises (62%) are between the “digital explorer” or “digital player” maturity

stages, characterized by an opportunistic approach to digital transformation not yet fully integrated

across the organization. Based on this IDC benchmark, we estimate that at least 60-70% of European

enterprises are interested in the exploitation of multiple datasets within their digital transformation

processes and are already dealing with data sharing issues. The increasing demand of Big Data

3 As reported in the European Data Market Study’s Final Study Report, December 2016, by IDC and Open

Evidence (SMART 2013/0063) 4 PwC, opportunities and Challenges of the industrial internet (2015), and Boston Consulting Group: the future

of productivity and growth in manufacturing industries (2015) 5 Brussels, 19.4.2016 COM(2016) 180 final 6 IDC FutureScape: Worldwide Analytics, Cognitive/AI, and Big Data 2017 Predictions. IDC November 2016

7

software and services (forecast by IDC to grow at a Compound Annual Growth Rate of 21.9% from 2015

to 2018, doubling the value of this market by 2018) confirms this trend.

Within this context, it is relevant to understand to what extent barriers to data sharing may play a role in

constraining the European industry journey towards digital transformation. There are three main

typologies of data sharing barriers emerging from the analysis of the data market:

The first is cultural/organisational: it includes lack of awareness of the potential business

benefits of data sharing, lack of trust and therefore unwillingness to share a company’s own

data for fear to lose a competitive advantage (sometimes even between departments of the

same company), the difficulty to assess the value of data assets, the lack of an internal data

supply chain making data available for sharing and exploitation.

The second type of barriers concerns legal/regulatory factors such as uncertainty on data

ownership and access to data, undue restrictions on data location and the free-flow of data,

particularly across international borders.

The third type concerns technical/operational barriers due to lack of interoperability between

different datasets and information systems, lack of standards or incompatible standards, and/or

high costs of data curation of different typologies of data which may undermine the business

case for data sharing.

Cultural/organizational barriers are particularly relevant for SMEs and are usually the first which need

to be overcome before enterprises engage in digital transformation and planning data sharing activities.

Legal/regulatory barriers have been analysed in-depth and their removal represents a key goal of EU

policies. The Digital Single Market (DSM) strategy has recognized the relevance of this issue in its action

for a European ‘Free flow of data’ initiative, which will tackle unjustified restrictions on data location and

the free movement of data (for reasons other than the protection of personal data within the EU), but

will also address “the emerging issues of ownership, interoperability, usability and access to data in

situations such as business-to-business, business to consumer, machine generated and machine-to-

machine data”7.

Technical barriers, which are the focus of this report, tend to be overlooked until enterprises engage in

concrete data exploitation activities and/or develop a proper data value strategy. The situation is in rapid

evolution. The need to solve technical issues is likely to become more relevant as the number of

enterprises engaging in digital transformation increases, as indicated above. Technology itself is

opening new perspectives; for example, the emergence of new distributed architectures or machine

learning technologies may offer alternative and more efficient ways to meet the challenges of

interoperability but also of trust and data ownership/usage. From the point of view of standardization,

on the one hand the existence of incompatible industry standards represents an issue; on the other

hand, the lack of standards is offering new business opportunities for new intermediaries such as data

companies and data marketplaces offering data curation services (to make datasets interoperable).

This report examines the relevance and potential solutions of technical barriers to data sharing, with the

objective to answer the following questions:

To what extent is lack of data interoperability an important barrier to data sharing?

Is it a problem within a sector, or rather between sectors?

Are standards the solution? Are there differences between sectors in terms of standards?

What are the consequences of the problem, in terms of lack of innovation etc. (if any)?

This report is based on desk research and a selected number of real life use cases with the objective to

provide evidence-based insights on the relevance of technical barriers, forthcoming developments,

potential solutions and policy implications.

7 http://ec.europa.eu/smart-regulation/roadmaps/docs/2016_cnect_001_free_flow_data_en.pdf

8

1.2 Method

The method includes both primary and secondary research. In the initial phase, desk research was

carried out in order to frame the overarching question underpinning this research and provide a general

background to the problem, including a review of the evidence from previous stories produced by this

study. Primary research in the form of one-to-one, in-depth interviews was also carried out and was

followed by desk research in order to enrich and complete the information obtained through the

interviews.

Four real-life cases were investigated, covering different typologies of data sharing:

Data sharing between several players in the same value chain;

Data sharing across multiple sectors or within a specific sector;

Data sharing with and without standards;

Data sharing applied to different types of data (sensor generated and IT generated).

Each case included an interview with a senior referent from the company that was featured in the real-

life case (founder, CEO or senior manager).

Table 1 Cases and key features

Cases Sector Open standards Type of data Geographical scope

Large scale

manufacturing

Manufacturing N Sensor Europe

N&W Global

Vending

Manufacturing Y Sensor Italy-Denmark

Yukon Digital Analytics for

Chemicals

N Different type Germany

Datary.io Data marketplaces Y Different type Spain-UK

1.3 The Structure of this Document

The document is structured along three main sections:

The first section introduces the main purpose and methodology of this story;

The second introduces briefly the concept of data interoperability and presents the case studies;

The third integrates the research results to provide a summary overview, as well as to draw the final

considerations and conclusions of the story.

9

2 Data Sharing in Practice: Select Case Studies

2.1 Introduction

Digital transformation has enhanced the awareness of a new wave of opportunities around data, driving

organizations of all sizes and shapes to extract more value from data and unlock additional revenues

from them. For this to happen, data in all formats need to be accessed, retained, disposed of and

exchanged. IDC predicts that by 2020, 66% of all medium-large enterprises worldwide will implement

some sort of advanced classification solutions to automate access , retention, and disposition of

unstructured content so to manage the information generated within the company, but also between

organizations and across sectors throughout the different segments of an industry value chain.

The case studies illustrated below have been selected to cover a wider range of possible situations. In

particular, they cover different segments of the value chain. Taking industrial plants as a central point,

we can envisage different technical barriers to data sharing:

- Between the producer of the machinery (e.g. Siemens) and the manufacturing company running

the industrial plant (e.g. BASF, N&W Global Vending);

- Between the manufacturing company and its clients (e.g. the operators of N&W vending

machines);

- Between the manufacturing company and the big data start-ups (e.g. Yukon Digital);

- Between the manufacturing company and the data marketplace (e.g. Datary.io).

While the cases are not directly related to each other, they cover different areas and relation in the value

chain.

Figure 1: Identification of technical barriers alongside the value chain

10

2.2 Case Study 1: N&W Global Vending

Background Information

N&W Global Vending is the global market leader in manufacturing vending machines. The sector is

more important than one might expect: there are today approximately 3.8 million vending machines in

Europe, the industry employs directly 85.000 people in 10.000 companies and the annual turnover is

around €14.6 Billions.

The value chain includes mainly the following type of players:

The machine’s manufacturer (such as N&W);

The machine’s operators. The operators purchase the machine and operate it. They take care

of technical maintenance and filling the machines with products;

The final client, i.e. owner of the office which contracts the operator;

The services’ payment operators (such as Nayax). They manage payments and offer the

possibility of multiple methods.

Additional players include third party service providers of Vending Machine Management solutions for

operators, and also app developers.

Figure 2: the value chain of vending machines

How Data are managed

A modern vending machine is equipped with sensors to measure its functioning, such as proximity,

temperature and humidity. It includes firmware to manage its key functions, and produces lots of data

related to both its functioning and the products it sells.

Data are necessary to operators in order to:

- Detect possible malfunctioning in the machine;

- Manage products and refilling.

The operator typically visits the machine periodically and downloads data on a mobile or USB device.

They are therefore a time-intensive service.

Manufacturere.g.N&W

Operator Client

Serviceprovidere.g.Digiso

Paymentoperatore.g

Nayax

11

Figure 3: data flows in vending machines

Alternatively, telemetry services enable remote monitoring and management through specialized

software solutions. The operator can regularly monitor both the technical functioning and the product

consumption. It can also in some cases change remotely some parameters, such as prices. Today’s

vending machines include touchscreen that display useful information to the public that can be modified

remotely.

The operator can access the data directly from the vending machine, through a USB port, or via

telemetry. The machine is equipped with own proprietary firmware, but the data are made freely

available using the standard data protocol from the industry, EVA Data Transfer Standard, based on

two protocols DDCMP and DEX/UCS. The operator has full access to the data, while the manufacturer

typically does not access them unless they are provided via telemetry.

The accessibility of the data is considered a fundamental service for the operator because it is in charge

of the maintenance of the machine. There is a competitive market of apps and software to manage and

analyse these data. N&W also offers their own app BlueRed to control the machines (see screenshot

below) but this is just one of the many possibilities.

Figure 4. Example of app to manage vending machines.

12

The data produced by the machine are generated by many different internal sensors. For instance, they

manage proximity, temperature and humidity. The data are extracted and processed directly by the

firmware and exposed using industry standards. The firmware therefore is programmed in order to

extract the data from the chosen sensors directly on the hardware.

According to the interviewee, the standard, however, covers only a minor part of the total data produced

by a machine. The rest of the data are managed through proprietary standards.

Current Situation and Future Perspectives

Today, there is little problem or discussion about the interoperability of data. Data are provided using

the standard format, and the operator can easily access and manage them, generally by plugging into

a USB port in the vending machine. The data are easily managed through a wealth of software and

service provider. The standards are defined at the industry level, and there is little interaction with

players outside the industry.

However, the future looks more unstable and challenging for two reasons. The first is the increasing

reliance in telemetry, which allows remote monitoring and in some cases also modification of the

machine (e.g. changing the prices). Moving from “read” to “write” will require significant adjustment of

processes and could pave the way to greater interaction with players outside the industry (e.g.

advertising agencies interested in using the displays to provide personalized advertising).

Secondly, predictive maintenance is expected to have an impact. Currently, maintenance is provided

directly by the operator who has access to data. But in the future, as we are seeing in other sectors,

manufacturers might consider data-driven predictive maintenance as a business opportunity, in order to

adapt their business models and provide a service measured in products sold by the machines (“vending

machine by the product sold”) just as, for example, Rolls Royce provides “engines by flight hours”

services. This could potentially generate data interoperability issues, since only part of the data is

covered by the industry standard EVA DTS: for instance, EVA DTS does not cover data about speed of

heating (derived from sensors) which could show calcification and hence be used for predictive

maintenance. More in general, any form of predictive maintenance requires the widest range of datasets

from the highest number of sensors in machines, and because of the increasing number of those

sensors, there will always be a gap between the data produced by the machine and the data

requirements included in the standards. So while standards help resolving the problems, they can’t

provide the full solution, especially in the medium term if predictive maintenance of vending machines

starts being a competitive market. Therefore, data interoperability is not a problem now but could

become one for innovative data-driven services in the industry.

13

2.3 Case Study 2: Datary.io and other data marketplaces


Data marketplaces, as defined by IDC within the EDM taxonomy, are “a third party, cloud-based

software platform providing Internet access to a disparate set of external data sources for use in IT

systems by business, government or non-profit organizations. The marketplace operator will manage

payment mechanisms to reimburse each dataset owner/provider for data use, as necessary. Optionally,

the marketplace provider may provide access to analysis tools that can operate on the data.”8

The initial hypothesis is that data marketplaces could potentially play a big role in removing technical

barriers, by providing data aggregation and curation services that manage to ensure interoperability.

Datary is one of these data marketplaces, fostering matchmaking between data supply and demand.

According to its presentation, Datary “connects data providers with data consumers (consultants,

analysts ...) through a marketplace that enables integration with business analytics tools. In addition, it

offers version control in the cloud and the ability to synchronize data in Datary with any work tool (either

Excel or more advanced tools such as Pentaho or CartoDB).”


Currently Datary provides both a pure data matchmaking service, enabling easy and intuitive data

sharing, and curated data services. Because of the fragmented regulation on personal data protection,

the current activity is focused on the latter, aggregating data from open public sources and providing

them through advanced analytics.

For instance, one of the main products is devoted to managing intellectual property information. Datary

aggregates information from open government data, and provides analytical tools that add value to the

information for the final user.

The data are freely available on the government website in open standard format (PDF, XML, HTML).

Figure 5: screenshot of open data on IPR

8 See also the Story report on Data Marketplaces http://www.datalandscape.eu/data-driven-

stories/europe%E2%80%99s-data-marketplaces-%E2%80%93-current-status-and-future-perspect ives

14

Figure 6: screenshot of datary.io output

Datary aggregates these data and provides customized dashboards and analytics. As such, Datary does

not intervene in removing technical barriers related to interoperability, but in providing customers with

more usable and powerful access to the data. The data (in this case, IP data) are already published in

standard, open and machine-readable formats. Datary aggregates data from different public sources in

order to ensure common data structures, and provides advanced analytical tool to make sense of these

data. In other words, it curates and provides added value on top of the data but does not remove

technical barriers as such.

This focus on curating open government data has been mentioned recurrently by other marketplaces

presented in the previous story.


As of today, the vision of Datary would be to provide matchmaking services between data providers and

users, without actually owning and accessing the data but simply facilitating the direct transactions

between users – the classic case of a two-sided market. However, this vision is not yet a reality because

of the complexity related to the fragmentation of data protection legislation. What matters for this

analysis is that in this vision, Datary would have no role in manipulating data, hence it would not address

technical barriers to data sharing. Similar positions have been expressed by other marketplaces

interviews, such as Dawex.

Since data marketplace aim to be a neutral intermediary for data transactions, providing only

fundamental support services such as secure hosting and payment, it seems unlikely that they could

play an important role in removing the technical barriers to data sharing. One of the reason could be

that data curation is indeed highly effort-intensive but, according to interviewees, is not recognized by

the market as a valuable service at this stage: data curation is a necessary “back-office” operation but

15

does not deliver visible business value, and for this reason it is typically “bundled” with more advanced

(and more marketable) analytics services. As such, data curation is not typically offered by data

marketplaces to final users. Instead, there are many dedicated “data wrangling” services being created,

mainly aiming at helping data scientists reduce the costs of data curation. The most recent version of

the “Big Data Landscape” includes a new category of “data transformation” which features 10 dedicated

startups. In conclusion, data interoperability problems are being addressed by a small new category of

specialised intermediaries providing data transformation services directly to final users.

Figure 7: Snapshot of the “Big Data Landscape 2016” , infrastructure section.

16

2.4 Case Study 3: Large manufacturing plants


Large manufacturing plants are composed by heavy machinery, nowadays equipped with thousands of

sensors – a combination called cyber-physical systems (CPS). As illustrated in the story on industrial

data spaces9, these machines gather and provide huge quantities of data that are instrumental to

optimize processes, improve quality and reduce errors, such as predictive maintenance, in the emerging

Industry 4.0 ecosystem (see figure 7).

Figure 7: overview of the data flows in industrial plants

The typical value chain is composed by the large companies managing multiple plants (e.g. FIAT), large

and midcaps companies manufacturing the machines (e.g. Siemens), and in some cases third party

data analytics SMEs can be involved to help making sense of data (e.g. Worldsensing).

Each machine typically produces data using their own proprietary standards in heterogeneous formats.

Because of the complexity and cross-sector nature of the value chain, there are not yet consolidated

industry standards. While interoperability is important, the interviewee stresses that proprietary

standards are not always just ways to protect competition but can have technological justifications:

specific types of data are best expressed in a specific language. Full interoperability would require, on

the one hand, losing some precious information that can only be expressed in a specific way; on the

other, many more data gathering requirements. A standard able to solve the different data needs of very

different machines delivering different services, would require either a simplification of the data gathered

(hence a loss of information) or a multiplication of the data provided by the machine (hence becoming

less resource intensive in terms of processing power, storage transport, energy consumption etc.).

9 http://www.datalandscape.eu/data-driven-stories/facilitating-industry-40-whats-role-industria l-data-platforms

17

Moreover, one important factor is that large machinery requires multi-million investments, and have long

upgrading times (typically 10 to 15 years), thus slowing down the timing of introduction of innovative

updates


Today, the head company typically gathers data from the individual machines through wrappers

developed on purpose to ensure syntactic interoperability. In other words, each different machine

requires ad-hoc development in order to extract and integrate the data – in some cases even requiring

ad-hoc hardware adaptors.

This does not represent an insurmountable problem since the industrial plant company is typically large

and can devote internal resources to data integration. In other words, the lack of standard is an additional

cost, rather than a barrier per se.

However, this can represent a significant barrier for third party SMEs, such as a data analytics company

aiming to enter the value chain to provide, for instance, predictive maintenance. Furthermore, ad-hoc

development practices, wrappers and adaptors can represent a barrier for large companies in case of

major mergers that have to undergo massive investment in order to ensure integration of the different

IT systems.

Future Perspectives

Over time, in response to market dynamics, machinery producers are increasing the openness of their

systems. They are gradually moving towards open standards and using APIs to ensure third party

secure access to data.

Providers such as Siemens are developing their own data platforms (MindSphere) that are supposed to

integrate and manage data from different machine manufacturers through connectors (MindConnect

Nano). Software Development Kits are being developed to allow third parties machine manufacturers to

directly “plug in” their data streams to the cloud platform.

There are also on-going efforts towards establishing industry standards, in particular through the Open

Platform Communication Unified Architecture (OPC UA). As the name indicates, this is a high level

standard for information architecture. As illustrated in the figure below, OPC UA works on a layer below

vertical industry standards and models, and it is designed to be future-proof thanks to its multi-layered

architecture.

Figure 8: the OPC UA standard

Manufacturing operates more and more through complex value chains, interlocking different sectors

with their own industry standards: consider for instance providers of furniture for boats, of textiles for

cars, of engines for airplanes, or of ICT for all sectors. But these different industries have different

requirements and data management needs: for example, the automotive sector is highly standardized,

18

the aerospace sector has far higher security requirements than others, and so on. It is probably

impossible to achieve full interoperability between all these sectors but it is also not desirable, because

this would entail loss of detail or increased data requirements. According to the interviewee, there is a

clear trade-off between generic and specific standards, due to the different amount of data required. An

increase in openness and scope of the standard drives a corresponding increase in the amount of data

required, while specific standards require only limited, specific data collection. However, to facilitate

cooperation between different industries as necessary in today’s complex value chains, there is going

to be an increasing demand for high-level, open, cross-sector standards providing open collaboration

platforms, without precluding the existence of specific solutions for specific processes.

Last but not least, it is clear that progress towards these new developments in manufacturing will take

longer than in other industries, because of the high capital investment and slow upgrading cycle of the

machinery. Despite this relatively slow pace of change, the trend towards greater interoperability and

openness is clear.

2.5 Case Study 4: Yukon Digital


The chemical industry is not one of the leading users of big data in Europe, but recently it has been

making considerable progress. .

Yukon Digital is a German big data start-up, providing analytics services to chemical manufacturing

industries. It was founded in 2014 by two partners, one being former Vice President at BASF.

Yukon Digital client-base includes large chemicals and oil and gas companies worldwide. The company

covers different phases in the value chain, from supply chain management, to operations, to marketing

and sales. The main datasets it uses are generated by sensors by the machinery in industrial plants,

and data generated by internal IT systems.


The client company gathers heterogeneous dataset from different IT systems, and provides them to

Yukon. The datasets could refer to data from sensors in the plant, from internal IT management, from

laboratories. Data are neither standardized nor interoperable; they are gathered by the client company

and transmitted to Yukon via traditional means, such as e-mail, cloud, and sometimes in-site analysis.

Yukon carries out an intensive effort to harmonise the data and make them usable. However, this work

used to employ 90% of project resources, while today, thanks to increasing openness of the systems, it

only requires 50%. Hence more resources can be devoted to actually analysing data and providing

added value, rather than curating them.

The effort for data harmonization and cleaning is part of the services provided by the company, but

certainly raised the costs of big data analytics without producing visible benefits for the client. In a recent

interview, Yukon founder Sean Jones puts it clearly: “connecting different data sources together is still

one of the biggest challenges – whether it is for internal or external processes. Additionally, the data is

oftentimes not just stored in an operational storage but also in laboratory information systems. So

bringing the different data with different formats together and then merging it into a data set that is

adequate for predictive work is one of the biggest challenges.”

19


There is a clear trend towards greater interoperability of the data provided by the different IT systems

and sensors. Moreover, automatic data curation (powered by machine learning) is expected to

continuously and significantly reduce the costs of data harmonization.

In perspective, the benefits of data sharing are becoming more visible to data holders, and the costs of

data curation are going down. That could pave the way to a strong acceleration in innovation, in

particular when it comes to applying machine learning techniques and artificial intelligence to industrial

data.

20

3 Final Considerations

This report has investigated the technical barriers to data sharing, based on a set of real-life case

studies. The starting point was the need to understand the nature and extent of the problem, and the

possible solutions. The results of this empirical research lead to the following considerations.

3.1 Cross analysis of the cases

The cases show a good variety of solutions to address the technical barriers . In some cases, industry

standard make data sharing seamless (vending machines), in other, sharing requires extensive

additional work (large manufacturing plants).

Overall, we can identify three main category of solutions: standards, technology and human work. Each

individual project to overcome technical barriers relies on a different combination of these three

dimensions, that could be illustrated in the radar chart below.

Figure 9 Three ways to overcome technical barriers

In terms of standards, we identified narrow industry standards (Vending Machines) and high level

architecture standards (OPC UA). These standards extend to the physical connection, which can be

standard (USB stick) or not (as in the case of manufacturing).

With regard to technology, there is an increasing usage of API and SDK to enable easier data sharing.

Moreover, machine learning is expected to facilitate data curation and reduce its costs over the next

years.10 Companies such as Trifacta and Paxata provide tools for simplifying the work of data scientists:

machine learning is used to automate data preparation in order to find, clean, blend and suggest data.

They make data ready to be used by data scientists through their analysis or visualisation tools.

Finally, data curation and wrangling can be solved simply by investing human work. Data marketplaces

and data analytics companies offer this service as part of their work. According to interviewees and

reports, still today data curation usually takes between 50% and 80% of the data scientists’ work.11

According to a recent survey, data preparation accounts for 79% of the time of data scientists, who

consider it as the least enjoyable part of the work.12 Moreover, it appears that this service is not

10 https://www.oreilly.com/ideas/lessons-from-next-generation-data-wrangling-tools 11 https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-

work.html?_r=5 12 http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-

data-science-task-survey-says/#76f31cf47f75

https://www.oreilly.com/ideas/lessons-from-next-generation-data-wrangling-tools

https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=5

https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html?_r=5

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#76f31cf47f75

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#76f31cf47f75

21

monetisable as stand alone, as its value is not appreciated by the client, but needs to be bundled with

the overall service provided. There are also voluntary initiatives to curate data of public interest, such

as the OKFN Open Knowledge Labs which aim to “Collect and maintain important and commonly-used

(“core”) datasets in high-quality, reliable and easy-to-use form (as Data Packages)”.13

In the future, most probably the solution will come from the interaction of all three dimensions, since

both standards and technology will always require some degree of human intervention. But, interviewees

agree that there is a general trend towards greater openness, standard adoption and data sharing

across sectors. Market forces matter: data are becoming more valuable, the cost of data gathering

decreases, and customers (especially business) expect greater accessibility of the data. Combined with

the technological evolution, there is a clear trend towards a reduction of the human effort for data

curation, freeing up precious resources of data scientists for performing analytics rather than data

curation.

3.2 Policy Implications

Based on the evidence collected from the case studies and the desk research we can now answer the

key questions initially posed about the technical barriers to data sharing.

First of all, technical barriers are perceived when enterprises start dealing with data sharing in practice

in the context of digital transformation or other digitisation projects, most often requiring collaboration

between different enterprises in a value chain, be it mono-sector or multisector. The main issue is lack

of interoperability between different datasets preventing integration and extraction of value from the

data. This issue is becoming more frequent because of the increasing pervasiveness of IoT sensors

generating heterogeneous data flows.

However, industry stakeholders currently perceive lack of interoperability not as a barrier preventing

data sharing, but simply as a problem with various possible solutions depending on the specific situation,

based on some combination of technology, standardization and human work. The evolution of the

market is currently providing these solutions. Specific industry standards are evolving to enable data

sharing, or high level architecture standards are being developed to make data easily accessible and

transferrable between operators in the same segment or at industry level. The increasing adoption of

ad-hoc technologies such as APIs and DSK is ensuring third-party secure access to data and easier

data sharing. What is more, machine learning technologies are expected to enhance data curation

activities, thus facilitating the integration of data from different sources and organizations and reducing

the costs of data sharing. We are also witnessing the emergence of a new category of specialized

intermediaries focused on data transformation to enable data sharing, with several start-ups identified

by the Big Data Landscape 2016 of the European Data Market Monitoring Tool.

Data interoperability problems are easier to solve in single sector value chains (as shown by the N&W

Global Vending case), building on existing industry standards and formats, and more complex in multi-

sector value chains which require brand new approaches. However, even single sector value chains will

likely evolve towards deeper interaction with actors from other sectors, to provide more value added

services (as could be the case with advertising and predictive maintenance for Vending Machines). This

means that new data interoperability issues will emerge and require new solutions.

Overall, interviewees recognise that there is a general trend towards greater openness, standard

adoption and data interoperability across all sectors, but they question the speed of progress which is

quite slow. Also, data interoperability problems which are currently manageable are likely to become

more complex and difficult to solve as the data-driven ecosystem grows in depth and sophistication.

European policies and specifically the DSM strategy already address the development of the main

framework conditions for data sharing, particularly through the forthcoming Free-flow of data initiative

13 http://okfnlabs.org/blog/2015/01/03/data-curators-wanted-for-core-datasets.html

http://okfnlabs.org/blog/2015/01/03/data-curators-wanted-for-core-datasets.html

22

and the ICT standardization priorities action plan Communication14, which singles out standards for

better data interoperability as a priority domain.

The open problems emerging from our analysis are the following:

The inherent tension between standardization and innovation/personalization. Enabling data

sharing does not mean imposing standards to all datasets, which would result in excessive

rigidity and prevent innovation, but developing a flexible combination of general and specific

standards, interfaces and common formats, while preventing users lock-ins because of

proprietary technologies. Different languages, protocols and models have their advantages in

delivering context-specific data, and variety and differentiation are positive values which should

be preserved. Therefore, insuring technical data interoperability will be a process accompanying

the evolution of the different sectors and their gradual maturity in digital transformation.

o Here there is a role to play for policy makers, in order to promote this process, insure a

level playing ground between stakeholders and accelerate as much as possible the

emergence of open standards and common formats, especially in the interest of SMEs

who do not have the market power to drive this process.

Data curation (preparing data for interoperability and sharing) is a time-intensive activity which

takes more than 50% of the time of data scientists in data sharing projects. It is a necessary,

but thankless job and a considerable additional cost, whose value is not sufficiently recognized

by the market. Apparently, this cost is not acting as a barrier: there is no evidence that

companies have given up on data sharing projects because of excessive data curation costs.

However, it tends to be underestimated in the planning phase. Fortunately, this task is becoming

more automated and less expensive. Data curation will always be needed to fill the gap between

different standard environments and new intermediaries are emerging to provide these services.

o The policy role here could be to make sure that sufficient R&D investments are made

into the automation of data curation (leveraging machine learning for example), and

also to promote the development of data scientists’ skills in this field. Policy makers

could also raise awareness about the pros and cons of data curation versus

standardization. Enterprises developing a business case for data sharing projects

should make sure that data curation tasks are properly assessed and accounted for,

developing realistic cost-benefits estimates.

Finally, the creation of neutral B2B data exchanges solving interoperability problems is

potentially welcomed by interviewees, but it is unclear how it could be implemented in practice,

who could be the main actors and what should be the scope in terms of business sectors. This

could be an option to be investigated, to understand if such platforms could be a way to

accelerate the solution of technical data sharing issues.

14 https://ec.europa.eu/digital-single-market/en/news/communication-ict-standardisation-priorit ies-digita l-

single-market

European Data Market SMART 2013/0063 Technical Barriers to ... · deliverable D3.12 of the study...

Documents

Transcript of European Data Market SMART 2013/0063 Technical Barriers to ... · deliverable D3.12 of the study...