Big Data: economic opportunities for Italy - Aspen Institute · hidden in massive datasets •...
Transcript of Big Data: economic opportunities for Italy - Aspen Institute · hidden in massive datasets •...
Big Data: economic opportunities for Italy
a cura di
Scuola Normale Superiore Consiglio Nazionale delle Ricerche Regione Emilia Romagna per Aspen Institute Italia
Interesse nazionale Ottobre 2017
Piazza Navona, 114 00186 - Roma
Tel: +39 06 45.46.891 Fax: +39 06 67.96.377
Via Vincenzo Monti, 12
20123 - Milano Tel: +39 02 99.96.131
Fax: +39 06 99.96.13.50
www.aspeninstitute.it
Report structure • Hallmarks of big data
• Data science
• Towards a data science agenda
• Economic and growth outlook
• Big data: structural change and renewed competitiveness
2
HALLMARKS OF BIG DATA
VOLUME
HALLMARKS OF BIG DATA
4
VARIETY + VELOCITY
weather
sensors POS payments
data warehouse
corporate
social media
video-surveillance
text documents
medical data
unstructured structure
scientific research
financial markets
structured
real
-tim
e ve
loci
ty
stat
ic
HALLMARKS OF BIG DATA
5
VOLUME VELOCITY
VARIETY
VALUE
HALLMARKS OF BIG DATA
6
HALLMARKS OF BIG DATA
• Big data analytics enables the identification of attributes, trends, and patterns on which to base choices and build entirely data-driven policies, even in the absence of benchmark models and theories within the context of use
• However, the potential value of big data can only be tapped through the development of ad-hoc analytics algorithms (data science)
• Data amassed for distinct purposes can contribute to formulating highly-innovative scenarios and methods even in different contexts
7
SOURCING BIG DATA
It is essential to: • ensure access for Italian firms and
researchers to this wealth of information
• encourage data sharing • safeguard the privacy rights of
individuals • formulate policies endorsed at a
supranational level
8
DATA SCIENCE
WHAT IS DATA SCIENCE?
Data availability, sophisticated analytics techniques, and scalable infrastructure Data science
10
Data science includes data extraction, data preparation, data exploration, data transformation, storage and retrieval, computing infrastructure, various types of data mining, machine and statistical learning, optimization, presentation of explanations and predictions, and the exploitation of results taking into account ethical, social, legal, and business considerations.
WHAT IS DATA SCIENCE?
11
THE DATA
Data may be structured or unstructured, big or small, and static or real-time.
12
THE ANALYTICS
• Data-mining algorithms for automated pattern discovery highlights the structure hidden in massive datasets
• Machine learning - “deep learning” methods exploit large “training” datasets of examples to learn general rules and models to classify data and predict outcomes
• Network science unveils the magic of shifting from the statistics of populations to the statistics of interlinked entities, connected by the ties of their mutual interactions
13
Validation
Data
Dem
ogra
phic
dat
a Ge
ogra
phic
dat
a M
ovem
ent d
ata
Tran
spor
t dat
a
Models
T-Cl
uste
ring
T-Pa
tter
ns
Forecasts
FROM DATA TO KNOWLEDGE
14
DATA SCIENCE FOR SOCIETY
Data science can improve society and boost social progress by: • supporting policymaking • yielding novel ways of producing high-quality and high-
precision statistical information • empowering citizens with self-awareness tools, and • promoting ethical uses of big data
for the “city of citizens” and people, societal debate, better governance, official statistics and demography, sustainable development, and developing countries.
15
A new data-dominated science is emerging, a data-centric way of thinking, organizing, and carrying out research activities that can lead to the solution of problems hitherto considered extremely difficult or even impossible to tackle, as well as resulting in serendipitous discoveries. Computational social science, medicine, meteorology, environmental science, ecology, agriculture, geology, and seismology are scientific fields where the data deluge, analytical capacity, processing capability, and data sharing and curation infrastructure are providing a powerful boost to research.
DATA SCIENCE FOR SCIENCE
16
DATA SCIENCE FOR INDUSTRY AND BUSINESS
Data science has the capacity to create an ecosystem of data-driven innovative business opportunities (facilitated by participatory platforms) that can help firms collaborate to bring to light new local, national, and global whitespace markets, and which can be leveraged for collaborative, participatory creation and enrichment of big data. Energy, environment, agri-food, mobility, transport and logistics, manufacturing and production, the public sector, healthcare, financial services, telecommunications services, retail, tourism etc.
17
MEASURING HAPPINESS VIA TWITTER
Computational social science is now using digital tools to analyze people’s rich and interactive lives to answer questions that were previously impossible to investigate. (Mann. PNAS January 19, 2016, vol. 113 no. 3)
18
SOCIETAL DEBATE
By analyzing millions of datasets of public debates on social media and in newspaper articles, it is possible to gauge what the most discussed topics are, how they emerge and evolve over time and space, and how opinions polarize.
19
MOBILITY, DIVERSITY, AND WELLBEING
Big data can improve official statistics by providing cheaper information in a more timely manner, capturing small-scale phenomena, and enabling the measurement of phenomena that were previously inexistent (digital assets of the population) or near-to-impossible to capture (happiness or mood).
A
B
C
HW
20
FUNCTIONAL AREAS IN TUSCANY
Data science for the “city of citizens”: Cities are the ideal living labs in which to test and deploy data science applications that indirectly translate into benefits for the individual in the form of improved public transport, a safer and healthier living environment, sustainable development, etc.
The polycentric city revealed by citizens’ everyday movements
21
ESTIMATING THE PROPAGATION OF FINANCIAL DISTRESS
Financial services: Huge amounts of data are processed to detect fraud and risk, to analyze customer behavior, segmentation, trading, and credit risk. Network science allows the systemic risk of existing economic and financial networks to be measured, thereby helping to prevent shocks and disasters.
22
SPORTS ANALYTICS
The proliferation of new sensing technologies that provide data streams extracted from every game is changing the way scientists, fans, and practitioners conceive of sports performance. By combining this (big) data with the powerful tools of data science and AI, it is now possible to reveal the great complexity underlying sports performance and carry out many challenging tasks: from automatic tactical analysis to data-driven performance ranking, game-outcome prediction, and injury forecasting.
23
TOWARDS A DATA SCIENCE AGENDA
• Semantics data integration and enrichment technology • New foundations for big data analytics • Engineering the management and curation of data • Advanced visualization and user experience • Scalable architectures for analytics • Responsible access to data
MAIN SCIENTIFIC AND TECHNOLOGICAL CHALLENGES
25
NEW FOUNDATIONS FOR BIG DATA ANALYTICS
At the convergence of data mining, machine learning, statistical modeling, optimization, and complex systems science, capable of transparently monitoring the quality of data and the results of analytical processes
– Reconciling statistical inference and computing – Explanation of machine-learning decision models – Correlation versus causality – Individual versus collective data analytics – Embedding of privacy mechanisms – Analytics as a service
26
ANALYTICS AS A SERVICE
From descriptive analytics (“What happens?”) to diagnostics (“Why did it happen?”) to prediction (“What will happen?”) to prescription (“How to make it happen?”)
Man-machine collaboration
Data Scientist Machine Intelligence
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics 27
NEW BOUNDARIES OF DATA USABILITY
• The GDPR will enter into force on 25 May 2018 and introduces new obligations for data processors in and outside the EU
• Defines rights for individuals regarding control of their own data and includes elements such as the adoption of privacy-by-design and privacy risk assessment, right to erasure and explanation, and accountability and transparency principles
28
BIG DATA, BIG RISKS
Big data is algorithmic, therefore it cannot be biased… yet • All traditional evils of social discrimination, and many new ones, exhibit
themselves in the big-data ecosystem • Because of its tremendous power, massive data analysis must be used
responsibly • Technology alone will not suffice: policy, user-involvement and education
efforts are needed
29
FOUR SKILLSETS OF THE DATA SCIENTIST
• Harvest and manage data with technical skills in collecting and integrating databases built from heterogeneous sources
• Make sense of data with technical skills in data mining, statistics, and machine learning to gain insight from large volumes of data
• Tell the story: skill in narrating the stories that data tells after analysis and modeling (e.g. using both visual and multimedia storytelling)
• Master ethical and legal aspects at every step of the discovery process 30
THE DATA SCIENCE PIPELINE
31
ECONOMIC AND GROWTH OUTLOOK
CURRENT SECTORS WHERE BIG DATA IS EMPLOYED
33
BIG DATA AND HEALTHCARE
The growth of sequencing capabilities and the sharing of medical data enables – from a big data perspective – the optimization of treatments and the development of personalized protocols without further experimentation (either on animals or on humans). This area raises particularly evident issues of ethics and privacy.
Increase in DNA sequencing capabilities
34
THE EU DATA MARKETPLACE
• EU data market (i.e. the marketplace where data-related products or services are exchanged) – in 2016, estimated at almost EUR 60 billion – by 2020, will amount to more than EUR 106 billion according to
the high-growth scenario forecast
• Total number of data firms in the EU (i.e. organizations whose main activity is the production and delivery of data-related products or services) – neared the threshold of 255,000 units in 2016 – will reach 360,000 units by 2020 according to the high-growth
scenario forecast
35
• The EU data market (data workers are those engaged in collecting, storing, managing, and analyzing data as their primary activity) – employed 6.1 million data workers in 2016 – will employ 10.4 million by 2020 according to the high-growth scenario
forecast
• The data economy (representing the aggregate impact of the data market on the EU economy as a whole) – accounted for almost 2% of EU GDP in 2016 – will have an impact of 4% on the total EU economy by 2020 according
to the high-growth scenario forecast
THE EU DATA MARKETPLACE
36
US DATA MARKET
Source: McKinsey Top 5 Game-changers 2013 37
Source: McKinsey 12 disruptive technologies 2017 38
Source: http://www3.weforum.org/docs/WEF_Future_of_Jobs.pdf
39
FUTURE OF JOBS
• 5.1 million jobs set to be lost in Western countries to disruptive labor-market changes over the period 2015–2020
• a total loss of 7.1 million jobs concentrated in routine white-collar office functions, such as office and administrative roles
• a gain of 2 million jobs in computer-, mathematical-, architectural-, and engineering-related fields
Source: World Economic Forum’s “Future of Jobs” Report (2016)
40
BIG DATA: STRUCTURAL CHANGE
AND RENEWED COMPETITIVENESS
Industry 4.0 should not be viewed solely from a technological standpoint but also from the perspective of the ability to coordinate science, technology, skills, and social context with a view to being best able to facilitate convergence of distinct but complementary technologies to respond to both the major global issues and the individual demands of millions of users/clients
INDUSTRY 4.0 AND BIG DATA
42
What gives order to new "industry" is hyper-connectivity and, hence, big data
Big data not only as a commodity but, above all, as a new way of tackling and managing modern-day complexity
Global value chains move their various phases around according to the value-added achievable in different local contexts
INDUSTRY 4.0 AND BIG DATA
43
high
medium high
Cottage industry
Fordist production
medium
low
low
Volumes of production
Product Differentiation
Flexible production
Industry 4.0
PRODUCT DEFINITION
44
high
medium
low medium high
Scale
low
Scope
Rigid mass production
Flexible mass production
Customized individual production
Customized mass production
PROCESS ORGANIZATION
45
46
Enabling technologies Additive manufacturing Digital manufacturing Virtual reality Second generation robots Internet of things Big data Artificial intelligence
I 4.0
Skills and infrastructure for the convergence of complementary technologies
46
BIBLIOGRAPHY
• Data Science: a Game-changer for Science and Innovation, Report for the G7 Academy, 2017
• The Big Data Value Strategic Research Innovation Agenda, 2017 http://www.bdva.eu/
• Big Data Analytics: towards a European Research Agenda, ERCIM (European Research Consortium for Informatics and Mathematics) White Paper on Big Data Analytics, 2015 https://www.ercim.eu/news/387-ercim-white-paper-on-big-data-analytics
• The fourth paradigm: data-intensive scientific discovery, Tony Hey, Stewart Tansley and Kristin Tolle, Microsoft Research, 2009
47