Work the way you live Deloitte Shared Services Conference 2019 · world, Gary Kasparov, and won. In...
Transcript of Work the way you live Deloitte Shared Services Conference 2019 · world, Gary Kasparov, and won. In...
Work the way you liveDeloitte Shared Services Conference 2019
Data science 101: demystifying data scienceChristina Ablewhite and Valentin Cojocaru, Deloitte
© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science
What do you want to get out of this lab?
© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science
Machine learning – a brief history
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1959 1997 2015
203020182016
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1997 2015
203020182016
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine Learning Evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1959 2015
203020182016
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1959 1997
203020182016
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1959 1997 2015
20302018
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1959 1997 2015
20302016
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning evolution
In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.
Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.
By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.
In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.
In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.
In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.
1959 1997 2015
20182016
© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science
What is data science?
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
What is data science?
“The world’s most valuable resource is no longer oil, but data”
- The Economist, May 2017
Definition
Data science is the discipline of extracting insight from all kinds of data, to improve decision making.
Data science is also about creating Artificial Intelligence (AI) products that perform tasks which ordinarily require human intelligence.
Data science helps us by
Identifying opportunities for improvement. Automating tasks for improved productivity. Recommending next best steps based on insights. Improving decision-making. Leading to better outcomes.
Computer Science
Maths & Statistics
Specialist Knowledge
Data Science
Data science lies at the intersection of (1) domain knowledge, (2) knowledge of mathematics and statistics, and (3) the ability to write code.
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Why now?
Data Volume
We now have too much information for humans to perceive for accurate decision-making and enough information for data science to be able to create accurate recommendations and predictions
Computing Processing Power
We now have the computing and cloud infrastructure to support the volume and speed of analytical capabilities that are required to meet demand at scale
Smarter Algorithms
We now have the data science capabilities and algorithms spread globally that allow machines to effectively support or replace human interaction in task completion and decision-making processes in our everyday lives
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
The software and programs used by a data scientist are split across three broad areas. For any project, a data scientist would generally use a minimum of 3 tools, one from each category
Common data science tools
Database andData Architecture
Programming And Modelling
Front-end Visualisationand Application
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Finding the right technique
… which of these data science techniques are suited to testing them?
A.Classification
B.Sentiment Analysis
C.Image Recognition
D.Optical Character Recognition (OCR)
E.Clustering
If have all of these challenges…
1. If I have medical data of 100 patients then can I predict who should be placed in an intensive-care unit?
2. If I have customer data then can I predict what demographic I should target when releasing product X?
3. If I have access to online forums then can I understand customer perceptions of a certain brand?
4. If I have 1,000 doctor notes then can I translate this information into data that can be analysed?
5. If I have brain scans of 100 patients then can I recognise tumour cells?
© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science
Machine learning techniques
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning introduction
ComputerComputer
Traditional Programming/ Modelling Machine Learning
Data
Algorithm/ ModelOutput
DataAlgorithm/ Model
Output
Machine learning is a family of algorithms that can consume large amounts of data, train itself on the data by seeking/ recognising patterns, and then output predictions based on those patterns
What is it?01
• Supervised learning:Algorithm predicts known outcomes from historical data
• Unsupervised learning: Algorithm finds structure in data without known labels/ outcomes
• Reinforcement learning: Rewards from sequence of actions
How does it work?02
• Faster/ more efficient:
Seconds vs thousands of man hours
• Cost effective:
Eliminates manual, repetitive labour hours
• Scalable:
Works best with large data sets
• Sustainable:
Easy to monitor
Why is it important?03
”Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed”
- Andrew Ng, Stanford
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning techniques we are covering today
Machine learning techniques
Machine learning
Supervised learning Unsupervised learning
Regression Classification
Deep learningNatural language
processing
Clustering
Optical character recognition
Sentiment analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Supervised learning - regression
Supervised learning definition: An algorithm which learns from a ‘training dataset’, consisting of some input data in addition to the desired output values. The algorithm can then predict the output value based on new input data.
House Price
House size (m2)
House Price
House size (m2)
Get training data
Desired output Input variableTraining Data
Model
Fit model with Training Data
Use model for prediction
New data point
House price House size
£304,100 161 m2
£287,546 140 m2
£175,908 102 m2
£222,674 114 m2
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Supervised learning - classification
Definition: Identifying which set of categories a new observation belongs to.
Example labelled dataset:
Historic data of when I read the newspaper in the morning on my way to work.
Can you predict if I will read the newspaper on 10 April?
Challenge 1: If I have medical data of 100 patients then I can predict who should be placed in an intensive-care unit.
Date Day of weekNewspaper
HeadlineMy commute
Read newspaper?
14 January Monday Negative Long Yes
23 January Wednesday Negative Short No
31 January Thursday Neutral Long Yes
1 February Friday Positive Long Yes
12 February Tuesday Neutral Short Yes
21 February Thursday Neutral Short No
12 March Tuesday Positive Long Yes
20 March Wednesday Negative Short No
22 March Friday Negative Short No
25 March Monday Neutral Long Yes
29 March Friday Positive Long Yes
10 April Wednesday Positive Short ???
Day of the Week
Mon Tues Wed
Newspaper Headline
My Commute
NeutralPos
Thu Fri
Neg Short Long
No No Yes NoYes
Indicative decision tree structure
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Supervised learning – natural language processing (NLP)
Definition: The application of computational techniques to the analysis and synthesis of natural language and speech.
Brief history:
• Originated with computational linguistics in the U.S. in the 1950s • Focused on machine translation from Russian to English • Deemed to be an easy task • Note: not perfectly solved up until now...
Example NLP techniques:
• Sentiment analysis• Image and speech to text• Automatic report generation• Machine translation (e.g. Google Translate)• Text prediction
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
NLP – why is it hard?!
What is a 2? Non-standard language
Complicated languages
• German: Donaudampfschiffahrtsgesellschaftskapitän (5 “words”)
• Chinese: 50,000 different characters (2-3k to read a newspaper)
• Japanese: 3 writing systems
• Thai: Ambiguous word boundaries and sentence concepts
• Slavic: Different word forms depending on gender, case, tense
Speech is cultural! Honest or sarcastic?
It’s tim3 t0 l34rn h0w t0 r34d 4g41n!
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
NLP – sentiment analysis
Definition: The process of computationally identifying and categorising opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.
Challenge 2: If I have access to online forums then I can understand customer perceptions of a certain brand.
Helpful for learning but battery could be better
11 January 2019
Brilliant educational toy! I bought it as a birthday present for my daughter as I wanted to help her improve
her knowledge of numbers and basic maths. It’s compact and got a good-sized screen and the images are very
clear so she can use it even when we are in the car. What I don’t like is how often it needs to be charged,
the battery seems rather poor. All in all, it’s still a good investment into learning and my little girl
enjoys using it.
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
NLP – optical character recognition (OCR)
Definition: OCR is the electronic conversion of images of typed, handwritten or printed text into machine encoded text.
Challenge 3: If I have 1,000 doctor notes then I can translate this information into data that can be analysed.
Brief History:
1913 – Optophone used to detect black print and convert it to an audible output 1960’s & 1970’s – Postal services start using OCR to scan for addresses using a limited number of fonts
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Supervised learning – deep learning
Definition: Deep Learning is a class of ML algorithms that uses multiple layers for feature extraction and learns multiple levels of representation for different levels of abstraction.
DATA ANSWERSDATA ANSWERS
Traditional Approaches Neural Network Approach
Neural Networks approach modelling by breaking down one large complex problem into many small simple problems.
Input Layer Output Layer
Hidden Layers
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Deep learning use case
Challenge 4: If I have brain scans of 100 patients then I can recognise tumour cells.
Image recognition using deep learning
Medical imaging generates data at unprecedented rates. Clinicians examine a large number of patients’ scans, while each scanned image contains GBs of data.
Traditionally, scanned images need to be reviewed by humans, however, since there is such a large volume of “training data” (i.e. the number of scans), there is an opportunity for applying deep learning.
Trends and potential anomalies in the scans can therefore be identified faster and in many cases more accurately.
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Unsupervised learning
Definition: When there is no training dataset and the desired result is not known.
Example: clustering algorithms
y
x
y
x
Original data Clustered data
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Unsupervised learning - clustering
Definition: Clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than those in other groups (i.e. pattern finding).
Challenge 5: If I have customer data then I can predict what demographic I should target when releasing product X.
Affluence
Age
Low-moderate affluence, young
Low-moderate affluence, middle aged
Moderate-high affluence, middle aged
Clustering algorithms find natural groupings in the data
Machine Learning
Supervised Unsupervised
Regression Classification
Deep LearningNLP
Clustering
Optical Character Recognition
Sentiment Analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine learning techniques we have covered today
Machine learning techniques
Machine learning
Supervised learning Unsupervised learning
Regression Classification
Deep learningNatural language
processing
Clustering
Optical character recognition
Sentiment analysis
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine Learning Quiz
Predict the future share price, given data about the previous
price and other economic factors
Recognise a person in a photo given previous photos of same
person
Given a dataset of demographic features, of customers, find typical types of customers
Driverless cars: Equip a car with many sensors and drive it around, such that the car
sensors record how you react to different roads and scenarios
Asses customers’ ability to repay a loan and group them
accordingly
Given a photo of handwritten text, convert it to digital text
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine Learning Quiz
Predict the future share price, given data about the previous
price and other economic factors
Recognise a person in a photo given previous photos of same
person
Given a dataset of demographic features, of customers, find typical types of customers
Asses customers’ ability to repay a loan and group them
accordingly
Given a photo of handwritten text, convert it to digital text
- Regression
Deep Learning (Facial Recognition) -
- Clustering
Classification -
- NLP (OCR)
Reinforcement Learning or Classification -
Driverless cars: Equip a car with many sensors and drive it around, such that the car
sensors record how you react to different roads and scenarios
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine Learning Quiz
Predict whether a transaction is fraudulent or not
Given a dataset of customer calls, group them by call reason
Understand whether customers liked a new product based on
their online reaction
You’re planning a road trip and want to know how much money
to allocate for gas based on previous road trip experiences
Given a dataset of consumer buying habits at grocery retailers,
find typical types of consumers
Given a large sample of hip MRI data, find the most common
medical issue
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Machine Learning Quiz
Predict whether a transaction is fraudulent or not
Given a dataset of customer calls, group them by call reason
Understand whether customers liked a new product based on
their online reaction
You’re planning a road trip and want to know how much money
to allocate for gas based on previous road trip experiences
Given a dataset of consumer buying habits at grocery retailers,
find typical types of consumers
Given a large sample of hip MRI data, find the most common
medical issue
- Classification
Classification or Clustering -
- NLP (Sentiment Analysis)
Clustering -
- Deep Learning (Image Recognition)
Regression -
Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.
Q&A
Any questions?
This publication has been written in general terms and we recommend that you obtain professional advice before acting or refraining from action on any of the contents of this
publication. Deloitte MCS Limited accepts no liability for any loss occasioned to any person acting or refraining from action as a result of any material in this publication.
Deloitte MCS Limited is registered in England and Wales with registered number 03311052 and its registered office at Hill House, 1 Little New Street, London, EC4A 3TR, United
Kingdom.
Deloitte MCS Limited is a subsidiary of Deloitte LLP, which is the United Kingdom affiliate of Deloitte NWE LLP, a member firm of Deloitte Touche Tohmatsu Limited, a UK private
company limited by guarantee (“DTTL”). DTTL and each of its member firms are legally separate and independent entities. DTTL and Deloitte NWE LLP do not provide services to
clients. Please see www.deloitte.com/about to learn more about our global network of member firms.
© 2019 Deloitte MCS Limited. All rights reserved.