Work the way you live Deloitte Shared Services Conference 2019 · world, Gary Kasparov, and won. In...

36
Work the way you live Deloitte Shared Services Conference 2019 Data science 101: demystifying data science Christina Ablewhite and Valentin Cojocaru, Deloitte

Transcript of Work the way you live Deloitte Shared Services Conference 2019 · world, Gary Kasparov, and won. In...

Work the way you liveDeloitte Shared Services Conference 2019

Data science 101: demystifying data scienceChristina Ablewhite and Valentin Cojocaru, Deloitte

© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science

What do you want to get out of this lab?

© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science

Machine learning – a brief history

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1959 1997 2015

203020182016

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1997 2015

203020182016

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine Learning Evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1959 2015

203020182016

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1959 1997

203020182016

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1959 1997 2015

20302018

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1959 1997 2015

20302016

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning evolution

In 2016, AlphaGo - a program that used machine learning - decimated world champion Ke Jie. The program’s creators let it continue to train and new version picked it up from scratch without studying any human games at all and mastered the ancient Chinese board game in three days.

Amazon opened their first public Amazon Go store in 2018. It is proving to revolutionise the retail experience utilising sophisticated image recognition software and AI to automatically charge you for your goods.

By 2030 it is predicted that AI will reshape the future of work. Capabilities required by humans will adapt as the influence of AI increases. Unique human capabilities will excel as we find new ways of augmenting human and machine interactions.

In 1997, IBM’s state-of-the-art computer, Deep Blue, played chess against the best human in the world, Gary Kasparov, and won.

In 2015, Google DeepMind trained a machine learning system to play Atari video games. After several hours, they were as good as most humans. After a few more hours, they were better than any human has ever been.

In 1959, Arthur Samuel, IBM researcher, invented a computer that could play checkers/draughts as well as a “respectable amateur”.

1959 1997 2015

20182016

© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science

What is data science?

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

What is data science?

“The world’s most valuable resource is no longer oil, but data”

- The Economist, May 2017

Definition

Data science is the discipline of extracting insight from all kinds of data, to improve decision making.

Data science is also about creating Artificial Intelligence (AI) products that perform tasks which ordinarily require human intelligence.

Data science helps us by

Identifying opportunities for improvement. Automating tasks for improved productivity. Recommending next best steps based on insights. Improving decision-making. Leading to better outcomes.

Computer Science

Maths & Statistics

Specialist Knowledge

Data Science

Data science lies at the intersection of (1) domain knowledge, (2) knowledge of mathematics and statistics, and (3) the ability to write code.

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Why now?

Data Volume

We now have too much information for humans to perceive for accurate decision-making and enough information for data science to be able to create accurate recommendations and predictions

Computing Processing Power

We now have the computing and cloud infrastructure to support the volume and speed of analytical capabilities that are required to meet demand at scale

Smarter Algorithms

We now have the data science capabilities and algorithms spread globally that allow machines to effectively support or replace human interaction in task completion and decision-making processes in our everyday lives

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

The software and programs used by a data scientist are split across three broad areas. For any project, a data scientist would generally use a minimum of 3 tools, one from each category

Common data science tools

Database andData Architecture

Programming And Modelling

Front-end Visualisationand Application

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Finding the right technique

… which of these data science techniques are suited to testing them?

A.Classification

B.Sentiment Analysis

C.Image Recognition

D.Optical Character Recognition (OCR)

E.Clustering

If have all of these challenges…

1. If I have medical data of 100 patients then can I predict who should be placed in an intensive-care unit?

2. If I have customer data then can I predict what demographic I should target when releasing product X?

3. If I have access to online forums then can I understand customer perceptions of a certain brand?

4. If I have 1,000 doctor notes then can I translate this information into data that can be analysed?

5. If I have brain scans of 100 patients then can I recognise tumour cells?

© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science

Machine learning techniques

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning introduction

ComputerComputer

Traditional Programming/ Modelling Machine Learning

Data

Algorithm/ ModelOutput

DataAlgorithm/ Model

Output

Machine learning is a family of algorithms that can consume large amounts of data, train itself on the data by seeking/ recognising patterns, and then output predictions based on those patterns

What is it?01

• Supervised learning:Algorithm predicts known outcomes from historical data

• Unsupervised learning: Algorithm finds structure in data without known labels/ outcomes

• Reinforcement learning: Rewards from sequence of actions

How does it work?02

• Faster/ more efficient:

Seconds vs thousands of man hours

• Cost effective:

Eliminates manual, repetitive labour hours

• Scalable:

Works best with large data sets

• Sustainable:

Easy to monitor

Why is it important?03

”Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed”

- Andrew Ng, Stanford

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning techniques we are covering today

Machine learning techniques

Machine learning

Supervised learning Unsupervised learning

Regression Classification

Deep learningNatural language

processing

Clustering

Optical character recognition

Sentiment analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Supervised learning - regression

Supervised learning definition: An algorithm which learns from a ‘training dataset’, consisting of some input data in addition to the desired output values. The algorithm can then predict the output value based on new input data.

House Price

House size (m2)

House Price

House size (m2)

Get training data

Desired output Input variableTraining Data

Model

Fit model with Training Data

Use model for prediction

New data point

House price House size

£304,100 161 m2

£287,546 140 m2

£175,908 102 m2

£222,674 114 m2

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Supervised learning - classification

Definition: Identifying which set of categories a new observation belongs to.

Example labelled dataset:

Historic data of when I read the newspaper in the morning on my way to work.

Can you predict if I will read the newspaper on 10 April?

Challenge 1: If I have medical data of 100 patients then I can predict who should be placed in an intensive-care unit.

Date Day of weekNewspaper

HeadlineMy commute

Read newspaper?

14 January Monday Negative Long Yes

23 January Wednesday Negative Short No

31 January Thursday Neutral Long Yes

1 February Friday Positive Long Yes

12 February Tuesday Neutral Short Yes

21 February Thursday Neutral Short No

12 March Tuesday Positive Long Yes

20 March Wednesday Negative Short No

22 March Friday Negative Short No

25 March Monday Neutral Long Yes

29 March Friday Positive Long Yes

10 April Wednesday Positive Short ???

Day of the Week

Mon Tues Wed

Newspaper Headline

My Commute

NeutralPos

Thu Fri

Neg Short Long

No No Yes NoYes

Indicative decision tree structure

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Supervised learning – natural language processing (NLP)

Definition: The application of computational techniques to the analysis and synthesis of natural language and speech.

Brief history:

• Originated with computational linguistics in the U.S. in the 1950s • Focused on machine translation from Russian to English • Deemed to be an easy task • Note: not perfectly solved up until now...

Example NLP techniques:

• Sentiment analysis• Image and speech to text• Automatic report generation• Machine translation (e.g. Google Translate)• Text prediction

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

NLP – why is it hard?!

What is a 2? Non-standard language

Complicated languages

• German: Donaudampfschiffahrtsgesellschaftskapitän (5 “words”)

• Chinese: 50,000 different characters (2-3k to read a newspaper)

• Japanese: 3 writing systems

• Thai: Ambiguous word boundaries and sentence concepts

• Slavic: Different word forms depending on gender, case, tense

Speech is cultural! Honest or sarcastic?

It’s tim3 t0 l34rn h0w t0 r34d 4g41n!

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

NLP – sentiment analysis

Definition: The process of computationally identifying and categorising opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.

Challenge 2: If I have access to online forums then I can understand customer perceptions of a certain brand.

Helpful for learning but battery could be better

11 January 2019

Brilliant educational toy! I bought it as a birthday present for my daughter as I wanted to help her improve

her knowledge of numbers and basic maths. It’s compact and got a good-sized screen and the images are very

clear so she can use it even when we are in the car. What I don’t like is how often it needs to be charged,

the battery seems rather poor. All in all, it’s still a good investment into learning and my little girl

enjoys using it.

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

NLP – optical character recognition (OCR)

Definition: OCR is the electronic conversion of images of typed, handwritten or printed text into machine encoded text.

Challenge 3: If I have 1,000 doctor notes then I can translate this information into data that can be analysed.

Brief History:

1913 – Optophone used to detect black print and convert it to an audible output 1960’s & 1970’s – Postal services start using OCR to scan for addresses using a limited number of fonts

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Supervised learning – deep learning

Definition: Deep Learning is a class of ML algorithms that uses multiple layers for feature extraction and learns multiple levels of representation for different levels of abstraction.

DATA ANSWERSDATA ANSWERS

Traditional Approaches Neural Network Approach

Neural Networks approach modelling by breaking down one large complex problem into many small simple problems.

Input Layer Output Layer

Hidden Layers

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Deep learning use case

Challenge 4: If I have brain scans of 100 patients then I can recognise tumour cells.

Image recognition using deep learning

Medical imaging generates data at unprecedented rates. Clinicians examine a large number of patients’ scans, while each scanned image contains GBs of data.

Traditionally, scanned images need to be reviewed by humans, however, since there is such a large volume of “training data” (i.e. the number of scans), there is an opportunity for applying deep learning.

Trends and potential anomalies in the scans can therefore be identified faster and in many cases more accurately.

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Unsupervised learning

Definition: When there is no training dataset and the desired result is not known.

Example: clustering algorithms

y

x

y

x

Original data Clustered data

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Unsupervised learning - clustering

Definition: Clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than those in other groups (i.e. pattern finding).

Challenge 5: If I have customer data then I can predict what demographic I should target when releasing product X.

Affluence

Age

Low-moderate affluence, young

Low-moderate affluence, middle aged

Moderate-high affluence, middle aged

Clustering algorithms find natural groupings in the data

Machine Learning

Supervised Unsupervised

Regression Classification

Deep LearningNLP

Clustering

Optical Character Recognition

Sentiment Analysis

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine learning techniques we have covered today

Machine learning techniques

Machine learning

Supervised learning Unsupervised learning

Regression Classification

Deep learningNatural language

processing

Clustering

Optical character recognition

Sentiment analysis

© 2019 Deloitte MCS Limited. All rights reserved. Data Science 101: Demystifying Data Science

Recap

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine Learning Quiz

Predict the future share price, given data about the previous

price and other economic factors

Recognise a person in a photo given previous photos of same

person

Given a dataset of demographic features, of customers, find typical types of customers

Driverless cars: Equip a car with many sensors and drive it around, such that the car

sensors record how you react to different roads and scenarios

Asses customers’ ability to repay a loan and group them

accordingly

Given a photo of handwritten text, convert it to digital text

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine Learning Quiz

Predict the future share price, given data about the previous

price and other economic factors

Recognise a person in a photo given previous photos of same

person

Given a dataset of demographic features, of customers, find typical types of customers

Asses customers’ ability to repay a loan and group them

accordingly

Given a photo of handwritten text, convert it to digital text

- Regression

Deep Learning (Facial Recognition) -

- Clustering

Classification -

- NLP (OCR)

Reinforcement Learning or Classification -

Driverless cars: Equip a car with many sensors and drive it around, such that the car

sensors record how you react to different roads and scenarios

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine Learning Quiz

Predict whether a transaction is fraudulent or not

Given a dataset of customer calls, group them by call reason

Understand whether customers liked a new product based on

their online reaction

You’re planning a road trip and want to know how much money

to allocate for gas based on previous road trip experiences

Given a dataset of consumer buying habits at grocery retailers,

find typical types of consumers

Given a large sample of hip MRI data, find the most common

medical issue

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Machine Learning Quiz

Predict whether a transaction is fraudulent or not

Given a dataset of customer calls, group them by call reason

Understand whether customers liked a new product based on

their online reaction

You’re planning a road trip and want to know how much money

to allocate for gas based on previous road trip experiences

Given a dataset of consumer buying habits at grocery retailers,

find typical types of consumers

Given a large sample of hip MRI data, find the most common

medical issue

- Classification

Classification or Clustering -

- NLP (Sentiment Analysis)

Clustering -

- Deep Learning (Image Recognition)

Regression -

Data Science 101: Demystifying Data Science© 2019 Deloitte MCS Limited. All rights reserved.

Q&A

Any questions?

This publication has been written in general terms and we recommend that you obtain professional advice before acting or refraining from action on any of the contents of this

publication. Deloitte MCS Limited accepts no liability for any loss occasioned to any person acting or refraining from action as a result of any material in this publication.

Deloitte MCS Limited is registered in England and Wales with registered number 03311052 and its registered office at Hill House, 1 Little New Street, London, EC4A 3TR, United

Kingdom.

Deloitte MCS Limited is a subsidiary of Deloitte LLP, which is the United Kingdom affiliate of Deloitte NWE LLP, a member firm of Deloitte Touche Tohmatsu Limited, a UK private

company limited by guarantee (“DTTL”). DTTL and each of its member firms are legally separate and independent entities. DTTL and Deloitte NWE LLP do not provide services to

clients. Please see www.deloitte.com/about to learn more about our global network of member firms.

© 2019 Deloitte MCS Limited. All rights reserved.