Machine Learning with SAS · PDF fileSOME MODELS USED IN MACHINE LEARNING ... Amazon...
Transcript of Machine Learning with SAS · PDF fileSOME MODELS USED IN MACHINE LEARNING ... Amazon...
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
MACHINE LEARNING WITH SAS
SAS NORDIC FANS WEBINAR – 21. MARCH 2017
This webinar will be recorded.
Please engage, use the Questions function during the presentation!
Kaare Brandt Petersen
Education & Academic
Gert Nissen
Technical Client Manager
Georg Morsing
Senior Manager
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
INTRODUCTION GETTING STARTED
Agenda
• Introduction – What is Machine Learning?
• Advanced Models used in Machine Learning
• Unstructured data
Who-am-I
• Nordic Director, Education & Academic
• Ph.d. Mathematical Modelling
What-about-you?
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
INTRODUCTION WHY IS MACHINE LEARNING HOT?
The Game Go – machine beats
the human world champion
Team Alpha Go developed an algorithm beating
the world champion Lee Sedol in spring 2016.
1
Speaking Chinese when
you speak English
Former Kaggle president Jeremy
Howard presented this example in his
TED Talk: Speach-to-text + translation
+ text to speach modulated.
2
Looking at pictures and
understand what you see
ImageNET example from Stanford 2014
– text formed by algorithm.
3
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
INTRODUCTION WHAT IS MACHINE LEARNING?
[Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed
Arthur Samuel (1901-1990), USA
Pioneer in computer games
First self-learning program playing checkers, 1959
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
INTRODUCTION THEORY VS DATA
Theory of what happened Function derived from theoryTheory based model
fitted to data
Data of what happened Function which can adapt to just
about every data pattern
Data driven modelling
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
”In machine learning, data speaks louder than theory
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ADVANCED MODELS
A WAY TO DEAL WITH A COMPLEX REALITY
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
APPROACHES BE FLEXIBLE (ADAPTABLE TO MULTIPLE REALITIES)
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ADVANCED MODELSOVERFITTING AND BALANCE BETWEEN
FLEXIBILITY AND DATA POINTS
Underlying process
Fitted function
Data point
Model complexity
(flexibility)
Data Amount
Simple
Complex
Small Large
Poor fit Poor fit Poor fit
Good fit Good fit
Good fitOverfitting Overfitting
Overfitting
Too simple models
Potentially good models
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ADVANCED MODELSDATA PARTITIONING IS A WAY TO FIND THE BALANCE
BETWEEN FLEXIBILITY AND DATA POINTS
Data set
Training data
Validation data
Test data
40%
30%
30% Find the right level
of flexibility
Estimate
performance
Find the parameter
values
(given the flexibility)
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SOME MODELS USED IN MACHINE LEARNING
K-Nearest Neighbours
Flexibility controlled by the number of
neighbours included, K.
Decision Trees
Flexibility controlled by the number of leaf
nodes (boxes), which again is controlled by a
number of options, such as performance on
the validation set, minimum number of
observations for splitting, etc.
Neural Networks
Flexibility typically controlled by the early
stopping, that is starting from small weights –
corresponding to a linear model – then letting
these grow and change but stopping when
the validation error is increasing.
Support Vector Machines
Flexibility controlled by the so-called kernel
width; a parameter which determines a
typical lenght of the data shape.
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SOME MODELS USED IN MACHINE LEARNING
Ensemble Learning
Flexibility first and foremost controlled by the
individual model handles, but the ensemble
approach itself (the bagging) is a regularizer,
so there may in fact be a need for overall
flexibility adjustment – this is in some case
handled by the number of submodels.
Bagging example:
Random Forests
Flexibility controlled by the number of trees
and the individual flexibility of the trees (the
number of leaf-nodes of the trees).
Boosting example:
Adaptive Boosting
Flexibility controlled by the number of
boosting steps (T).
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
HOW TO IN SASMACHINE LEARNING METHODS
IN SAS ENTERPRISE MINER
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
HOW TO IN SAS COURSE
Machine Learning with SAS
2 day course
Hands-on using SAS Enterprise Miner
Next:
• Copenhagen, April 25-26
• Stockholm, May 9-10
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
UNSTRUCTURED DATA AND DEEP LEARNING
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SOUND SOME SOUND – WHAT CAN YOU HEAR?
This is what sound looks
like for an algorithm
44,1 kHz sampling
44.100 numbers per sec
3 minutes equals
7,938,000 numbers
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
IMAGES THE MNIST DATA SET
MNIST data set
• Handwritten digits
• Famous ML benchmark data set
• 70.000 images
• 28x28 grayscale = 784 values per image
Table
• 70.000 rows
• 785 columns in total (784 input + 1 target)
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
IMAGES THIS IS IMAGES OF HANDWRITTEN DIGITS
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
IMAGES TRADITIONAL APPROACH TO IMAGES
Feature
extraction
1
8
2 2Imag
es
1
2
N
Features1 2 10
Image no 21355: 28x28=784 values
21355
10 key values
to represent
the image
content
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DEEP LEARNING WHAT IS DEEP LEARNING?
[Deep] learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification
Geoffrey Hinton (1947-*),
Godfather of Deep Learning
Born in England, Lives in Canada
University of Toronto
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DEEP LEARNING DEEP LEARNING OVER-SIMPLIFIED INTO ONE SLIDE
Input and output must match (as best possible). Then the middle
layer act as a compressed representaiton of the full image
= ”Alive”
2
1 Unsupervised
part for finding
the optimal
representation
Supervised
learning on the
optimal
representation
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DEEP LEARNING THE CAT PROBLEM
Extracting image features of a cat – but cats have many
formsBrutto list of 1.000.000.0000 images
Amazon Mechanical Turk:
* 48940 persons categorizing and sort
* 15.000.000 img in 22.000 categories
* 62.000 images of cats
Convoluted neural networks (Hinton et al.)
24 millions nodes
140 millions parametes
15.000 million connections
Source: Fei Fei Li, Director of Stanford AI & Vision Lab, TED Talk 2015
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed . sas.com
CONCLUSIONS
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
HOW TO IN SASMACHINE LEARNING IN SAS VIYA
… (AND MANY ADVANCED METHODS COMING UP IN 2017)
Source: http://video.sas.com/detail/videos/#category/videos/sas-viya-data-mining-and-machine-learning
More info: SAS User Forum
in the Nordics, May & June
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
HOW TO IN SAS COURSE
Machine Learning with SAS
2 day course
Hands-on using SAS Enterprise Miner
Next:
• Copenhagen, April 25-26
• Stockholm, May 9-10
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SAS COMMUNITY
NORDIC
HTTP://COMMUNITIES.SAS.COM/NORDIC
• Get the presentation from today
and continue your learning
• Join the Nordic SAS Online
Community and receive regular
activity updates
C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
NORDIC WEBINAR
SERIESSIGN UP AT WWW.SAS.COM/NORDIC-USERS
Date Title Area
January 5.1. News in SAS 9.4 M4 All
February 2.2. Efficient SAS programming Programming
7.2. SAS Studio version 3.6 Programming
28.2. Calculating values and creating parameters in SAS Visual Analytics Visual Analytics
March 17.3. SAS Environment Manager Administration, Data Management
21.3. Machine Learning with SAS Analytics
April 20.4. News from SAS Global Forum All
26.4. Graph Builder and Maps with SAS Visual Analytics Visual Analytics
May 10.5. New versions of SAS Visual Analytics Visual Analytics
Note: Date and topics are preliminary. Changes can occur.