Machine Learning with SAS · PDF fileSOME MODELS USED IN MACHINE LEARNING ... Amazon...

27
Copyright © 2014, SAS Institute Inc. All rights reserved. MACHINE LEARNING WITH SAS SAS NORDIC FANS WEBINAR 21. MARCH 2017 This webinar will be recorded. Please engage, use the Questions function during the presentation! Kaare Brandt Petersen Education & Academic Gert Nissen Technical Client Manager Georg Morsing Senior Manager

Transcript of Machine Learning with SAS · PDF fileSOME MODELS USED IN MACHINE LEARNING ... Amazon...

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MACHINE LEARNING WITH SAS

SAS NORDIC FANS WEBINAR – 21. MARCH 2017

This webinar will be recorded.

Please engage, use the Questions function during the presentation!

Kaare Brandt Petersen

Education & Academic

Gert Nissen

Technical Client Manager

Georg Morsing

Senior Manager

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

INTRODUCTION GETTING STARTED

Agenda

• Introduction – What is Machine Learning?

• Advanced Models used in Machine Learning

• Unstructured data

Who-am-I

• Nordic Director, Education & Academic

• Ph.d. Mathematical Modelling

What-about-you?

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

INTRODUCTION WHY IS MACHINE LEARNING HOT?

The Game Go – machine beats

the human world champion

Team Alpha Go developed an algorithm beating

the world champion Lee Sedol in spring 2016.

1

Speaking Chinese when

you speak English

Former Kaggle president Jeremy

Howard presented this example in his

TED Talk: Speach-to-text + translation

+ text to speach modulated.

2

Looking at pictures and

understand what you see

ImageNET example from Stanford 2014

– text formed by algorithm.

3

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

INTRODUCTION WHAT IS MACHINE LEARNING?

[Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed

Arthur Samuel (1901-1990), USA

Pioneer in computer games

First self-learning program playing checkers, 1959

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

INTRODUCTION THEORY VS DATA

Theory of what happened Function derived from theoryTheory based model

fitted to data

Data of what happened Function which can adapt to just

about every data pattern

Data driven modelling

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

”In machine learning, data speaks louder than theory

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ADVANCED MODELS

A WAY TO DEAL WITH A COMPLEX REALITY

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

APPROACHES BE FLEXIBLE (ADAPTABLE TO MULTIPLE REALITIES)

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ADVANCED MODELSOVERFITTING AND BALANCE BETWEEN

FLEXIBILITY AND DATA POINTS

Underlying process

Fitted function

Data point

Model complexity

(flexibility)

Data Amount

Simple

Complex

Small Large

Poor fit Poor fit Poor fit

Good fit Good fit

Good fitOverfitting Overfitting

Overfitting

Too simple models

Potentially good models

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ADVANCED MODELSDATA PARTITIONING IS A WAY TO FIND THE BALANCE

BETWEEN FLEXIBILITY AND DATA POINTS

Data set

Training data

Validation data

Test data

40%

30%

30% Find the right level

of flexibility

Estimate

performance

Find the parameter

values

(given the flexibility)

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SOME MODELS USED IN MACHINE LEARNING

K-Nearest Neighbours

Flexibility controlled by the number of

neighbours included, K.

Decision Trees

Flexibility controlled by the number of leaf

nodes (boxes), which again is controlled by a

number of options, such as performance on

the validation set, minimum number of

observations for splitting, etc.

Neural Networks

Flexibility typically controlled by the early

stopping, that is starting from small weights –

corresponding to a linear model – then letting

these grow and change but stopping when

the validation error is increasing.

Support Vector Machines

Flexibility controlled by the so-called kernel

width; a parameter which determines a

typical lenght of the data shape.

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SOME MODELS USED IN MACHINE LEARNING

Ensemble Learning

Flexibility first and foremost controlled by the

individual model handles, but the ensemble

approach itself (the bagging) is a regularizer,

so there may in fact be a need for overall

flexibility adjustment – this is in some case

handled by the number of submodels.

Bagging example:

Random Forests

Flexibility controlled by the number of trees

and the individual flexibility of the trees (the

number of leaf-nodes of the trees).

Boosting example:

Adaptive Boosting

Flexibility controlled by the number of

boosting steps (T).

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

HOW TO IN SASMACHINE LEARNING METHODS

IN SAS ENTERPRISE MINER

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

HOW TO IN SAS COURSE

Machine Learning with SAS

2 day course

Hands-on using SAS Enterprise Miner

Next:

• Copenhagen, April 25-26

• Stockholm, May 9-10

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

UNSTRUCTURED DATA AND DEEP LEARNING

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SOUND SOME SOUND – WHAT CAN YOU HEAR?

This is what sound looks

like for an algorithm

44,1 kHz sampling

44.100 numbers per sec

3 minutes equals

7,938,000 numbers

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

IMAGES THE MNIST DATA SET

MNIST data set

• Handwritten digits

• Famous ML benchmark data set

• 70.000 images

• 28x28 grayscale = 784 values per image

Table

• 70.000 rows

• 785 columns in total (784 input + 1 target)

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

IMAGES THIS IS IMAGES OF HANDWRITTEN DIGITS

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

IMAGES TRADITIONAL APPROACH TO IMAGES

Feature

extraction

1

8

2 2Imag

es

1

2

N

Features1 2 10

Image no 21355: 28x28=784 values

21355

10 key values

to represent

the image

content

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DEEP LEARNING WHAT IS DEEP LEARNING?

[Deep] learning is a set of methods that allows a machine to be fed with raw data and to automatically discover the representations needed for detection or classification

Geoffrey Hinton (1947-*),

Godfather of Deep Learning

Born in England, Lives in Canada

University of Toronto

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DEEP LEARNING DEEP LEARNING OVER-SIMPLIFIED INTO ONE SLIDE

Input and output must match (as best possible). Then the middle

layer act as a compressed representaiton of the full image

= ”Alive”

2

1 Unsupervised

part for finding

the optimal

representation

Supervised

learning on the

optimal

representation

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DEEP LEARNING THE CAT PROBLEM

Extracting image features of a cat – but cats have many

formsBrutto list of 1.000.000.0000 images

Amazon Mechanical Turk:

* 48940 persons categorizing and sort

* 15.000.000 img in 22.000 categories

* 62.000 images of cats

Convoluted neural networks (Hinton et al.)

24 millions nodes

140 millions parametes

15.000 million connections

Source: Fei Fei Li, Director of Stanford AI & Vision Lab, TED Talk 2015

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed . sas.com

CONCLUSIONS

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

HOW TO IN SASMACHINE LEARNING IN SAS VIYA

… (AND MANY ADVANCED METHODS COMING UP IN 2017)

Source: http://video.sas.com/detail/videos/#category/videos/sas-viya-data-mining-and-machine-learning

More info: SAS User Forum

in the Nordics, May & June

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

HOW TO IN SAS COURSE

Machine Learning with SAS

2 day course

Hands-on using SAS Enterprise Miner

Next:

• Copenhagen, April 25-26

• Stockholm, May 9-10

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SAS COMMUNITY

NORDIC

HTTP://COMMUNITIES.SAS.COM/NORDIC

• Get the presentation from today

and continue your learning

• Join the Nordic SAS Online

Community and receive regular

activity updates

C op yr i g h t © 2014 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

NORDIC WEBINAR

SERIESSIGN UP AT WWW.SAS.COM/NORDIC-USERS

Date Title Area

January 5.1. News in SAS 9.4 M4 All

February 2.2. Efficient SAS programming Programming

7.2. SAS Studio version 3.6 Programming

28.2. Calculating values and creating parameters in SAS Visual Analytics Visual Analytics

March 17.3. SAS Environment Manager Administration, Data Management

21.3. Machine Learning with SAS Analytics

April 20.4. News from SAS Global Forum All

26.4. Graph Builder and Maps with SAS Visual Analytics Visual Analytics

May 10.5. New versions of SAS Visual Analytics Visual Analytics

Note: Date and topics are preliminary. Changes can occur.