CSTalks - On machine learning - 2 Mar

61
On Machine Learning at CSTalks by Vlad Hosu

description

 

Transcript of CSTalks - On machine learning - 2 Mar

Page 1: CSTalks - On machine learning - 2 Mar

On Machine Learning

at CSTalks

by Vlad Hosu

Page 2: CSTalks - On machine learning - 2 Mar

Introduction

Page 3: CSTalks - On machine learning - 2 Mar

Fundamental Questions

•What are the fundamental laws that govern all learning processes?

•How can we build computer systems that automatically improve with experience?

3

Page 4: CSTalks - On machine learning - 2 Mar

Learning: Method

•a process of adaption

•by which a parametric model is automatically adjusted

•so that some fitness criteria is more readily met

4

Page 5: CSTalks - On machine learning - 2 Mar

Before Learning

I’m learning, hence I need

adapt!

5

Page 6: CSTalks - On machine learning - 2 Mar

After

Result: Liony adjusts his

diet.

6

Page 7: CSTalks - On machine learning - 2 Mar

Biological Learning

•Model: nervous systemneuron connectivity, chemical changes etc

•Fitness: improved behaviorskills, memory, knowledge

7

Page 8: CSTalks - On machine learning - 2 Mar

Machine Learning

•a mathematical model

•with adjustable parameters

•optimizing some fitness function

8

Page 9: CSTalks - On machine learning - 2 Mar

Motivation

9

Page 10: CSTalks - On machine learning - 2 Mar

Why?

•some things are hard to code

•too much data

•automatic learning works better

•is easier to customize/personalize

10

Page 11: CSTalks - On machine learning - 2 Mar

Learning: Purpose

•estimation

•function - stock market

•class - recognition

•structure - grouping

Page 12: CSTalks - On machine learning - 2 Mar

Requirements

•good learning ability

•scalability to large problems

•simple and easy algorithm implementation

12

Page 13: CSTalks - On machine learning - 2 Mar

Things Ahead• Problems

• Clustering

• Classification

• Regression

• Learning issues

• importance of domain knowledge

• learning/generalization ability

• model complexity issues

• Optimization

13

Page 14: CSTalks - On machine learning - 2 Mar

Important Problems

14

Page 15: CSTalks - On machine learning - 2 Mar

Clustering

15

Page 16: CSTalks - On machine learning - 2 Mar

Classificationx1

x2

16

Page 17: CSTalks - On machine learning - 2 Mar

Classification

•Types

•discriminative

•generative

x1

x2

17

Page 18: CSTalks - On machine learning - 2 Mar

Classification

•Types

•discriminative

•generative

x1

x21 0

18

Page 19: CSTalks - On machine learning - 2 Mar

Classification

19

Page 20: CSTalks - On machine learning - 2 Mar

Regression

20

Page 21: CSTalks - On machine learning - 2 Mar

Making Connections

•discrete value regression =>generative classification

•regression on boundary space => discriminative classification

•clustering + labels => classification

21

Page 22: CSTalks - On machine learning - 2 Mar

Learning Issues

22

Page 23: CSTalks - On machine learning - 2 Mar

Domain Knowledge

•exploitation of problem structure

•human abstractions are better

•important for picking the right model

23

Page 24: CSTalks - On machine learning - 2 Mar

Grouping in Images

•groups together similar parts of an image

•select objects

•find patterns

•features = pixel values (function of)

24

Page 25: CSTalks - On machine learning - 2 Mar

Segmentation

25

Page 26: CSTalks - On machine learning - 2 Mar

Color Space

RGB

space

RGB

space

26

Page 27: CSTalks - On machine learning - 2 Mar

Color Space (cont)

27

Page 28: CSTalks - On machine learning - 2 Mar

Suitable Clustering

28

Page 29: CSTalks - On machine learning - 2 Mar

Generalization Ability

•training data generalizes to new data

•important for classification accuracy

29

Page 30: CSTalks - On machine learning - 2 Mar

Support Vector Machines (SVM)

•linear classifier on distorted space

30

Page 31: CSTalks - On machine learning - 2 Mar

Learning Ability

overfittin

g

31

Page 32: CSTalks - On machine learning - 2 Mar

Problems with Over-fitting

32

Page 33: CSTalks - On machine learning - 2 Mar

SVM vs Decision Trees

33

Page 34: CSTalks - On machine learning - 2 Mar

Complexity Issues

•models should be

•as simple as possible

•but representative of the training data

34

Page 35: CSTalks - On machine learning - 2 Mar

Neural Networks

•model: weights

•fitness: output error

•general function ∑

35

Page 36: CSTalks - On machine learning - 2 Mar

Training a Network

36

Page 37: CSTalks - On machine learning - 2 Mar

Non-trivial Functions

37

Page 38: CSTalks - On machine learning - 2 Mar

Optimization

38

Page 39: CSTalks - On machine learning - 2 Mar

Optimizing Fitness

•find extrema

•strategies

•gradient descent

•convex optimization

39

Page 40: CSTalks - On machine learning - 2 Mar

Optimization

•finding extrema

•local/global

40

Page 41: CSTalks - On machine learning - 2 Mar

Gradient Descent

41

Page 42: CSTalks - On machine learning - 2 Mar

Problem: Local Extrema

42

Page 43: CSTalks - On machine learning - 2 Mar

Problem: Speed

43

Page 44: CSTalks - On machine learning - 2 Mar

Linear Programming

x1

x2

lines define a convex function

planes in 3D etc

44

Page 45: CSTalks - On machine learning - 2 Mar

Considerations

•scaling to large features spaces

•feature selection

•dimensionality reduction

45

Page 46: CSTalks - On machine learning - 2 Mar

Open Problems

46

Page 47: CSTalks - On machine learning - 2 Mar

Open Problems

•unlabeled data for regression

•exploiting sparsity in high dimensional spaces for non-parametric learning

•transferring learnt information from one task to simplify learning another

47

Page 48: CSTalks - On machine learning - 2 Mar

Open Problems (cont)

•algorithms for learning control strategies from delayed rewards and other inputs

•best “active learning” strategies for different learning problems

•degree one can preserve data privacy while obtaining the benefits of data mining

48

Page 49: CSTalks - On machine learning - 2 Mar

The endQuestions?

Page 50: CSTalks - On machine learning - 2 Mar

Types of Regression

•parametric

•non-parametric

50

Page 51: CSTalks - On machine learning - 2 Mar

Linear vs Non-linear

• linear

• smooth

• under-fitting

• good enough for some processes (biz)

• non-linear

• complex

• over-fitting

• works on most data-sets

51

Page 52: CSTalks - On machine learning - 2 Mar

Naive Bayes

good

spam

write people

free

π

π

No.Good

No.Spam

*

*

52

Page 53: CSTalks - On machine learning - 2 Mar

Graph Clustering

53

Page 54: CSTalks - On machine learning - 2 Mar

Mean Shift

54

Page 55: CSTalks - On machine learning - 2 Mar

Problems in CV•What are the physical and geometric

processes that govern (digital) imaging?

•What are the “informative” areas of an image and how do we detect them?

•What portions of an image pertain to one another and to relevant physical phenomena?

•From one (or more) images, how can we determine the geometry of the scene?

55

Page 56: CSTalks - On machine learning - 2 Mar

Linear Regression

•model: straight line

•2 adjustable parameters

•fitness function: root mean squared error

56

Page 57: CSTalks - On machine learning - 2 Mar

Solution Stability

y-shift

slop

e

57

Page 58: CSTalks - On machine learning - 2 Mar

Some Issues with Model Selection

normal

outliers

wrongmodel

58

Page 59: CSTalks - On machine learning - 2 Mar

Real Photo in Color Space

EM KMeans59

Page 60: CSTalks - On machine learning - 2 Mar

Conjugate Gradient

60

Page 61: CSTalks - On machine learning - 2 Mar

Newton’s Method

61