Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

32
Introduction to Machine Learning Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    2

Transcript of Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

Page 1: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

1

Introduction to Machine LearningAmit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011

Page 2: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

2

A high-level view of Machine Learning

Objectives:

Understand what is machine learning

Motivate why it has become so important

Identify Types of learning and salient

frameworks, algorithms and their utility

Take a sneak peak at the next set of

problems

Page 3: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

3

Organization

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers

Page 4: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

4

Learning is improving task performance based on experience

Example: Learning to ride a bicycle

T: Task of learning to ride a bicycle

P: Performance of balancing while moving

E: Experience of riding in many situations

Is it wise to memorize all situations and appropriate responses by observing an expert?

Page 5: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

5

More examples

Improve on task, T, with respect to performance metric, P, based on experience, E.

T: Playing checkersP: Percentage of games won against an arbitrary opponent E: Playing practice games against itself

T: Recognizing hand-written wordsP: Percentage of words correctly classifiedE: Database of human-labeled images of handwritten words

T: Driving on four-lane highways using vision sensorsP: Average distance traveled before a human-judged errorE: A sequence of images and steering commands recorded while observing a human driver.

T: Categorize email messages as spam or legitimate.P: Percentage of email messages correctly classified.E: Database of emails, some with human-given labels

Source: Introduction to Machine Learning by Raymond J. Mooney

Page 6: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

6

Mathematically speaking…

Determine f such that yn=f(xn) and g(y, x) is minimized for unseen values of y and x pairs.

Form of f is fixed, but some parameters can be tuned: So, y=fθ(x), where, x is observed, and y needs to be

inferred e.g. y=1, if mx > c, 0 otherwise, so θ = (m,c)

Machine Learning is concerned with designing algorithms that learn “better” values of θ given “more” x (and y) for a given problem

Page 7: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

7

Some pertinent questions to ask What is the scope of the task? How will performance be measured? How should learning be approached?

Scalability: How can we learn fast? How much resources are needed to learn?

Generalization: How will it perform in unseen situations?

Online learning: Can it learn and improve while performing the

task?

Page 8: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

8

Related Disciplines

Artificial Intelligence Data Mining Probability and Statistics Information theory Numerical optimization Adaptive Control Theory Neurobiology Psychology (cognitive, perceptual,

dev.) Linguistics

Page 9: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

9

Organization

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers

Page 10: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

10

Solve problems efficiently and better

Develop systems that are too difficult/expensive to construct manually because they require specific detailed skills or knowledge tuned to a specific task (knowledge engineering bottleneck).

Develop systems that can automatically adapt and customize themselves to individual users. Personalized news or mail filter Personalized tutoring

Discover new knowledge from large databases (data mining). Market basket analysis (e.g. diapers and beer) Medical text mining (e.g. migraines to calcium channel

blockers to magnesium)

Source: Introduction to Machine Learning by Raymond J. Mooney

Page 11: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

11

Understand how we learn and our limitations

Computational studies of learning may help us understand learning in humans and other biological organisms. Hebbian neural learning▪ “Neurons that fire together, wire together.”

Power law of practice

log(# training trials)

log(p

erf

. ti

me)

Source: Introduction to Machine Learning by Raymond J. Mooney

Page 12: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

12

The Time is Ripe

Many basic effective and efficient algorithms available

Large amounts of data available

Large amounts of computational resources available

Source: Introduction to Machine Learning by Raymond J. Mooney

Page 13: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

13

Some recent success stories

Automatic vehicle navigation•Road recognition•Automatic navigation

Speech recognition•Speech to text•Automated services over the phone

Face detection•Facebook face tagging suggestions•Camera autofocus for portraits

Page 14: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

14

Organization

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers

Page 15: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

15

Framing the problem

Remember, y=fθ(x)?

y can be continuous or categorical

y may be known for some x or none at all

f can be simple (e.g. linear) or complex

f can incorporate some knowledge of how x was generated or be blind to the generation

etc…

Page 16: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

16

Based on availability of desired output

Supervised learning: For, y=fθ(x), a set of xi, yi (usually classes) are

known Now predict yj for new xj

Examples: Two classes of protein with given amino acid

sequences Labeled male and female face images

Page 17: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

17

Neural Networks - MLP

In a nutshell: Input is non-linearly transformed

by hidden layers usually a “fuzzy” linearly classified combination

Output is a linear combination of the hidden layer

Use when: Want to model a non-linear

function Labeled data is available Don’t want to write new s/w

Variations: Competitive learning for

classification Many more…

Page 18: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

18

Support Vector Machines

In a nutshell: Learns optimal boundary

between two classes (red line)

Use when: Labeled class data is

available Want to minimize chance of

error in the test case Variations:

Non-linear mapping of the input vectors using “Kernels”

Page 19: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

19

Based on availability of desired output

Unsupervised learning: For, y=fθ(x), only a set

of xi are known Predict y, such that y is

simpler than x but retains its essence

Examples: Clustering (when y is a

class label) Dimensionality

reduction (when y is continuous)

Page 20: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

20

Clustering

In a nutshell: Grouping a similar objects based

on a definition of similarity That is, intra vs. inter cluster

similarity, e.g. distance from center of the cluster

Use when: Class labels are not available,

but you have a desired number of clusters in mind

Variations: Different similarity measures Automatic detection of number

of clusters Online clustering

Page 21: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

21

Principal Component Analysis

In a nutshell: High dimensional data,

where not all dimensions are independent, e.g. (x1, x2, x3), where x3=ax1+bx2+c

Use when: You want to perform linear

dimensionality reduction Variations:

ICA Online PCA

Page 22: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

22

Manifold Learning

In a nutshell: Learning a lower-

dimensional manifold (e.g. surface) close to which the data lies

Use when: You want to perform non-

linear dimensionality reduction

Variations: SOM

Page 23: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

23

Based on use of knowledge about the process Generative models:

For, y=fθ(x), we have some idea of how x was generated given x and θ

Examples: HMMs: Given phonemes and {age, gender}, we know

how the speech can be generated Bayesian Networks: Given {gender, age, race} we

have some idea of what a face will look like for different emotions

Page 24: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

24

Based on use of knowledge about the process

Discriminative Models: Do not care about

how the data was generated

Finding the right features is of prime importance

Followed by finding the right classifier

Examples: SVM MLP

Source: “Automatic Recognition of Facial Actions in Spontaneous Expressions” by Bartlett et al in Journal of Multimedia, Sep 2006

Page 25: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

25

Organization

What is learning?

Why learn?

Types of learning and salient frameworks

Frontiers

Page 26: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

26

History of Machine Learning (1/2)

1980s: Advanced decision tree and rule learning Explanation-based Learning (EBL) Learning and planning and problem solving Utility problem Analogy Cognitive architectures Resurgence of neural networks (connectionism,

backpropagation) Valiant’s PAC Learning Theory Focus on experimental methodology

1990s Data mining Adaptive software agents and web applications Text learning Reinforcement learning (RL) Inductive Logic Programming (ILP) Ensembles: Bagging, Boosting, and Stacking Bayes Net learning

Source: Introduction to Machine Learning by Raymond J. Mooney

Page 27: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

27

History of Machine Learning (2/2)

2000s Support vector machines Kernel methods Graphical models Statistical relational learning Transfer learning Sequence labeling Collective classification and structured outputs Computer Systems Applications▪ Compilers▪ Debugging▪ Graphics▪ Security (intrusion, virus, and worm detection)

E mail management Personalized assistants that learn Learning in robotics and vision

Source: Introduction to Machine Learning by Raymond J. Mooney

Page 28: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

28

Application Frontiers (1/2)

Bioinformatics•Gene expression prediction (just scratched the surface)•Automated drug discovery

Speech recognition•Context recog., e.g. for digital personal assistants (SiRi?)•Better than Google translate; imagine visiting Brazil

Image and video processing•Automatic event detection in video•“Seeing” software for the blind

Page 29: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

29

Application Frontiers (2/2)

Robotics•Where is my iRobot?•Would you raise a “robot” child and make it learn?

Advanced scientific calculations•Weather modeling through prediction•Vector field or FEM calculation through prediction

Who knows…•Always in search of new problems

Page 30: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

30

Theoretical Frontiers

Learning the structure of classifiers Automatic feature discovery and

active learning Discovering the limits of learning

Information theoretic bounds?

Learning that never ends Explaining human learning Computer languages with ML

primitivesAdapted from: “The Discipline of Machine Learning” by Tom Mitchell, 2006

Page 31: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

31

Questions?

Thank you!

Page 32: Amit Sethi, EEE, IIT G @ Cepstrum, Oct 16, 2011 1.

32

Appendix: Some definitions Inference: Using a system to get the output variable for a

given input variable Learning: Changing parameters according to an algorithm

to improve performance Training: Using machine learning algorithm to learn

function parameters based on input and (optionally) output dataset known as “training set”

Validation and Testing: Using inference (without training) to test the performance of the learned system on data

Offline learning: When all training happens prior to testing, and no learning takes place during testing

Online learning: When learning and testing happen for the same data