580.691 Learning Theory Reza Shadmehr Introduction.

580.691 Learning Theory

Reza Shadmehr

Introduction

Administrative Stuff

• Course staff

• Lecturer: Reza Shadmehr ([email protected])

• Teaching assistant: Andy Long ([email protected])

• Course meeting times:

• Mondays and Wednesdays 3:00-4:15PM.

• Grading:

• Homework: (40%) Homework for each lecture is listed on the course web page: shadmehrlab.org/Courses/learningtheory.html. The homework is due on Mondays, start of class.

• Midterm 1: (30%) (March 11)

• Midterm 2: (30%) (2pm, May 14)

• Textbook: Biological Learning and Control, MIT Press, 2012

Teaching Assistant Info and Policies

• Name: Andy Long

• E-mail: [email protected]

• Will be checked periodically during the day, but no later than 8 pm

• Homework questions must be received by 8 pm the day before due for a response

• Office Hours:

• Location and time to be announced.

NOTE: Please schedule at least 1 day in advance by e-mail

• Homework:

• Hard copies / code due by 3:00 pm in class / by e-mail on Monday of each week.

• Grading scale: 0-10

Must show work to receive credit. Code submitted with programming assignments must be commented.

Homework policy

Much of what you will learn in this course will be through doing your homework. I expect you to do your homework without copying from other students.

You are encouraged to form groups to study the material and to work on homework problems. However, any work that you hand in must be your own. This means that even if you figured out how to do the problem with the help of others, you must then write up your homework on your own. If you worked with others, acknowledge at the top of your paper all the people with whom you worked.

Homework solutions: Solutions to the homework are generally posted the morning after the day the homework is due.

Turning in the homework: Make a copy of your homework before you turn it in. If the homework involves programming, you will email the TA your code before class time as well as turn in printed results in class.

Late homework: Receive a zero. However, I will drop the two lowest homework grades.

Academic Integrity for Programming Assignments

Students naturally want to work together, and it is clear they learn a great deal by doing so. Getting help is often the best way to interpret error messages and find bugs, even for experienced programmers. In response to this, the following rules will be in force for programming assignments:

• Students are allowed to work together in designing algorithms, in interpreting error messages, and in discussing strategies for finding bugs, but NOT in writing code.

• Students may not share code, may not copy code, and may not discuss code in detail (line-by-line or loop-by-loop) while it is being written or afterwards.

• Similarly, students may not receive detailed help on their code from individuals outside the course. This restriction includes tutors.

• Students may not show their code to other students as a means of helping them. Sometimes good students who feel sorry for struggling students are tempted to provide them with “just a peek” at their code. Such “peeks” often turn into extensive copying, despite prior claims of good intentions.

We will check assignments carefully, and make my judgment about which students violated the rules of academic integrity on programming assignments. When I believe an incident of academic dishonesty has occurred, I contact the students involved. Students caught cheating on programming assignments will be punished. The standard punishment for the first offense is a 0 on the assignment and a 5 percentage point penalty on the semester average. Students whose violations are more flagrant will receive a higher penalty. For example, a student who outright steals another student’s code will receive an F in the course immediately. Students caught a second time will receive an immediate F, regardless of circumstances. Each incident will be reported to the Dean of Students’ office.

Grading policy for undergraduate students

The material in this class is of an advanced nature appropriate for graduate students and perhaps senior undergraduates. Therefore, undergraduates might feel intimidated competing for grades with graduate students. I view all undergraduates who complete this course as gifted and talented. Accordingly, I will add the following percentage points to their final score for the course.

Undergraduates in their 4th year: +5% points.Undergraduates in their 3th year: +10% points.Undergraduates in their 2nd year: +15% points.

Exams

There are two exams in this course, each covering half the material in the course. You are allowed to bring notes to the exam. The notes are limited to a single sheet of paper (both sides may be used).

If you miss the exam, you must have a documented health or family emergency. The makeup exam, if granted, will be an oral exam with the professor.

Supervised Learning

Inputs Outputs

Training Info = desired (target) outputs

Supervised Learning System

input pattern

desiredoutput

y+

actual output

+

–

x2

xn

x1

wn

w1

w2 +Sign[ ]

Training sample

“yes”

“yes”

“yes”

Representation:

x = [01110000000100…]T

Test sample

Representation: examples are binary vectors of length d = 49. You need to classify them as whether they represent the number “3” or not.

x = [0110 . . . 1000]T =

and labels

The mapping from examples to labels is a “linear classifier”

{ 1,1}y

1 1ˆ sign( ) sign( )d dy w x w x w x

Where w is a vector of parameters we have to learn from example.

Linear classifier

• The linear classifier is a way of combining expert opinion. In this case, each opinion is made by a binary “expert”.

combined “votes”

1x 2x dx

0 1 1

1w 2w dw

1 1ˆ sign( ) sign( )d dy w x w x w x

Expert 1

majority rule

“vote”

Supervised learning of a classifier

• We are given a set of {input, output} pairs:

• Our task is to learn a mapping or function such that:

{ , }yx

( ) ( )

:

1...i i

f X Y X y Y

y f i n

x

x

n is the number of training samplesi is the trial number

Estimation

• To learn the task, we need to adjust the parameters whenever we make a mistake.

00000110011100100000001000000+1

00000111000000110000011100000+1

10000110000001100000110001111-1

x y

ˆ

ˆ0 if ˆ

ˆ2 if

y y

y yy y

y y y

y

w w x

w w x

• Whenever the guess was wrong, move the voting parameter toward the correct answer.

• If the guess was right, leave the “voting” alone.

w x x

.w xw x

Evaluation

Evaluate the performance your algorithm by measuring average classification error (over the entire data set) as a function of the number of examples seen so far:

Model selection

A simple linear classifier cannot solve all problems. Example: the XOR problem.

(1,0)

(0,1)

(0,0)

(1,1)

1x

2x

+1

+1

-1

-1

1x

2x

( ) ( )

( ) ( )( )1 2

:

( ) 1...

( , )

i i

i ii

f X Y X y Y

y f i n

x x

x

x

x

X

ix

Y

iy

Input space Output space

Maybe we can add “polynomial experts”:

12 1 21 1ˆ sign( ) d dy w x w w x xx

• Loss function as mean squared error; batch learning and the normal equation; sources of error (approximation, structural, noise).

• Newton-Raphson, LMS and steepest descent with Newton-Raphson, weighted least-squares, regression with basis functions, estimating the loss function in human learning.

• sensitivity to error, generalization function, examples from psychophysics, estimation of generalization function from sequence of errors

• Maximum likelihood estimation; likelihood of data given a distribution; ML estimate of model weights and model noise.

Regression and supervised learning

State estimation and sensorimotor integration

•Optimal state estimation, state uncertainty, state noise and measurement noise, adjusting learning rates to minimize model uncertainty. Derivation of the Kalman filter algorithm.

•Estimation with multiple sensors, estimation with signal-dependent noise.

Bayesian integration

•A generative model of sensorimotor adaptation experiments; accounting for sensory illusions during adaptation; effect of statistics of prior actions on patterns of learning

•Modulating sensitivity to error through manipulation of state and measurement noises; modulating forgetting rates

•Multiple timescales of memory and the problem of savings

State Estimation

System Identification

•Introduction to subspace analysis; projection of row vectors of matrices, singular value decomposition, system identification of deterministic systems using subspace methods.

•Identification of the learner, Expectation maximization as an algorithm for system identification.

Structural Learning

• Motor costs and rewards. Movement vigor and encoding of reward. Muscle tuning functions as a signature of motor costs.

• Open loop optimal control with cost of time. Temporal discounting of reward. Optimizing movement duration with motor and accuracy costs. Control of saccades as an example of a movement in which cost of time appears to be hyperbolic.

• Introduction to optimal feedback control. Bellman’s equation

• Optimal feedback control of linear dynamical systems with additive noise

• Optimal feedback control of linear dynamical systems with signal-dependent noise

Optimal Control

Bayesian Classification

•Fisher linear discriminant, classification using posterior probabilities with explicit models of densities, equal-variance Gaussian densities (linear discriminant analysis), unequal-variance Gaussian densities (quadratic discriminant analysis), Kernel estimates of density.

•Equal-variance Gaussian densities (linear discriminant analysis), unequal-variance Gaussian densities (quadratic discriminant analysis), Kernel estimates of density

•Logistic regression as a method to model posterior probability of class membership as a function of state variables; batch algorithm: Iterative Re-weighted Least Squares; on-line algorithm

Classification via Bayesian Estimation

•Unsupervised classification. Mixture models, K-means algorithm, and Expectation-Maximization (EM).

•EM and conditional mixtures. EM as maximizing the expected complete log-likelihood; method of Lagrange multipliers; selecting number of mixture components; mixture of experts.

Expectation Maximization

580.691 Learning Theory Reza Shadmehr Introduction.

Documents

Transcript of 580.691 Learning Theory Reza Shadmehr Introduction.