Cogs 118A: Natural Computation I Angela Yu ajyu@ucsd TA: He (Crane) Huang January 5, 2009
description
Transcript of Cogs 118A: Natural Computation I Angela Yu ajyu@ucsd TA: He (Crane) Huang January 5, 2009
Cogs 118A: Natural Computation I
Angela Yu
TA: He (Crane) Huang
January 5, 2009
www.cogsci.ucsd.edu/~ajyu/Teaching/Cogs118A_wi10/cogs118A.html
Course Website
• Policy on academic integrity
• Scribe schedule
118A Curriculum OverviewRegression (3)
Graphical models (8)
Approximate inference & sampling (10, 11)
Applications to cognitive science
midterm
finalprojectdue
assignments
No final!
Sequential data (13)
Information theory, reinforcement learning
118B Curriculum (Tentative) Overview
Classification (4)
Neural networks (5)
Kernel methods & SVM (6, 7)
Final project
Mixture models & EM (9)
Continuous latent variables (12)
Machine Learning: 3 Types
Supervised learning
The agent also observes a sequence of labels or outputs y1, y2, …,
and the goal is to learn the mapping f: x y.
Unsupervised learning
The agent’s goal is to learn a statistical model for the inputs, p(x),to be used for prediction, compact representation, decisions, …
Reinforcement learning
The agent can perform actions a1, a2, a3, …, which affect the
state of the world, and receives rewards r1, r2, r3, … Its goal is
to learn a policy : {x1, x2, …, xt} at that maximizes <reward>.
Imagine an agent observes a sequence of inputs: x1, x2, x3, …
Supervised LearningClassification: the outputs y1, y2, … are discrete labels
Dog Cat
Regression: the outputs y1, y2, … are continuously valued
x
y
Applications: Cognitive ScienceObject Categorization Speech Recognition
WET VET
Face Recognition Motor Learning
• Noisy sensory inputs
• Incomplete information
• Excess inputs (irrelevant)
• Prior knowledge/bias
• Changing environment
• No (one) right answer, inductive inference (open-ended)
• Stochasticity (uncertainty), not deterministic
• …
Challenges
Face Recognition
An Example: Curve-Fitting
x
y
Polynomial Curve-Fitting
€
y(x;w) = w0 + w1x +w2x 2 +K + wn x n = w j xj
j= 0
n
∑
Linear model
Error function (root-mean-square)
€
E(w) = y(x j ,w) − y j( )2
j= 0
n
∑ N
Linear Fit?
x
y
Quadratic Fit?
x
y
5th-Order Polynomial Fit?
x
y
10th-Order Polynomial Fit?
x
y
And the Answer is…
x
y
Quadratic
Training Error vs. Test (Generalization) Error
x
y
y
x
1st-order
5th-order 10th-order
2nd-order
• Model complexity important
• Minimizing training error overtraining
• A better error metric: test (generalization) error
• But test error costs extra data (precious)
• Fix 1: regularization (3.1)
• Fix 2: Bayesian model comparison (3.3)
What Did We Learn?
Math Review