MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, [email protected] MW 2:30 – 4:00 Room:...

42
MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, [email protected] MW 2:30 – 4:00 Room: 34-301 Course web page: http://www.ai.mit.edu/courses/6.899/

Transcript of MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, [email protected] MW 2:30 – 4:00 Room:...

Page 1: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

MIT 6.899 Learning and Inference in Vision

• Prof. Bill Freeman, [email protected]• MW 2:30 – 4:00• Room: 34-301• Course web page:

http://www.ai.mit.edu/courses/6.899/

Page 2: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Reading class

• We’ll cover about 1 paper each class.

• Seminal or topical research papers in the intersection of machine learning and vision.

• One student will present each paper. Then we’ll discuss the paper as a class.

• One student will write a computer example illustrating the paper’s main idea.

Page 3: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Learning and Inference

• “Learning”: learn the parameter values or structure of a probabilistic model.– Look at many examples of people walking, and

build up probabilistic model relating video images to 3-d motions.

• “Inference”: infer hidden variables, given a observations.– Eg, given a particular video of someone

walking, infer their motions in 3-d.

Page 4: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Statistical dependencies between variables

Learning and Inference

y1 y2Observed variables

x1 x2Unobserved variables

Page 5: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Statistical dependencies between variables

Learning and Inference

Observed variables

Unobserved variables

“Learning”: learn this model, and the formof the statistical dependencies.

Page 6: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Statistical dependencies between variables

Learning and Inference

y1 y2Observed variables

x1 x2Unobserved variables

“Learning”: learn this model, and the formof the statistical dependencies.

“Inference”: given this model, and the observations, y1 & y2, infer x1 & x2, or their conditional distribution.

Page 7: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Cartoon history of speech recognition research

• 1960’s, 1970’s, 1980’s: lots of different approaches; “hey, let’s try this”.

• 1980’s Hidden Markov Models (HMM), statistical approach took off.

• 1990’s and beyond: HMM’s now the dominant approach. “The person with the best training set wins”.

Page 8: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Same story for document understanding

• The person with the best training set wins.

Page 9: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Computer vision is ready to make that transition

• Machine learning approaches are becoming dominant.

• We get to make and watch the transition to principled, statistical approach happen.

• It’s not trivial: issues of representation, robustness, generalization, speed, …

Page 10: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Categories of the papers

1. Learning image representations

2. Learning manifolds

3. Linear and bilinear models

4. Learning low-level vision

5. Graphical models, belief propagation

6. Particle filters and tracking

7. Face and object recognition

8. Learning models of object appearance

Page 11: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

1 Learning image representations

Example training image

From http://www.amsci.org/amsci/articles/00articles/olshausencap1.html

Page 12: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

1 Learning image representations

From: http://www.cns.nyu.edu/pub/eero/simoncelli01-reprint.pdf

Page 13: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

2 Learning manifolds

From: http://www.sciencemag.org/cgi/content/full/290/5500/2319

Joshua B. Tenenbaum, Vin de Silva, John C. Langford

Page 14: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

2 Learning manifolds

From: http://www.sciencemag.org/cgi/content/full/290/5500/2319

Page 15: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

2 Learning manifolds

From: http://www.sciencemag.org/cgi/content/full/290/5500/2319

Page 16: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

3 Linear and bilinear models

From: http://www-psych.stanford.edu/~jbt/NC120601.pdf

Page 17: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

4 Learning low-level vision

From Y. Weiss, http://www.cs.berkeley.edu/~yweiss/iccv01.ps.gz

Images, under different lighting

reflectance illumination

Page 18: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

5 Graphical models, belief propagation

From: http://www.cs.berkeley.edu/~yweiss/nips96.pdf

Page 19: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

6 Particle filters and tracking

From: http://www.robots.ox.ac.uk/~ab/abstracts/eccv96.isard.html

Page 20: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

7 Face and object recognition

From Viola and Jones, http://www.ai.mit.edu/people/viola/research/publications/ICCV01-Viola-Jones.ps.gz

Page 21: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

7 Face and object recognition

From Viola and Jones, http://www.ai.mit.edu/people/viola/research/publications/ICCV01-Viola-Jones.ps.gz

Page 22: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

7 Face and object recognition

From: Pinar Duygulu, Kobus Barnard, Nando deFreitas, and David Forsyth,

Page 23: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

8 Learning models of object appearance

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz

Images containing the object

Images not containing the object

Page 24: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

8 Learning models of object appearance

Test images

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz

Contains the object?

Contains the object?

Page 25: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

8 Learning models of object appearance

Weber, Welling, and Perona, http://www.gatsby.ucl.ac.uk/~welling/papers/ECCV00_fin.ps.gz

Page 26: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Guest lecturers/discussants

• Andrew Blake (Condensation, Oxford/Microsoft)

• Baback Moghaddam (Bayesian face recognition, MERL)

• Paul Viola (Fast face recognition, MERL)

Page 27: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Class requirements

1. Read each paper. Think about them. Discuss in class.

2. Present one paper to the class.

3. Present one computer example to the class.

4. Final project: write a conference paper related to vision and learning.

Page 28: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

1. Read the papers, discuss them

• Write down 3 insights about the paper that you might want to share with the class in discussion.

• Turn them in on a sheet of paper.

Page 29: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

2. Presentations about a paper

• About 15 minutes long. Set the stage for discussions.

• Review the paper. Summarize its contributions. Give relevant background. Discuss how it relates to other papers we’ve read.

• Meet with me two days before to go over your presentation about the paper.

Page 30: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

3. Programming example• Present a computer implementation of a toy

example that illustrates the main idea of the paper.

• Show trade-offs in parameter settings, or in training sets.

• Goal: help us build up intuition about these techniques.

• Ok to use on-line code. Then focus on creating informative toy training sets.

Page 31: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Toy problems

• Simple summaries of the main idea.

• Identify an informative idea from the paper

• Make a simple example using it.

• Play with it.

Page 32: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Toy problem

by Ted Adelson

Page 33: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Toy problem

“If you can make a system to solve this, I’ll give you a PhD”

by Ted Adelson

Page 34: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Particle filter for inferring human motion in 3-d

From: Hedvig Sidenbladh’s thesis, http://www.nada.kth.se/~hedvig/publications/thesis.pdf

Page 35: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Particle filter toy example

From: Hedvig Sidenbladh’s thesis, http://www.nada.kth.se/~hedvig/publications/thesis.pdf

Page 36: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

What we’ll have at the end of the class

Non-negative matrix factorization example1-d particle filtering exampleBoosting for face recognitionExample of belief propagation for scene

understanding.Manifold learning comparisons.

Code examples

Page 37: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

4. Final project: write a conference paper

• Submitting papers to conferences, you get just one shot, so it’s important to learn how to make good submissions.

• We’ll discuss many papers, and what’s good and bad about them, during the class.

• I’ll give a lecture on “how to write a good conference paper”.

• Subject of the paper can be:– A project from your own research.– A project you undertake for the class.

• Your idea• One I suggest to you

Page 38: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Feedback options

• At the end of the course: “it would have been better if we had done this…”– Somewhat helpful

• During the course: “I find this useful; I don’t find that useful…”– Very helpful

Page 39: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

What background do you need?

• Be able to read and understand the papers– Linear algebra– Familiarity with estimation theory– Image filtering

• Background in machine learning and computer vision.

Page 40: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Auditing versus credit

• If you’re a student and want to take the class, sign up for credit.– You’ll stay more engaged.– Makes it more probable that I can offer the

class again.

• But if you do audit: – Please don’t come to class if you haven’t read

the paper.– I may ask you to present to the class, anyway.

Page 41: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

First paper

• Monday, Feb. 11.• Emergence of simple-cell receptive field properties

by learning a sparse code for natural images, Olshausen BA, Field DJ (1996) Nature, 381: 607-609

• Presenter: Bill Freeman• Computational demonstration: need volunteer

(software is available: http://redwood.ucdavis.edu/bruno/sparsenet.html)

Page 42: MIT 6.899 Learning and Inference in Vision Prof. Bill Freeman, wtf@mit.edu MW 2:30 – 4:00 Room: 34-301 Course web page:

Second paper

• Wednesday, Feb. 13.

• Learning the parts of objects by non-negative matrix factorization, D. D. Lee and H. S. Seung, Nature 401, 788-791 (1999), and commentary by Mel.

• Presenter: need volunteer

• Computational demonstration: need volunteer