Learning Continuous Models for Estimating Intrinsic ... mtappen/pubs/phd_   Learning Continuous...

download Learning Continuous Models for Estimating Intrinsic ... mtappen/pubs/phd_   Learning Continuous Models

of 144

  • date post

    21-Feb-2019
  • Category

    Documents

  • view

    219
  • download

    0

Embed Size (px)

Transcript of Learning Continuous Models for Estimating Intrinsic ... mtappen/pubs/phd_   Learning Continuous...

Learning Continuous Models for Estimating

Intrinsic Component Images

by

Marshall Friend Tappen

B.S. Computer Science, Brigham Young University, 2000S.M. Electrical Engineering and Computer Science, Massachusetts

Institute of Technology, 2002

Submitted to the Department of Electrical Engineering and ComputerScience

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

May 2006

c Massachusetts Institute of Technology 2006. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science

May 26, 2006

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Edward H. Adelson

Professor of Vision ScienceThesis Supervisor

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .William T. Freeman

Professor of Computer Science and EngineeringThesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Arthur C. Smith

Chairman, Department Committee on Graduate Students

2

Learning Continuous Models for Estimating Intrinsic

Component Images

by

Marshall Friend Tappen

Submitted to the Department of Electrical Engineering and Computer Scienceon May 26, 2006, in partial fulfillment of the

requirements for the degree ofDoctor of Philosophy in Electrical Engineering and Computer Science

Abstract

The goal of computer vision is to use an image to recover the characteristics of a scene,such as its shape or illumination. This is difficult because an image is the mixture ofmultiple characteristics. For example, an edge in an image could be caused by eitheran edge on a surface or a change in the surfaces color. Distinguishing the effects ofdifferent scene characteristics is an important step towards high-level analysis of animage.

This thesis describes how to use machine learning to build a system that recoversdifferent characteristics of the scene from a single, gray-scale image of the scene.The goal of the system is to use the observed image to recover images, referredto as Intrinsic Component Images, that represent the scenes characteristics. Thedevelopment of the system is focused on estimating two important characteristics ofa scene, its shading and reflectance, from a single image. From the observed image, thesystem estimates a shading image, which captures the interaction of the illuminationand shape of the scene pictured, and an albedo image, which represents how thesurfaces in the image reflect light. Measured both qualitatively and quantitatively,this system produces state-of-the-art estimates of shading and albedo images. Thissystem is also flexible enough to be used for the separate problem of removing noisefrom an image.

Building this system requires algorithms for continuous regression and learningthe parameters of a Conditionally Gaussian Markov Random Field. Unlike previouswork, this system is trained using real-world surfaces with ground-truth shading andalbedo images. The learning algorithms are designed to accomodate the large amountof data in this training set.

Thesis Supervisor: Edward H. AdelsonTitle: Professor of Vision Science

Thesis Supervisor: William T. FreemanTitle: Professor of Computer Science and Engineering

3

4

Acknowledgments

I have been fortunate to be advised by Bill Freeman and Ted Adelson. They have

taught me much about both computer vision and how to do research itself. I am grate-

ful for the time that they have spent discussing ideas, revising papers, commenting

on talks, and helping me get out of ruts in my research.

I appreciate the time that Michael Collins, the third member of my thesis commit-

tee, spent reading this thesis carefully and making suggestions for its improvements.

Many thanks go to the members of the Vision and Learning Group and Perceptual

Sciences group: Bryan, Antonio, Erik, Ce, Ron, Barun, Yuanzhen, Lavanya, and

Kevin. They have patiently listened to and commented on my various hare-brained

ideas over the years.

My parents, John and Melanie Tappen, and my mother-in-law, Grace Chan, have

been tremendous supports, both emotionally and, occasionally, financially.

My greatest thanks goes to my family. My sons, John and Jeffrey, were both born

at MIT. Whether my days at school went well or poorly, I could always count on an

enthusiastic reception at home. It is a humbling experience to hear a five-year-old

pray for the success of his father at school. In the end, they have shaped my graduate

experience more than anything else.

My wife, Joy, has continually loved and supported me. During my studies at MIT,

she has

Spent six years in small, student apartments, with two active boys.

Made my small student stipend stretch amazingly far.

Had to move every time that she has been expecting a child.

Her constant love and support have made life and graduate school a wonderful

experience. I am blessed to have such an amazing partner for life and the eternities.

This work was supported by an NDSEG Fellowship from the Department of De-

fense and grants from Shell Oil, the National Geospatial-Intelligence Agency, NGA-

NEGI, the National Science Foundation, the NIH, and the Nippon Telegraph and

Telephone Corporation as part of the NTT/MIT Collaboration Agreement.

5

6

Contents

1 Introduction 21

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.1.1 Intrinsic Images . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.1.2 Shading and Albedo Images . . . . . . . . . . . . . . . . . . . 24

1.1.3 Scene and Noise Images . . . . . . . . . . . . . . . . . . . . . 24

1.2 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.2.1 Estimating Shading and Albedo . . . . . . . . . . . . . . . . . 24

1.2.2 Discriminative Approaches . . . . . . . . . . . . . . . . . . . . 27

1.2.3 Generative Approaches . . . . . . . . . . . . . . . . . . . . . . 28

1.2.4 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.2.5 Broader Links . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1.2.6 Authors Previous Work . . . . . . . . . . . . . . . . . . . . . 33

1.2.7 Limitations of Previous Work . . . . . . . . . . . . . . . . . . 34

1.3 Basic Strategy for Estimating Intrinsic Component Images . . . . . . 36

1.4 Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . . . . 38

1.5 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.6 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2 Learning a Mixture of Experts Estimator 41

2.1 Basic Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.2 Issues in Choosing the Type of Estimator . . . . . . . . . . . . . . . . 42

2.3 Possible Types of Estimators . . . . . . . . . . . . . . . . . . . . . . . 43

2.3.1 Linear Estimators . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3.2 Nearest Neighbor Estimator . . . . . . . . . . . . . . . . . . . 44

2.3.3 The Nadaraya-Watson Estimator and Kernel Density Regression 44

2.3.4 Probabilistic Interpretation of the Nadaraya-Watson Estimator 45

7

2.4 Overcoming Complexity issues using Experts . . . . . . . . . . . . . . 48

2.4.1 Completing the Estimator by Defining p(i|o) . . . . . . . . . . 49

2.5 Stagewise Training of the Mixture of Experts Estimator . . . . . . . . 51

2.5.1 Stagewise Versus Iterative Training . . . . . . . . . . . . . . . 52

2.5.2 Fitting the Mixture of Experts Estimator . . . . . . . . . . . . 52

2.5.3 Beyond Squared-Error . . . . . . . . . . . . . . . . . . . . . . 56

2.5.4 Classification using ExpertBoost . . . . . . . . . . . . . . . . . 57

2.5.5 Related Work on Stagewise Methods . . . . . . . . . . . . . . 58

2.6 Visualizing the ExpertBoost Algorithm . . . . . . . . . . . . . . . . . 60

2.6.1 The Training Process . . . . . . . . . . . . . . . . . . . . . . . 60

2.6.2 The Importance of Local Fitting . . . . . . . . . . . . . . . . . 63

2.6.3 Resiliency to Noise . . . . . . . . . . . . . . . . . . . . . . . . 65

2.6.4 Benefits of Randomly Chosen Patches . . . . . . . . . . . . . . 66

2.7 Comparison with Support Vector Regression . . . . . . . . . . . . . . 67

2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3 Estimating Intrinsic Component Images 71

3.1 Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.1.1 Creating Denoising Data . . . . . . . . . . . . . . . . . . . . . 72

3.1.2 Creating Shading and Albedo Data . . . . . . . . . . . . . . . 72

3.2 Recovering Intrinsic Component Images from Constraint Estimates . 75

3.3 Evaluating the Mixture of Experts Estimators . . . . . . . . . . . . . 76

3.3.1 The Training and Test Sets . . . . . . . . . . . . . . . . . . . 76

3.4 Evaluating Im