Geometry and Photometry in 3D Visual shashua/papers/AITR-1401.pdf · PDF fileGeometry and...

Click here to load reader

  • date post

    30-Jul-2018
  • Category

    Documents

  • view

    213
  • download

    0

Embed Size (px)

Transcript of Geometry and Photometry in 3D Visual shashua/papers/AITR-1401.pdf · PDF fileGeometry and...

  • Geometry and Photometry in 3D Visual Recognition

    by

    Amnon Shashua

    B.Sc., Tel-Aviv University, Israel (1986)

    M.Sc., Weizmann Institute of Science, Israel (1989)

    Submitted to the Department of Brain and Cognitive Sciences

    in Partial Fulllment of the Requirements for the Degree of

    Doctor of Philosophy

    at the

    Massachusetts Institute Of Technology

    November, 1992

    Articial Intelligence Laboratory, AI-TR 1401

    Copyright c Massachusetts Institute of Technology, 1992

    This report describes research done at the Articial Intelligence Laboratory of the Massachusetts Institute of

    Technology. Support for the laboratory's articial intelligence research is provided in part by the Advanced

    Research Projects Agency of the Department of Defense under Oce of Naval Research contract N00014-

    85-K-0124. A. Shashua was also supported by NSF-IRI8900267.

  • ii

  • Abstract

    This thesis addresses the problem of visual recognition under two sources of variability:

    geometric and photometric. The geometric deals with the relation between 3D objects

    and their views under parallel, perspective, and central projection. The photometric deals

    with the relation between 3D matte objects and their images under changing illumination

    conditions. Taken together, an alignment-based method is presented for recognizing ob-

    jects viewed from arbitrary viewing positions and illuminated by arbitrary settings of light

    sources.

    In the rst part of the thesis we show that a relative non-metric structure invariant

    that holds under both parallel and central projection models can be dened relative to

    four points in space and, moreover, can be uniquely recovered from two views regardless

    of whether one or the other was created by means of parallel or central projection. As a

    result, we propose a method that is useful for purposes of recognition (via alignment) and

    structure from motion, and that has the following properties: (i) the transition between

    projection models is natural and transparent, (ii) camera calibration is not required, and

    (iii) structure is dened relative to the object and does not involve the center of projection.

    The second part of this thesis addresses the photometric aspect of recognition under

    changing illumination. First, we argue that image properties alone do not appear to be

    generally sucient for dealing with the eects of changing illumination; we propose a model-

    based approach instead. Second, we observe that the process responsible for factoring out

    the illumination during the recognition process appears to require more than just contour

    information, but just slightly more. Taken together, we introduce a model-based alignment

    method that compensates for the eects of changing illumination by linearly combining

    model images of the object. The model images, each taken from a dierent illumination

    condition, can be converted onto novel images of the object regardless of whether the image

    is represented by grey-values, sign-bits, or other forms of reduced representations.

    The third part of this thesis addresses the problem of achieving full correspondence

    between model views and puts together the geometric and photometric components into a

    single recognition system. The method for achieving correspondence is based on combining

    ane or projective geometry and optical ow techniques into a single working framework.

    Thesis Advisor: Professor Shimon Ullman

    iii

  • iv

  • Acknowledgments

    First and foremost, thanks go to Shimon Ullman. I was fortunate to have Shimon as

    my advisor for the past six years, both at Weizmann and at MIT. Shimon allowed me to

    benet from his knowledge, experience, nancial support, and many of his original views

    while not constraining my creative eorts. I also gratefully acknowledge Shimon's inuence

    on shaping my views in the elds of computer and human vision.

    Thanks also to other members of the Brain and Cognitive Sciences department and the

    Articial Intelligence Laboratory: to Tomaso Poggio for his generosity, continuous support

    and interest in my work, for countless illuminating discussions, and much more | but

    mostly for always treating me as a colleague and friend. To Whitman Richards for his

    never ending generosity and patience and for the many interesting \wednesday lunches"

    we had together. To Eric Grimson for the welg-group seminars and for patiently reading

    through my memos. To Berthold Horn for interesting discussions on issues related to

    photometry and geometry.

    Thanks to the members of my thesis committee: to Tomaso Poggio who kindly agreed

    being the chairman, and to Ted Adelson, Ellen Hildreth, and Alan Yuille for making their

    time available to me. Special thanks to Tomaso Poggio and Ted Adelson for being my

    co-sponsors for the post-doctoral McDonnell-Pew fellowship.

    Thanks to current and former members of the Center for Biological Information Pro-

    cessing: to Shimon Edelman who together with his wife Esti helped to make our rst year

    in the U.S. very pleasant; to Daphna Weinshall for putting a kind word whenever possible

    and for being my Tennis partner; to Heinrich Bultho for many discussions on \Mooney"

    images, and to Norberto Grzywacz for always keeping spirits high.

    For stimulating my interest in visual motion and structure from motion and for provid-

    ing an exciting and inspiring summer, thanks to Peter Burt, Padmanabhan Anandan, Jim

    Bergen, Keith Hanna, Neil Okamoto and Rick Wildes at David Sarno Research Center.

    Thanks to the vision group at IBM T.J. Watson Research Center: to Ruud Bolle,

    Andrea Califano and Rakesh Mohan for being generous enough to give me all the freedom

    in pursuing directions of research during my stay there at the summer of 1990; to Gabriel

    Taubin for many months of inspiring collaboration on combinatorial algorithms.

    Many thanks to my fellow lab-mates at the Articial Intelligence Laboratory: to Tan-

    veer Syeda and Ivan Bachelder my oce-mates, to David Jacobs for many inspiring chats

    v

  • on projective geometry and for his active role in the welg-group seminars, to David Clemens

    for making our life with Lisp Machines fairly enjoyable, to Anita Flynn for initiating many

    of the events that make this lab a fun place, to Pawan Sinha for always being helpful,

    to Brian Subirana for keeping spirits and \Crick" dinners up, to Davi Geiger for endless

    energy and enthusiasm; and to Ronen Basri, David Beymer, Thomas Breuel, Todd Cass,

    Ron Chaney, Frederico Girosi, John Harris, Ian Horswill, David Michael, Pam Lipson, Jose

    Robles, Kah Key Sung, Ali Taalebi, Sandy Wells, and Paul Viola who have made my stay

    here both enjoyable and inspiring.

    Thanks to my fellow students at the department of Brain and Cognitive Sciences,

    especially to Sherif Botros, Diana Smetters, Eric Loeb and Randy Smith. Thanks to my

    fellow students at the Media Laboratory: Bill Freeman, Mathew Turk, Trevor Darrell and

    Eero Simoncelli.

    Many thanks to the Articial Intelligence Laboratory and to the department of Brain

    and Cognitive Sciences for providing a stimulating environment, especially to Janice Ellert-

    sen who runs the BCS department, and to Liz Highleyman, Sally Richter and Jeanne

    Speckman at the AI lab.

    To my parents, many thanks for their moral and nancial support during what seems

    as the uncountable years of higher education.

    Finally, to my wife without whom this thesis would not be possible, endless gratitude.

    vi

  • To Anat, with Love

    vii

  • viii

  • Contents

    1 Introduction 1

    1.1 Sources of Variability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2

    1.2 Scope of Recognition in this Work : : : : : : : : : : : : : : : : : : : : : : : 4

    1.3 Existing Approaches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8

    1.4 Relationship to Human Vision : : : : : : : : : : : : : : : : : : : : : : : : : : 11

    1.4.1 Recognition and the Problem of Varying Context : : : : : : : : : : : 12

    1.4.2 Geometry Related Issues in Human Vision : : : : : : : : : : : : : : : 12

    1.4.3 Issues of Photometry in Human Vision : : : : : : : : : : : : : : : : : 15

    1.5 Overview of Thesis Content and Technical Contributions : : : : : : : : : : : 17

    1.5.1 Part I: Geometry, Recognition and SFM : : : : : : : : : : : : : : : : 17

    1.5.2 Part II: The Photometric Problem : : : : : : : : : : : : : : : : : : : 19

    1.5.3 Part III: Combining Geometric and Photometric Sources of Information 20

    1.6 Projection Models, Camera Models and General Notations : : : : : : : : : : 20

    1.6.1 Camera Models : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22

    1.6.2 General Notations : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23

    I Geometry: Recognition and Structure from Motion 25

    2 Non-metric Structure and Alignment from two Parallel Projected Views 27

    2.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28

    ix

  • 2.2 Ane Coordinates from two Views : : : : : : : : : : : : : : : : : : : : : : : 29

    2.3 Ane Structure: Koenderink and Van Doorn's Version : : : : : : : : : : : : 30

    2.4 Epipolar Geometry and Recognition : : : : : : : : : : : : : : : : : : : : : : 32

    2.5 The Linear Combination of Views and Ane Structure : : : : : : : : : : : : 33

    2.6 Discussion : : : : : : : : :