Geometry and Photometry in 3D Visual shashua/papers/AITR-1401.pdf · PDF fileGeometry and...
date post
30-Jul-2018Category
Documents
view
213download
0
Embed Size (px)
Transcript of Geometry and Photometry in 3D Visual shashua/papers/AITR-1401.pdf · PDF fileGeometry and...
Geometry and Photometry in 3D Visual Recognition
by
Amnon Shashua
B.Sc., Tel-Aviv University, Israel (1986)
M.Sc., Weizmann Institute of Science, Israel (1989)
Submitted to the Department of Brain and Cognitive Sciences
in Partial Fulllment of the Requirements for the Degree of
Doctor of Philosophy
at the
Massachusetts Institute Of Technology
November, 1992
Articial Intelligence Laboratory, AI-TR 1401
Copyright c Massachusetts Institute of Technology, 1992
This report describes research done at the Articial Intelligence Laboratory of the Massachusetts Institute of
Technology. Support for the laboratory's articial intelligence research is provided in part by the Advanced
Research Projects Agency of the Department of Defense under Oce of Naval Research contract N00014-
85-K-0124. A. Shashua was also supported by NSF-IRI8900267.
ii
Abstract
This thesis addresses the problem of visual recognition under two sources of variability:
geometric and photometric. The geometric deals with the relation between 3D objects
and their views under parallel, perspective, and central projection. The photometric deals
with the relation between 3D matte objects and their images under changing illumination
conditions. Taken together, an alignment-based method is presented for recognizing ob-
jects viewed from arbitrary viewing positions and illuminated by arbitrary settings of light
sources.
In the rst part of the thesis we show that a relative non-metric structure invariant
that holds under both parallel and central projection models can be dened relative to
four points in space and, moreover, can be uniquely recovered from two views regardless
of whether one or the other was created by means of parallel or central projection. As a
result, we propose a method that is useful for purposes of recognition (via alignment) and
structure from motion, and that has the following properties: (i) the transition between
projection models is natural and transparent, (ii) camera calibration is not required, and
(iii) structure is dened relative to the object and does not involve the center of projection.
The second part of this thesis addresses the photometric aspect of recognition under
changing illumination. First, we argue that image properties alone do not appear to be
generally sucient for dealing with the eects of changing illumination; we propose a model-
based approach instead. Second, we observe that the process responsible for factoring out
the illumination during the recognition process appears to require more than just contour
information, but just slightly more. Taken together, we introduce a model-based alignment
method that compensates for the eects of changing illumination by linearly combining
model images of the object. The model images, each taken from a dierent illumination
condition, can be converted onto novel images of the object regardless of whether the image
is represented by grey-values, sign-bits, or other forms of reduced representations.
The third part of this thesis addresses the problem of achieving full correspondence
between model views and puts together the geometric and photometric components into a
single recognition system. The method for achieving correspondence is based on combining
ane or projective geometry and optical ow techniques into a single working framework.
Thesis Advisor: Professor Shimon Ullman
iii
iv
Acknowledgments
First and foremost, thanks go to Shimon Ullman. I was fortunate to have Shimon as
my advisor for the past six years, both at Weizmann and at MIT. Shimon allowed me to
benet from his knowledge, experience, nancial support, and many of his original views
while not constraining my creative eorts. I also gratefully acknowledge Shimon's inuence
on shaping my views in the elds of computer and human vision.
Thanks also to other members of the Brain and Cognitive Sciences department and the
Articial Intelligence Laboratory: to Tomaso Poggio for his generosity, continuous support
and interest in my work, for countless illuminating discussions, and much more | but
mostly for always treating me as a colleague and friend. To Whitman Richards for his
never ending generosity and patience and for the many interesting \wednesday lunches"
we had together. To Eric Grimson for the welg-group seminars and for patiently reading
through my memos. To Berthold Horn for interesting discussions on issues related to
photometry and geometry.
Thanks to the members of my thesis committee: to Tomaso Poggio who kindly agreed
being the chairman, and to Ted Adelson, Ellen Hildreth, and Alan Yuille for making their
time available to me. Special thanks to Tomaso Poggio and Ted Adelson for being my
co-sponsors for the post-doctoral McDonnell-Pew fellowship.
Thanks to current and former members of the Center for Biological Information Pro-
cessing: to Shimon Edelman who together with his wife Esti helped to make our rst year
in the U.S. very pleasant; to Daphna Weinshall for putting a kind word whenever possible
and for being my Tennis partner; to Heinrich Bultho for many discussions on \Mooney"
images, and to Norberto Grzywacz for always keeping spirits high.
For stimulating my interest in visual motion and structure from motion and for provid-
ing an exciting and inspiring summer, thanks to Peter Burt, Padmanabhan Anandan, Jim
Bergen, Keith Hanna, Neil Okamoto and Rick Wildes at David Sarno Research Center.
Thanks to the vision group at IBM T.J. Watson Research Center: to Ruud Bolle,
Andrea Califano and Rakesh Mohan for being generous enough to give me all the freedom
in pursuing directions of research during my stay there at the summer of 1990; to Gabriel
Taubin for many months of inspiring collaboration on combinatorial algorithms.
Many thanks to my fellow lab-mates at the Articial Intelligence Laboratory: to Tan-
veer Syeda and Ivan Bachelder my oce-mates, to David Jacobs for many inspiring chats
v
on projective geometry and for his active role in the welg-group seminars, to David Clemens
for making our life with Lisp Machines fairly enjoyable, to Anita Flynn for initiating many
of the events that make this lab a fun place, to Pawan Sinha for always being helpful,
to Brian Subirana for keeping spirits and \Crick" dinners up, to Davi Geiger for endless
energy and enthusiasm; and to Ronen Basri, David Beymer, Thomas Breuel, Todd Cass,
Ron Chaney, Frederico Girosi, John Harris, Ian Horswill, David Michael, Pam Lipson, Jose
Robles, Kah Key Sung, Ali Taalebi, Sandy Wells, and Paul Viola who have made my stay
here both enjoyable and inspiring.
Thanks to my fellow students at the department of Brain and Cognitive Sciences,
especially to Sherif Botros, Diana Smetters, Eric Loeb and Randy Smith. Thanks to my
fellow students at the Media Laboratory: Bill Freeman, Mathew Turk, Trevor Darrell and
Eero Simoncelli.
Many thanks to the Articial Intelligence Laboratory and to the department of Brain
and Cognitive Sciences for providing a stimulating environment, especially to Janice Ellert-
sen who runs the BCS department, and to Liz Highleyman, Sally Richter and Jeanne
Speckman at the AI lab.
To my parents, many thanks for their moral and nancial support during what seems
as the uncountable years of higher education.
Finally, to my wife without whom this thesis would not be possible, endless gratitude.
vi
To Anat, with Love
vii
viii
Contents
1 Introduction 1
1.1 Sources of Variability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2
1.2 Scope of Recognition in this Work : : : : : : : : : : : : : : : : : : : : : : : 4
1.3 Existing Approaches : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
1.4 Relationship to Human Vision : : : : : : : : : : : : : : : : : : : : : : : : : : 11
1.4.1 Recognition and the Problem of Varying Context : : : : : : : : : : : 12
1.4.2 Geometry Related Issues in Human Vision : : : : : : : : : : : : : : : 12
1.4.3 Issues of Photometry in Human Vision : : : : : : : : : : : : : : : : : 15
1.5 Overview of Thesis Content and Technical Contributions : : : : : : : : : : : 17
1.5.1 Part I: Geometry, Recognition and SFM : : : : : : : : : : : : : : : : 17
1.5.2 Part II: The Photometric Problem : : : : : : : : : : : : : : : : : : : 19
1.5.3 Part III: Combining Geometric and Photometric Sources of Information 20
1.6 Projection Models, Camera Models and General Notations : : : : : : : : : : 20
1.6.1 Camera Models : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22
1.6.2 General Notations : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23
I Geometry: Recognition and Structure from Motion 25
2 Non-metric Structure and Alignment from two Parallel Projected Views 27
2.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28
ix
2.2 Ane Coordinates from two Views : : : : : : : : : : : : : : : : : : : : : : : 29
2.3 Ane Structure: Koenderink and Van Doorn's Version : : : : : : : : : : : : 30
2.4 Epipolar Geometry and Recognition : : : : : : : : : : : : : : : : : : : : : : 32
2.5 The Linear Combination of Views and Ane Structure : : : : : : : : : : : : 33
2.6 Discussion : : : : : : : : :