Computer Vision Dan Witzner Hansen Course web page: Email: [email protected].
-
date post
21-Dec-2015 -
Category
Documents
-
view
223 -
download
2
Transcript of Computer Vision Dan Witzner Hansen Course web page: Email: [email protected].
What is Vision?
Today
• Introduction to the course• Crash course in 2D and 3D
geometry (Brush-up from High school)
• (Solving linear equations and least squares solution to linear equations)
The Vision Problem
How to infer salient properties of 3-D world from time-varying
2-D image projection
¤ What is salient?¤ How to deal with loss of information going from 3-D to 2-D?
Computer Vision: Stages
• Image formation• Low-level
– Single image processing– Multiple views
• Mid-level– Estimation, segmentation (main topic of Image
Analysis and Foundations of Image Analysis and will only be covered briefly here)
• High-level – Recognition– Classification
Image Formation
• 3-D geometry• Physics of light• Camera properties
– Focal length– Distortion
• Sampling issues– Spatial– Temporal
Low-level: Single Image Processing
• Filtering– Edge– Color – Local pattern similarity
• Texture– Appearance characterization from the
statistics of applying multiple filters• 3-D structure estimation from…
– Shading– Texture
Low-level: Multiple Views
• Stereo– Structure from two views
• Structure from motion– What can we learn in general from
many views, whether they were taken simultaneously or sequentially?
Mid-Level: Estimation, Segmentation
• Estimation: Fitting parameters to data– Static (e.g., shape)– Dynamic (e.g., tracking)
• Segmentation/clustering– Breaking an image or image
sequence into a few meaningful pieces with internal similarity
High-level: Recognition, Classification
• Recognition: Finding and parametrizing a known object
• Classification– Assignment to known categories
using statistics/probability to make best choice
Course Overview
Image formation and cameras
Projective geometryRelating pointsbetween images
Motion Motion analysisObject Tracking
Shape and recognition
Shape anaysisObject recognition
APPLICATIONS
Applications: Factory Inspection
Cognex’s “CapInspect” system:
Low-level image analysis: Identify edges, regionsMid-level: Distinguish “cap” from “no cap”
Estimation: What are orientation of cap, height of liquid?
Applications: Face Detection
courtesy of H. Rowley
How is this like the bottle problem on the previous slide?
Applications: Text Detection & Recognition
from J. Zhang et al.
Similar to face finding: Where is the text and what does it say?Viewing at an angle complicates things...
Detection and Recognition: How?
• Build models of the appearance characteristics (color, texture, etc.) of all objects of interest
• Detection: Look for areas of image with sufficiently similar appearance to a particular object
• Recognition: Decide which of several objects is most similar to what we see
• Segmentation: “Recognize” every pixel
Applications: Virtual Advertising
courtesy of Princeton Video Image
First-Down Line, Virtual Advertising: How?
• Where should message go?– Sensors that measure pan, tilt, zoom and focus are
attached to calibrated cameras at surveyed positions– Knowledge of the 3-D position of the line, advertising
rectangle, etc. can be directly translated into where in the image it should appear for a given camera
• What pixels get painted?– Occluding image objects like the ball, players, etc.
where the graphic is to be put must be segmented out. These are recognized by being a sufficiently different color from the background at that point. This allows pixel-by-pixel compositing.
Applications: Inserting Computer Graphics with a
Moving Camera
CG Insertion with a Moving Camera: How?
• This technique is often called matchmove• Once again, we need camera calibration, but
also information on how the camera is moving—its egomotion. This allows the CG object to correctly move with the real scene, even if we don’t know the 3-D parameters of that scene.
• Estimating camera motion:– Much simpler if we know camera is moving sideways
because then the problem is only 2-D – For general motions: By identifying and following
scene features over the entire length of the shot, we can solve retrospectively for what 3-D camera motion would be consistent with their 2-D image tracks. Must also make sure to ignore independently moving objects like cars and people.
Applications: Motion Capture
Vicon software:12 cameras, 41 markers for body capture;
6 zoom cameras, 30 markers for face
Applications: Motion Capture without Markers
courtesy of C. Bregler
What’s the difference between these two problems?
Motion Capture: How?
• Similar to matchmove in that we follow features and estimate underlying motion that explains their tracks
• Difference is that the motion is not of the camera but rather of the subject (though camera could be moving, too)– Face/arm/person has more degrees of
freedom than camera flying through space, but still constrained
• Special markers make feature identification and tracking considerably easier
• Multiple cameras gather more information
Applications: Image-Based Modeling
courtesy of P. Debevec
Façade project: UC Berkeley Campanile
Image-Based Modeling: How?
• 3-D model constructed from manually-selected line correspondences in images from multiple calibrated cameras
• Novel views generated by texture-mapping selected images onto model
A Movie
Movie
Applications: Robotics
Autonomous driving: Lane & vehicle tracking (with radar)
Human Computer Interaction
What is the relationship between many of these
applications?• Knowledge of
– Cameras– Motion and Tracking– Shapes and object recognition– Mathematics and Statistics
Course Prerequisites
• Background in/comfort with:– Linear algebra– Multi-variable calculus– Statistics, probability
• Homeworks will use Matlab but you are also welcome to use C/C++ (harder though)– An ability to program in C/C++, Java, or
equivalent should be sufficient preparation, but knowing Matlab is better (no introduction given, but you can come see me if needed)
Grading
• 100 % on mandatory assignments• Submission ON TIME
More specifically…..
Single View Examples
Mosaicing
Stereo
Stereo reconstruction
Tracking, Shape and HCI
After the course• Understand, choose between, and apply various computer
vision algorithms. • Understand the relations between objects in the 3D world and
those obtained from cameras. • Understand the principles on how to make 3D models
(reconstruction) from images. • Write programs which are able to follow objects in pre-
recorded movies or live images obtained from cameras in either Matlab or C++.
• Understand principles for making computer vision systems that aim towards enabling humans to interact with a computer through cameras.
Reading Material
• Textbooks: – “Multiple View Geometry” Hartley and Zisserman– ”Introductory Techniques for 3D Computer Vision”(less
important)
• Supplemental readings will be available online as PDF files and a few as photocopies from books.
• Complete assigned reading before corresponding lecture and re-read difficult parts after the lecture.
• This is NOT an easy course so, expect at least 15 hrs WORK each week.
• Show up for ALL lectures.
Details
• Homework– Submission at to me in by the end of
the exercises.– Expect to have it ready before the
exercises, though!– NO Lateness policy – Add-on’s will be
exprected if late• Exam
– Submission of mandatory assignments by the end of the semeste.
More Details
• Instructor– E-mail: [email protected]– Office hours (by appointment):
• Friday, 10:00-12:00 pm
Remember that semester projects in connection with the course are possible.
Your First Assignment
• Try to get Matlab running• Take a look at a Matlab primer• Unfortunately most of the tools
(mathematics) have to be developed in the beginning of the course and it may therefore seem quite mathematical.
• DON’T LET THAT DISCURAGE YOU
More questions?
First try the web page:www.itu.dk/courses/MCV
Feel free to e-mail me at any time
What is needed here?