Computer Vision CS 223b Jana Kosecka [email protected],[email protected] CA: Sanaz Mohatari...

51
Computer Vision CS 223b Jana Kosecka http://cs223b.stanford.edu/ [email protected],[email protected] CA: Sanaz Mohatari [email protected] Han-I Su [email protected]

Transcript of Computer Vision CS 223b Jana Kosecka [email protected],[email protected] CA: Sanaz Mohatari...

Page 1: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Computer Vision CS 223b

Jana Kosecka http://cs223b.stanford.edu/

[email protected],[email protected]

CA: Sanaz Mohatari [email protected] Su [email protected]

Page 2: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Logistics

• Grading: Homeworks (about every 2 weeks) 30% Midterm: 30% Final project: 40%

• Prerequisites: basic statistical concepts, geometry, linear algebra, calculus

• Recommended text:  • Introductory Techniques for 3D Computer Vision (E. Trucco, A. Verri, Prentice Hall, 1998)• From Images to Geometric Models: Y. Ma, S. Soatto,

J.Kosecka and S. Sastry, Springer Verlag 2003• Computer Vision: Algorithms and Applications R. Szeliski

(preprint)

• Computer Vision a Modern Approach (D. Forsyth, J. Ponce, Prentice Hall 2002)

• Required Software MATLAB (with Image Processing toolbox)• Open CV library

Page 3: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

The Texts

Page 4: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Project Deadlines

• Check Web site for proposals, or develop your own

• Team up!• Dates

– Feb 13: Project proposals due– Mar 12: Final report due– Mar 15: Mini-workshop on projects

Page 5: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

To define your own project…

• Find a mentor • Generate project description for the Class Web

site• Gather data, process data• Write suitable project proposal

Examples: • Learn to find sports videos on youtube.com• Match images of same location at flickr.com• Fly autonomous helicopter with camera

• <your idea here>

Page 6: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Today’s Goals

• Learn about CS223b• Get Excited about Computer Vision• Learn about image formation (Part 1)

Page 7: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

What is vision?

• From the 3-D world to 2-D images: image formation (physics). – Domain of artistic reproduction (synthesis): painting, graphics.

• From 2-D images to the 3-D world: image analysis (mathematical modeling, inference).– Domain of vision: biological (eye+brain), computational

Page 8: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

IMAGE SYNTHESIS: simulation of the image-formation process

• Pinhole (perspective) imaging in most ancient civilizations.• Euclid, perspective projection, 4th century B.C., Alexandria (Egypt)• Pompeii frescos, 1st century A.D. (ubiquitous).• Geometry understood very early on, then forgotten.

Image courtesy of C. Taylor

Page 9: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

PERSPECTIVE IMAGING (geometry)

Image courtesy of C. Taylor

• Re-discovered and formalized in the Renaissance:• Fillippo Brunelleschi, first Renaissance artist to paint with correct perspective,1413 • “Della Pictura”, Leon Battista Alberti, 1435, first treatise• Leonardo Da Vinci, stereopsis, shading, color, 1500s• Raphael, 1518

Page 10: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Biological motivations

Understanding visual sensing modality and its role

- We have no difficulties to navigate, manipulate objectsrecognize familiar places and faces

- How can we successfully carry out all these everyday tasks ?- How does the visual perception mediates these activities ? - Overall systemSensation and perception – integrate and interpret sensory readingsBehavior – control muscles and glands to mediate some behaviorMemory – long term memory (declarative – faces, places, events) short term memory (procedural – associations, skills)Higher level functions – internal models and representations of the environments

Page 11: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Goals of Computer Vision

• Build machines and develop algorithms which can automatically replicate some functionalities of

biological visual system

- Systems which navigate in cluttered environments- Systems which can recognize objects, activities- Systems which can interact with humans/world

Synergies with other disciplines and various applications

Artificial Intelligence - robotics, natural language understandingVision as a sensor – medical imaging, Geospatial Imaging, robotics,

visual surveillance, inspection

Page 12: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Computer Vision Computer Vision

Visual Sensing Visual Sensing

Images I(x,y) – brightness patternsImages I(x,y) – brightness patterns

- image appearance depends on structure of the scene- material and reflectance properties of the objects- position and strength of light sources

Page 13: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

[Felleman & Van Essen, 1991][Felleman & Van Essen, 1991]

This is the part of your brain that processes visual information

VISUAL INFORMATION PROCESSING

Page 14: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

This is how a computer represents it

Page 15: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

And so is this …And so is this …

Page 16: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

And so are these!

We need to extract some “invariant”, i.e. what is common to all these images (they are all images of my office)

Page 17: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

BUMMER! THIS IS IMPOSSIBLE!

– THM: [Weiss, 1991]: There exists NO generic viewpoint invariant!

– THM: [Chen et al., 2003]: There exists NO photometric invariant!!

• So, how do we (primates) solve the problem?

Page 18: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

EXAMPLE OF A (VERY COMMON) SENSOR NETWORK

Retina performs distributed computation:– Contrast adaptation (lateral inhibition)

– Enhancement/edgels (e.g. Mach bands)

– Motion detection (leap frog)

Does the human visualsystem really solve theproblem? Not really!

Page 19: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Optical Illusion

Page 20: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Optical Illusion

http://web.mit.edu/persci/adelson/checkershadow_illusion.html

Checker A and B are of the same gray-level value

Page 21: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

SIDE EFFECTS OF LATERAL INHIBITION

• THESE ARE NOT SPIRALSTHESE ARE NOT SPIRALS

• AND THEY ARE NOT MOVING!AND THEY ARE NOT MOVING!

Cause Peripheral DriftCause Peripheral Drift

http://www.psy.rittsumei.ac.jp/~akitaoka/rotsnakee.html

Page 22: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Challenges/Issues Challenges/Issues

• About 40% of our brain is devoted to vision • We see immediately and can form and understand images instantly

• Several strategies for forming a representation of the scenes• Task dependent attentional mechanisms• Detailed representations are often not necessary• Different approaches in the past Animate Vision (biologically inspired), Purposive Vision

(depending on the task/purpose) - have many shortcomings

Page 23: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

• Recovery of the properties of the environmentRecovery of the properties of the environment from single or multiple views using various visual cuesfrom single or multiple views using various visual cues

Vision problemsVision problems

• Segmentation Segmentation • Recognition Recognition • Reconstruction Reconstruction • Vision Based Control - ActionVision Based Control - Action

• stereostereo• motionmotion• shadingshading• texturetexture• brightness, colorbrightness, color

Visual CuesVisual Cues

Page 24: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 1:Stereo

See http://schwehr.org/photoRealVR/example.html

Page 25: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 2: Structure From Motion

http://www.cs.unc.edu/Research/urbanscape

Page 26: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 2: Structure From Motion

http://www.cs.unc.edu/Research/urbanscape

Page 27: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 2: Structure From Motion

http://www.cs.unc.edu/Research/urbanscape

Page 28: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 3: 3D Modeling

http://www.photogrammetry.ethz.ch/research/cause/3dreconstruction3.html

Page 29: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 5: Segmentation

http://elib.cs.berkeley.edu/photos/classify/

Page 30: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 6: Classification

Page 31: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 6: Classification

Page 32: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Example 9: Human Vision

Page 33: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Segmentation Segmentation – partition image into separate objects– partition image into separate objects

• Clustering and search algorithms in the space of visual cuesClustering and search algorithms in the space of visual cues

Page 34: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.
Page 35: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Figure from, “A general framework for object detection,” by C. Papageorgiou, M. Oren and T. Poggio, Proc. ICCV, copyright 1998, IEEE

Recognition by finding patterns

Figure from A Statistical Method for 3D Object Detection Applied to Faces and Cars, H. Schneiderman and T. Kanade, CVPR, 2000.

Face detectionHuman detection

Page 36: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Samples from motorbikes appearance model

(Fergus’s et. al results)

Recognition by finding patterns

Page 37: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Focus of this course

The focus of the course : 1. Geometry of Single and Multiple Views

Shape and Motion Recovery, Matching, Alignment ProblemsReconstruction (from 2D to 3D)Computation of Pictorial cues – shading, texture, blur, contour, stereo, motion cues

2. Object Detection and Recognition

Object representation, detection in cluttered scenesRecognition of object categories

Page 38: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

How to reliably recover and represent the geometricmodel from single image or video and camera motion/pose

Representation issues depends on the task/applications

• Image-based rendering, Computer Graphics• Virtual and Augmented Reality• Vision based control, surveillance, target tracking• Human computer interaction

• Medical imaging (alignment, monitoring of change)• Video Analysis

1. Geometry of Single and Multiple Views, Video

Page 39: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Vision and Computer Graphics- image based rendering techniques- 3D reconstruction from multiple views or video- single view modeling - view morphing (static and dynamic case)

Page 40: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Modeling with Multiple Images

University High School, Urbana, IllinoisThree of Twelve Images, courtesy Paul Debevec

Page 41: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

QuickTime™ and aVideo decompressor

are needed to see this picture.

Final Model

Page 42: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Visual surveillance

wide area surveillance, traffic monitoring Interpretation of different activities

Page 43: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Virtual and Augmented Reality, Human computer Interaction

Virtual object insertionvarious gesture based interfacesInterpretation of human activitiesEnabling technologies of intelligent homes, smart spaces

Page 44: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Topics

• Image Formation, Representation of Camera Motion, Camera Calibration

• Image Features – filtering, edge detection, point feature detection

• Image alignment, 3D structure and motion recovery, stereo

• Analysis of dynamic scenes, detection, tracking

• Object detection and object recognition

Page 45: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Surveillance

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 46: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Surveillance

QuickTime™ and aYUV420 codec decompressor

are needed to see this picture.

Page 47: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

Image Morphing, Mosaicing, Alignment Images of CSL, UIUC

Page 48: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

3D RECONSTRUCTION FROM MULTIPLE VIEWS:3D RECONSTRUCTION FROM MULTIPLE VIEWS:GEOMETRY AND PHOTOMETRYGEOMETRY AND PHOTOMETRY

jin-soatto-yezzi; image courtesy: j-y bouguet - intel

Page 49: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

estimated shape

laser-scanned,manually polished

jin-soatto-yezzi

Page 50: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

with h. jin

Page 51: Computer Vision CS 223b Jana Kosecka  kosecka@cs.gmu.edu,kosecka@ai.stanford.edu CA: Sanaz Mohatari sanazm@stanford.edusanazm@stanford.edu.

APPLICATIONS – Unmanned Aerial Vehicles (UAVs)

Berkeley Aerial Robot (BEAR) Project

Rate: 10HzAccuracy: 5cm, 4o