Computer Vision: Extracting Data from the Visual World

18
Computer Vision: Extracting Data from the Visual World A Brief Example... Steven Mitchell, Ph.D. Componica, LLC

description

Introduction to Computer Visioning Technology. Exploring the science behind "making computers see stuff", the future of facial tracking, optical recognition, and surveillance .

Transcript of Computer Vision: Extracting Data from the Visual World

Page 1: Computer Vision: Extracting Data from the Visual World

Computer Vision: Extracting Data from the Visual WorldA Brief Example... !

Steven Mitchell, Ph.D. Componica, LLC

Page 2: Computer Vision: Extracting Data from the Visual World

About us.Componica, LLC (http://www.componica.com/)

Strong Background in Computer Vision

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Page 3: Computer Vision: Extracting Data from the Visual World

About us.Componica seamlessly combines the worlds of machine learning, computer visioning & mobile development & applying the latest in visionary technology to the world of mobile media.

Words for Spanish / Russian / French

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Page 4: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Why is computer vision relevant?

How do these things work?

Should I be concernd?

Page 5: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

In this slideshow:

Facial Detection - Find me a face.

Facial Recognition - Who’s face is it?

Image Registration - Aligning pictures together.

...which leads to augmented reality.

QR Codes - They’re everywhere.

Optical Character Recognition - Reading Stuff.

Page 6: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Face DetectionThis is NOT facial recognition.

Developed by Viola / Jones in 2000. Major break-thru in image recognition...this was not possible prior.

How much does a cow weigh?

An army of simple face detectors.

"Robust Real-time Object Detection"!Paul Viola and Michael Jones

Page 7: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

BTW, It’s how the Kinect sees people.

Page 8: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

BTW, It’s how the Kinect sees people.

"Real-Time Human Pose Recognition in Parts from Single Depth Images"!Shotton, Fitzgibbon, Cook, Sharp, Finocchio, Moore, Kipman, Blake!

Microsoft Research Cambridge & Xbox Incubation

Page 9: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Facial Recognition

Remove effects caused by lighting and perspective.

After you find a face, reduce it to numbers.

"Statistical Models of Appearance for Computer Vision"!T.F. Cootes and C.J.Taylor

Page 10: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Facial Recognition

Let’s mix some paint...

Comparing numbers in hyperspace

k-Nearest Neighbor, Wikipedia

Page 11: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

The most common way to register images. Find the most interesting points on the two images.

Compare all the interesting points from one image to the other forming matching pairs of points between images.

Image Registration - Interesting Points

Page 12: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Augmented RealityFAST interest point detection 0.55msBuilding query bit masks 0.12msMatching into database 0.35msRobust pose estimation 0.1msTotal frame time 1.12ms

Table 1. Timings for the stages of our approach on a dataset withimages taken from within the range of trained viewpoints.

Figure 4. The bit error count provides a reasonable way to deter-mine good matches. Left: matches from viewpoints contained intraining set. Right: matches on viewpoints from outside trainingset.

robustness to different imaging devices.Matching on the first test sequence was very good, cor-

rectly localising the target in all 754 frames of the test se-quence. There was little blur in the sequence so the fullframe provided enough matches in all but 7 frames of thesequence, when the half-sampled image fallback was usedto obtain enough matches for a confident pose estimate. Theaverage total frame time on the sequence was 1.12ms on a2.4GHz processor. The time attributed to each stage of theprocess is shown in Table 1.

Somewhat surprisingly our method also performed rea-sonably well on the second sequence, even though it wasknown the frames were taken from views that were not cov-ered by our training set. On this sequence the target was lo-calised in 635 frames of the 675 in the sequence (94%). Asexpected the pose estimate using only the full-frame imagewas generally less confident so the fallbacks to sub-sampledimages were used more often: 377 frames used the half-image and 63 also used the quarter-scale image. Becauseof this additional workload the per-frame average time in-creased to 1.52ms.

The matching performance on these test sequences sug-

Figure 5. Increasing the range of viewpoint bins in the training setallows more viewpoint invariance to be added in a straightforwardmanner.

gests that the bit count dissimilarity score provides a reason-able way of scoring matches. To confirm this we computedthe average number of inlier and outlier matches over all ofthe frames in the two sequences, and plotted these againstthe dissimilarity score obtained for the match in Figure 4.For the sequence on the left where the viewpoints are in-cluded in the training set many good matches are found ineach frame, with on average 9.7 zero-error inliers obtained.The inlier percentage for matches with low dissimilarityscores is also good at over 82% in the zero error case. Theresult that both the number of inliers and the inlier fractiondrop off with increasing dissimilarity score demonstratesthat the simple bit error count is a reasonable measure ofthe quality of a match. The figure provides strong supportfor a PROSAC-like robust estimation procedure once thematches have been sorted by dissimilarity score as the lowerror matches are very likely to be correct.

Even when the viewpoint of the query image is outsidethe range for which features have been trained, as in the dataon the right of Figure 4, the dissimilarity score still providesa reasonable way to sort the matches, as the inlier fractioncan be seen to drop off with increasing dissimilarity. Theinlier rate of the first matches when sorted by dissimilarityscore is still sufficient in most frames to obtain a pose witha robust estimation stage such as PROSAC.

4.2. Controllable Viewpoint InvarianceAs our framework uses independent features for different

viewpoint bins it is possible to trade-off between robustnessto viewpoint variations and computation required for local-isation by simply adding or removing more bins.

For applications where viewpoints are restricted (for ex-ample if the camera has a roughly constant orientation) thenumber of database features can be drastically reduced lead-ing to even higher performance. Alternatively if more com-putational power is available it is possible to increase the

Once you have correspondence, you can compute 3D geometry.

http://mi.eng.cam.ac.uk/~er258/work/fast.html

http://nghiaho.com

Page 13: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

QR Codes

http://en.wikipedia.org/wiki/QR_Code

!

"Quick Response code" invented by Toyota subsidiary Denso Wave in 1994.

Open License

Up to 2.5K of data

Error Correction

Easy to read and generate:

ZXing library

Page 14: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Optical Character Recognition

iPhone 4th Gen

iPod Touch 4th Gen

Page 15: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Optical Character Recognition

Page 16: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

CommentaryUbiquitous Surveillance...extreme dislike.

Birthday Paradox...The probability that, in a set of n randomly chosen people, some pair of them will have the same birthday.

Page 17: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

CommentaryVideo Cameras may fit the criteria of legally blind.

Page 18: Computer Vision: Extracting Data from the Visual World

Copyright 2011 - Componica, LLC (http://www.componica.com/)

Computer visioning technology and society: opportunities, possibilities:

Smartphones that ID diseases, plants, insects.

Robotic lawnmowers that don’t run over the neighbor’s cat.

Computers that judge emotions by reading your face.

Keyless entry based on face, iris.

Automated inspection of manufactured parts.

Conclusion