Intelligent Video Surveillance in Crowded Scenes.pptxgwang/tutorials/talk_ustc.pdf ·...
Transcript of Intelligent Video Surveillance in Crowded Scenes.pptxgwang/tutorials/talk_ustc.pdf ·...
Intelligent Video Surveillance Intelligent Video Surveillance in Crowded Scenes
Xiaogang Wang
Department of Electronic EngineeringDepartment of Electronic Engineering
The Chinese University of Hong Kong
Intelligent Video Surveillance
The number of surveillance cameras is fast increasingThe number of surveillance cameras is fast increasingThe Heathrow airport in London has 5,000 surveillance cameras
By 2009, China has installed more than 3,000,000 surveillance cameras. This number will increase more than 40% per year in the next five years.
ApplicationsHomeland securityHomeland security
Anti-crime
Traffic control
M it hild ld l d ti t t hMonitor children, elderly and patients at home
…
FunctionsLow-level: detect, track and recognize objects of interest
High-level: understand activities of objects and detect abnormalities
Crowded scenes…
Sparse scenes
Single camera view Multiple camera viewsg p
“The requirements for the next generation of video surveillancesystems are robustness, reliability, scalability and self-y y y fadaptability for crowded, large and complex scenes. ”(Remagnino et al. Machine Vision and Applications, 2007)
Detection and track based
Conventional Approaches for Activity Analysis
Detection and track basedDetection and tracking are unreliable in crowded because of occlusions
f bObject detection and tracking Trajectories of objects
Activity analysis
…
Abnormal activitiesTypical activity categories
Motion based
Conventional Approaches for Activity Analysis
Motion basedCannot separate co-occurring activities
Walk…
…Wave
…
……
Motion feature vectorVideo sequence
Run
Activity analysis
Divide into short video clips
Features of our Approach
Detection and tracking are not requiredDetection and tracking are not required
Separate co-occurring activities
Work robustly in crowded scenesWork robustly in crowded scenes
Unsupervised (no need to manually label training data)
Simultaneously model simple activities more complicated Simultaneously model simple activities, more complicated interactions among objects and global behaviors in the scene
X. Wang, X. Ma and E. Grimson, “Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence(PAMI), Vol. 31, pp. 539-555, 2009.
High level picture of our approach
Motion Features(a)( )
Atomic activities modeled as
distributions over the feature codebook
(b)( )
0
0.05
0.1
0.15
0.2
0.25
0.3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Global behaviors
… …
modeled as distributions over atomic activities
(c)
Parametric hierarchical Bayesian model
ημ0
0.05
0.1
0.15
0.2
0.25
0.3
2β0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
1β
jc1 2
Global behavior models (L = 2)
jπ0β cβ∞
Gl b l b h i φ φ φ φjiz
Global behaviormodels 1 2 3 4
1φ 2φ 3φ 4φ
Atomic activity models (K=4)
jixNj
M
H kφ∞
Atomic activity… … MAtomic activity
models
Video clip j (j = 1…M)
jπ
Observed feature values of moving pixels
Temporal co-occurrence of moving pixels
Moving pixels in a short video clip Spatial distribution of an atomic activity
Experimental Results
The input is a 90 minutes long traffic video sequenceThe input is a 90 minutes long traffic video sequence
Learned atomic activities from a traffic scene
(1) (2) (3) (4) (5) (6)(1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11) (12)
(13) (14) (15) (16) (17) (18)
(19) (20) (21) (22) (23) (24)
(25) (26) (27) (28) (29)
Global behavior I: green light for south/north traffic
prior
index of atomic activities
vehicles southboundvehicles northbound vehicles northbound
vehicles incoming northbound vehicles outgoing eastboundvehicles incoming southbound
Global behavior II: green light for east/west traffic
prior
index of atomic activities
vehicles incoming westbound vehicles outgoing westbound vehicles outgoing southbound
vehicles outgoing eastboundvehicles incoming eastbound pedestrians westbound
Global behavior III: left turn signal for east/west traffic
prior
index of atomic activities
vehicles turning left eastbound vehicles outgoing northbound vehicles outgoing northbound
vehicles incoming eastbound vehicles outgoing eastbound vehicles stopping southbound
Global behavior IV: walk sign
prior
index of atomic activities
pedestrians outgoing eastboundpedestrians incoming eastbound pedestrians westbound
vehicles stopping vehicles stoppingpedestrians westbound
Global behavior V: northbound right turns
prior
index of atomic activities
vehicles incoming northbound vehicles outgoing eastbound
Temporal video segmentation
green light for east/west traffic walk sign
left turn signal for east/west traffic
green light for south/north traffic northbound right turns
Confusion matrix of video segmentation
Clustering result
Manual label
The average accuracy is 85.74% using our approach.
The average accuracy is 65.6% when modeling atomic activities and global behaviorsThe average accuracy is 65.6% when modeling atomic activities and global behaviorsin two separate steps.
The approaches, such as Zhong et al CVPR’04, of using a motion feature vector t t id li f l thi d tto represent a video clip perform poorly on this data.
Abnormality detection results
Interaction query
vehicles approaching
0.3
0.35
0.4
0.45
0.5
pedestrians crossing the street0.05
0.1
0.15
0.2
0.25
01 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Query distribution
Top four retrieved jay-walking examples
Pedestrians/Vehicles Detection Based on Motions
Atomic activities related to vehicles
Pedestrians/Vehicles Detection Based on Motions
Atomic activities related to pedestrians
Classify Motions into Vehicles (Red) and Pedestrians (Green)
ConclusionConclusionPropose an unsupervised approach for robust activity analysis in crowded scenes
Co-occurring activities are separated without detecting and tracking objectstracking objects
Only using moving pixels as features, this approach is able to detect activities detect activities analyze interactions among objects temporally segment video sequences into global behaviorsdetect abnormalitiesclassify motions into pedestrians and vehicles
Face Sketch Face Sketch Synthesis and Recognition
Xiaogang Wang
Department of Electronic EngineeringDepartment of Electronic Engineering
The Chinese University of Hong Kong
OutlineOutlineApplications
CUHK face sketch database
Face sketch synthesis using a global linear model
Patch-based face sketch synthesis using multi-scale Markov random fields
F k t h itiFace sketch recognition
ApplicationsApplicationsLaw enforcement
Film industry
Entertainment
Query sketch drawn by the artist
Face photos in the police mug-shot databasesSketches synthesized by computer
CUHK Face Sketch Database (CUFS)CUHK Face Sketch Database (CUFS)Publicly available: http://mmlab.ie.cuhk.edu.hk/facesketch.html
188 people from the CUHK student data set
CUHK Face Sketch Database (CUFS)CUHK Face Sketch Database (CUFS)123 people from AR database
CUHK Face Sketch Database (CUFS)CUHK Face Sketch Database (CUFS)295 people from XM2VTS database
L i g B d F Sk t h S th iLearning Based Face Sketch SynthesisGenerate a sketch from an input face photo based on a set of training face photo-sketch pairs
Sketches of different styles can be synthesized by choosing i i f diff ltraining sets of different styles
Face sketch synthesis
Input face photo Synthesize sketch
Training set
Face Sketch Synthesis Using Gl b l Li M d la Global Linear Model
1c 1c1p 1s
transform + + ?Synthesize sketch
… …
ncnc
Input photo
Synthesize sketch
np nsTraining photo sketch pairs
X. Tang and X. Wang, “Face Sketch Recognition,” IEEE Trans. on Circuits and Systems for Video Technology (CSVT), Vol. 14, No. 1, pp. 50-57, 2004.
ResultsResults
Photo Sketch drawn by the artist
Synthesize sketch
Photo Sketch drawn by the artist
Synthesize sketch
Separate Shape and TextureSeparate Shape and Texture
Shape Transformation
Photo shape Sketch shape
Photo Graph matching
Texture Transformation
Synthesized sketch
Photo texture Sketch texture
X. Tang and X. Wang, “Face Sketch Synthesis and Recognition,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2003.
ResultsResults
Without separationseparation
Separate shape & texture
Photo Sketch drawn by the artist
Synthesize sketch
Photo Sketch drawn by the artist
Synthesize sketch
Photo Sketch drawn by the artist
Synthesize sketch
Photo Sketch drawn by the artist
Synthesize sketch
Photo Sketch drawn by the artist
Synthesized sketch by the artistsketch
Patch-Based Face Sketch Synthesis Using M lti S l M k R d Fi ldMulti-Scale Markov Random Fields
X. Wang and X. Tang, “Face Photo‐Sketch Synthesis and Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, pp. 1955-1967, 2009.
ResultsResults
Photo Sketch drawn by the artist
Synthesized sketches after different number of iterations of belief propagation on Markov random fields
After 0 iteration After 5 iteration After 40 iterations
Photo Sketch drawn by the artist
Global linear transform Patch based
Photo Synthesized Drawn by the artist Photo Synthesized Drawn by the artist
Photo Synthesized Drawn by the artist Photo Synthesized Drawn by the artist
Sk t h S th i ith Li hti V i tiSketch Synthesis with Lighting Variations
Sk t h S th i ith P V i tiSketch Synthesis with Pose Variations
Synthesize Photos from SketchesSynthesize Photos from Sketches
By artist Synthesized photo Photo By artist Synthesized photo Photo
Face Sketch RecognitionFace Sketch Recognition306 people for training and 300 people for testing
Table 1 Rank 1 – 10 recognition accuracy using different face sketch synthesis methods (%)
Methods 1 2 3 4 5 6 7 8 9 10
Table 1. Rank 1 10 recognition accuracy using different face sketch synthesis methods (%)
Direct match 6.3 8.0 9.0 9.3 11.3 13.3 14.0 14.0 14.3 16.0
Global linear transform 90.0 94.0 96.7 97.3 97.7 97.7 98.3 98.3 99.0 99.0
Patch based 96.3 97.7 98.0 98.3 98.7 98.7 99.3 99.3 99.7 99.796.3 97.7 98.0 98.3 98.7 98.7 99.3 99.3 99.7 99.7
ConclusionConclusionPropose two face sketch synthesis approaches based on global linear transform and Markov random fields
Face sketch synthesis by linear transform can be improved by separating shape and textureseparating shape and texture
Face sketch recognition can be significantly improved by first transforming face photos into sketchesg p
Image and Video Processing Lab, Department of Electronic Engineering the Chinese University of Hong KongEngineering, the Chinese University of Hong Kong
Video surveillanceMedical imagingg gMachine Learning
Multimedia Lab, Department of Information Engineering, the Chinese University of Hong
Face analysisI hImage searchVideo editing3D reconstruction…
多媒体集成实验室,中科院深圳先进技术研究院