Intelligent Video Surveillance in Crowded Scenes.pptxgwang/tutorials/talk_ustc.pdf ·...

Intelligent Video Surveillance Intelligent Video Surveillance in Crowded Scenes

Xiaogang Wang

Department of Electronic EngineeringDepartment of Electronic Engineering

The Chinese University of Hong Kong

Intelligent Video Surveillance

The number of surveillance cameras is fast increasingThe number of surveillance cameras is fast increasingThe Heathrow airport in London has 5,000 surveillance cameras

By 2009, China has installed more than 3,000,000 surveillance cameras. This number will increase more than 40% per year in the next five years.

ApplicationsHomeland securityHomeland security

Anti-crime

Traffic control

M it hild ld l d ti t t hMonitor children, elderly and patients at home

…

FunctionsLow-level: detect, track and recognize objects of interest

High-level: understand activities of objects and detect abnormalities

Crowded scenes…

Sparse scenes

Single camera view Multiple camera viewsg p

“The requirements for the next generation of video surveillancesystems are robustness, reliability, scalability and self-y y y fadaptability for crowded, large and complex scenes. ”(Remagnino et al. Machine Vision and Applications, 2007)

Detection and track based

Conventional Approaches for Activity Analysis

Detection and track basedDetection and tracking are unreliable in crowded because of occlusions

f bObject detection and tracking Trajectories of objects

Activity analysis

…

Abnormal activitiesTypical activity categories

Motion based

Conventional Approaches for Activity Analysis

Motion basedCannot separate co-occurring activities

Walk…

…Wave

…

……

Motion feature vectorVideo sequence

Run

Activity analysis

Divide into short video clips

Features of our Approach

Detection and tracking are not requiredDetection and tracking are not required

Separate co-occurring activities

Work robustly in crowded scenesWork robustly in crowded scenes

Unsupervised (no need to manually label training data)

Simultaneously model simple activities more complicated Simultaneously model simple activities, more complicated interactions among objects and global behaviors in the scene

X. Wang, X. Ma and E. Grimson, “Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence(PAMI), Vol. 31, pp. 539-555, 2009.

High level picture of our approach

Motion Features(a)( )

Atomic activities modeled as

distributions over the feature codebook

(b)( )

0

0.05

0.1

0.15

0.2

0.25

0.3

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Global behaviors

… …

modeled as distributions over atomic activities

(c)

Parametric hierarchical Bayesian model

ημ0

0.05

0.1

0.15

0.2

0.25

0.3

2β0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

1β

jc1 2

Global behavior models (L = 2)

jπ0β cβ∞

Gl b l b h i φ φ φ φjiz

Global behaviormodels 1 2 3 4

1φ 2φ 3φ 4φ

Atomic activity models (K=4)

jixNj

M

H kφ∞

Atomic activity… … MAtomic activity

models

Video clip j (j = 1…M)

jπ

Observed feature values of moving pixels

Temporal co-occurrence of moving pixels

Moving pixels in a short video clip Spatial distribution of an atomic activity

Experimental Results

The input is a 90 minutes long traffic video sequenceThe input is a 90 minutes long traffic video sequence

Learned atomic activities from a traffic scene

(1) (2) (3) (4) (5) (6)(1) (2) (3) (4) (5) (6)

(7) (8) (9) (10) (11) (12)

(13) (14) (15) (16) (17) (18)

(19) (20) (21) (22) (23) (24)

(25) (26) (27) (28) (29)

Global behavior I: green light for south/north traffic

prior

index of atomic activities

vehicles southboundvehicles northbound vehicles northbound

vehicles incoming northbound vehicles outgoing eastboundvehicles incoming southbound

Global behavior II: green light for east/west traffic

prior


vehicles incoming westbound vehicles outgoing westbound vehicles outgoing southbound

vehicles outgoing eastboundvehicles incoming eastbound pedestrians westbound

Global behavior III: left turn signal for east/west traffic

prior


vehicles turning left eastbound vehicles outgoing northbound vehicles outgoing northbound

vehicles incoming eastbound vehicles outgoing eastbound vehicles stopping southbound

Global behavior IV: walk sign

prior


pedestrians outgoing eastboundpedestrians incoming eastbound pedestrians westbound

vehicles stopping vehicles stoppingpedestrians westbound

Global behavior V: northbound right turns

prior


vehicles incoming northbound vehicles outgoing eastbound

Temporal video segmentation

green light for east/west traffic walk sign

left turn signal for east/west traffic

green light for south/north traffic northbound right turns

Confusion matrix of video segmentation

Clustering result

Manual label

The average accuracy is 85.74% using our approach.

The average accuracy is 65.6% when modeling atomic activities and global behaviorsThe average accuracy is 65.6% when modeling atomic activities and global behaviorsin two separate steps.

The approaches, such as Zhong et al CVPR’04, of using a motion feature vector t t id li f l thi d tto represent a video clip perform poorly on this data.

Abnormality detection results

Interaction query

vehicles approaching

0.3

0.35

0.4

0.45

0.5

pedestrians crossing the street0.05

0.1

0.15

0.2

0.25

01 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Query distribution

Top four retrieved jay-walking examples

Pedestrians/Vehicles Detection Based on Motions

Atomic activities related to vehicles

Pedestrians/Vehicles Detection Based on Motions

Atomic activities related to pedestrians

Classify Motions into Vehicles (Red) and Pedestrians (Green)

ConclusionConclusionPropose an unsupervised approach for robust activity analysis in crowded scenes

Co-occurring activities are separated without detecting and tracking objectstracking objects

Only using moving pixels as features, this approach is able to detect activities detect activities analyze interactions among objects temporally segment video sequences into global behaviorsdetect abnormalitiesclassify motions into pedestrians and vehicles

Face Sketch Face Sketch Synthesis and Recognition

Xiaogang Wang

Department of Electronic EngineeringDepartment of Electronic Engineering

The Chinese University of Hong Kong

OutlineOutlineApplications

CUHK face sketch database

Face sketch synthesis using a global linear model

Patch-based face sketch synthesis using multi-scale Markov random fields

F k t h itiFace sketch recognition

ApplicationsApplicationsLaw enforcement

Film industry

Entertainment

Query sketch drawn by the artist

Face photos in the police mug-shot databasesSketches synthesized by computer

CUHK Face Sketch Database (CUFS)CUHK Face Sketch Database (CUFS)Publicly available: http://mmlab.ie.cuhk.edu.hk/facesketch.html

188 people from the CUHK student data set

CUHK Face Sketch Database (CUFS)CUHK Face Sketch Database (CUFS)123 people from AR database

CUHK Face Sketch Database (CUFS)CUHK Face Sketch Database (CUFS)295 people from XM2VTS database

L i g B d F Sk t h S th iLearning Based Face Sketch SynthesisGenerate a sketch from an input face photo based on a set of training face photo-sketch pairs

Sketches of different styles can be synthesized by choosing i i f diff ltraining sets of different styles

Face sketch synthesis

Input face photo Synthesize sketch

Training set

Face Sketch Synthesis Using Gl b l Li M d la Global Linear Model

1c 1c1p 1s

transform + + ?Synthesize sketch

… …

ncnc

Input photo

Synthesize sketch

np nsTraining photo sketch pairs

X. Tang and X. Wang, “Face Sketch Recognition,” IEEE Trans. on Circuits and Systems for Video Technology (CSVT), Vol. 14, No. 1, pp. 50-57, 2004.

ResultsResults

Photo Sketch drawn by the artist

Synthesize sketch


Synthesize sketch

Separate Shape and TextureSeparate Shape and Texture

Shape Transformation

Photo shape Sketch shape

Photo Graph matching

Texture Transformation

Synthesized sketch

Photo texture Sketch texture

X. Tang and X. Wang, “Face Sketch Synthesis and Recognition,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2003.

ResultsResults

Without separationseparation

Separate shape & texture


Synthesize sketch


Synthesize sketch


Synthesize sketch


Synthesize sketch


Synthesized sketch by the artistsketch

Patch-Based Face Sketch Synthesis Using M lti S l M k R d Fi ldMulti-Scale Markov Random Fields

X. Wang and X. Tang, “Face Photo‐Sketch Synthesis and Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, pp. 1955-1967, 2009.

ResultsResults


Synthesized sketches after different number of iterations of belief propagation on Markov random fields

After 0 iteration After 5 iteration After 40 iterations


Global linear transform Patch based

Photo Synthesized Drawn by the artist Photo Synthesized Drawn by the artist

Sk t h S th i ith Li hti V i tiSketch Synthesis with Lighting Variations

Sk t h S th i ith P V i tiSketch Synthesis with Pose Variations

Synthesize Photos from SketchesSynthesize Photos from Sketches

By artist Synthesized photo Photo By artist Synthesized photo Photo

Face Sketch RecognitionFace Sketch Recognition306 people for training and 300 people for testing

Table 1 Rank 1 – 10 recognition accuracy using different face sketch synthesis methods (%)

Methods 1 2 3 4 5 6 7 8 9 10

Table 1. Rank 1 10 recognition accuracy using different face sketch synthesis methods (%)

Direct match 6.3 8.0 9.0 9.3 11.3 13.3 14.0 14.0 14.3 16.0

Global linear transform 90.0 94.0 96.7 97.3 97.7 97.7 98.3 98.3 99.0 99.0

Patch based 96.3 97.7 98.0 98.3 98.7 98.7 99.3 99.3 99.7 99.796.3 97.7 98.0 98.3 98.7 98.7 99.3 99.3 99.7 99.7

ConclusionConclusionPropose two face sketch synthesis approaches based on global linear transform and Markov random fields

Face sketch synthesis by linear transform can be improved by separating shape and textureseparating shape and texture

Face sketch recognition can be significantly improved by first transforming face photos into sketchesg p

Image and Video Processing Lab, Department of Electronic Engineering the Chinese University of Hong KongEngineering, the Chinese University of Hong Kong

Video surveillanceMedical imagingg gMachine Learning

Multimedia Lab, Department of Information Engineering, the Chinese University of Hong

Face analysisI hImage searchVideo editing3D reconstruction…

多媒体集成实验室，中科院深圳先进技术研究院

Intelligent Video Surveillance in Crowded Scenes.pptxgwang/tutorials/talk_ustc.pdf ·...

Documents

Transcript of Intelligent Video Surveillance in Crowded Scenes.pptxgwang/tutorials/talk_ustc.pdf ·...