【ISVC2015】Evaluation of Vision-based Human Activity Recognition in Dense Trajectory Framework
-
Upload
hirokatsu-kataoka -
Category
Science
-
view
1.051 -
download
0
Transcript of 【ISVC2015】Evaluation of Vision-based Human Activity Recognition in Dense Trajectory Framework
Evaluation of Vision-based Human Activity Recognition in Dense Trajectory Framework
Hirokatsu Kataoka, Yoshimitsu Aoki†, Kenji Iwata, Yutaka Satoh
National Institute of Advanced Industrial Science and Technology (AIST) † Keio University
http://www.hirokatsukataoka.net/
Background Computer vision for human sensing
- Detection, Tracking, Trajectory Analysis - Posture Estimation, Activity Recognition - Action recognition is able to extend human sensing applications
Mental state
Body Situation
Attention
Activity Analysis
shakinghands
Look at people
Detection Gaze Estimation
Action Recognition
Posture Estimation
Face Recognition
Trajectory extraction
Tracking
Activity Recognition
“Activity” is a low-level primitive with semantic meaning e.g. walking, running, sitting
This image contains a man walking - The classification (location is given)
Activity recognition - The classification and localization
Activity detection
Walking
Dense Trajectories (DT) [Wang+, IJCV2013] • State-of-the-art space-time recognition approach – State-of-the-art: DT + Deep Learning [THUMOS2015]
– Usable motion analyzer – Simply, (i) flow tracker (ii) feature vectorization
Large amount of opt. flows
[THUMOS2015] http://www.thumos.info/results.html
History of keypoint/traj.-based approach • Space-time interest points (STIP) – DT
STIP: Space-time interest points
[Laptev et al., IJCV2005]
Dense Trajectories[Wang et al., CVPR2011]
[Laptev et al., CVPR2008]
HOG + HOF on STIP
Feature Mining for Activity Recognition
[Gilbert et al., PAMI2011]
Cuboid Features
[Dollar et al., PETS2005]
STR: Spatio-Temporal Relationship Match
[Ryoo et al., ICCV2009]
[Raptis et al., ECCV2010]
Tracklet Descriptors
STIP & DT: Sampling • Space-time interest points (STIP) – DT
STIP: Space-time interest points
[Laptev et al., IJCV2005]
Dense Trajectories[Wang et al., CVPR2011]
Action Bank[Sadanand et al., CVPR2012]
[Laptev et al., CVPR2008]
HOG + HOF on STIP
Feature Mining for Activity Recognition
[Gilbert et al., PAMI2011]
Cuboid Features
[Dollar et al., PETS2005]
STR: Spatio-Temporal Relationship Match
[Ryoo et al., ICCV2009]
[Raptis et al., ECCV2010]
Tracklet Descriptors
Co-occurrence features in DT • Extended co-occurrence feature (ECoHOG) – Feature • CoHOG[Watanabe, PSIVT2009] (pair-count), ECoHOG (edge-magnitude accum.) • PCA for codeword • DT+Co-occurrence features (62.4%) > DT (59.2%) on MPII cooking
CoHOG
ECoHOG
H. Kataoka+, “Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity Recognition”, in ACCV2014.
Need for more features!
Pose-based approach
Holistic appraoch
Proposal • Feature evaluation for more better performance – Evaluation of 13 features at fair settings – 5 Category • Trajectory: traj. feature (originally in DT) • Shape: HOG, SIFT • Motion: HOF, MBHx, MBHy, MIP • Texture: HLAC, LBP, iLBP, LTP • Co-occurrence: CoHOG, ECoHOG
– 4 different datasets • NTSEL (traffic) • INRIA surgery (surgery) • MSR daily activity 3d (daily living) • UCF50 (sports)
Simple algorithm • (i) Flow tracking – Pyramidal images & sampling – Farneback optical flow & flow tracking
• (ii) Feature vectorization – HOG, HOF, MBH, Trajectory, SIFT, LBP….. – Bag-of-words (BoW) representation
Pyramidal images & sampling • Scaling and dense sampling
– Pyramidal images • Scales *= 1/√2
– Sampling at each scale • Grid: 5x5 [pxls] (experimentally decided) • Corner detection T: threshold, λ: eigen value
Scale invariant Detailed description
Farneback Optical Flow • Dense Optical Flow + ST-patch – Farneback Optical Flow is included OpenCV – Comparison of KLT tracker and SIFT – Local space-time patch around tracked sampling points
Noises
Tracking-error
Trajectory-based feature • Trajectory shape – Calculating flow between frames – Scale normalization
Pt = (Pt+1 − Pt) = (xt+1 − xt, yt+1 − yt)
[Wang+, IJCV2013]
Shape-based feature • HOG, SIFT
Edge-orient., mag. from block representation with overlapping and normalization
Edge-shape from background
Simply divided 4x4 blocks
[Lowe, IJCV2004]
[Dalal+, CVPR2005]
Motion • HOF, MBHx, MBHy, MIP
Block optical flow extraction
Quantization
Motion boundary with dense optical flow [Dalal+, ECCV2006]
Trinary (-1, 0, +1) from block flow direction, [Kliper-Gross+, ECCV2012]
[Laptev+, CVPR2008]
Texture • HLAC, LBP, iLBP, LTP
Higher-order local auto-correlation 0-, 1st-, 2nd- order pattern
Texture binarization in a 3x3 patch, [Ojala+, TPAMI2002]
[Otsu+, IAIP1988] [Kobayashi+, ICPR2004]
Co-occurrence • Extended co-occurrence feature (ECoHOG) – Feature • CoHOG[Watanabe, PSIVT2009] (pair-count), ECoHOG (edge-magnitude accum.) • PCA for codeword • DT+Co-occurrence features (62.4%) > DT (59.2%) on MPII cooking
CoHOG ECoHOG
H. Kataoka+, “Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity Recognition”, in ACCV2014.
Experiments • Evaluation of 13 features in dense trajectory framework – 4 different datasets • Traffic scene (NTSEL dataset): 4 classes • Surgery (INRIA surgery): 4 classes • Daily living (MSR daily action 3D): 12 classes • Sports (UCF50): 50 classes
Results on the 4 datasets • High-performance features – Top three features at each dataset – 4 different scenes
Results on the 4 datasets • High-performance features – CoHOG, SIFT, MBH – CoHOG is the stable accuracy at all datasets
Detailed performance rate • Depending on recognition task! – We need to experimentally concatenate several features – Feature concatenation on the NTSEL and INRIA surgery
Rate of feature concatenation • Baseline, 5 categories and concatenated vector – Baseline: DT + BoW model – Motion and co-occurrence feature – No need to apply all features
Conclusion • We evaluated 13 features in the framework of DT – For more effective activity recognition – 4 different scenes at each dataset – Detailed evaluation and concatenated vectors – Top-N ranked concatenation is needed for activity recognition
Feature extraction Around trajectories
– Extraction of 13 features in ST-patch – 2 (x dir.) x 2 (y dir.) x 3 (t dir.) region – Calculating features with bag-of-words(BoW)
ST-patch and xyt block extraction
13 features extractioin
Trajectory feature • Trajectory shape – フレーム間のフローを算出 – 全体のフローの大きさで正規化
Pt = (Pt+1 − Pt) = (xt+1 − xt, yt+1 − yt)
HOG特徴量 • Histograms of Oriented Gradients (HOG) – 物体のおおまかな形状を表現可能 – 局所領域をブロック分割して特徴取得 – エッジ勾配(下式g(x,y))により量子化ヒストグラム作成 – 勾配毎のエッジ強度(下式m(x,y))を累積
歩行者画像から取得した形状
背景から取得した形状