Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
-
Upload
xavier-giro -
Category
Technology
-
view
116 -
download
2
Transcript of Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs
![Page 1: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/1.jpg)
Temporal Action Localization in Untrimmed Videos via Multi-Stage CNNs
Slides by Alberto MontesComputer Vision Group Reading Group,
June 13th, 2016
[arXiv] [code]
Zheng Shou, Dongang Wang and Shih-Fu Chang
![Page 2: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/2.jpg)
Introduction
![Page 3: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/3.jpg)
Previous Work
Improved Dense Trajectory (iDT)
Fisher Vector2D Convolution
![Page 4: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/4.jpg)
Segment-CNN
![Page 5: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/5.jpg)
Segment-CNN
![Page 6: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/6.jpg)
Segment-CNN
![Page 7: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/7.jpg)
Segment-CNN
![Page 8: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/8.jpg)
Problem Definition
Video:
frame # frames
Annotations:
Candidates:
action category
action categorystart and ending frame
![Page 9: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/9.jpg)
Multi-Scale Segment Generation
◉ Each frame resized to 171x128 pixels◉ Temporal sliding windows:
○ 16, 32, 64, 128, 256, 512 frames○ 75% overlap
◉ Construct segment s by uniformly sampling 16 frames
![Page 10: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/10.jpg)
Network Architecture
C3D Network
![Page 11: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/11.jpg)
Training Proposal and Classification Network
◉ lr=0.0001 except fc8 lr=0.01, momentum=0.9, weight decay factor=0.0005
◉ Drop lr by factor of 2 every 10K iterations
Proposal Network:
● fc8: 2 nodes
Classification Network:
● fc8: K+1 nodes
![Page 12: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/12.jpg)
Localization Network
Add Custom Loss function
![Page 13: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/13.jpg)
Localization Network
true class label
overlap sensitivity
Try to boost segments with high overlap
Works best with: λ = 1, α = 0.25
![Page 14: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/14.jpg)
Localization Network
Learning target:
![Page 15: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/15.jpg)
Localization Network
![Page 16: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/16.jpg)
Prediction and Post-processing
◉ Keep segments with Ppro
> 0.7◉ Remove background segments◉ P
loc multiply with class-specific frequency of
occurrence for each window length in the training data to leverage window length distribution patterns
◉ NMS based on Ploc
to remove redundancy.
(θ - 0.1)
![Page 17: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/17.jpg)
Experiments
![Page 18: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/18.jpg)
MEXaction2
“Bull Charge Cape” and
“Horse Riding” videos
77 hours of videos
Training set: 1336 instances
Validation set: 310 instances
Test set: 329 instances
Datasets
THUMOS 2014
Temporal Action Detection Task
20 categories
Training set: 2755 videos
Validation set: 1010 videos and 3007 instances
Test set: 1574 videos and 3358 instances
![Page 19: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/19.jpg)
Results MEXaction2
DFT: Dense Trajectory Features + SVM
![Page 20: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/20.jpg)
Results MEXaction2
![Page 21: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/21.jpg)
Results MEXaction2
![Page 22: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/22.jpg)
Evaluation
![Page 23: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/23.jpg)
Evaluation
![Page 24: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/24.jpg)
Evaluation
Impact of individual networks:
![Page 25: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/25.jpg)
Conclusions
Propose a multi-stage framework Semgent-CNN to address temporal action location
![Page 26: Temporal Action Localization in Untrimmed Videos via Multi Stage CNNs](https://reader030.fdocuments.net/reader030/viewer/2022020314/587756581a28ab84388b7559/html5/thumbnails/26.jpg)
“
Thank you!Questions?