Detecting Activities of Daily Living in First-person Camera - People
Transcript of Detecting Activities of Daily Living in First-person Camera - People
![Page 1: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/1.jpg)
1
Detecting Activities of Daily Living in First-person Camera Views
Hamed Pirsiavash, Deva Ramanan
Computer Science Department, UC Irvine
![Page 2: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/2.jpg)
2
Motivation A sample video of Activities of Daily Living
![Page 3: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/3.jpg)
3
Applications Tele-rehabilitation
• Kopp et al,, Arch. of Physical Medicine and Rehabilitation. 1997.
• Catz et al, Spinal Cord 1997.
Long-term at-home monitoring
![Page 4: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/4.jpg)
Applications Life-logging
• Gemmell et al, “MyLifeBits: a personal database for everything.” Communications of the ACM 2006.
• Hodges et al, “SenseCam: A retrospective memory aid”, UbiComp, 2006.
So far, mostly “write-only” memory!
This is the right time for computer vision community to get involved.
4
![Page 5: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/5.jpg)
There are quite a few video benchmarks for action recognition.
Collecting interesting but natural video is surprisingly hard. It is difficult to define action categories outside “sports” domain
Related work: action recognition
5
UCF sports, CVPR’08
UCF Youtube, CVPR’08
KTH, ICPR’04
Hollywood, CVPR’09
Olympics sport, BMVC’10
VIRAT, CVPR’11
5
![Page 6: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/6.jpg)
6
Wearable ADL detection
It is easy to collect natural data
6
![Page 7: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/7.jpg)
7
Wearable ADL detection
ADL actions derived from medical literature on patient rehabilitation
It is easy to collect natural data
7
![Page 8: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/8.jpg)
• Challenges – What features to use? – Appearance model – Temporal model
• Our model – “Active” vs “passive” objects – Temporal pyramid
• Dataset
• Experiments 8
Outline
8
![Page 9: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/9.jpg)
Challenges What features to use?
Low level features
(Weak semantics)
High level features
(Strong semantics)
Human pose
Difficulties of pose: • Detectors are not accurate enough • Not useful in first person camera views
Space-time interest points
Laptev, IJCV’05
9
![Page 10: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/10.jpg)
Challenges What features to use?
Low level features
(Weak semantics)
High level features
(Strong semantics)
Human pose Object-centric features Space-time interest points
Laptev, IJCV’05 Difficulties of pose: • Detectors are not accurate enough • Not useful in first person camera views
10
![Page 11: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/11.jpg)
Challenges Occlusion / Functional state
“Classic” data
![Page 12: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/12.jpg)
Challenges Occlusion / Functional state
Wearable data
“Classic” data
12
![Page 13: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/13.jpg)
Challenges long-scale temporal structure
“Classic” data: boxing
![Page 14: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/14.jpg)
Challenges long-scale temporal structure
time Start boiling
water Do other things (while waiting)
Pour in cup Drink tea
Difficult for HMMs to capture long-term temporal dependencies
Wearable data: making tea
“Classic” data: boxing
![Page 15: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/15.jpg)
• Challenges – What features to use? – Appearance model – Temporal model
• Our model – “Active” vs “passive” objects – Temporal pyramid
• Dataset
• Experiments 15
Outline
15
![Page 16: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/16.jpg)
“Passive” vs “active” objects
Passive Active
16
![Page 17: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/17.jpg)
“Passive” vs “active” objects
Passive Active 17
![Page 18: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/18.jpg)
“Passive” vs “active” objects
Passive Active
Better object detection (visual phrases CVPR’11) Better features for action classification (active vs passive)
18
![Page 19: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/19.jpg)
Appearance feature: bag of objects
Bag of detected objects
fridge TV stove
fridge TV stove
SVM classifier
Video clip
19
![Page 20: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/20.jpg)
Appearance feature: bag of objects
Bag of detected objects
SVM classifier
Video clip
Active fridge
Active stove
Passive fridge
Active fridge
Active stove
Passive fridge
20
![Page 21: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/21.jpg)
Inspired by “Spatial Pyramid” CVPR’06 and “Pyramid Match Kernels” ICCV’05
Temporal pyramid Coarse to fine correspondence matching with a multi-layer pyramid
Temporal pyramid descriptor
Video clip
SVM classifier
time 21
![Page 22: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/22.jpg)
• Challenges – What features to use? – Appearance model – Temporal model
• Our model – “Active” vs “passive” objects – Temporal pyramid
• Dataset
• Experiments 22
Outline
22
![Page 23: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/23.jpg)
Sample video with annotations
23
![Page 24: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/24.jpg)
24
Wearable ADL data collection
• 20 persons • 20 different apartments • 10 hours of HD video • 170 degrees of viewing angle • Annotated
– Actions – Object bounding boxes – Active-passive objects – Object IDs
24
Prior work: • Lee et al, CVPR’12 • Fathi et al, CVPR’11, CVPR’12 • Kitani et al, CVPR’11 • Ren et al, CVPR’10
![Page 25: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/25.jpg)
Average object locations
Active Passive Active Passive
Active Passive
25
Active objects tend to appear on the right hand side and closer – Right-handed people are dominant – We cannot mirror-flip images in training
![Page 26: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/26.jpg)
• Challenges – What features to use? – Appearance model – Temporal model
• Our model – “Active” vs “passive” objects – Temporal pyramid
• Dataset
• Experiments 26
Outline
26
![Page 27: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/27.jpg)
Experiments
Low level features High level features
Our model Object-centric features
24 object categories
Baseline Space-time interest points
(STIP) Laptev et al, BMVC’09
27
![Page 28: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/28.jpg)
28
Accuracy on 18 action categories • Our model: 40.6% • STIP baseline: 22.8%
![Page 29: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/29.jpg)
29
Accuracy on 18 action categories • Our model: 40.6% • STIP baseline: 22.8%
![Page 30: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/30.jpg)
30
Classification accuracy
• Temporal model helps
30
![Page 31: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/31.jpg)
31
• Temporal model helps • Our object-centric features outperform STIP
Classification accuracy
31
![Page 32: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/32.jpg)
32
• Temporal model helps • Our object-centric features outperform STIP • Visual phrases improves accuracy
Classification accuracy
32
![Page 33: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/33.jpg)
33
• Temporal model helps • Our object-centric features outperform STIP • Visual phrases improves accuracy • Ideal object detectors double the performance
Classification accuracy
33
![Page 34: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/34.jpg)
34
• Temporal model helps • Our object-centric features outperform STIP • Visual phrases improves accuracy • Ideal object detectors double the performance
Results on temporally continuous video and taxonomy loss are included in the paper
Classification accuracy
34
![Page 35: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/35.jpg)
35
Summary
Data and code will be available soon!
![Page 36: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/36.jpg)
36
Summary
Data and code will be available soon!
![Page 37: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/37.jpg)
37
Summary
Data and code will be available soon!
![Page 38: Detecting Activities of Daily Living in First-person Camera - People](https://reader036.fdocuments.net/reader036/viewer/2022070222/613c150a22e01a42d40e863c/html5/thumbnails/38.jpg)
38
Thanks!