Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf ·...
Transcript of Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf ·...
![Page 1: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/1.jpg)
Detecting activities of daily living in first person camera views
Hamed Pirsiavash, Deva Ramanan
Presented by Dinesh Jayaraman
![Page 2: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/2.jpg)
Wearable ADL detection
Slides from authors (link)
![Page 3: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/3.jpg)
Method - Choice of features
Slides from authors (link)
![Page 4: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/4.jpg)
Method - Choice of features
Slides from authors (link)
![Page 6: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/6.jpg)
Method - Active/Passive objects
Slides from authors (link)
![Page 7: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/7.jpg)
Method - Active/Passive objects
Slides from authors (link)
![Page 8: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/8.jpg)
Method - Temporal pyramid
Slides from authors (link)
![Page 9: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/9.jpg)
Method - Temporal pyramid
Slides from authors (link)
![Page 10: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/10.jpg)
Data● 40 GB of video data● Annotations
○ Object annotations○ 30-frame intervals○ Present/absent○ Bounding boxes○ Active/passive
● Action annotations○ Start time, end time
● Pre-computed:○ DPM object detection outputs○ Active/passive models
![Page 11: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/11.jpg)
Examples
![Page 12: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/12.jpg)
Implementation differencesTemporal pyramid is not really implemented as a pyramid - linear SVM in place of kernel SVM
Locations are not used
![Page 13: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/13.jpg)
Recap - Key ideas● Bag-of-objects representation (instead of
low-level STIP-type approach)● Separate models for active/passive objects● Temporal pyramid
We will now study the impact of each of these
![Page 14: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/14.jpg)
Accuracy- 37%
![Page 15: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/15.jpg)
Taxonomic loss function
![Page 16: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/16.jpg)
![Page 17: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/17.jpg)
Understanding data - 32 ADL actions, 18 selected
● 'combing hair'● 'make up'● 'brushing teeth'● 'dental floss'● 'washing hands/face'● 'drying hands/face'● 'enter/leave room'● 'adjusting thermostat'● 'laundry'● 'washing dishes'● 'moving dishes'● 'making tea'● 'making coffee'● 'drinking water/bottle'● 'drinking water/tap'
● 'making hot food'● 'making cold food/snack'● 'eating food/snack'● 'mopping in kitchen'● 'vacuuming'● 'taking pills'● 'watching tv'● 'using computer'● 'using cell'● 'making bed'● 'cleaning house'● 'reading book'● 'using_mouth_wash'● 'writing'● 'putting on shoes/sucks'● 'drinking coffee/tea'● 'grabbing water from tap'
![Page 18: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/18.jpg)
Understanding data - 32 ADL actions, 18 selected
● 'combing hair'● 'make up'● 'brushing teeth'● 'dental floss'● 'washing hands/face'● 'drying hands/face'● 'enter/leave room'● 'adjusting thermostat'● 'laundry'● 'washing dishes'● 'moving dishes'● 'making tea'● 'making coffee'● 'drinking water/bottle'● 'drinking water/tap'
● 'making hot food'● 'making cold food/snack'● 'eating food/snack'● 'mopping in kitchen'● 'vacuuming'● 'taking pills'● 'watching tv'● 'using computer'● 'using cell'● 'making bed'● 'cleaning house'● 'reading book'● 'using_mouth_wash'● 'writing'● 'putting on shoes/sucks'● 'drinking coffee/tea'● 'grabbing water from tap'
![Page 19: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/19.jpg)
Data available for actions
Not a data issue
Number of instances in data
![Page 20: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/20.jpg)
![Page 21: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/21.jpg)
Method Accuracy
DPM | act +pass | 2 temp levels 19.98%
Results
![Page 22: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/22.jpg)
What does each stage contribute?● Bag-of-objects● Bag-of-active/passive objects● Bag-of-active/passive objects with temporal
ordering
![Page 23: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/23.jpg)
Object occurence
![Page 24: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/24.jpg)
Object presence
![Page 25: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/25.jpg)
![Page 26: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/26.jpg)
Method Accuracy
DPM | act.+pas.| 2 temp levels 19.98%
Ideal | no activity info | no ord. 29.61%
Results
![Page 27: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/27.jpg)
Thresholded bag-of-objects● Object presence duration is an important
cue, but ○ has large variance○ assumes objects with large presence duration are
also important for discrimination● Binary approach counters these
shortcomings but○ loses object presence duration cues○ susceptible to noise without ground truth data. Even
one false positive will have large impact.
![Page 28: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/28.jpg)
Thresholded bag-of-objects● Thresholded bag-of-objects features
compromise○ less noisy○ retains information about which objects are more
and less important
![Page 29: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/29.jpg)
Bag-of-objectsCaptures some notion of the scene.
Action classes that are typically performed in similar settings tend to get confused.
Can action recognition really just be reduced to object detection?
![Page 30: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/30.jpg)
Active and passive objects
![Page 31: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/31.jpg)
Active and passive objects
Significant performance improvements
![Page 32: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/32.jpg)
Method Accuracy
DPM | act.+pas.| 2 temp levels 19.98%
Ideal | no activity info | no ord. 29.61%
Ideal | act. + pas. | no ord 46.12%
Results
![Page 33: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/33.jpg)
Data ambiguityAgain, a large quantity of the data actually collected is not used in the paper, or in the implementation.
Only 21 of 49 passive objects and 5 of 49 active objects are used in the implementation.
This might be a constraint forced by object detection performance.
![Page 34: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/34.jpg)
Active and passive objectsInformation about which objects are being used - crucial cue for action recognition. Captures important information about person's interaction with objects, rather than just looking at objects.Helps disambiguate previously confused action classes performed in similar settings.Large performance boost (from 33.5% to 40% and 29.5% to 46% respectively)
![Page 35: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/35.jpg)
Temporal ordering
![Page 36: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/36.jpg)
Temporal ordering
![Page 37: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/37.jpg)
Method Accuracy
DPM | act.+pas.| 2 temp levels 19.98%
Ideal | no activity info | no ord. 29.61%
Ideal | act. + pas. | no ord 46.12%
Ideal | act. + pas. | 2 temp levels 47.33%
Results
![Page 38: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/38.jpg)
Temporal ordering
Marginal improvement in performance
Does more temporal ordering improve performance?
![Page 39: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/39.jpg)
Three temporal levels
Accuracy - 45.67% (drop from two levels)
![Page 40: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/40.jpg)
Temporal orderingContributes little to classification when ground truth annotations for active and passive objects are known for this dataset
Without active/passive objects, temporal ordering (2 levels) boosts performance from 29.6 to 36.2%
![Page 41: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/41.jpg)
Method Accuracy
DPM | act.+pas.| 2 temp levels 19.98%
Ideal | no activity info | no ord. 29.61%
Ideal | no activity inf| 2 temp lev 36.20%
Ideal | act. + pas. | no ord 46.12%
Ideal | act. + pas. | 2 temp levels 47.33%
Ideal | act. + pas. | 3 temp levels 45.67%
Results
![Page 42: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/42.jpg)
Temporal ordering
Why is temporal ordering more important when not using less data or "non-ideal detectors"?
![Page 43: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/43.jpg)
Can we do better?What we have learnt:● Activity information contributes most● Temporal ordering makes insignificant
difference when activity information is available
● Training data is limited => smaller feature space is preferable
![Page 44: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/44.jpg)
ONLY active objects
![Page 45: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/45.jpg)
ONLY active objects
![Page 46: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/46.jpg)
ONLY Passive objects
![Page 47: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/47.jpg)
ONLY passive objects
![Page 48: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/48.jpg)
Active objects● Deteriorates to 51.63% with two temporal
levels - insufficient training data● We have side-stepped object detection by
using ground truth annotations● Near-ideal active object detection
performance may be very hard to achieve - occlusions etc., so other cues are important for robust performance.
![Page 49: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/49.jpg)
Method Accuracy
DPM | act.+pas.| 2 temp levels 19.98%
Ideal | no activity info | no ord. 29.61%
Ideal | no activity inf | 2 temp lev 36.20%
Ideal | pas. | 2 temp levels 25.04%
Ideal | act. | no ord 56.50%
Ideal | act. | 2 temp levels 51.63%
Ideal | act. + pas. | no ord 46.12%
Ideal | act. + pas. | 2 temp levels 47.33%
Ideal | act. + pas. | 3 temp levels 45.67%
Results
![Page 50: Detecting activities of daily living in first person ...cv-fall2012/slides/dinesh-expt.pdf · Detecting activities of daily living in first person camera views Hamed Pirsiavash, Deva](https://reader036.fdocuments.net/reader036/viewer/2022062916/5ecbc539aab05a781359c052/html5/thumbnails/50.jpg)
● Hamed Pirsiavash and Deva Ramanan, "Detecting activities of daily living in first-person camera views", CVPR 2012
● Examples, dataset and code at http://deepthought.ics.uci.edu/ADLdataset/adl.html