Big data - University of Pittsburghpeople.cs.pitt.edu/~kovashka/cs3710_sp15/bigdata_phuong.pdfBased...
Transcript of Big data - University of Pittsburghpeople.cs.pitt.edu/~kovashka/cs3710_sp15/bigdata_phuong.pdfBased...
Big dataPhuong Pham
April 9, 15
Learning Everything about Anything:Webly-Supervised Visual Concept
LearningBased on Santosh Divvala’s slide
Motivation
3
Rearing horse Rolling horse Reining horse Eye horse
What are these?
“horse”
A system that can
4
Without human supervision• Biased, non-comprehensive• Concept-specific expertise• Scalability
Approach
5
Overall questions? Or we will go into detail of each component
Retrieve concept’s variations
• Approximately 5000 n-grams per concept
• Several visually non-salient n-grams e.g. “last horse”, “particular horse”
• Classifier-based pruning (5000 -> 1000)• Train/test thumbnail images, if A.P. < threshold, discard
n-gram
• Find good quality n-gram
6
Good quality n-grams
7
quality coverage
Not for finding super-ngrams!
Merge into S if i,j are different?
Super-ngrams
8
Merging similar “good quality” ngrams
Training models
• Train separate DPM per super-ngram
• Pruning noisy components
• Merging similar appearance clustering (~ 50%)
9
Pruning noisy components
10
Noisy components (pruned based on A.P. threshold and frequency)
Merging similar components (as super-ngram)11
Object detection
12
• Their system is comparable to WS_vid_CVPR12 (better in avg)• WS_Img_ICCV11 is the best model at the cost of fully supervised learning (limitation of annotation)
Action detection
13The webly learning give reasonable good result?
Future work
14
Summary
• Advantages• Cut annotation cost
• Combine linguistic and visual semantic may lead to more interesting research topics
• ?
• Disadvantages• Still not defeat state of the art even with training set at
large scale
• ?
15
Questions
16
Scene Completion using Millions of Photographs
Based on James Hays’ slides
Motivation
18
The algorithm
19
Input image Scene Descriptor Image Collection
200 matches20 completionsContext matching+ blending
…
…
Semantic scene matching
20
→Gist scene descriptor from 6 orientation and 5 scales
Find its L2 nearest neighbors in 2.3 million images
21
→
… 200 total
22Graph cut + Poisson blending Hays and Efros, SIGGRAPH 2007
Context matching
Graph cut seam finding
23
For missing regions of the existing image
For regions of the image not covered be the scene match
For other pixels
Result ranking
24
We assign each of the 200 results a score which is the sum of:
The scene matching distance
The context matching distance (color + texture)
The graph cut cost
Qualitative
25
Qualitative - failures
26
Quantitive
• Our: 37%
• Baseline: 10%
27
Human evaluation
28
Discussions
29