Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana...

download Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana Lazebnik, Fei-Fei Li, Rob Fergus, and Antonio Torralba.

If you can't read please download the document

Transcript of Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana...

  • Slide 1
  • Words & Pictures Clustering and Bag of Words Representations Many slides adapted from Svetlana Lazebnik, Fei-Fei Li, Rob Fergus, and Antonio Torralba
  • Slide 2
  • Announcements HW1 due Thurs, Sept 27 @ 12pm By email to [email protected]. No need to include shopping image [email protected] Write-up can be webpage or pdf.
  • Slide 3
  • Document Vectors Represent document as a bag of words
  • Slide 4
  • Origin: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • Slide 5
  • Origin: Bag-of-words models US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • Slide 6
  • Origin: Bag-of-words models US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • Slide 7
  • Origin: Bag-of-words models US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
  • Slide 8
  • Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba
  • Slide 9
  • Bags of features for image classification 1.Extract features
  • Slide 10
  • 2.Learn visual vocabulary Bags of features for image classification
  • Slide 11
  • 1.Extract features 2.Learn visual vocabulary 3.Quantize features using visual vocabulary Bags of features for image classification
  • Slide 12
  • 1.Extract features 2.Learn visual vocabulary 3.Quantize features using visual vocabulary 4.Represent images by frequencies of visual words Bags of features for image classification
  • Slide 13
  • Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 1. Feature extraction
  • Slide 14
  • Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka et al. 2004 Fei-Fei & Perona, 2005 Sivic et al. 2005 1. Feature extraction
  • Slide 15
  • Regular grid Vogel & Schiele, 2003 Fei-Fei & Perona, 2005 Interest point detector Csurka et al. 2004 Fei-Fei & Perona, 2005 Sivic et al. 2005 Other methods Random sampling (Vidal-Naquet & Ullman, 2002) Segmentation-based patches (Barnard et al. 2003) 1. Feature extraction
  • Slide 16
  • Normalize patch Detect patches [Mikojaczyk and Schmid 02] [Mata, Chum, Urban & Pajdla, 02] [Sivic & Zisserman, 03] Compute SIFT descriptor [Lowe99] Slide credit: Josef Sivic 1. Feature extraction
  • Slide 17
  • Slide 18
  • 2. Learning the visual vocabulary
  • Slide 19
  • Clustering Slide credit: Josef Sivic
  • Slide 20
  • 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic Visual vocabulary
  • Slide 21
  • Clustering The assignment of objects into groups (called clusters) so that objects from the same cluster are more similar to each other than objects from different clusters. Often similarity is assessed according to a distance measure. Clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.
  • Slide 22
  • Slide 23
  • Slide 24
  • Any of the similarity metrics we talked about before (SSD, angle between vectors)
  • Slide 25
  • Feature Clustering Clustering is the process of grouping a set of features into clusters of similar features. Features within a cluster should be similar. Features from different clusters should be dissimilar.
  • Slide 26
  • source: Dan Klein
  • Slide 27
  • K-means clustering Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k source: Svetlana Lazebnik
  • Slide 28
  • K-means clustering Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k source: Svetlana Lazebnik
  • Slide 29
  • Slide 30
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • source: Dan Klein
  • Slide 41
  • Slide 42
  • Source: Hinrich Schutze
  • Slide 43
  • Slide 44
  • Hierarchical clustering strategies Agglomerative clustering Start with each point in a separate cluster At each iteration, merge two of the closest clusters Divisive clustering Start with all points grouped into a single cluster At each iteration, split the largest cluster source: Svetlana Lazebnik
  • Slide 45
  • source: Dan Klein
  • Slide 46
  • Slide 47
  • Divisive Clustering Top-down (instead of bottom-up as in Agglomerative Clustering) Start with all docs in one big cluster Then recursively split clusters Eventually each node forms a cluster on its own. Source: Hinrich Schutze
  • Slide 48
  • Flat or hierarchical clustering? For high efficiency, use flat clustering (e.g. k means) For deterministic results: hierarchical clustering When a hierarchical structure is desired: hierarchical algorithm Hierarchical clustering can also be applied if K cannot be predetermined (can start without knowing K) Source: Hinrich Schutze
  • Slide 49
  • 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic
  • Slide 50
  • 2. Learning the visual vocabulary Clustering Slide credit: Josef Sivic Visual vocabulary
  • Slide 51
  • From clustering to vector quantization Clustering is a common method for learning a visual vocabulary or codebook Unsupervised learning process Each cluster center produced by k-means becomes a codebook entry Codebook can be learned on separate training set Provided the training set is sufficiently representative, the codebook will be universal The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest entry in the codebook Codebook = visual vocabulary Codebook entry = visual word
  • Slide 52
  • Example visual vocabulary Fei-Fei et al. 2005
  • Slide 53
  • Image patch examples of visual words Sivic et al. 2005
  • Slide 54
  • Visual vocabularies: Issues How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006)
  • Slide 55
  • 3. Image representation .. frequency codewords
  • Slide 56
  • Image classification (next) Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
  • Slide 57
  • Clustering in Action
  • Slide 58
  • President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Names and Faces Whos in the picture? T.L. Berg, A.C. Berg, J. Edwards, D.A. Forsyth
  • Slide 59
  • Intuition George Bush
  • Slide 60
  • 500k News Corpora Producer and director Bruce Paltrow has died at the age of 58 in Rome, Italy, the U.S. Consulate said on October 3, 2002. Paltrow had suffered from throat cancer for several years, but the cause of his death was not immediately known. He is seen with his daughter actress Gwyneth Paltrow after the Academy Awards in Los Angles in March 21, 1999 file photo. (Fred Prouser/Reuters) Actress Winona Ryder (news) reacts to remarks by prosecutor Ann Rundle during the sentencing hearing in her felony shoplifting case Friday, Dec. 6, 2002 at the Beverly Hills, Calif., courthouse. At right is Ryder's attorney Mark Geragos. Ryder was sentenced to three years of probation and was ordered to perform 480 hours of community service. (AP Photo/Steve Grayson, POOL)
  • Slide 61
  • President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Name & Face Extraction Detected Faces
  • Slide 62
  • President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Name & Face Extraction Detected Names: President George W. Bush, Defense Donald Rumsfeld, Saddam Hussein. Detected Faces
  • Slide 63
  • Each name in the dataset is a potential cluster. Want to simultaneously: 1.) Learn image model for each person. 2.) Learn depiction model across names. Achieve both of these by considering a big assignment (clustering) problem. Goal
  • Slide 64
  • Assignment Problem
  • Slide 65
  • Language indicates Depiction President George W. Bush makes a statement in the Rose Garden while Secretary of Defense Donald Rumsfeld looks on, July 23, 2003. Rumsfeld said the United States would release graphic photographs of the dead sons of Saddam Hussein to prove they were killed by American troops. Photo by Larry Downing/Reuters Cues - POS tags before and after name, location in caption, distance to closest: ( ) (L) (C) (R) left right center shown pictured above P(Depicted | Context) Yes/No multiple independent cues
  • Slide 66
  • 1.) Update assignments 2.) Update: appearance model for each person. language model of depiction across names. Iterate 1-2 Method
  • Slide 67
  • Results British director Sam Mendes and his partner actress Kate Winslet arrive at the London premiere of 'The Road to Perdition', September 18, 2002. The films stars Tom Hanks as a Chicago hit man who has a separate family life and co- stars Paul Newman and Jude Law. REUTERS/Dan Chung World number one Lleyton Hewitt of Australia hits a return to Nicolas Massu of Chile at the Japan Open tennis championships in Tokyo October 3, 2002. REUTERS/Eriko Sugita
  • Slide 68
  • US President George W. Bush (L) makes remarks while Secretary of State Colin Powell (R) listens before signing the US Leadership Against HIV /AIDS, Tuberculosis and Malaria Act of 2003 at the Department of State in Washington, DC. The five-year plan is designed to help prevent and treat AIDS, especially in more than a dozen African and Caribbean nations(AFP/Luke Frazza) German supermodel Claudia Schiffer gave birth to a baby boy by Caesarian section January 30, 2003, her spokeswoman said. The baby is the first child for both Schiffer, 32, and her husband, British film producer Matthew Vaughn, who was at her side for the birth. Schiffer is seen on the German television show 'Bet It...?!' ('Wetten Dass...?!') in Braunschweig, on January 26, 2002. (Alexandra Winkler/Reuters) Results
  • Slide 69
  • Without CEO Summit With Martha Stewart Without James Bond With Pierce Brosnan Without Dick Cheney With George W. Bush ModelAccuracy of labeling Vision model, No Lang model67% Vision model + Lang model78%
  • Slide 70
  • Face Dictionary http://tamaraberg.com/faces/faceDict/NIPSdict/index.html
  • Slide 71
  • Results - Depiction Classifier% correct Baseline (all pictured)67% Learned Lang Model86% IN - pictured, OUT - not pictured