Visual Model Building

download Visual Model Building

of 48

Transcript of Visual Model Building

  • 8/6/2019 Visual Model Building

    1/48

    1/48

    Faculty of Electrical Engineering Department of Computer EngineeringUniversity of Belgrade, Serbia and Informatics

    Autonomous Visual ModelBuilding based on

    Image Crawling throughInternet Search Engines

    mentor:Milo Savi prof dr Veljko [email protected] [email protected]

    mailto:Savic.LosMi@gmailmailto:Savic.LosMi@gmail
  • 8/6/2019 Visual Model Building

    2/48

    2/48

    Content

    The future of search Introduction

    Generalized Multiple Instance Learning (GMIL)

    Deverse Density (DD)

    The Bag K-Means Algorithm

    Uncertain Labelling Density

    The Bag Fuzzy K-Means Algorithm

    Cross-Modality Automatic Training Experimental Results

    Future Work

  • 8/6/2019 Visual Model Building

    3/48

    3/48

    The future of search

    Modes-internet capabilities deployed in more devices

    -different ways of entering and expressing your queries by voice, naturallanguage, picture, song...

    Its clear that while keyword-based searching is incredibly powerful,its also incredibly limiting.

    Mediavideos, images, news, books, maps, audio, ....

    The 10 blue links offered as results for Internet search can be amazingand even life-changing,but when you are trying to remember the steps to the Charleston,a textual web page isnt going to be nearly as helpful as a video.

  • 8/6/2019 Visual Model Building

    4/48

    4/48

    The future of search

    Personalizationlocation, social context (social graph), ...

    Example:I have a friend who works at a store called 'LF' in Los Angeles.

    The first page of search results on Google,

    'LF' could refer to my friends trendy fashion store,but it could also refer to: low frequency, large format, or a future concept cardesign from Lexus.

    Algorithmic analysis of the users social graph to further refine a queryor disambiguate, it could prove very useful in the future.

    Language

    If the answer exists online anywhere in any language,search enginewill go get it for you, translate it and bring it back in your native tongue.

  • 8/6/2019 Visual Model Building

    5/48

    5/48

    Introduction As the amount of image data increases,

    content-based image indexing and retrieval is becoming increasinglyimportant!

    Semantic model-based indexing has been proposed as an efficientmethod.

    Supervised learning has been used as a successful method to buildgeneric semantic models.

    However, in this approach, tedious manual labeling is needed to buildtens or hundreds of models for various visual concepts.

    This manual annotating process is time- and cost- consuming,

    and thus makes the system hard to scale.

    Even with this enormous labeling effort,any new instances not previously labeled would not be able to bedealt with.

  • 8/6/2019 Visual Model Building

    6/48

    6/48

    Introduction Semi-supervised learning or partial annotation was proposed to

    reduce the involved manual effort.

    Once the database is partially annotated,traditional pattern classification methods are often used to derivesemantics of the objects not yet annotated.

    However, it is not clear how much annotation is sufficient for a

    specific database,and what the best subset of the objects to be annotated is?

    It is desirable to have an automatic learning algorithm,which totally does not need the costly manual labeling process.

  • 8/6/2019 Visual Model Building

    7/48

    7/48

    Introduction Google's Image Search is the most comprehensive image search

    engine on the Web.

    Google gathers a large collection of images for its search engine byanalyzing the text on the page adjacent to the image,the image caption, and dozens of other factors to determine theimage content.

    Google also uses sophisticated algorithms to remove duplicates,and to ensure that the most relevant images are presented first in theresults.

    Traditionally, relevance feedback technique is involved for imageretrieval based on these imperfect data.

    Relevance feedback moves the query point towards the relevantobjects or selectively weighs the features in the low-level featurespace based on user feedback.

  • 8/6/2019 Visual Model Building

    8/48

    8/48

    Introduction However, relevance feedback still needs human involvements.

    Thus, it is very difficulty, if not impossible,to build a large amount of models based on relevance feedback.

    Here is shown that it is possible to automatically build up themodelswithout any human intervention for various concepts for future

    search and retrieval tasks.

  • 8/6/2019 Visual Model Building

    9/48

    9/48

    Introduction

    Figure 1.The framework for autonomous concept learning based on image crawlingthrough Internet search engines

    First of all, images are gathered by image crawling from the Google searchresults.

    Then, using the GMIL solved by ULD,the most informative examples are learned and the model of thenamed concept is built.

    This learned model can be used for concept indexing in other test sets.

  • 8/6/2019 Visual Model Building

    10/48

    10/48

    GENERALIZED MULTIPLEINSTANCE LEARNING (GMIL)

    The whole scheme is based on the Multiple Instance Learning(MIL) approach.

    In this learning scheme, instead of giving the learner labels forindividual examples,the trainer only labels collections of examples, which are called bags.

    A bag is labeled negative if all the examples in it are negative.

    It is labeled positive if there is at least one positive example in it.

    The key challenge in MIL is to cope with the ambiguity

    of not knowing which instances in a positive bag are actually positiveand which are not.

    Based on that, the learner attempts to find the desired concept.

  • 8/6/2019 Visual Model Building

    11/48

    11/48

    GENERALIZED MULTIPLE

    INSTANCE LEARNING (GMIL)Multiple-Instance Learning (MIL) Given a set of instances x1, x2,..., xn , the task in a typical machine

    learning problem is to learn a function:y = f(x1, x2, ...., xn)

    so that the function can be used to classify the data.

    In traditional supervised learning,training data are given in terms (yi, xi) to learn the function forclassifying the data outside the training set.

    In MIL, the training data are grouped into bags X1, X2, ..., Xm withand .

    Instead of giving the labels yi for each instance,we have the label Yi for each bag.

  • 8/6/2019 Visual Model Building

    12/48

    12/48

    GENERALIZED MULTIPLE

    INSTANCE LEARNING (GMIL)Multiple-Instance Learning (MIL) A bag is labeled negative (Y = -1) if all the instances in it are

    negative.A bag is positive (Y = 1) if at least one instance in it is positive.

    The MIL model was first formalized by Dietterich et al. to deal with thedrug activity prediction problem.

    Following that, an algorithm called Diverse Density (DD) wasdeveloped to provide a solution to MIL,which performs well on a variety of problems such as drug activity

    prediction, stock selection, and image retrieval.

    Later, the method is extended in to deal with the real-valued labelsinstead of the binary labels

  • 8/6/2019 Visual Model Building

    13/48

    13/48

    GENERALIZED MULTIPLE

    INSTANCE LEARNING (GMIL)Multiple-Instance Learning (MIL)

    Many other algorithms,such as k-NN algorithms, Support Vector Machine (SVM), and EM

    combined with DD are proposed to solve MIL. However, most of the algorithms are sensitive to the distribution of

    the instances in the positive bags,and cannot work without negative bags.

    In the MIL framework, users still have to label the bags.

    To prevent the tedious manual labeling work,we need to generate the positive bags and negative bagsautomatically.

  • 8/6/2019 Visual Model Building

    14/48

    14/48

    GENERALIZED MULTIPLEINSTANCE LEARNING (GMIL) However, in practical applications,

    it is very difficult if not impossible to generate the positive andnegative bags reliably.

    Without reliable positive and negative bags,DD may not give reliable solutions.

    To solve the problem,we generalize the concept of Positive bags to Quasi-Positivebags,and propose Uncertain Labeling Density (ULD) to solve this

    generalized MIL problem.

  • 8/6/2019 Visual Model Building

    15/48

    15/48

    GENERALIZED MULTIPLEINSTANCE LEARNING (GMIL)

    Quasi-Positive Bag In our scenario,

    although there is a relatively high probability that the concept of interest (e.g.a persons face) will appear in the crawled images,there are many cases that no such association exists.

    If these images instance are used as the positive bags,we may have false-positive bags that do not contain the concept of interest.

    In this case, DD may not be able to give correct results.

    To overcome this problem, we extend the concept of Positive bags toQuasi-Positive bags.

    A Quasi-Positive bag has a high probability to contain a positiveinstance,but may not be guaranteed to contain one.

  • 8/6/2019 Visual Model Building

    16/48

    16/48

    GENERALIZED MULTIPLE

    INSTANCE LEARNING (GMIL)Definition: Generalized Multiple Instance Learning

    (GMIL)

    In the generalized MIL, a bag is labeled negative ( Y=-1 ),

    if all the instances in it are negative.A bag is Quasi-Positive (Y=1),if in a high probability at least one instance in it is positive.

  • 8/6/2019 Visual Model Building

    17/48

    17/48

    Diverse Density (DD)

    One way to solve MIL problems is to examine the distribution of theinstance vectors,and look for a feature vector that is close to the instances in differentpositive bagsand far from all the instances in the negative bags.

    Such a vector represents the concept we are trying to learn.

    Diverse Density is a measure of the intersection of the positivebags minus the union of the negative bags.

    By maximizing Diverse Density,

    we can find the point of intersection (the desired concept).

  • 8/6/2019 Visual Model Building

    18/48

    18/48

    Diverse Density (DD)

    Assume the intersection of all positive bags minus the union of all negativebags is a single point t,we can find this point by :

    B+

    i - ith positive bag

    B-i

    - ith negative bag

    Pr(t|Bi) is estimated by the most-likely-cause estimator,

    in which only the instance in the bag which is most likely to be in the conceptC

    t

    considered:

    Bij jth instance in ith bag

  • 8/6/2019 Visual Model Building

    19/48

    19/48

    Diverse Density (DD)

    The distribution is estimated as a Gaussian-like distribution of:

    where

    For the convenience of discussion, we define Bag Distance as:

    =

  • 8/6/2019 Visual Model Building

    20/48

    20/48

    The Bag K-Means Algorithmfor Diverse Density with theabsence of negative bags

    Bag K-Means algorithm serves to efficiently find the maximum of DDinstead of using the time-consuming gradient descent algorithm.

    It has a similar cost function as the K-Means algorithmbut with a different definition of distance,which we call bag distance - defined on previous slide.

    In our special application, where negative bags are not provided,can be simplified as:

  • 8/6/2019 Visual Model Building

    21/48

    21/48

    The Bag K-Means Algorithm

    It has exactly the same form of the cost function as K-Meansbut with a different definition of d.

    Basically, when there is no negative bag,the DD algorithm is trying to find the centroid of the cluster by K-Means whenK=1 .

    According to this conclusion,

    an efficient algorithm to find the maximum DD by the Bag K-Means algorithmis:

    (1) Choose an initial seed t(2) Choose a convergence threshold (3) For each bag i, choose one example s

    iwhich is closest to the

    seed t , and calculate the distance dt

    i

    (4) Calculate , where N is the total number of bags

    (5) If ||t tnew

    ||

  • 8/6/2019 Visual Model Building

    22/48

    22/48

    The Bag K-Means Algorithm

    The algorithm starts with an initial guess of the target point twhich is obtained by trying instances from Qusi-Positive bags,then an interactive searching algorithm is performed to updatethe position of this target point t so that start equation isachieved:

    Next we provide the proof of convergence of Bag K-Means!!!

  • 8/6/2019 Visual Model Building

    23/48

    23/48

    The Bag K-Means Algorithm Theorem: The Bag K-Means algorithm converges.

    Proof: Assume tiis the centroid we found in the iteration i, and s

    ijis the

    sample obtained in step (3) for bag j. By step (4), we get a new centroid ti+1

    .

    We have:

    with the property of the traditional K-Means algorithm.

    Because of the criterion of choosing new si+1,j , we have:

    Combine these two formulas, we get :

    which means the algorithm decreases the cost function each time.Therefore, this process will converge.

  • 8/6/2019 Visual Model Building

    24/48

    24/48

    Uncertain Labeling Density

    In our generalized MIL,what we have are Quasi-Positive bags, i.e., some false-positive bagsdo not include positive instances at all.

    In a false-positive bag, by the original DD definition,Pr (t|B+

    i) will be very small or even zero.

    These outliers will influence the DD significantly due to themultiplication of the probabilities.

    Many algorithms have been proposed to handle this outlier problemin K-Means.Among them, fuzzy K-Means algorithm is the most well known.

    The intuition of the algorithm is to give different measurements(weights) on the relationship each example belonging to any cluster.The weights indicate the possibility a given example belongs to anycluster.

  • 8/6/2019 Visual Model Building

    25/48

    25/48

    Uncertain Labeling Density

    By assigning low weight values to outliers,the effect of noisy data on the clustering process is reduced.

    Here, based on this similar idea from fuzzy K-Means,we propose an Uncertain Labeling Density (ULD)algorithm tohandle the Quasi-Positive bag problem for GMIL.

    Definition: Uncertain Labeling Density (ULD)

  • 8/6/2019 Visual Model Building

    26/48

    26/48

    Uncertain Labeling Density

    t

    i represents the weight of bag i belonging to concept t b (b>1) is the fuzzy exponent.

    It determines the degree of fuzziness of the final solution.Usually, b=2

    Similarly, we get the conclusion

    that the maximum of ULD can be obtained by Fuzzy K-Meanswith the definition of Bag Distancewith maximizing the cost function:

  • 8/6/2019 Visual Model Building

    27/48

    27/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

    The Bag Fuzzy K-Means algorithm is proposed as follows:

    (1) Choose an initial seed t among the Quasi-Positive bags(2) Choose a convergence threshold (3) For each bag i, choose one example s which is closest to t this

    seed, and calculate the Bag Distance dt

    i(4) Calculate

  • 8/6/2019 Visual Model Building

    28/48

    28/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

    (5) If ||t tnew

    ||

  • 8/6/2019 Visual Model Building

    29/48

    29/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

    And the updated weighted mean is set as the current centroid.

    The convergence of this Bag Fuzzy K-Mean algorithmcan be obtained by the previous proof of the Bag K-Means algorithmand the convergence of the original Fuzzy K-Means algorithm.

    Example. Comparison of MIL using Diversity Density and UncertainLabeling Density Algorithms in the case of quasi-positivebags

    Figure 2. shows an Quasi-Positive bags, and without negative bags.Different symbols represent various Quasi-Positive bags.There are two false-positive bags,which are illustrated by the inverse-triangles and circles.

  • 8/6/2019 Visual Model Building

    30/48

    30/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

  • 8/6/2019 Visual Model Building

    31/48

    31/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

    The true intersection point is the instance with the value (9, 9)with intersections from four different positive bags.

    Just by finding the maximum of the original Diverse Density,the algorithm will converge to (5, 5) (labeled with a + symbol)because of the influence of the false-positive bags.

    Figure 2(b) illustrates the corresponding Diverse Density values.

  • 8/6/2019 Visual Model Building

    32/48

    32/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

  • 8/6/2019 Visual Model Building

    33/48

    33/48

    The Bag Fuzzy K-Mean Algorithm forUncertain Labeling Density

    By using the ULD method,it is easy to obtain the correct intersection point with the ULD asshowing in Figure. 2(c).

  • 8/6/2019 Visual Model Building

    34/48

    34/48

    CROSS-MODALITY

    AUTOMATIC TRAINING How to automatically generate the quasi-positive bags in our schemein practice???

    Here we only show the procedure of the cross-modality training onface models!!!

    For generic visual models,the system can use a region segmentation, feature extractionand supervised learning framework.

  • 8/6/2019 Visual Model Building

    35/48

    35/48

    Feature Generation

    Face detectionskin color detection, skin regions determination(Gaussian blurring, thresholding, matematicalmorphological operations,...)

    Eigenface generation

    Quasi-positive bags generations

  • 8/6/2019 Visual Model Building

    36/48

    36/48

    Experimental Examples

    An example of building the face model ofBill Clinton !!!

  • 8/6/2019 Visual Model Building

    37/48

    37/48

    Experimental Examples

  • 8/6/2019 Visual Model Building

    38/48

    38/48

    Experimental Examples

  • 8/6/2019 Visual Model Building

    39/48

    39/48

    Experimental Examples

  • 8/6/2019 Visual Model Building

    40/48

    40/48

    Experimental Examples Illustration of Google Image Search Results for Newt Gingrich

  • 8/6/2019 Visual Model Building

    41/48

    41/48

    Experimental Examples Illustration of the results by our algorithm

  • 8/6/2019 Visual Model Building

    42/48

    42/48

    Experimental Examples Illustration of Google Image Search Results for Hillary Clinton

    4242

  • 8/6/2019 Visual Model Building

    43/48

    43/48

    Experimental Examples Illustration of the results by our algorithm

    Comparing to Google Image

  • 8/6/2019 Visual Model Building

    44/48

    44/48

    Comparing to Google ImageSearch

  • 8/6/2019 Visual Model Building

    45/48

    45/48

    Future work

    Future work include applying this algorithmto learn more general concepts,e.g. outdoor and sports,

    as well as using these learned modelsfor concept detection and search tasksin generic image/video databases!!!

    R f

  • 8/6/2019 Visual Model Building

    46/48

    46/48

    References[1]. Xioadan Song and Ching-Yung Lin and Ming-Ting Sun, Autonomous

    Visual Model Building based on Image Crawling through Internet SearchEngines, New York, USA, October 15-16, 2004

    [2]. X. Song and C.-Y. Lin and M.-T. Sun, Cross-modality automatic facemodel training from large video databases , The First IEEE CVPRWorkshop on Face Processing in Video (FPIV'04), Washington DC, June28, 2004

    [3]. O. Maron, Learning from ambiguity, PhD dissertation, Department ofElectrical Engineering and Computer Science, MIT, Jun. 1998.

    [4]. O. Maron, T. Lozano-Perez, A Framework for Multiple InstanceLearning, Proc. of Neural Information Processing Systems 10, 1998.

    [5]. O. Maron, and A. L. Ratan, Multiple-Instance Learning for Natural Scene

    Classification, Proc. of ICML 1998, 341-349.[6]. R. A. Amar, D. R. Dooly, S. A. Goldman, and Q. Zhang, Multiple-instance

    learning of real-valued data, Proc. of ICML, Williamstown, MA, 2001, 3-10.

  • 8/6/2019 Visual Model Building

    47/48

    47/48

    Questiones?

  • 8/6/2019 Visual Model Building

    48/48

    48/48

    THANK YOU FOR YOURATTENTION!