Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

download Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

of 6

Transcript of Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

  • 7/27/2019 Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

    1/6

    978-1-4244-7493-6/10/$26.00 2010 IEEE ICME 2010

    IMAGE TAG REFINEMENT ALONG THE WHAT DIMENSIONUSING TAG CATEGORIZATION AND NEIGHBOR VOTING

    Sihyoung Lee, Wesley De Neve, Yong Man Ro

    Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST),Yuseong-gu, Daejeon, Republic of Korea

    Email: {ijiat, wesley.deneve, ymro}@kaist.ac.kr

    ABSTRACT

    Online sharing of images is increasingly becoming popular,resulting in the availability of vast collections of user-contributed images that have been annotated with user-supplied tags. However, user-supplied tags are often notrelated to the actual image content, affecting the

    performance of multimedia applications that rely on tag- based retrieval of user-contributed images. This paper proposes a modular approach towards tag refinement, takinginto account the nature of tags. First, tags are automaticallycategorized in five categories using WordNet: where,when, who, what, and how . Next, as a start towards afull implementation of our modular tag refinement approach,we use neighbor voting to learn the relevance of tags alongthe what dimension. Our experimental results show thatthe proposed tag refinement technique is able to successfullydifferentiate correct tags from noisy tags along the what dimension. In addition, we demonstrate that the proposedtag refinement technique is able to improve the effectiveness

    of image tag recommendation for non-tagged images.

    Keywords collective knowledge , folksonomy, tagcategorization, tag recommendation, tag refinement

    1. INTRODUCTION

    Thanks to the popularity of easy-to-use multimedia devices,as well as the availability of cheap storage and bandwidth,the number of user-generated images is increasingrapidly [1]. These images are frequently shared on socialmedia applications such as Flickr [2] and Facebook [3]. For example, as of October 2009, Flickr is known to host

    4 billion images [4]. Similarly, more than 2.5 billion photosare uploaded to Facebook each month as of January2010 [5]. As the number of images contributed by users tosocial media applications is increasing at a high rate, the

    problem of organizing and finding relevant images becomesmore apparent for end-users.

    Current techniques for organizing and retrieving user-contributed images are strongly based on freely chosentextual descriptors, so-called user-defined labels or tags.These tags allow providing context to the user-generated

    images: where , when , who , what , and how . To date,online services for image sharing typically support manualimage tagging. For example, Flickr allows its users to assigntags to images when they upload images. The result of

    personal free tagging of images for the purpose of indexingand retrieval is known as an image folksonomy [6].

    As pointed out in [7] and [8], an image folksonomysuffers from two problems when targeting effective retrievalof user-contributed images by means of user-supplied tags:weakly annotated images and noisy tags. The presence of weakly annotated images on social media applications can

    be mainly attributed to the use of manual tagging. Indeed,users often experience manual tagging as cumbersome andtime-consuming. As explained in [6] and [9], noisy tags aretags that are irrelevant to the image content. Specifically,noisy tags include meaningless tags (e.g., asdf and grrr),tags that contain typographical errors, and tags that are notvisually related to the image content. The presence of noisytags in an image folksonomy can be attributed to severalreasons. For instance, people tend to interpret an image by

    using their personal experience, knowledge, and imagination[8]. Further, people tend to use the same tags to annotatedifferent images that have been captured during the sameevent (e.g., batch annotation when uploading images).

    To address the problem of weakly annotated images,tag recommendation is a promising technique. Given animage, tag recommendation engines suggest candidate tagsthat are related to the image content or to the user accordingto a particular criterion (e.g., visual similarity or tagginghistory). Users can then select relevant tags from the list of suggested tags with a minimum of effort. The authors of [8]

    propose a tag recommendation method that consists of twocomplementary strategies, each making use of content

    analysis techniques (i.e., feature extraction). The firststrategy takes advantage of an offline and supervisedclassification technique to annotate images with a controlledconcept vocabulary, while the second strategy relies on the

    propagation of user-contributed tags along visually similar images. The authors of [10] analyze a representativesnapshot of Flickr, focusing on how users tag images andwhat information is contained in the set of user-suppliedtags. The authors subsequently evaluate a number of tagrecommendation strategies, focusing on the use of tag co-

    48978-1-4244-7492-9/10/$26.00 2010 IEEE ICME 2010

  • 7/27/2019 Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

    2/6

    occurrence statistics. The authors of [11] propose a tagexpansion method that incorporates both visual and textualinformation in a folksonomy. Given an image that is to beannotated, a user first has to choose a number of initial tags.

    Next, images are retrieved that have already been annotatedwith the tags initially chosen by the user. According to thevisual similarity between the image to be annotated and theretrieved images, tags associated with the images found arethen weighted for the purpose of tag recommendation. Amethod for inferring semantic concepts from user-contributed images and tags is proposed in [12], adopting asparse graph-based semi-supervised learning approach.

    Little attention has thus far been paid to techniques thatallow differentiating noisy tags from reliable tags [9]. Animportant contribution to the field of image tag refinementhas been made by the authors of [13], using neighbor votingto estimate the relevance of tag assignments. Neighbor voting assumes that tags are likely to reflect objectiveaspects of an image when different persons have labeledvisually similar images using the same tags. Therefore,

    neighbor voting calculates the relevance of a tag withrespect to the content of an input image by accumulatingvotes for the tag from visual neighbors of the input image.

    The authors of [13] use neighbor voting to estimate therelevance of tags, independent of the nature of the tags.However, as observed in [10], tags describe different aspectsof an image, such as visual concepts, location, time, people,feelings, and opinions. Therefore, we argue that it is better to adopt different tag refinement approaches according tothe nature of tags, rather than applying a single tagrefinement strategy to all kinds of tags. For example, therelevance of tags describing location information can beverified with GPS information, whereas the relevance of

    tags describing people can be estimated using facerecognition techniques. Further, the relevance of tagsdescribing time-related events (e.g., sunset) can becomputed using time information that is inherentlyembedded in digital images, whereas the relevance of tagsdescribing emotions can be determined using affectivecontent analysis [14] or even a brain-machine interface [15].

    In this paper, we present a modular approach towardsimage tag refinement. Before learning the relevance of tags,we propose to divide tags into five categories: what ,when , who, where , and how [10]. To automaticallycategorize tags into the aforementioned five categories, wemake use of WordNet [16]. After tag categorization, and asa first step towards a complete implementation of our modular tag refinement approach, we apply neighbor votingto learn the relevance of tags that have been assigned alongthe what dimension, making it possible to differentiatenoisy tags from correct tags.

    Our experimental results show that the proposed tagrefinement technique is able to successfully differentiatecorrect tags from noisy tags along the what dimension. Inaddition, we demonstrate that the proposed tag refinementtechnique allows increasing the effectiveness of image tag

    recommendation for non-tagged images. To the best of our knowledge, this paper describes the first attempt to makeuse of tag categorization as part of a broader approach thataims at mitigating tag noise in an image folksonomy.The remainder of this paper is organized as follows. The

    proposed tag refinement method is discussed in Section 2,while experimental results are provided in Section 3. Finally,our paper is concluded in Section 4.

    2. IMAGE TAG REFINEMENT

    In this section, we describe our modular tag refinementapproach. We start with a description of the proposedsystem architecture.

    2.1. System architecture

    Fig. 1 visualizes the overall architecture of the proposedtag refinement system. Let us assume that an image i isannotated with a set of tags T i. The set T i may contain

    several noisy tags. As shown in Fig. 1, we first make use of WordNet to automatically categorize the tags t in T i alongthe what , where , when , who, and how dimension. Inthis paper, we subsequently estimate the relevance of tagsalong the what dimension by making use of neighbor voting. Tags with a relevance value lower than a particular threshold value are considered to be noisy, and these tagsare removed from the image folksonomy. This process can

    be described using the following equation:

    { },),( |, threshold it relevanceT T t t T what irefined what i >= (1)

    where refined what iT , is a refined set of tags (assigned to i along the

    'what ' dimension), and T what represents a set of tags that belong to the what category. relevance (t , i) represents therelevance of tag t with respect to the image content of i (seeEq. (2)), and represents the logical and operator.threshold is a value that determines whether a tag is noisy or correct.

    Neighbor voting algorithm

    Tag categorization

    Retrieval of visual neighbors

    WordNet

    Image folksonomy

    building,cambridge, car,

    cat, fall, fire,flower, happy,

    joy, lamp, leaf,loss , oil, sad,shells, water

    leaf, water Tag relevance learningwhat

    GPS-based refinementwhere

    refined tags alongthe where dimension

    Time-based refinementwhen

    refined tags alongthe when dimension

    Model-based refinementhow

    refined tags alongthe how dimension

    Face recognition-basedrefinement

    who

    refined tags alongthe who dimension

    Scope of the paper

    Fig. 1. The architecture of the proposed tag refinement system

    49

  • 7/27/2019 Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

    3/6

    2.2. Tag refinement

    2.2.1. Image tag categorization

    Similar to [10], we adopt WordNet to categorize the tags inthe publicly available MIRFLICKR-25000 database [18].The authors of [10] map tags onto the WordNet nounsemantic categories such as act, animal, artifact, food, group,location, object, person, plant, substance, and time. When atag can be mapped onto multiple noun semantic categories,we map the tag to the category with the highest ranking (asalso done by the authors of [10]).

    Fig. 2 shows the distribution of the tags in theMIRFLICKR-25000 database, taking into account thefrequency of the tags. We observed that 46% of the tags can

    be classified, either describing location information (8%),artifacts or objects (10%), people or groups (6%), actions or events (3%), time information (2%), or other information(17%), while 54% of the image tags cannot be classified(e.g., tags containing typographical errors and tags

    representing neologisms).

    unclassified

    locations

    artifacts or objects

    people or groups

    actions or events

    time

    other

    Fig. 2. Tag distribution in the MIRFLICKR-25000 database

    Table I shows how we mapped the WordNet nounsemantic categories onto the following five categories:what , when , who, where , and how . Looking at theset of tags that can be classified, we find that tags are mostfrequently assigned along the what dimension (46%),followed by how (19%), where (17%), who (12%), andwhen (6%). This illustrates that the what dimension is themost important from a quantitative point-of-view.

    Table I. Mapping of WordNet categories onto our categories

    Our categories WordNet semantic noun categories

    what animal, artifact, attribute, body, food, object, phenomenon, plant, shape, something, substance

    where location, space

    who person, group

    when time, event

    how act, cognition, communication, feeling, motive, possession, process, quantity, relation, state

    2.2.1. Tag relevance learning

    After tag categorization, the relevance of tags along thewhat dimension is estimated using the neighbor votingtechnique discussed in [13]. Neighbor voting consists of twoconsecutive steps: retrieval of visual neighbors of a giveninput image and tag relevance learning. As described in [13],we make use of content-based k-nn search in order to findvisual neighbors, taking into account the unique user constraint.

    The relevance of a tag with respect to the content of theinput image is determined by the number of votes receivedfor the tag from visual neighbors. Let V i be a set of visualneighbors of the input image i, then the relevance of t withrespect to the content of i can be formulated as follows:

    ),,(),(),( iiV j

    V t Prior t jvoteit relevance =

    (2)

    where j is an image in V i and where vote ( j, t ) represents a

    voting function, returning one when j has been annotated byt , and returning zero otherwise. Prior (t , V i) represents the prior frequency of t , and this prior frequency isapproximated as:

    ,),( I

    I V V t Prior t ii = (3)

    where I is an image folksonomy and where I t represents aset of images annotated with t . Further, || denotes thenumber of elements in a set.

    3. EXPERIMENTS

    This section discusses a number of experiments that analyzethe effectiveness of the proposed tag refinement approach.Specifically, we investigate the ratio of noisy tags in theimage folksonomy before and after executing tagrefinement. In addition, we quantify the effect of tagrefinement on the effectiveness of image tagrecommendation. We start by describing the imagefolksonomy used in our experiments.

    3.1. Image folksonomy

    The image set used in our experiments needs to contain

    information about the three fundamental constituents of animage folksonomy [17]: images, tags, and users. As such,we have used the publicly available MIRFLICKR-25000database in our experiments [18]. This database consists of 25,000 Flickr images that have been annotated with 223,537tags assigned by 9,862 users (the average number of tags per image is 8.94). To evaluate the effectiveness of the proposedtag refinement method, we created a test set that consists of 200 strongly annotated images with at least five tags alongthe ' what ' dimension. The 200 selected images were

    50

  • 7/27/2019 Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

    4/6

    annotated with 4,862 tags along the what dimension (theaverage number of tags per image is 24.3). All tagsassociated with the 200 images were manually classified aseither correct or noisy: 1,247 tags were identified as beingnoisy, whereas 3,615 tags were found to be correct.

    To find visual neighbors, each image was represented by the 256-dimensional Scalable Color Descriptor (SCD), asdefined in the MPEG-7 standard [19]. The visual distance

    between images was measured using the L1 metric, asrecommended by the MPEG-7 standard.

    3.2. Evaluation criteria

    In our experiments, we adopted the noise level ( NL) metricand the precision at rank k ( P@k ) for evaluating theeffectiveness of tag refinement and tag recommendation,respectively. NL represents the proportion of noisy tags inthe set of all user-supplied tags. If an image folksonomycontains a high number of noisy tags, then NL is close toone. Likewise, when NL is close to zero, the number of

    noisy tags in an image folksonomy is low. P@k representsthe proportion of recommended tags that are relevant to theimage content. The two metrics can be expressed as follows:

    ,T

    T NL

    I ii

    I i

    noisyi

    =

    (4)

    , I k

    T P@k I i

    relevant i

    =

    (5)

    where I is the set of test images and where i is a test image

    in I . Further, iT is the set of tags assigned to i, and noiseiT isthe set of noisy tags in iT . The parameter k represents thenumber of recommended tags for i, and relevant iT is the set of tags that are relevant to i (among all recommended tags).

    3.3. Experimental results

    3.3.1. Effectiveness of tag refinement

    Table II summarizes the objective performance of our tagrefinement technique (i.e., tag categorization, followed byneighbor voting along the what dimension), compared to

    the performance of tag refinement only making use of neighbor voting [13]. Before tag refinement, a substantialamount of noisy tags are present. The proposed technique isable to remove 3,395 noisy tags, while tag refinement usingneighbor voting is able to remove 3,037 noisy tags (alsoremoving about 600 correct tags for both refinementtechniques). To summarize, the proposed refinementtechnique is able to remove more noisy tag assignments thantag refinement only making use of neighbor voting.

    Table II. Objective performance of tag refinement

    Before tagrefinement

    After tag refinement Neighbor voting [13]

    Proposedmethod

    NL 0.774 0.466 0.249

    Fig. 3 visualizes the results shown in Table II, showingthe distribution of noisy tags and correct tags for each of the200 test images. As shown in Fig. 3, tag refinement is ableto considerably reduce the number of noisy tags. Moreover,the proposed tag refinement method is able to eliminatemore noisy tags than the tag refinement approach that onlyuses neighbor voting [13]. The image indexes in Fig. 3 have

    been ranked according to a decreasing number of tags, after having applied the proposed tag refinement method (this is,the order of the image indexes in Fig. 3(c) has been used torank the image indexes in Fig. 3(a) and Fig. 3(b)).

    0

    10

    20

    30

    4050

    60

    70

    80

    1 7 1 3 1 9 2 5 3 1 3 7 4 3 4 9 5 5 6 1 6 7 7 3 7 9 8 5 9 1 9 7 1 0 3

    1 0 9

    1 1 5

    1 2 1

    1 2 7

    1 3 3

    1 3 9

    1 4 5

    1 5 1

    1 5 7

    1 6 3

    1 6 9

    1 7 5

    1 8 1

    1 8 7

    1 9 3

    1 9 9

    N u m

    b e r o f

    t a g s

    Image index

    Noisy tags

    Correct tags

    (a) Before tag refinement

    02468

    101214161820

    1 7 1 3 1 9 2 5 3 1 3 7 4 3 4 9 5 5 6 1 6 7 7 3 7 9 8 5 9 1 9 7 1 0 3

    1 0 9

    1 1 5

    1 2 1

    1 2 7

    1 3 3

    1 3 9

    1 4 5

    1 5 1

    1 5 7

    1 6 3

    1 6 9

    1 7 5

    1 8 1

    1 8 7

    1 9 3

    1 9 9

    N u m

    b e r o

    f t a g s

    Image index

    Noisy tags

    Correct tags

    (b)

    After tag refinement with neighbor voting [13]

    0

    2

    4

    6

    8

    10

    12

    14

    16

    1 7 1 3 1 9 2 5 3 1 3 7 4 3 4 9 5 5 6 1 6 7 7 3 7 9 8 5 9 1 9 7 1 0 3

    1 0 9

    1 1 5

    1 2 1

    1 2 7

    1 3 3

    1 3 9

    1 4 5

    1 5 1

    1 5 7

    1 6 3

    1 6 9

    1 7 5

    1 8 1

    1 8 7

    1 9 3

    1 9 9

    N u m

    b e r o

    f t a g s

    Image index

    Noisy tags

    Correct tags

    (c) After tag refinement with the proposed method

    Fig. 3. Distribution of noisy and correct tags

    Fig. 4 contains a number of example images that

    illustrate the subjective performance of the different tagrefinement techniques. In Fig. 4, correct tags have beenunderlined. In addition, tags have been ranked according todecreasing relevance values. For the four images shown inFig. 4, tag refinement using the proposed method performs

    better than tag refinement only making use of neighbor voting. The proposed method preserves more correct tagsand removes more noisy tags for all of the example imagesshown in Fig. 4.

    51

  • 7/27/2019 Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

    5/6

    Image

    Tags

    Before tag refinementAfter tag refinement

    With neighbor voting [13] With proposed method

    1224, 12mm, 24mm,abandoned, angle,apartments, blue,

    buildings, clouds,colourful, condos, d80,danshui, explore, filter,houses, nikon, ocean,

    pano, panorama, polariser, polarizer, rama,

    scary, seaside, side, sky,taiwan, tokina, water,wide, wideangle, ,

    ,

    abandoned, nikon, buildings, panorama, sky,

    explore, d80, filter buildings, sky, filter

    5x, canada, canadian,casio, cloud, clouds,

    crowsnest, exf1, frank,grass, hdr, highway,

    mountain , pass, photomatix, pine, rockies,

    rocky, slide, tree

    clouds, hdr, tree,mountain, cloud,

    photomatix, grass ,canada

    clouds, tree, mountain,cloud, grass

    bc, beach, british, cacade,canada, casio, cloud,

    clouds, columbia, exf1,firstquality, forest, fpg,

    fun, hike, hiking, island, juandefuca, lagoon,

    lighthouse, metchosin,mountain, mountains,

    ocean, picnic, play, range,sand, state, trail,

    vancouver, victoria,washington, water, wave,

    waves, witty, wittys

    clouds, beach, water,ocean, cloud, sand,

    mountains, island, wave,mountain, canada,

    washington, forest, trail,waves, hiking

    clouds, beach, water,ocean, cloud, sand,mountains, island,

    mountain, trail

    7th, 7thstreet, alley, bessie, bessiecarmichael, blue, building, california,

    corridor,elementaryschool, fence,foundinsf, guesswheresf,

    gwsf, p erspective, primary, primarycolors,

    red, redyellowblue,sanfrancisco, school,

    sfusd, slot, soma, wall,yellow

    blue, red, wall, california, building, fence,

    sanfrancisco, yellow

    blue, red, wall, building,fence, yellow, corridor,

    alley, soma

    Fig. 4. Subjective performance of tag refinement

    The difference between the proposed tag refinementapproach and tag refinement using neighbor voting becomesapparent when investigating why several noisy tags have

    been removed or kept. As described in Eq. (2), neighbor voting strongly depends on the popularity or frequency of tags that have been assigned to the visual neighbors of agiven input image. As such, low relevance values areassigned to tags with a low frequency (which are typicallynoisy). For example, neighbor voting successfully removesthe tags 1224, 12mm, and 24mm. However, neighbor voting does not remove the tag nikon, which is one of themost frequent tags in the image set used. The proposed tagrefinement method, however, is able to remove the tagnikon thanks to the use of tag categorization (which isunable to classify the tag nikon into one of the WordNetnoun semantic categories).

    3.3.2. Influence of tag refinement on tag recommendation

    To quantify the influence of tag refinement on the performance of tag recommendation, we investigated theeffectiveness of folksonomy-based image tagrecommendation before and after image tag refinement. Weused the folksonomy-based tag recommendation approachdescribed in [20]. The authors of [20] describe a tagrecommendation technique that makes use of visualsimilarity and tag frequency. In addition, to construct arefined folksonomy (i.e., a folksonomy that has been the

    subject of tag refinement), we remove tags with a relevancelower than 1.1, using the same experimental settings for neighbor voting as described in [13].

    Table III shows the tag recommendation performancewhen using the two different tag refinement approaches.Similar to [6], we can observe that the presence of noisy tagassignments decreases the effectiveness of tagrecommendation. Further, the effectiveness of image tagrecommendation in terms of P@5 increases withapproximately 5% (from 0.315 to 0.330) for tag refinementusing neighbor voting, while the proposed image tagrefinement method allows increasing the effectiveness of image tag recommendation with about 29% (from 0.315 to0.404).

    Table III. Influence of tag refinement on tag recommendation

    Originalfolksonomy

    Refined folksonomyWith neighbor

    voting [13]With proposed

    method

    P@1 0.285 0.505 0.680 P@5 0.315 0.330 0.405

    Fig. 5 contains a number of example images that showthe effect of tag refinement on the effectiveness of tagrecommendation. Among the recommended tags, reliabletags related to the image content have been underlined. Inaddition, tags have been ranked according to decreasingrelevance values. It is clear that tag recommendation after tag refinement suggests more relevant tags than tagrecommendation before tag refinement. Before tagrefinement, only five tags are on average relevant to theexample images shown in Fig. 5. After tag refinement using

    neighbor voting and tag refinement using tag categorizationfollowed by neighbor voting along the what dimension, sixand seven tags are on average relevant to the exampleimages, respectively.

    Image

    Recommended tags

    OriginalfolksonomyRefined folksonomy

    With neighbor voting [13] With proposed method

    explore, sky, blue, water,nature, yellow, clouds,

    nikon, canon, 2007

    sky, explore, blue, nature,yellow, water, canon,

    clouds, white, geotagged

    sky, blue, nature, yellow,water, clouds, building,

    architecture, beach, snow

    explore, nature, green,sky, blue, macro, wat er,

    nikon, flower, clouds

    nature, green, sky, blue,explore, macro, yellow,

    canon, flower, water

    nature, green, sky, blue,yellow, flower, water,

    clouds, bird, grass

    sky, explore, blue, clouds,nature, water, landscape,nikon, canon, geotagged

    sky, blue, clouds, nature,explore, water, landscape,

    hdr, yellow, canon

    sky, blue, clouds, nature,water, yellow, snow, bird,

    lake, beach

    sky, blue, explore, water,clouds, nature, sunset,landscape, snow, nikon

    blue, sky, water, clouds,snow, canon, sunset,explore, lake, bird

    blue, sky, water, clouds,nature, snow, lake, bird,

    beach, architecture

    Fig. 5. Example images illustrating the influence of tag refinement

    on tag recommendation for non-tagged images

    52

  • 7/27/2019 Image Tag Refinement Along the 'What' Dimension using Tag Categorization and Neighbor Voting

    6/6

    Comparing the impact of the two refined folksonomieson image tag recommendation, the tags recommended usingthe folksonomy refined with the proposed method can beconsidered more reliable than the tags recommended usingthe folksonomy refined with neighbor voting. In addition,most of the reliable tags also have a higher ranking. This can

    be attributed to the observation that the proposed tagrefinement technique removes more noisy tags than the tagrefinement technique using neighbor voting (as the proposedtag refinement technique can be considered morespecialized and restrictive).

    4. CONCLUSIONS

    Current image search technology strongly relies on the presence of user-supplied tags for retrieval in vastcollections of user-contributed images. However, noisy tagsexist that are not related to the image content, thus affectingthe usefulness of user-defined tags for retrieving user-contributed images. This paper proposes a modular

    approach towards tag refinement, taking into account thenature of tags. First, tags are automatically categorized infive classes using WordNet: what , when , who, where ,and how. Next, we use neighbor voting to learn therelevance of tags along the what dimension, making it

    possible to differentiate noisy tags from correct tags.Experiments were designed to investigate the

    effectiveness of the proposed tag refinement method,comparing its effectiveness to tag refinement using neighbor voting. Our experimental results indicate that the proposedtag refinement technique is able to successfully differentiatecorrect tags from noisy tags along the what dimension.Specifically, our tag refinement approach reduces the noise

    level of our test set from 0.774 to 0.249, whereas tagrefinement using neighbor voting is only able to decreasethe noise level from 0.774 to 0.466. In addition, weinvestigated the influence of tag refinement on theeffectiveness of image tag recommendation for non-taggedimages, showing that the proposed tag refinement techniqueis able to improve the effectiveness of image tagrecommendation with about 29% when using the P@5 metric, while tag refinement using neighbor voting is onlyable to achieve an improvement of 5% (compared to

    baseline tag recommendation using a folksonomy that hasnot been the subject of tag refinement).

    In this paper, we only considered tag refinement alongthe what dimension. Future research will focus on thedesign and analysis of specialized tag refinement techniquesthat are able to operate along other dimensions. Moreadvanced tag categorization approaches will be studied aswell (see for instance [22], a further evolution of [10]).

    15. REFERENCES

    [1] OECD, OECD Study on the Participative Web: User Generated Content, October 2007.

    [2] Flickr. http://flickr.com/.[3] Facebook. http://facebook.com/.[4] Flickr blog, 4,000,000,000, Oct. 2009. Available on

    http://blog.flickr.net/en/2009/10/12/4000000000/.[5] Facebook statistics. January 2010. Available on

    http://www.facebook.com/press/info.php?statistics.[6] L. Wu, L. Yang, N. Yu, and X. Hua, Learning to Tag, Proc.

    ACM WWW, pp. 361-370, April 2009.[7] D. Liu, M. Wang, L. Yang, X. Hua, and H. J. Zhang, Tagquality improvement for social images, Proc. ICME 2009, pp.250-353, June 2009.

    [8] S. Lindstaedt, R. Mrzinger, R. Sorschag, V. Pammer, and G.Thallinger, Automatic Image Annotation using VisualContent and Folksonomies, Multimedia Tools andApplications, vol. 42, no. 1, March 2009.

    [9] T. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng,NUS-WIDE: A Real-World Web Image Database from

    National University of Singapore, Proc. ACM CIVR, 2009.[10] B. Sigurbjrnsson and R. van Zwol, Flickr Tag

    Recommendation based on Collective Knowledge,International WWW Conference, pp. 327-336, April 2008.

    [11] O. Kucuktunc, S.G. Sevil, A.B. Tosun, H. Zitouni, P.

    Duygulu, and F. Can, Tag Suggestr: Automatic Photo TagExpansion using Visual Information for Photo SharingWebsites, International Conference on Semantic and DigitalMedia Technologies, pp. 6173, Nov. 2008.

    [12] J. Tang, S. Yan, R. Hong, G.-J. Qi, T.-S. Chua, Inferringsemantic concepts from community-contributed images andnoisy tags, Proc. ACM Multimedia, pp. 223-232, Oct. 2009.

    [13] X. Li, C. G. M. Snoek, M. Worring, Learning Social TagRelevance by Neighbor Voting, IEEE Trans. Multimedia,vol. 11, no. 7, pp. 1310-1322, Aug. 2009.

    [14] A. Hanjalic, Extracting Moods from Pictures and Sounds:Towards Truly Personalized TV, IEEE Signal ProcessingMagazine, vol. 23, no. 2, pp. 90-100, March 2006.

    [15] A. Yazdani, J.-S. Lee, T. Ebrahimi, Implicit EmotionalTagging of Multimedia Using EEG Signals and Brain

    Computer Interface, Proc. of the First SIGMM Workshop onSocial Media, pp. 81-88, Oct. 2009.

    [16] C. Fellbaum, WordNet: An Electronic Lexical Database,The MIT Press, 1998.

    [17] R. Jschke, L. Marinho, and A. Hotho, L. Schmidt-Thieme,and G. Stumme, Tag Recommendations in Folksonomies,Proc. 11th European conference on Principles and Practice of Knowledge Discovery in Databases, pp. 506514, Sept. 2007.

    [18] M. J. Huiskes and M. S. Lew, The MIR Flickr RetrievalEvaluation, Proc. ACM International Conference onMultimedia Information Retrieval, pp. 39-43, Oct. 2008.

    [19] B. S. Manjunath et al., Introduction to MPEG-7: MultimediaContent Description Language, John Wiley and Sons, 2002.

    [20] O. Kucuktunc, S.G. Sevil, A.B. Tosun, H. Zitouni, P.Duygulu, and F. Can, Tag Suggestr: Automatic Photo TagExpansion using Visual Information for Photo SharingWebsites, Proceedings of SAMT, pp. 6173, Dec. 2008.

    [21] S. Overell, B. Sigurbjrnsson, R. van Zwol, Classifying Tagsusing Open Content Resources, Proc. of the Second ACMInternational Conference on Web Search and Data Mining, pp.64-73, Feb. 2009.

    53