Folksonomy-based personalized search by hybrid user...

11
Folksonomy-based personalized search by hybrid user proles in multiple levels $ Qing Du a , Haoran Xie b , Yi Cai a,n , Ho-fung Leung c , Qing Li d , Huaqing Min a , Fu Lee Wang e a School of Software Engineering, South China University of Technology, Guangzhou, China b Department of Mathematics and InformationTechnology, The Hong Kong Institute of Education, N.T., Hong Kong Special Administrative Region c Department of Computer Science and Engineering, Chinese University of Hong Kong, N.T., Hong Kong Special Administrative Region d Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region e Caritas Institute of Higher Education, N.T., Hong Kong Special Administrative Region article info Article history: Received 10 March 2015 Received in revised form 20 October 2015 Accepted 22 October 2015 Available online 8 April 2016 Keywords: Folksonomy Social tagging Web 2.0 User proling Personalized search abstract Recently, some systems have allowed users to rate and annotate resources, e.g., MovieLens, and we consider that it provides a way to identify favorite and non-favorite tags of a user by integrating his or her rating and tags. In this paper, we review and elaborate on the limitations of the current research on user proling for personalized search in collaborative tagging systems. We then propose a new multi- level user proling model by integrating tags and ratings to achieve personalized search, which can reect not only a user's likes but also a his or her dislikes. To the best of our knowledge, this is the rst effort to integrate ratings and tags to model multi-level user proles for personalized search. & 2016 Elsevier B.V. All rights reserved. 1. Introduction Currently, collaborative tagging systems have become increasingly popular, and many social resource sites support a tagging mechanism. For example, bookmarks on Del.icio.us 1 may be tagged in terms of topics of interest by users, and in Flickr, 2 users can upload and annotate their own photos. The resources and tags posted by Web users on these systems are supposed to be highly dependent on their interests, and the tags given by users provide rich information for building more accurate and specic user proles [2]. Further, the tags given by different users to a resource are useful for describing it. This provides a collaborative form of resource description, and such a description is considered to be more meaningful and acceptable from users' perspectives. Given the characteristics of collaborative tagging systems, researchers consider that constructing user and resource proles from collaborative tags is instrumental for personalized resource search. Some studies such as [35], have been conducted to construct user and resource proles from tags in collaborative tagg- ing systems for personalized search. However, there are several limitations in the current user proling methods, which include the following: All current work assumes that of a user's tags are the his or her favorite features. However, users may also use some tags to reect their dislikes. For example, Alice may like science ction movies but not dinosaurs. Thus, she may use the tag dinosaurto annotate all the movies about dinosaurs to remind herself that these movies include dinosaurs. She also might give a very low rating to these movies to indicate her dislike. In other words, this assumption of the current research is not reason- able. This is because a user's tags include not only his or her like tags but also some dislike tags. Current work only models a user's positive preferences in user proling, and ignores his or her negative preferences. Current work adopts a single vector to model a user prole. However, a user prole should include not only a user's most- favorite features but also least-favorite features as well as neutral features. Using only a single vector, it is difcult to reect both most- and least-favorite features of a user simulta- neously. In other words, current work lacks the building blocks to model a user's dislikes. Recently, some systems have allowed users to both rate and annotate resources, e.g., MovieLens. Some researchers have integrated ratings and tags into recommender systems to calculate a feature's co- Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing http://dx.doi.org/10.1016/j.neucom.2015.10.135 0925-2312/& 2016 Elsevier B.V. All rights reserved. A preliminary version of this manuscript has been published in [1]. n Corresponding author. Tel.: þ86 20 39380218. E-mail address: [email protected] (Y. Cai). 1 http://delicious.com 2 http://www.ickr.com Neurocomputing 204 (2016) 142152

Transcript of Folksonomy-based personalized search by hybrid user...

Page 1: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Neurocomputing 204 (2016) 142–152

Contents lists available at ScienceDirect

Neurocomputing

http://d0925-23

☆A prn CorrE-m1 ht2 ht

journal homepage: www.elsevier.com/locate/neucom

Folksonomy-based personalized search by hybrid user profilesin multiple levels$

Qing Du a, Haoran Xie b, Yi Cai a,n, Ho-fung Leung c, Qing Li d, Huaqing Min a, Fu Lee Wang e

a School of Software Engineering, South China University of Technology, Guangzhou, Chinab Department of Mathematics and Information Technology, The Hong Kong Institute of Education, N.T., Hong Kong Special Administrative Regionc Department of Computer Science and Engineering, Chinese University of Hong Kong, N.T., Hong Kong Special Administrative Regiond Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Regione Caritas Institute of Higher Education, N.T., Hong Kong Special Administrative Region

a r t i c l e i n f o

Article history:Received 10 March 2015Received in revised form20 October 2015Accepted 22 October 2015Available online 8 April 2016

Keywords:FolksonomySocial taggingWeb 2.0User profilingPersonalized search

x.doi.org/10.1016/j.neucom.2015.10.13512/& 2016 Elsevier B.V. All rights reserved.

eliminary version of this manuscript has beenesponding author. Tel.: þ86 20 39380218.ail address: [email protected] (Y. Cai).tp://delicious.comtp://www.flickr.com

a b s t r a c t

Recently, some systems have allowed users to rate and annotate resources, e.g., MovieLens, and weconsider that it provides a way to identify favorite and non-favorite tags of a user by integrating his orher rating and tags. In this paper, we review and elaborate on the limitations of the current research onuser profiling for personalized search in collaborative tagging systems. We then propose a new multi-level user profiling model by integrating tags and ratings to achieve personalized search, which canreflect not only a user's likes but also a his or her dislikes. To the best of our knowledge, this is the firsteffort to integrate ratings and tags to model multi-level user profiles for personalized search.

& 2016 Elsevier B.V. All rights reserved.

1. Introduction

Currently, collaborative tagging systems have become increasinglypopular, and many social resource sites support a tagging mechanism.For example, bookmarks on Del.icio.us1 may be tagged in terms oftopics of interest by users, and in Flickr,2 users can upload andannotate their own photos. The resources and tags posted by Webusers on these systems are supposed to be highly dependent on theirinterests, and the tags given by users provide rich information forbuilding more accurate and specific user profiles [2]. Further, the tagsgiven by different users to a resource are useful for describing it. Thisprovides a collaborative form of resource description, and such adescription is considered to be more meaningful and acceptable fromusers' perspectives.

Given the characteristics of collaborative tagging systems,researchers consider that constructing user and resource profiles fromcollaborative tags is instrumental for personalized resource search.Some studies such as [3–5], have been conducted to construct userand resource profiles from tags in collaborative tagg-ing systems for personalized search. However, there are several

published in [1].

limitations in the current user profiling methods, which include thefollowing:

� All current work assumes that of a user's tags are the his or herfavorite features. However, users may also use some tags toreflect their dislikes. For example, Alice may like science fictionmovies but not dinosaurs. Thus, she may use the tag ‘dinosaur’to annotate all the movies about dinosaurs to remind herselfthat these movies include dinosaurs. She also might give a verylow rating to these movies to indicate her dislike. In otherwords, this assumption of the current research is not reason-able. This is because a user's tags include not only his or her liketags but also some dislike tags. Current work only models auser's positive preferences in user profiling, and ignores his orher negative preferences.

� Current work adopts a single vector to model a user profile.However, a user profile should include not only a user's most-favorite features but also least-favorite features as well asneutral features. Using only a single vector, it is difficult toreflect both most- and least-favorite features of a user simulta-neously. In other words, current work lacks the building blocksto model a user's dislikes.

Recently, some systems have allowed users to both rate andannotate resources, e.g., MovieLens. Some researchers have integratedratings and tags into recommender systems to calculate a feature's co-

Page 2: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152 143

occurrence effectiveness [6]. In these systems, some tags of resourcesthat are rated by a user with high or low ratings directly indicate auser's likes and dislikes. Thus, we consider that integrating a user'sratings and tags provides a way to identify their like and dislike tags.In this paper, we review and elaborate on the limitations of the cur-rent work on user profiling for personalized search in collaborativetagging systems. To address these limitations, we propose a newmulti-level user profiling (MUP) model for constructing user profilesto achieve personalized search. The contributions of our work are asfollows.

� We clarify the limitations of current tag-based userprofiling work.

� We hence propose a new three-level model for constructing auser profile by integrating tags and ratings that reflects not onlythe user's likes but also a his or her dislikes. Furthermore, weexpand the three-level model to an n-level model and explorethe relationship between the number of levels and personalizedsearch performance.

� We employ the proposed multi-level user profile to enhancepersonalized search in collaborative tagging systems.

� We conduct experiments with a real data set from MovieLens.3

The results show that our method outperforms other state-of-the-art methods in personalized resource search.

To the best of our knowledge, this is the first effort to integrate theratings and tags to model multi-level user profiles for personalizedsearch.

User ItemsRating Tags

5

3

1

PepperFishSichuan Dish

Pork

Mushroom

2. Related work and background

In this section, we first survey some existing work on semanticlinks, collaborative tagging, and personalized search. We thenexamine and discuss the limitations of these work in terms of userprofiling.

2.1. Semantic links

Luo et al. [7] proposed a discovery algorithm of associatedresources to build the original Association Link Network (ALN) fororganizing loose Web resources. Liu et al. [8] utilized tags and thesurrounding texts of multimedia resources, and integrated theSemantic Link Network and multimedia resources to organize multi-media resources by their semantics. In [9], Liu et al. identified threemajor problems that hinder efficient and reliable communications andproposed a novel in-middle recovery scheme that was achieved bydesigning and implementing a proliferation routing to address them.To construct more energy-efficient topologies, Liu et al. [10] proposeda novel opportunity-based topology control and designed a fully dis-tributed algorithm called CONREAP based on reliability theory torealize it. Further, Liu et al. [11] considered that it is not necessary tohandle every temporal violation in a scientific workflow system andproposed a novel adaptive temporal violation handling point selectionstrategy to avoid unnecessary temporal violation handling. In [12],Wang et al. presented the design and implementation of G-Hadoop, aMapReduce framework that aims to enable large-scale distributedcomputing across multiple clusters. Xu et al. [13] proposed a generalapproach to generate the temporal semantic annotation of semanticrelations between entities by constructing connection entities, lexicalsyntactic patterns, context sentences, context graph, and contextcommunities.

3 http://www.grouplens.org/node/73

2.2. Personalized search in collaborative tagging systems

There are some existing studies that utilize resource and userprofiles to facilitate personalized search in a folksonomy (alsoknown as a collaborative tagging system). Noll and Meinel [4]proposed term frequency (TF) profiles to discover related tags forusers and resources, to provide personalized ranking. Later stu-dies follow the term frequency-inverse document frequency (TF-IDF), Best Matching 25 (BM25) [5], and hybrid [14] paradigms. In[15], TF-IDF was combined with user and resource profiles alongwith the positions of tags by considering two kinds of sources. Inour earlier work [2], we proposed a normalized term frequency(NTF) to model user and resource profiles and compared it withprevious methods. In [16], Gemmell et al. proposed a method topersonalize a users experience within a folksonomy using clus-tering. Kim et al. [17] proposed a new model of tag-based per-sonalized searches to enhance not only retrieval accuracy but alsoretrieval coverage. Han et al. [18] collected user tags from folk-sonomies and mapped them onto an existing domain ontology.By leveraging social tagging as preference indicator, they builttwo latent tag models: a preference model and an annotationmodel. Bouadjenek et al. [19] proposed a new approach toenhancing document representation using social annotations. Jinet al. [20] presented verbal context in folksonomy to capture auser's intention and addressed the irrelevant contextual factorsfor a verbal context model. In [21], Hsu proposed an approach tobuild a tag-based resource profile semantically. The primitivefeatures extracted from images such as color histograms [22],centerlines [23] and texture patterns [24], can be very useful andpowerful for various search applications.

Although there are several works that handle personalizedsearch with tag-based user and item profiles, they have somelimitations. In the following subsection, we examine and discussthese limitations.

2.3. Limitations of current user profiling methods

All current work uses a single vector of tags to represent userpreferences and treats all resources tagged by a user as favorite fea-tures, even mistaking disliked features as favorite features. Whenconstructing a user profile, a single vector is insufficient for reflectingthe degree of like or dislike, as explained in Example 1 (shown inFig. 1).

Example 1. Consider a user Alice who has eaten three dishes andgiven three ratings of 5, 3, and 1 to each of the three dishes. Alice usesthree tags, “Pepper, Fish,” and “Sichuan,” to annotate the dish rated as5, and these three tags indicate her favorite flavors. In addition, for thedish rated 1, she uses two tags, “Mushroom” and “Fish,” to remind

Fish

Fig. 1. Alice's ratings and tags for dishes.

Page 3: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152144

herself this dish does not have tastes that she likes. Hence, the userprofile of Alice would be constructed as follows if we only use a singlevector, as in previous methods.

U!

Alice ¼ Pepper : 1:0;…;Pork : 1:0;Mushroom : 1:0ð ÞIn the above user profile of Alice, the tag “Mushroom” is mistakenlyclassified in the set of favorite tags for Alice. Furthermore, this tag isgiven a high value if we use the TF-IDF to calculate the weight.

Example 1 implies that the traditional user profiling methodsdo not modeli a user profile well when processing tags associatedwith a by low rating, and the results will be even worse if theproportion of these tags is large. In order to obtain a betterunderstanding of this problem, we make the followingobservation.

Observation 1. In the MovieLens data set, about 29.5% of tags arerelated to resources that have ratings lower than 3 (dislikedresources). In particular, about 11% of tags are related to resourcesrated 1 or less (extremely disliked resources). Another interestingcharacteristic is that not all tags correspond to a user's favoritefeatures, e.g., some tags, like “horrible,” “boring,” and “pointless,”may indicate a user's least-favorite features.

According to Observation 1, there are a number of tags ass-coiated with low ratings. Current user profiling methods are basedon only one single vector, and they have two limitations. One isthat a single vector of all a user's is not very accurate when indi-cating a user's preferences, because it may include some disliketags. The other is that they only reflects a user's preferences, butcannot model a user's dislikes. This will result in the loss of someimportant information in the user's profile.

3. Three-level hybrid user profiling

3.1. Relationship between tags and ratings

Some collaborative tagging systems allow users to rate andannotate resources. In these systems, some tags of the resourcesthat are rated by a user with high or low ratings directly indicate auser's positive and negative preferences. Other tags that areassociated with items with both high and low ratings clearlycannot reflect a user's preferences. Given this consideration, wedivide a user's tags into three kinds. Before we classify tags, wefirst define resource types.

Definition 1. A favorite resource x of user i is a resource that isgiven a high rating by user i, i.e., ri;x4 ¼ θ, where ri;x is the ratinggiven by user i for resource x.

Definition 2. A non-favorite resource x of user i is a resource thatis given a low rating by user i, i.e., ri;xo ¼ δ.

Definition 3. An ordinary resource x of user i is a resource that isgiven a medium rating by user i, i.e., δori;xoθ.

Based on the definitions of resources, tags can be divided intothe following three kinds: favorite tags, which exist on favoriteresources only, non-favorite tags, which exist on non-favoriteresources only, and fused tags, which exist on ordinary resourcesonly or on two or three types of resources.

3.2. Hybrid user profiling on three levels

Integrating ratings and tags can help identify a user's most- andleast-favorite features. Based on the three kinds of tags mentioned

above, we propose a three-level user profiling (TUP) method (incontrast to single-level user profiling methods) as follows:

Definition 4. A user profile of user i, denoted by U!

i, is a set ofthree vectors, i.e.,

U!

i ¼ ðU!f

i ; U!a

i ; U!u

i Þ

where U!f

i is the favorite tag vector of user i, U!a

i is the non-

favorite tag vector of user i, and U!u

i is the fused tag vector ofuser i.

The favorite tag vector reflects a user's favorite features andconsists of favorite tags given by the user and is defined as follows:

Definition 5. A favorite tag vector of user i, denoted by U!f

i , is avector of tag:value pairs, i.e.,

U!f

i ¼ tfi;1 : vfi;1; tfi;2 : vfi;2;…; tfi;n : vfi;n

� �

where tfi;x is a favorite tag of user i, vfi;x is the degree to which it is afavorite and, intuitively, can be obtained as follows:

vfi;x ¼

Pny ¼ 1

ri;yMaxratingNf

i

ð1Þ

where Nif is the number of all favorite resources tagged by user i, n

is the number of favorite resources tagged by user i using tag tfi;x,Maxrating is the highest rating a user can give, e.g., in the Movie-Lens data set, Maxrating is 5.0.

The non-favorite tag vector reflects a user's least-favoritefeatures with high confidence and consists of non-favorite tagsgiven by the user, which are defined as follows:

Definition 6. A non-favorite tag vector of user i, denoted by U!a

i ,is a vector of tag:value pairs, i.e.,

U!a

i ¼ ðtai;1 : vai;1; tai;2 : vai;2;…; tai;n : vai;nÞ

where tai;x is a non-favorite tag given by user i, vi;x is the degree towhich it is not a favorite, which, intuitively, can be obtained asfollows:

vai;x ¼Pn

y ¼ 1Minrating

ri;y

Nai

ð2Þ

where Nia is the number of all non-favorite resources tagged by

user i, n is the number of non-favorite resources tagged by user iusing tag tai;x, and Minrating is the lowest rating available, e.g., inMovieLens data set, Minrating is 0.5.

The above two vectors (favorite and non-favoriate tag vectors) canreflect a user's most- or least-favorite features explicitly. Fused tagscannot reflect user's preferences as explicitly and reliably as favorite ornon-favorite tags. However, this kind of tag also plays an important rolein constructing a user profile and is defined as follows.

Definition 7. A fused tag vector of user i, denoted by U!u

i , is avector of tag:value pairs, i.e.,

U!u

i ¼ ðtui;1 : vui;1; tui;2 : vui;2;…; tui;n : vui;nÞ

Page 4: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152 145

where tui;x is a fused tag given by user i to a resource, vui;x is thedegree to which the tag is ambiguous and can be obtained asfollows:

vui;x ¼

PNh ¼ 1

ri;hMaxrating

� PMj ¼ 1

Minrating

ri;jþ PL

k ¼ 1

ri;kMaxrating

� �2

Sð3Þ

where N, M, L, and S are the number of all favorite, non-favorite,ordinary, and total resources annotated by user iwith fused tag tui;x,and ri;x is the real rating given by user i.

3.3. Tag transference

We make the following important observation on TUP userprofiling.

Observation 2. As the rating records increase, a user's favorite(non-favorite) tags may be used to annotate a non-favorite(favorite) resource or ordinary resource. These tags will then betransferred from the favorite (non-favorite) tag vector to the fusedtag vector.

We use Fig. 2 as an example to illustrate Observation 2. In Fig. 2(a),tags A and B are originally associated with high and low ratings,respectively. When the user adopts A for annotations with a low rating

(i.e., 2), A is removed from the favorite tag vector of user U!f

i and be

transferred to fused tag vector U!u

i . Even though tag A may be used toannotate resources with a low rating only once, it will still be trans-

ferred into U!u

i . Similarly, B will also be transferred to U!u

i .Intuitively, tag transference will reduce the tag's proportion in

user favorite vector or user non-favorite vector. Furthermore, tag

transference from U!f

i and U!a

i to U!u

i can help us reserve the “highpurity” tags and remove the “low purity” tags. Here “purity”means the explicit degree of a user's attitude to a tag. For example,Bob use the tag “delicious” to annotate five resources that are allfavorite resources (e.g., ratingsZ4:0), and we can hence consider“delicious” to high purity. Another tag given by Bob is “not bad,”which is used to annotate three resources (two are favoriteresources, one is a non-favorite resource for Bob); thus “not bad”has a relatively low purity. We consider that tag transference inuser profile can help us find which tags are the tags user reallylikes or dislikes with higher confidence.

Resources

5

5

2

2

favorite vector non-favorite vector

fused vector

A B

: tag

: resource

: rating rec

Fig. 2. Example of ta

We assume that if we restrain tag's transference, the accuracy ofpersonalized search will be reduced. In order to verify this assumption,we use a threshold ρ with different values to control tag transferenceas follows.

tag xA

U!f

i ; ifNf

i;x

Ni;xZρ

U!a

i ; ifNa

i;x

Ni;xZρ

U!u

i ; ifNf

i;x

Ni;xoρ and

Nai;x

Ni;xoρ

8>>>>>>>>>><>>>>>>>>>>:

ð4Þ

We use the following example to illustrate Eq. (1) and tagtransference.

Example 2. When we set ρ¼ 0:9, for a particular user i, weassume that if a tag has been used to tag a favorite resource eight

times, it should be put in U!f

i . If this tag is then used to tag a non-

favorite resource twice, then it is transferred into U!u

i because8

2þ8oρ.

We refer to multi-level user profiling method with a threshold fortag transference as MUP@ρ. Higher values of threshold ρ is lead to lessrestraint on tag transference (i.e., tags are easier to transfer to thefused vector according to Eq. (4)), and more tags are transferred intothe fused vector. In addition, note that when the threshold is equal to1.0, it means that there is no restraint on tag transference. In otherwords, tags in the favorite and non-favorite vectors are very easilytransferred to the fused vector and the tag transference is maximized.In our experiments, we compare the performances of TUP and TUP@ρto illustrate the importance of tag transference.

4. General user profile model

Intuitively, a more specialized user profile leads to better per-sonalized search results. As we illustrated above, adopting a set ofthree vectors instead of a single vector to profile a user can reflectnot only a user's positive preferences, but also his or her negativepreferences. However, whether an extremely specialized userprofile results in the best personalized search performance stillremains unproven. To explore the relationship between the

Resources

5

5

2

2

2

4

favorite vector non-favorite vector

fused vector

A B

ord

g transference.

Page 5: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152146

specialization of a user profile and the personalized search per-formance, we propose a multi-level user profiling method calledMUP which generalizes the TUP and expands TUP from three to nlevels.

4.1. Relationship between tags and ratings

When we expand our model to nlevels, it is not reasonable tocarry on classifying the resources into only three categories,favorite, non-favorite, and ordinary resources, as defined inDefinitions 1–3. Hence, the tags should be categorized into n typesaccording to what kind of resources they tag. Therefore, in MUP,we divide the resources into n kinds based on their ratings andclassify tags into n kinds according to average ratings of theresources that they tag.

Example 3. Given n¼5, the resources in the MovieLens dataset(from 0.5 to 5) can be divided into five sentiment levels: 4.5–5.0(favorite), 3.5–4.0 (like), 2.5–3.0 (ordinary), 1.5–2.0 (dislike) and0.5–1.0 (hate). For a user Bob, he watches 2 movies (“Star Trek”and “X-Men”), and rates them as 5 and 3, respectively. He also uses“Science Fiction” to annotate both “Star Trek” and “X-Men”.Therefore, “Star Trek” is one of Bob's first-level (favorite) resources,“X-Men” is a third-level (ordinary) resource, and “Science Fiction”,with its average rating of 4 is one of Bob's “like” tags and is asecond-level tag.

4.2. Multi-level user profiling

Definition 8. A user profile of user i, denoted by U!

i, is a list ofvectors, i.e.,

U!

i ¼ ðU!1

i ; U!2

i ;…; U!k

i ;…; U!n

i Þ

where n denotes the specialization degree of a user profile and U!k

ireflects its sentiment features at the kth level. Higher values of nlead to a more specialized the user profile, which means that theuser profile can reflect more degrees of a user's sentiment,including likes and dislikes. Moreover, as k increases, tag vectors atthe kth level consist of tags that the user prefers less. In other

words, U!1

i contains a user's most-favorite tags and U!n

i includesuser's least-favorite tags. The definition of the kth level tag vector

U!k

i is as follow:

Definition 9. A kth level tag vector of user i, denoted by U!k

i , is avector of tag:value pairs, i.e.,

U!k

i ¼ ðtki;1 : vki;1; tki;2 : vki;2;…; tki;n : vki;nÞ

where tki;x is a tag given by user i to the resources at the kth sen-timent level vki;x is the degree to which it is a kth level tag, whichcan be obtained as one of the three following equation. The firstoption is as follows:

1rkoα : vki;x ¼

Pny ¼ 1

ri;yMaxratingNk

i

ð5Þ

where α is a customized parameter, N is the number of all theresources on the kth sentiment level annotated by user i, n is thenumber of resources at the kth sentiment level tagged by user i usingtag tki;x, and Maxrating is the maximum rating of the kth level, e.g., at alevel in which the average rating ranges from 3.5 to 4.0,Maxrating ¼ 4:0. A higher value of α indicates that the user considers alarger range of ratings to be ratings that show his or her preference.Thus, more resources with lower ratings are perceived as his favorite

resources and more tag vectors of n tag vectors can be treated as tagvectors that reflect his or her likes. The second option is:

βokrn : vki;x ¼

Pny ¼ 1

Minrating

ri;yNk

i

ð6Þ

where β is a customized parameter, N is the number of all theresources at the kth sentiment level annotated by user i, n is thenumber of resources at the kth sentiment level tagged by user i usingtag tki;x, and Minrating is the minimum rating of the kth level, e.g., at thelevel inwhich the average rating ranges from 1.0 to 1.5, Minrating ¼ 1:0.A higher value of β leads to a lower value of n�β and demonstratesthat the user considers a smaller range of ratings as ratings that showhis or her dislikes. Thus, fewer resources with higher ratings aredeemed to be his or her non-favorite resources and fewer tag vectorsof the n tag vectors are treated as tag vectors that reflect his or herdislikes. The final option is

αrkrβ : vki;x ¼bPNh ¼ 1

ri;hMaxrating

� PMj ¼ 1

Minrating

ri;jþ PL

l ¼ 1

ri;lMaxrating

� �2

Sð7Þ

where α and β are the customized parameters illustrated in Eq. (6), Nis the number of all the user's favored resources (between level 1 andlevel α) annotated by user iwith tag tki;x,M is the number of disfavoredresources (between level β and level n) annotated by user i using tagtki;x, L is the number of ordinary resources (between level α and level β)annotated by user i using tag tki;x, S is the number of all the resourcesannotated by user i using tags at the kth level and ri;x is the real ratinggiven by user i.

We further illustrate the construction of a user profile withExample 4.

Example 4. Assuming that α¼3, β¼3 and Bob watches twoadditional movies, “The Twilight”, rated as 2, and “Snow Whiteand the Huntsman”, rated as 1. In addition, he uses “KristenStewart” to tag both “The Twilight” and “Snow White and theHuntsman”. Thus, the tag “Science Fiction” is a second level tagand “Kristen Stewart” is a fourth-level tag with an average ratingof 1.5. Hence, there is no tag in the first-, third- and fifth-level tag

vectors. The second-level tag vector uses and the Eq. (5) is U2Bob

��!¼

“ScienceFiction” : 0ð Þ and the fourth-level tag vector is hence U4Bob

��!¼ “KristenStewart” :

1:521

� �by applying Eq. (6).

5. Resource profiling

We adopt the widely used method of vector space modeling torepresent resource profiles and select NTF [2] to measure theweight of each tag as follows:

Definition 10. A resource profile for a resource j denoted by Rj!

is avector of tag:value pairs:

R!

j ¼ tj;1 : wj;1; tj;2 : wj;2;…; tj;j R!

j j: w

j;j R!

j j

!

where tj;n is a tag that is used to describe resource j, n is the number oftags used to describe resource j, wi;x is the degree to which resource cpossesses the tag (feature) tj;n, and wj;x is obtained as wi;x ¼ Sj;x

Sj, where

Sj;x denotes the number of users who use tag x to annotate resource j,and Sj denotes the number of users who use any tags to annotateresource j.

Page 6: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152 147

6. Personalized search

In a personalized search system, users have different informationneeds (usually represented in the form of user input queries) anddifferent personal interests. Because there are many resources that cansatisfy a user's query, the search results should also satisfy and matchthe user's personal interests by letting the user determine what he orshe actually wants quickly (e.g., by putting the expected result on thetop of the search results). Thus, personalized search determines theinformation that not only satisfies a user's basic query needs but alsothe best match for his or her personal interests. This process can besplit into two sub-processes. One is query relevance measurement,which determines to find out to what extent resources satisfy a user'sbasic query needs and the other is user interest relevancemeasurement.

6.1. Query relevance measurement

In our work, we assume a user query is usually in the form of avector of terms.

Definition 11. A query issued by user i denoted by q!i is a vectorof terms as follows:

q!i ¼ ðtqi;1; tqi;2;…; tqi;mÞ

where tqi;x is a term, and m is the total number of terms in thequery. For example, a user may issue a query consisting of the term“chicken” if he wants to find a chicken dish, or “spicy fish” tosearch for dishes made of fish with spicy flavors.

The objective of traditional search is to rank resources based ontheir relevance to a given query. This is achieved by matching thequery with resource profiles. The relevant score of a resource for aquery can be formally measured by a query relevance function asfollows:

γ : Q � R-½0;1�where R is the set of resources and Q is the set of queries. Function γreturns the relevant score of a resource to a query: the higher rele-vance score, the more relevant the resource is to the query. Cai et al.[2] proposed a query relevance function that regards a user query asfuzzy requirements of the user on resources' content and each term inthe query is a fuzzy requirement for the relevant resources. We applythat function in our work.

γð q!i; R!

jÞ ¼P

wj;x

m� k

m

� �α

; tj;xA q!i ð8Þ

where k is the number of the terms satisfied by resource j in query q!i,m is the total number of terms in the query and α is a parameter usedto adjust the effect of the number of relevant tags in a resource profilefor a query.

6.2. User interest relevance measurement

Resources with high content-relevance to a query may not alwaysbe accepted by the querying user; hence, the final goal of personalizedsearch is to obtain resources that match both the query requirementsand the user's personal interests. The relevance of the users' interestsand resources can be formally defined by a function as follows:

θ : U � R-½0;1�where R is the set of resources and U is the set of users. Function θreturns a score for the relevance of a resource to a user's personalinterests. Higher values of θ indicate that the more relevant theresource is to the user's personal interests. Each resource j has a

resource profile R!

j. The relevance between U!

i and R!

j can be con-sidered as the aggregation of the relevance degrees among the three

vectors (i.e., U!f

i , U!u

i , and U!a

i ) of U!

i and R!

j. We use the weighteduser interest relevance function proposed in [2] to calculate thedegree of relevance degree between two vectors.

θðU!η

i ; R!

jÞ ¼P

lηx :vηi;x

mð9Þ

where m is the total number of terms in the query, ηA ½1;n�, and

lηx ¼wj;xþð1�vηi;xÞð1�wj;xÞ 14wj;x401 wj;x ¼ 10 wj;x ¼ 0

8><>: ð10Þ

The relevance between resource profile R!

j and user profile U!

i canthen be calculated as follows:

θðU!i; R!

jÞ ¼ f ðθðU!1

i ; R!

jÞ;…θðU!n

i ; R!

jÞÞ ð11Þ

where the function f is an aggregation of U!1

i ; U!2

i ;…; U!u

i . In a per-sonalized search system, we should consider not only a user's most-favorite features, but also user's least-favorite features. Hence if aresource possesses some least-favorite features of a user, we shouldreduce the user-interest relevance degree between the user and the

resource. Thus, θðU!k

i ; R!

jÞð1okoαÞ has a positive correlation with

θðU!i; R!

jÞ, θðU!a

i ; R!

jÞðβokonÞ has a negative correlation with

θðU!i; R!

jÞ. Tag vectors without explicit sentiment orientation alsoplay an important role when the data is sparse, so we set a positive

correlation between θðU!k

i ; R!

jÞðαrkrβÞ and θðU!i; R!

jÞ. The fol-lowing is a possible function that satisfies the above criteria:

θ U!

i; R!

j

� �¼ κ1 � θ U

!1

i ; R!

j

� �þ…þκn � θ U

!n

i ; R!

j

� �where

jκ1jþjκ2jþ…þjκnj ¼ 1κ14κ24…4κβ4004κβþ14κβþ24…4κn

8><>: ð12Þ

6.3. Personalized ranking

A personalized search should match both the query require-ments and user's personal interests. Based on the query relevancevalue obtained by γ function and the user interest relevance valueobtained by function θ, we aggregate both into a final personalizedrelevant score so as to obtain the final resource ranking for aparticular user query. We use parameter δ to adjust the queryrelevance and user interest relevance values. The aggregatedfunction is as follows:

RScore q!i; U!

i; R!

j

� �¼ δ � γ q!i; R

!j

� �þ 1�δ� � � θ U

!i; R!

j

� �ð13Þ

where q!i is a query, R!

j is a resource profile, U!

i is a user profile,all of which are in the form of term vectors.

Higher values of RScore q!i; U!

i; R!

j

� �indicate that resource j

should be in a higher position in the result list of query q!i issuedby user i. This is because resource j is highly relevant to not onlythe query, but also the user's interests. We do not adopt multi-plication aggregation here because there could be some resources

that are relevant to the query (i.e., γð q!i; R!

jÞ40) but not relevant

to the user's interest at all (i.e., θðU!i; R!

jÞ ¼ 0). If we use

multiplication to aggregate γð q!i; R!

jÞ and θðU!i; R!

jÞ, then

RScoreð q!i; U!

i; R!

jÞ is zero. Thus, those resources that are relevantto a query to some extent but not relevant to a user's interest at allwill have the same ranking score, i.e., zero. This is not intuitive.

Page 7: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152148

7. Experiments

In this section, we report the results of four main experiments.First, we compare our approach (TUP) with three current personalizedsearch methods in collaborative tagging systems. Second, we conductexperiments using MUP method with different numbers of levels. Wethen make an internal comparisonwith different MUP@ρ to verify theimportance of tag transference. Finally, we perform experiments withdifferent δ values to prove the significance of the user's personalinterests.

7.1. Data set

We conducted experiments on the MovieLens and Epinions data-sets, in which resources have both ratings and tags. The MovieLensdataset has 44,805 user-item-tag-rating tuples, annotated by 2,025users on 4,796 movies. Each user has rated various numbers ofmovies, and the ratings follow a 0.5 (bad) to 5 (excellent) numericalscale. Each tag is typically a single word or a short phrase. Themeaning and purpose of a particular tag are determined by each user.There are 519 users, 1,153 items, 66,487 tags, and 9,053 ratings in theEpinions dataset.4

We randomly split the data sets into two parts, 80% of tags wereused as the training set and 20% of tags as the test set. We used thedata in the training set to construct user profiles and resource profiles.Based on the constructed profiles, we used the data in the test set asinput queries to test the efficiency of the personalized search methods.

7.2. Evaluation metrics

We employed three metrics to evaluate the efficiency of ourmethod.

The first metric is the mean reciprocal rank (MRR), which is a sta-tistic for evaluating a ranking for a query. The reciprocal rank of aquery result is the multiplicative inverse of the rank of the first correctanswer. The MRR is the average of the reciprocal ranks of the resultsfor a query. MRR is defined as follows:

MRR¼ 1m

Xmi ¼ 1

1ranki

ð14Þ

where m is the number of queries, ranki is the position of the correctanswer (relevant resource) in the result ranking for query i. MRRemphasizes the importance of placing the correct answer near the topof the result list. The larger the average MRR is, the faster and easier itis for the user to determine the resources he or she wants.

The second metric is the top-N hitrate (HR) [25], which is used tomeasure how often resources interesting to user are in the recom-mendation or personalized search result list. It is defined as:

HR¼ nm

ð15Þ

wherem is the number of queries and n is the number of queries thathas answers in the first-N positions. The larger the average hitrate is,the more precise the personalized search model is.

The third metric is imp [26]. This is a common evaluation metric tomeasure how a personalization strategy improves the ranking of thetarget resources of a user in the result list by comparing it to baselinemethods. It is defined as:

imp qi� �¼ 1

rp� 1rb

ð16Þ

where qi is an issued query, rb is the rank of the target resource

4 As the experimental results on Epinions is quite similar to MovieLens, we firstreport the results on MovieLens in details, and then present the results on Epinionsbriefly.

returned by a baseline search approach, and rp is the rank of the sameresource returned by our personalized search. The overall rankingimprovement is calculated as “the average query imp” for all queries inthe test data, as follows:

imp¼Pm

i ¼ 1 imp qi� �

mð17Þ

where m is the number of queries. A larger value of imp indicates agreater improvement in the ranking for target resources by the pro-posed approach.

7.3. Baseline methods

To evaluate the effectiveness of our method, we compared ourapproach with three state-of-the-art personalized search methods forcollaborative tagging systems. The first one (denoted by SIGIR'08) isthemethod presented in [5], where theweights of the tags in user andresource profiles are based on TF-IDF values. The second method(denoted by ECIR'10) is a personalized search method from [14], inwhich the weights of the tags in the user and resource profiles are anaggregation of BM25 values and TF-IDF values. The third method(denoted by CIKM'10) is proposed in [2], in which a new method isproposed to model user and resource profiles. A novel search methodusing these profiles is also proposed. These three methods representthe current mainstream techniques for handling personalized searchin collaborative tagging systems, but they each use different methodsto model the user and resource profiles.

7.4. Experimental results on MovieLens

We first compare our method with the baseline methods onthe MovieLens data set. Fig. 3 compares the MRR of our methodwith that of the other methods. According to Fig. 3, TUP obtainsthe highest MRR value at 0.122, while ECIR'10, SIGIR'08 and CIKM'10are 0.067, 0.072, and 0.102, respectively. Our method outperformsthe comparison methods by about 20%.

Fig. 4 compares the hitrate of our method and the baselinemethods for different n (i.e., size of result sets), where hitrate@nmeans that the size of the returned result set is n. From Fig. 4, we cansee that our method outperforms all the three compared methods forall values of n (i.e., from 1 to 45). The higher value of n is, the higherhitrate values all methods obtain. We can see that when n¼45 (i.e., 45resources are returned in the result list), our approach can achieve ahitrate of 0.387, which means 38.7% of users can find the resourcesthey want. The hitrate value of all methods increases as n increases.

Fig. 5 compares the imp of the proposed method (TUP) and theother three methods on imp metric. Our method outperforms themethod of CIKM'10 by 29.03%, the method of ECIR'10 by 30.54%,and SIGIR'08 by 29.21%.

Given Figs. 3–5, we conclude that our method outperforms all thebaseline methods for all adopted metrics on the MovieLens data set.We consider that there are two reasons. One is that our method not

Fig. 3. Comparison of the MRR of the proposed and baseline methods (MovieLens).

Page 8: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Fig. 4. Comparison of the hitrate@n of the proposed and baseline methods(MovieLens).

Fig. 5. Comparison of the imp the proposed and baseline methods (MovieLens).

Fig. 6. Comparison of different numbers of levels for MRR (MovieLens).

Fig. 7. Comparison of different numbers of levels for Hitrate@45 (MovieLens).

Q. Du et al. / Neurocomputing 204 (2016) 142–152 149

only models a user's interest but also his or her dislikes. The other oneis that we use ratings to identify which tags are a user's favorite tagsand which tags are his or her non-favorite tags. The experimentalresults demonstrate the advantage and effectiveness of our method.

Next, to explore the relationship between the number of levelsn and personalized search performance, we compared differentnumbers of levels using the MUP methods on the MovieLens dataset. We set n¼3, n¼5 (α¼β¼3) and n¼10 (α¼5 and β¼6) todetermine the optimal number of levels. According to Fig. 6, theMRR value of MUP@n¼5 is 0.118, the MRR value of MUP@n¼10 is0.112, and the highest MRR value of TUP is 0.124, which outper-forms both MUP@n¼5 and MUP@n¼10. Similar to the results ofthe MRR metric, TUP also get higher value than MUP@n¼5 andMUP@n¼10 for the hitrate@45 metric, as shown in Fig. 7. There-fore, we conclude that the personalized search performance doesnot have a positive correlation with the number of levels n, andTUP is the optimal method of TUP, MUP@n¼5, and MUP@n¼10.

The main reason why TUP performs better than MUP@n¼5 andMUP@n¼10 is that TUP is more consistent with the way humans. Fora user, he or she can easily distinguish between the resources he orshe likes and dislikes. However, it is not always easy for him or her todistinguish minor discrepancies between different degrees of pre-ference. For example, a user may rate a resource with a higher valuewhen he or she is in a good mood and may also rate the sameresource with a lower value when in a badmood. The degree of like ordislike can be affected by many other factors. However, under anycircumstances, a user is able to distinguish most- and least-favoriteresources. Furthermore, because the data set we used is very sparse,many tag vectors in the MUP@n¼5 and MUP@n¼10 user profiles areempty, especially for the MUP@n¼10 user profiles.

We then conducted experiments to observe the effect of dif-ferent weight settings. From Table 1 and Fig. 8, it is easy to tell thatwhen the weights of favorite tags (κ1), fused tags (κ2), and non-favorite tags (κ3) are 0.5, 0.4, and 0.1 respectively, the experimentperforms the best. When the weights of favorite tags and fusedtags share the same weight value, it performs a little worse. Asimilar result is obtained when the weights of favorite tags andnon-favorite tags are the same. The performance degrades whenall tags types have the same weight of 0.33. Therefore, we candraw the conclusion that the higher weights should be given tofavorite tags, less to fused tags, and the least to non-favorite tags.

In addition, in order to verify the importance of tag transference,we compared different transference thresholds. Fig. 9 compares theMRR of TUP and TUP@ρ, where ρ¼0.5, 0.6, 0.7, 0.8, and 0.9. Thesmaller ρ is, the fewer tags will be transferred. From Fig. 9, we cansee that TUP outperforms TUP@ρ for all values of ρ (i.e., from ρ¼0.5to 0.9).

Based on the above results comparing the MRR of TUP andMUP@ρ, we conclude that tag transference can improve the accuracyof user profiling for personalized search. More tag transfers maintain ahigher purity of the user's favorite and non-favorite vectors, and thuscan more accurately model the user. If we restrain this process, theaccuracy of user profiling for personalized search is reduced.

Lastly, we compared different δ value ranges from 0.1 to 1.0. Fig. 10shows that as δ increases (less than 0.9), TUP performs better. Inaddition, TUP obtains the best result when δ is equal to 0.9. On thecontrary, when δ¼1.0, which means that we just use the query rele-vance value and ignore the user interest relevance value, TUP obtains aworse result. From this result, we conclude that the user's personalinterest also plays an important role in a personalized search system.

Page 9: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Table 1Experimental results with different weight settings.

Weight Value MRR Hitrate@45

Setting 1 Favorite tags (j κ1 j ) 0.50Fused tags (j κ2 j ) 0.40 0.12159 0.38675Non-favorite tags(j κ3 j )

0.10

Setting 2 Favorite tags (j κ1 j ) 0.33Fused tags (j κ2 j ) 0.33 0.11942 0.37928Non-favorite tags(j κ3 j )

0.33

Setting 3 Favorite tags (j κ1 j ) 0.40Fused tags (j κ2 j ) 0.40 0.11981 0.38034Non-favorite tags(j κ3 j )

0.20

Setting 4 Favorite tags (j κ1 j ) 0.40Fused tags (j κ2 j ) 0.20 0.11969 0.38018Non-favorite tags(j κ3 j )

0.40

Setting 5 Favorite tags (j κ1 j ) 0.45Fused tags (j κ2 j ) 0.35 0.12087 0.38269Non-favorite tags(j κ3 j )

0.20

Fig. 10. Effect of different δ values on MRR using the MovieLens data set.

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

SIGIR08 ECIR10 CIKM10 MUP

MRR

Fig. 11. Comparison of the MRR of the proposed and baseline methods (Epinions).

0.118

0.1185

0.119

0.1195

0.12

0.1205

0.121

0.1215

0.122

Setting 1 Setting 2 Setting 3 Setting 4 Setting 5

MRR

Fig. 8. Effect of different weight settings on MRR.

0.1195

0.12

0.1205

0.121

0.1215

0.122

0.1225

0.123

0.1235

0.124

[email protected] [email protected] [email protected] [email protected] [email protected] TUP

MRR

Fig. 9. MRR of TUP and TUP@ρ on the MovieLens data set.

Q. Du et al. / Neurocomputing 204 (2016) 142–152150

7.5. Experimental results on Epinions

In this subsection, we briefly present the experimental resultsobtained from Epinions. The performance of various methods in termsof MRR and hitrate@N are shown in Figs. 11 and 12, respectively.

Clearly, the performance of the methods is quite similar to the resultswe obtained for MovieLens. Specifically, the proposed method (TUP)has the bestMRR (0.170) of all four methods. Note that the relationshipof performance in MovienLens holds for Epinion (CIKM 104ECIR104SIGIR 08), as shown in Fig. 11. Similarly, we can clearlyidentify the same trends in terms of hitrate@N as illustrated in Fig. 12.The experimental results on Epinions further verify the effectivenessof the proposed method.

8. Conclusion and future work

In this paper, we focused on personalized resource search byexploiting tag-based user profiles and resource profiles. Additionally,the limitations of recent approaches to profiling users for personalizedsearch in collaborative tagging systems were discussed. Furthermore,we proposed the novel TUP model by integrating tags and ratings. Inaddition, the model was generalized and extended to nlevels (n43)for generic cases. The main distinction between existing methods andours is that we further discriminate the like and dislike tags annotatedby users. We conducted experiments using two real-life datasets tovalidate the effectiveness of the proposed method. We discovered aphenomenon called tag transference in the three-level user profile. Bysetting different weights to the parameters, we reveal that tag trans-ference is important in user profiling for personalized search. Thehighest weights should be given to favorite tags, followed by fusedtags, and non-favorite tags should be weighted the least. There areseveral directions for future research. One prominent direction is toapply the proposed user model in commercial recommender systemsby further taking domain-specific characteristics (e.g., low-level

Page 10: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1 5 10 15 20 25 30 35 40 45

CIKM10

ECIR10

SIGIR08

MUP

Fig. 12. Comparison of the hitrate@n of the proposed and baseline methods(Epinions).

Q. Du et al. / Neurocomputing 204 (2016) 142–152 151

features of multimedia objects [27–29]). Another plan is to adoptsentiment dictionaries [30,31] and some recent classification techni-ques [32–34] to increase the accuracy of tag classification.

Acknowledgments

The work described in this paper was fully supported by NationalNatural Science Foundation of China (project no. 61300137), theFundamental Research Funds for the Central Universities, SCUT (No.2015zm136), and a grant from Research Grants Council of Hong KongSpecial Administrative Region, China (UGC/FDS11/E06/14).

References

[1] Y. Cai, H. Han, J. Chen, Y. Shao, H.-F. Leung, H. Min, Integrating tags and ratingsinto user profiling for personalized search in collaborative tagging systems, in:Proceedings of IEEE 2012 IEEE/WIC/ACM International Conferences on WebIntelligence and Intelligent Agent Technology (WI-IAT), vol. 1, 2012, pp. 716–723.

[2] Y. Cai, Q. Li, Personalized search by tag-based user profile and resource profilein collaborative tagging systems, in: Proceedings of CIKM'10, CIKM '10, ACM,New York, NY, USA, 2010, pp. 969–978.

[3] H. Xie, Q. Li, Y. Cai, Community-aware resource profiling for personalizedsearch in folksonomy, J. Comput. Sci. Technol. 27 (3) (2012) 599–610.

[4] M.G. Noll, C. Meinel, Web search personalization via social bookmarking andtagging, in: ISWC'07/ASWC'07 Proceedings of the 6th International TheSemantic Web and 2nd Asian Conference on Asian semantic Web Conference,Springer-Verlag, Berlin, Heidelberg, 2007, pp. 367–380.

[5] S. Xu, S. Bao, B. Fei, Z. Su, Y. Yu, Exploring folksonomy for personalized search,in: SIGIR'08 Proceedings of the 31st Annual International ACM SIGIR Con-ference on Research and Development in Information Retrieval, ACM, NewYork, NY, USA, 2008, pp. 155–162.

[6] H. Han, Y. Cai, Y. Shao, Q. Li, Improving recommendation based on features' co-occurrence effects in collaborative tagging systems, in: Q.Z. Sheng, G. Wang, C.S. Jensen, G. Xu (Eds.), APWeb, Lecture Notes in Computer Science, vol. 7235,Springer-Verlag, Berlin, Heidelberg, 2012, pp. 652–659.

[7] X. Luo, Z. Xu, J. Yu, X. Chen, Building association link network for semantic linkon web resources, IEEE Trans. Autom. Sci. Eng. 8 (3) (2011) 482–494.

[8] Y. Liu, L. Chen, X. Luo, L. Mei, C. Hu, Z. Xu, Semantic link network based modelfor organizing multimedia big data, IEEE Trans. Emerg. Top. Comput. (2014) 1.

[9] Y. Liu, Y. Zhu, L.M. Ni, G. Xue, A reliability-oriented transmission service in wirelesssensor networks, IEEE Trans. Parallel Distrib. Syst. 22 (12) (2011) 2100–2107.

[10] Y. Liu, Q. Zhang, L.M. Ni, Opportunity-based topology control in wirelesssensor networks, IEEE Trans. Parallel Distrib. Syst. 21 (3) (2010) 405–416.

[11] X. Liu, Y. Yang, D.Yuan, J. Chen, Do, we need to handle every temporal violationin scientific workflow systems, ACM Trans. Softw. Eng. Methodol. 23 (1), 2014,No. 5 (1-34).

[12] L. Wang, J. Tao, R. Ranjan, H. Marten, A. Streit, J. Chen, D. Chen, G-hadoop:mapreduce across distributed data centers for data-intensive computing,Future Gener. Comput. Syst. 29 (3) (2013) 739–750.

[13] Z. Xu, X. Luo, S. Zhang, X. Wei, L. Mei, C. Hu, Mining temporal explicit andimplicit semantic relations between entities using web search engines, FutureGeneration Computer Systems.

[14] D. Vallet, I. Cantador, J.M. Jose, Personalizing web search with folksonomy-based user and document profiles, in: Proceedings of the 32nd Europeanconference on Advances in Information Retrieval, ECIR'2010, Springer-Verlag,Berlin, Heidelberg, 2010, pp. 420–431.

[15] Y. Cai, Q. Li, H. Xie, L. Yu, Personalized resource search by tag-based userprofile and resource profile, in: L. Chen, P. Triantafillou, T. Suel (Eds.), WISE,Lecture Notes in Computer Science, vol. 6488, Springer-Verlag, Berlin, Hei-delberg, 2010, pp. 510–523.

[16] J. Gemmell, A. Shepitsen, M. Mobasher, R. Burke, Personalization in FolksonomiesBased on Tag Clustering, in: Proceedings of the 6th Workshop on IntelligentTechniques for Web Personalization and Recommender Systems, 2008.

[17] H.-N. Kim, M. Rawashdeh, A. Alghamdi, A. El Saddik, Folksonomy-based per-sonalized search and ranking in social media services, Inf. Syst. 37 (1) (2012)61–76.

[18] X. Han, Z. Shen, C. Miao, X. Luo, Folksonomy-based ontological user interestprofile modeling and its application in personalized search, in: Aijun An,Pawan Lingras, Sheila Petty, Runhe Huang (Eds.), Active Media Technology,Springer, 2010, pp. 34–46.

[19] M.R. Bouadjenek, H. Hacid, M. Bouzeghoub, A. Vakali, Using social annotationsto enhance document representation for personalized search, in: Proceedingsof the 36th International ACM SIGIR Conference on Research and Develop-ment in Information Retrieval, SIGIR '13, ACM, New York, NY, USA, 2013,pp. 1049–1052.

[20] T. Jin, H. Xie, J. Lei, Q. Li, X. Li, X. Mao, Y. Rao, Finding dominating set fromverbal contextual graph for personalized search in folksonomy, in: Proceed-ings of IEEE 2013 IEEE/WIC/ACM International Joint Conferences on WebIntelligence (WI) and Intelligent Agent Technologies (IAT), vol. 1, 2013,pp. 367–372.

[21] I. Hsu, et al., Integrating ontology technology with folksonomies for perso-nalized social tag recommendation, Appl. Soft Comput. 13 (8) (2013)3745–3750.

[22] H. Xie, Q. Li, X. Mao, X. Li, Y. Cai, Q. Zheng, Mining latent user community fortag-based and content-based search in social media, Comput. J. 57 (9) (2014)1415–1430.

[23] X. You, B. Fang, Y.Y. Tang, Z. He, J. Huang, Locating vessel centerlines in retinalimages using wavelet transform: a multilevel approach, in: De-Shuang Huang,Xiao-Ping Zhang, Guang-Bin Huang (Eds.), Advances in Intelligent Computing,Springer, 2005, pp. 688–696.

[24] Z. Zhu, X. You, C.P. Chen, D. Tao, W. Ou, X. Jiang, J. Zou, An adaptive hybridpattern for noise-robust texture analysis, Pattern Recognit. 48 (8) (2015)2592–2608.

[25] H.-N. Kim, I. Ha, J.-G. Jung, G.-S. Jo, User preference modeling from positivecontents for personalized recommendation, in: Proceedings of the 10thinternational conference on Discovery science, DS'07, Springer-Verlag, Berlin,Heidelberg, 2007, pp. 116–126.

[26] A. Shepitsen, J. Gemmell, B. Mobasher, R. Burke, Personalized recommenda-tion in social tagging systems using hierarchical clustering, in: Proceedings ofthe 2008 ACM conference on Recommender systems, RecSys '08, ACM, NewYork, NY, USA, 2008, pp. 259–266.

[27] Z.Q. Pan, Y. Zhang, S. Kwong, Efficient motion and disparity estimation opti-mization for low complexity multiview video coding, IEEE Trans. Broadcast 61(2) (2015) 166–176.

[28] Y.H. Zheng, B. Jeon, D.H. Xu, Q.M. Wu, Jonathan, H. Zhang, Image segmentationby generalized hierarchical fuzzy C-means algorithm, J. Intell. Fuzzy Syst. 28(2) (2015) 961–973.

[29] Z.H. Xia, X.H. Wang, X.M. Sun, Q.S. Liu, N.X. Xiong, Steganalysis of LSBmatching using differences between nonadjacent pixels, Multimed. ToolsAppl. 75 (4) (2016) 1947–1962.

[30] X. Li, H. Xie, L. Chen, J. Wang, X. Deng, News impact on stock price return viasentiment analysis, Knowledge Based Syst. 69 (2014) 14–23.

[31] X. Li, H. Xie, Y. Song, Q. Li, S. Zhu, F. Wang, Does summarization help stockprediction? News impact analysis via summarization, IEEE Intell. Syst. 30 (3)(2015) 26–34.

[32] B. Gu, V.S. Sheng, K.Y. Tay, W. Romano, S. Li, Incremental Support VectorLearning for Ordinal Regression, IEEE Transactions on Neural Netw. Learn Syst.26 (7) (2015) 1403–1416.

[33] B. Gu, V.S. Sheng, Z.J. Wang, D. Ho, S. Osman, S. Li, Incremental learning for ν-support vector regression, Neural Netw. 67 (2015) 140–150.

[34] X.Z. Wen, L. Shao, Y. Xue, W. Fang, A rapid learning algorithm for vehicleclassification, Info. Sci. 295 (1) (2015) 395–406.

Qing Du received the Ph.D. degree in computer sciencefrom South China University of Technology. She iscurrently a lecturer of School of Software Engineeringat the South China University of Technology, Guangz-hou, China. Her research interests are recommendationsystem, user modeling and data mining.

Page 11: Folksonomy-based personalized search by hybrid user ...static.tongtianta.site/paper_pdf/0c929736-67d2-11e9-b1e7-00163e08bb86.pdfFolksonomy-based personalized search by hybrid user

Q. Du et al. / Neurocomputing 204 (2016) 142–152152

Haoran Xie is an assistant professor at the Hong KongInstitute of Education. He received his PhD and MScfrom Department of Computer Science, City Universityof Hong Kong, and BEng from School of SoftwareEngineering, Beijing University of Technology. Hisresearch interests include social media, big data, datamining, recommender systems, human computerinteraction and e-learning systems.

Yi Cai received the Ph.D. degree in computer sciencefrom The Chinese University of Hong Kong. He is cur-rently a professor of School of Software Engineering atthe South China University of Technology, Guangzhou,China. His research interests are recommendation sys-tem, personalized search, semantic web and datamining.

Qing Li is a professor at the City University of HongKong. His research interests include object modeling,multimedia databases, social media and recommendersystems. He is a Fellow of IET, a senior member of IEEE,a member of ACM SIGMOD and IEEE Technical Com-mittee on Data Engineering. He is the chairperson ofthe Hong Kong Web Society, and is a steering com-mittee member of DASFAA, ICWL, and WISE Society.

Ho-fung Leung is a professor at the Chinese Universityof Hong Kong. His research interests includes ontolo-gies, intelligent agents, self-organizing systems, com-plex systems, social computing, service-orientedarchitectures, and cloud computing. He is a Fellow ofBCS, HKIE, a senior member of ACM and IEEE, a fullmember of HKCS. He was the chairperson of ACM HongKong Chapter, and is a Chartered Engineer registered bythe Engineering Council.

Huaqing Min is a Professor and the dean of School ofSoftware Engineering, South China University of Tech-nology, China. His research interest includes artificialintelligence, machine learning, database, data miningand robotics.

Fu Lee Wang is a professor and Vice-President(Research and Advancement) in Caritas Institute ofHigher Education. He received B.Eng. and M.Phil. fromThe University of Hong Kong, MSc from The Hong KongUniversity of Science and Technology, MBA fromImperial College London, and PhD from The ChineseUniversity of Hong Kong. His research interests includeelectronic business, information retrieval, informationsystems and e-learning. Before joining Caritas, he was afaculty member at the City University of Hong Kong. Heis Past Chair of ACM Hong Kong Chapter and Chair ofIEEE Hong Kong Section Computer Society Chapter. He

served as programme/conference chair of a number ofinternational conferences.