Social Tagging Uichin Lee KSE652 Social Computing Systems Design and Analysis.

Social Tagging

Uichin LeeKSE652 Social Computing Systems

Design and Analysis

Survey on Social Tagging Techniques

Manish Gupta, Rui Li, Zhijun Yin, Jiawei Han,SIGKDD Explorations , 2010

What is “tag”?

usertag

resourcetag assignment

jazzmusicu1 r1

y1:

trumpetu1 r3

y2:

tag assignments

trumpetu2 r3

y3:

http://wis.ewi.tudelft.nl/icwe2011/tutorial/tutorial-slides.pptx

http://wis.ewi.tudelft.nl/icwe2011/tutorial/tutorial-slides.pptx

Tag photos on Flickr

Tag URLs on Delicious

http://bierdoctor.com/papers/cscw08/ejrader-rwash-tagging-cscw.pdf

Bloggers: Wordpress, LiveJournal

Hash tags in Twitter

Citeulike

Tag Design Space

• Tag sharing• Tag selection/suggestion: how to select/display a set of tags? • Item ownership:

– Apply tags to items users created (e.g., photos in Flickr)– Apply tags to items others created (e.g., product pages in Amazon)

• Tag scope– Broad: <user, item, tag> (personal tag to an item; Delicious)– Narrow: <item, tag> (single shared tags to an item; Flickr)

• Other dimensions: tag delimiter (one or multiple words), how to normalize tags across factors like letter cases, white space, etc.

tagging, communities, vocabulary, evolution, Sen et al., CSCW 2006

Applications• Indexing: faster/deeper indexing (e.g., delicious)• Search: social and semantic expansions for web search;

personalized search; enterprise search; searching library catalogues

• Enhanced browsing: tag clouds; popularity driven browsing, filtering

• Taxonomy generation (e.g., folksonomy)• Clustering/classification: clustering/classifying web objects (or

blog entries) [tag + text if any]• Social interest discovery: user interest profiling, discovering

current popular places/events (e.g., Flickr)• Recommendation/personalization

Contents

• Taxonomy? Folksonomy?• Tagging Motivations• Tag Types• Linguistic Classification• Tag Generation Models• Tag Distributions• Tag Semantics• Tag Visualization

Taxonomy? Folksonomy?

• Problems with metadata generation and fixed taxonomies– Manual, expensive, different vocabulary– Fixed static taxonomies are rigid, conservative, and centralized– “Post activation analysis paralysis” (Sinha 2005)

• A state of fear that you will make the wrong decision. And the item will be lost forever - it will land in some deep well, some hard to access branch of the tree and disappear from your view and attention.

• Folksonomies as a solution– Folksonomy: folk (people) + taxis (classification) + nomos

(management)– Emergent and iterative system

Tagging Motivations

• (easing) Future Retrieval (e.g., toread)• Contribution and Sharing • Attract Attention (if popular)• Play and Competition (e.g., ESP games)• Self Referential Tags (mystuff, myLaptop)• Opinion Expression • Task Organization (e.g., gtd, jobsearch)• Social Signaling (contextual info about an object)• Money (e.g., tagging tasks in M-Turk)• Technological Ease (e.g., Phonetags)

Tagging Motivations in Flickr

Why we tag: motivations for annotation in mobile and online media, M. Ames, and M. Naaman, CHI 2007

ZoneTag

Flickr


Why we tag: motivations for annotation in mobile and online media, M. Ames, and M. Naaman, CHI 2007


What Drives Content Tagging: The Case of Photos on Flickr, Oded Nov, Mor Naaman, Chen Ye, CHI 2008


What Drives Content Tagging: The Case of Photos on Flickr, Oded Nov, Mor Naaman, Chen Ye, CHI 2008

Number of Tags (R2 = .571)

(from survey)

(from usage data; Flickr API)

Tag Types

• Content-Based tags (autos, Honda, batman, Lucene)• Context-Based tags (location, time)• Attribute tags (Jeremy’s Blog) / qualities or characteristics• Ownership tag; identifying who owns the resource• Subjective tags (opinion, emotion)• Organizational tags (mywork, mypaper)• Purpose tags (related to info seeking, e.g., “learn_LATEX”)• Factual tags (people, place, concepts)• Personal tags• Self-referential tags • Tag bundles (tagging tags)

Linguistic Classification

• Functional (describing functions; e.g., weapon)• Functional collocation (function + place/time; e.g.,

furniture, tableware)• Origin collocation (why things are together; e.g., dirty

dishes)• Function or origin (e.g., “Michelangelo” “medieval”)• Taxonomic (classifying objects)• Adjective (e.g., red, great, funny, beautiful)• Verb (action; e.g., “explore”, “todo”, “jumping”)• Proper name (e.g., “New Zealand”)

Tag Generation Models

• Factors– Users’ background knowledge– Previous tags suggested by others– Content of the resources– Community influences– Tag selection algorithm– And others….

Tag Generation Models• Basic Polya Urn Model

– Captures popularity of assigned tags but does not consider new tags

• Yule Simon Model– New word (prob p), existing word (prob 1-p) --- each word proportional

to its frequency (leading to power-law dist)• Information value based model

– Previous tag assignments vs. information value of a tag• More parameters

– User background knowledge, number of previous tags the user has accessed, most popular tags

• Language model– Content affects tag generation (tagging ~ language model)

Tags: rank – frequency plot

Tag Distributions

• Vocabulary growth over time follows power law (both system and resource level)– N(t) t^r, r< 1∼– dN(t)/d ~ t^(r-1) ; new tags appears

less and less frequently as time passes• A user’s set of distinct tags grows

linearly as new resources are added. But user vocabulary growth tends to decline over time

• Vocabulary rank-frequency follows power law

Temporal evolution: total # of distinct tags

Vocabulary growth in collaborative tagging systems, Cattuto et al., 2007

Tag Semantics• Analysis of pairwise relation between tags (inter-tag relation

graphs)• Semantic tag classification (ClassTag)

– Mapping tags onto WordNet semantic categories– Additionally using Wikipedia articles

ClassTag: Classifying Tags using Open Content Resources, Overell et al., WSDM 2009

Tag Semantics

• Tags vs. keywords– Most important words (e.g., tf or tf*idf) of the

document are generally covered by the tags– Missing keywords are often misspelled

Tag-based Social Interest Discovery, Li et al., WWW 2008

An example of the tf and tf×idf keywords and user-generated tags of a user-saved URL

(all tags attached to this URL by all users)

Tag Semantics

• Tags vs. keywords– Most important words (e.g., tf or tf*idf) of the

document are generally covered by the tags– Missing keywords are often misspelled

Tag-based Social Interest Discovery, Li et al., WWW 2008

Tag coverage for tf keywords

Tag coverage for tf×idf keywords

Tag Visualization

• Tag clouds for browsing/searching– Useful for broad search (less cognitive load); but less useful for

specific search– Disadvantages: skewness towards popular items; multiple clicks; low

recall• Tag selection for tag clouds

– Due to limited screen space, select tags with higher resource coverage (representativeness, volume)

– When displaying tags, we can cluster them based on semantic relationship

• Tag evolution visualization– Temporal evolution of tags; merging data from multiple time

intervals (e.g., tagline)

tagging, communities, vocabulary, evolution

Shilad Sen, Shyong K. (Tony) Lam, Al Mamunur Rashid, Dan Cosley, Dan Frankowski, Jeremy Osterhouse, F. Maxwell Harper, John Riedl

CSCW 2006

Relationship between community influence and user tendency

(preference,knowledge)

Tagging in MovieLens

MovieLens movie list with tags

“Movie details page” tag display

Adding tags with auto-complete

Research Questions

• How strongly do investment and habit affect personal tagging behavior?

• How does the tagging community influence personal vocabulary?

• How does the tag selection algorithm affect users’ satisfaction with the system?

Experiment Setup

• Randomly assigned users who logged in to MovieLens during the experiment to one of four experimental groups– Unshared– Shared (randomly selected tags)– Shared-pop (most popular tags)– Shared-rec group (recommend tags most commonly

applied to both the target movie and to the most similar movies to the target movie)

Overall Tag Usage

• Overall tag usage statistics by experimental group

The tags column overall total is smaller than the sum of the groups, because two groups might independently use the same tag

Tag Classification• Factual tags identify “facts” about a movie such as people, places,

or concepts. Help to describe movies and find related movies • Subjective tags express user opinions related to a movie. • Personal tags are most often used to organize a user’s movies (item

ownership, self-reference, task organization)

63% factual, 29% subjective, 3% personal

How strongly do investment and habit affect personal tagging behavior?

• Similarity of tag class of the nth tag applied by a user to – tag class distributions of other tags applied by the user

before the nth tag (applied)– tag class distributions of tags viewed by the user (viewed)– tag class distributions of the uniform tag class distribution

(uniform)

• Example: x(nth)= [0, 1, 0] (fact, sub, per)– y(1~n-1, applied or viewed) = [0.62, 0.35, 0.13] => x*y =

0.37– y~uniform = [1/3, 1/3, 1/3] => x*y = 0.58


• Both habit/investment and tags viewed appear to influence the class of applied tags.


• Probability that a user’s nth applied tag is a new tag decreases over time

How does the tag selection algorithm affect users’ satisfaction with the system?

• Final tag application class distribution by experimental group

The dominant tag class for each group is bolded. (Each row sums to 100%.)

How does the tag selection algorithm affect users’ satisfaction with the system?

factual

subjectivepersonal

subjective

factual

personal

subjective personal

factualfactual

subjective

personal

Unshared Shared

Shared Popular Shared

Recommendation

Group tag application number Group tag application number

Frac

tion

of ta

g ap

plic

ation

s

Group tag application number

Frac

tion

of ta

g ap

plic

ation

s

Frac

tion

of ta

g ap

plic

ation

sFr

actio

n of

tag

appl

icati

ons

Tag Suggestion: User Satisfaction

• Participants didn’t like intrusive tag suggestion (e.g., popup after movie rating)

• Participants didn’t like inference algorithm either– Wrong inference makes users confusing

• e.g., suggested tag “small town” for the movie “Swiss Family Robinson” “I’m confused – I thought it was about people on a deserted island??”

• Yet, suggestion algorithm worked well in terms of displaying a higher number of tags– Pervasiveness may lead users to tag more in general

Influences on TagChoices in del.icio.us

Emilee Rader and Rick WashSchool of Information, University of Michigan

CSCW 2008

http://bierdoctor.com/papers/cscw08/



Understanding Tagging Process

• Social Hypothesis: Users’ tag choices are influenced by the tag choices of others

• Organizing Hypothesis: Users’ tag choices are personal and idiosyncratic, not influenced by others’ tag choices

Wash and Rader (2007)

• Respondents generally used one or more heuristics for choosing tags: – Reuse tags they have applied before to other web

pages– Create and adhere to mental rules or definitions

for specific tags – Choose terms they imagine using to re-find

bookmarks in the future

Research Question

• Look for a connection between the small scale (individual tag choices) and the large scale (aggregate patterns) for tags on del.icio.us.

• Hypotheses– Imitation: Users imitate tags that previous users

have applied to a web page– Organizing: Users re-use tags that they have applied

to other web pages– Recommended: Users choose tags that are

suggested via the del.icio.us posting interface

Regression

• Logistic regression model: tag chosen = f(used.onSite, used.byUser, interaction, tag dummys, random effect(user))– Used.onSite: imitation– Used.byUser: organizing– Interaction: onSite=1, byUser=1 (i.e., Recommendation)

• Data set:– Randomly chose 30 web pages from the sample that had been

bookmarked by at least 100 users. – In June 2007, the complete public bookmark and tag histories for all of

the approximately 12,000 users who had ever bookmarked any of these 30 web pages were downloaded

– Complete tag histories for 30 web pages bookmarked in del.icio.us, as well as tag histories for all users who ever bookmarked any of those 30 web pages as of June 2007.

Regression Results

• Organizing hypothesis is strongly supported (less influence by social and recommendation mechanisms in Delicious)

Summary

• Applications; Motivations• Tag Types; linguistic Classification• Tag Generation Models; Tag Distributions• Tag Semantics; Tag Visualization; Tag Design Space• Relationship between community influence and personal

tendency– Influenced by personal tagging behavior and tag selection algorithm

(community input)– Tag class distribution differs widely across different groups– Quality of tag recommendation matters

• Tagging process is mainly driven by information organizing behavior (i.e., personal tendency) in Delicious web site.

Social Tagging Uichin Lee KSE652 Social Computing Systems Design and Analysis.

Documents

Transcript of Social Tagging Uichin Lee KSE652 Social Computing Systems Design and Analysis.