Social Tagging Uichin Lee KSE652 Social Computing Systems Design and Analysis.
-
Upload
gregory-chandler -
Category
Documents
-
view
221 -
download
0
Transcript of Social Tagging Uichin Lee KSE652 Social Computing Systems Design and Analysis.
Social Tagging
Uichin LeeKSE652 Social Computing Systems
Design and Analysis
Survey on Social Tagging Techniques
Manish Gupta, Rui Li, Zhijun Yin, Jiawei Han,SIGKDD Explorations , 2010
What is “tag”?
usertag
resourcetag assignment
jazzmusicu1 r1
y1:
trumpetu1 r3
y2:
tag assignments
trumpetu2 r3
y3:
http://wis.ewi.tudelft.nl/icwe2011/tutorial/tutorial-slides.pptx
Tag photos on Flickr
Tag photos on Flickr
Tag URLs on Delicious
http://bierdoctor.com/papers/cscw08/ejrader-rwash-tagging-cscw.pdf
Bloggers: Wordpress, LiveJournal
Hash tags in Twitter
Citeulike
Tag Design Space
• Tag sharing• Tag selection/suggestion: how to select/display a set of tags? • Item ownership:
– Apply tags to items users created (e.g., photos in Flickr)– Apply tags to items others created (e.g., product pages in Amazon)
• Tag scope– Broad: <user, item, tag> (personal tag to an item; Delicious)– Narrow: <item, tag> (single shared tags to an item; Flickr)
• Other dimensions: tag delimiter (one or multiple words), how to normalize tags across factors like letter cases, white space, etc.
tagging, communities, vocabulary, evolution, Sen et al., CSCW 2006
Applications• Indexing: faster/deeper indexing (e.g., delicious)• Search: social and semantic expansions for web search;
personalized search; enterprise search; searching library catalogues
• Enhanced browsing: tag clouds; popularity driven browsing, filtering
• Taxonomy generation (e.g., folksonomy)• Clustering/classification: clustering/classifying web objects (or
blog entries) [tag + text if any]• Social interest discovery: user interest profiling, discovering
current popular places/events (e.g., Flickr)• Recommendation/personalization
Contents
• Taxonomy? Folksonomy?• Tagging Motivations• Tag Types• Linguistic Classification• Tag Generation Models• Tag Distributions• Tag Semantics• Tag Visualization
Taxonomy? Folksonomy?
• Problems with metadata generation and fixed taxonomies– Manual, expensive, different vocabulary– Fixed static taxonomies are rigid, conservative, and centralized– “Post activation analysis paralysis” (Sinha 2005)
• A state of fear that you will make the wrong decision. And the item will be lost forever - it will land in some deep well, some hard to access branch of the tree and disappear from your view and attention.
• Folksonomies as a solution– Folksonomy: folk (people) + taxis (classification) + nomos
(management)– Emergent and iterative system
Tagging Motivations
• (easing) Future Retrieval (e.g., toread)• Contribution and Sharing • Attract Attention (if popular)• Play and Competition (e.g., ESP games)• Self Referential Tags (mystuff, myLaptop)• Opinion Expression • Task Organization (e.g., gtd, jobsearch)• Social Signaling (contextual info about an object)• Money (e.g., tagging tasks in M-Turk)• Technological Ease (e.g., Phonetags)
Tagging Motivations in Flickr
Why we tag: motivations for annotation in mobile and online media, M. Ames, and M. Naaman, CHI 2007
ZoneTag
Flickr
Tagging Motivations in Flickr
Why we tag: motivations for annotation in mobile and online media, M. Ames, and M. Naaman, CHI 2007
Tagging Motivations in Flickr
What Drives Content Tagging: The Case of Photos on Flickr, Oded Nov, Mor Naaman, Chen Ye, CHI 2008
Tagging Motivations in Flickr
What Drives Content Tagging: The Case of Photos on Flickr, Oded Nov, Mor Naaman, Chen Ye, CHI 2008
Number of Tags (R2 = .571)
(from survey)
(from usage data; Flickr API)
Tag Types
• Content-Based tags (autos, Honda, batman, Lucene)• Context-Based tags (location, time)• Attribute tags (Jeremy’s Blog) / qualities or characteristics• Ownership tag; identifying who owns the resource• Subjective tags (opinion, emotion)• Organizational tags (mywork, mypaper)• Purpose tags (related to info seeking, e.g., “learn_LATEX”)• Factual tags (people, place, concepts)• Personal tags• Self-referential tags • Tag bundles (tagging tags)
Linguistic Classification
• Functional (describing functions; e.g., weapon)• Functional collocation (function + place/time; e.g.,
furniture, tableware)• Origin collocation (why things are together; e.g., dirty
dishes)• Function or origin (e.g., “Michelangelo” “medieval”)• Taxonomic (classifying objects)• Adjective (e.g., red, great, funny, beautiful)• Verb (action; e.g., “explore”, “todo”, “jumping”)• Proper name (e.g., “New Zealand”)
Tag Generation Models
• Factors– Users’ background knowledge– Previous tags suggested by others– Content of the resources– Community influences– Tag selection algorithm– And others….
Tag Generation Models• Basic Polya Urn Model
– Captures popularity of assigned tags but does not consider new tags
• Yule Simon Model– New word (prob p), existing word (prob 1-p) --- each word proportional
to its frequency (leading to power-law dist)• Information value based model
– Previous tag assignments vs. information value of a tag• More parameters
– User background knowledge, number of previous tags the user has accessed, most popular tags
• Language model– Content affects tag generation (tagging ~ language model)
Tags: rank – frequency plot
Tag Distributions
• Vocabulary growth over time follows power law (both system and resource level)– N(t) t^r, r< 1∼– dN(t)/d ~ t^(r-1) ; new tags appears
less and less frequently as time passes• A user’s set of distinct tags grows
linearly as new resources are added. But user vocabulary growth tends to decline over time
• Vocabulary rank-frequency follows power law
Temporal evolution: total # of distinct tags
Vocabulary growth in collaborative tagging systems, Cattuto et al., 2007
Tag Semantics• Analysis of pairwise relation between tags (inter-tag relation
graphs)• Semantic tag classification (ClassTag)
– Mapping tags onto WordNet semantic categories– Additionally using Wikipedia articles
ClassTag: Classifying Tags using Open Content Resources, Overell et al., WSDM 2009
Tag Semantics
• Tags vs. keywords– Most important words (e.g., tf or tf*idf) of the
document are generally covered by the tags– Missing keywords are often misspelled
Tag-based Social Interest Discovery, Li et al., WWW 2008
An example of the tf and tf×idf keywords and user-generated tags of a user-saved URL
(all tags attached to this URL by all users)
Tag Semantics
• Tags vs. keywords– Most important words (e.g., tf or tf*idf) of the
document are generally covered by the tags– Missing keywords are often misspelled
Tag-based Social Interest Discovery, Li et al., WWW 2008
Tag coverage for tf keywords
Tag coverage for tf×idf keywords
Tag Visualization
• Tag clouds for browsing/searching– Useful for broad search (less cognitive load); but less useful for
specific search– Disadvantages: skewness towards popular items; multiple clicks; low
recall• Tag selection for tag clouds
– Due to limited screen space, select tags with higher resource coverage (representativeness, volume)
– When displaying tags, we can cluster them based on semantic relationship
• Tag evolution visualization– Temporal evolution of tags; merging data from multiple time
intervals (e.g., tagline)
tagging, communities, vocabulary, evolution
Shilad Sen, Shyong K. (Tony) Lam, Al Mamunur Rashid, Dan Cosley, Dan Frankowski, Jeremy Osterhouse, F. Maxwell Harper, John Riedl
CSCW 2006
Relationship between community influence and user tendency
(preference,knowledge)
Tagging in MovieLens
MovieLens movie list with tags
“Movie details page” tag display
Adding tags with auto-complete
Research Questions
• How strongly do investment and habit affect personal tagging behavior?
• How does the tagging community influence personal vocabulary?
• How does the tag selection algorithm affect users’ satisfaction with the system?
Experiment Setup
• Randomly assigned users who logged in to MovieLens during the experiment to one of four experimental groups– Unshared– Shared (randomly selected tags)– Shared-pop (most popular tags)– Shared-rec group (recommend tags most commonly
applied to both the target movie and to the most similar movies to the target movie)
Overall Tag Usage
• Overall tag usage statistics by experimental group
The tags column overall total is smaller than the sum of the groups, because two groups might independently use the same tag
Tag Classification• Factual tags identify “facts” about a movie such as people, places,
or concepts. Help to describe movies and find related movies • Subjective tags express user opinions related to a movie. • Personal tags are most often used to organize a user’s movies (item
ownership, self-reference, task organization)
63% factual, 29% subjective, 3% personal
How strongly do investment and habit affect personal tagging behavior?
• Similarity of tag class of the nth tag applied by a user to – tag class distributions of other tags applied by the user
before the nth tag (applied)– tag class distributions of tags viewed by the user (viewed)– tag class distributions of the uniform tag class distribution
(uniform)
• Example: x(nth)= [0, 1, 0] (fact, sub, per)– y(1~n-1, applied or viewed) = [0.62, 0.35, 0.13] => x*y =
0.37– y~uniform = [1/3, 1/3, 1/3] => x*y = 0.58
How strongly do investment and habit affect personal tagging behavior?
• Both habit/investment and tags viewed appear to influence the class of applied tags.
How strongly do investment and habit affect personal tagging behavior?
• Probability that a user’s nth applied tag is a new tag decreases over time
How does the tag selection algorithm affect users’ satisfaction with the system?
• Final tag application class distribution by experimental group
The dominant tag class for each group is bolded. (Each row sums to 100%.)
How does the tag selection algorithm affect users’ satisfaction with the system?
factual
subjectivepersonal
subjective
factual
personal
subjective personal
factualfactual
subjective
personal
Unshared Shared
Shared Popular Shared
Recommendation
Group tag application number Group tag application number
Frac
tion
of ta
g ap
plic
ation
s
Group tag application number
Frac
tion
of ta
g ap
plic
ation
s
Frac
tion
of ta
g ap
plic
ation
sFr
actio
n of
tag
appl
icati
ons
Tag Suggestion: User Satisfaction
• Participants didn’t like intrusive tag suggestion (e.g., popup after movie rating)
• Participants didn’t like inference algorithm either– Wrong inference makes users confusing
• e.g., suggested tag “small town” for the movie “Swiss Family Robinson” “I’m confused – I thought it was about people on a deserted island??”
• Yet, suggestion algorithm worked well in terms of displaying a higher number of tags– Pervasiveness may lead users to tag more in general
Influences on TagChoices in del.icio.us
Emilee Rader and Rick WashSchool of Information, University of Michigan
CSCW 2008
http://bierdoctor.com/papers/cscw08/
Understanding Tagging Process
• Social Hypothesis: Users’ tag choices are influenced by the tag choices of others
• Organizing Hypothesis: Users’ tag choices are personal and idiosyncratic, not influenced by others’ tag choices
Wash and Rader (2007)
• Respondents generally used one or more heuristics for choosing tags: – Reuse tags they have applied before to other web
pages– Create and adhere to mental rules or definitions
for specific tags – Choose terms they imagine using to re-find
bookmarks in the future
Research Question
• Look for a connection between the small scale (individual tag choices) and the large scale (aggregate patterns) for tags on del.icio.us.
• Hypotheses– Imitation: Users imitate tags that previous users
have applied to a web page– Organizing: Users re-use tags that they have applied
to other web pages– Recommended: Users choose tags that are
suggested via the del.icio.us posting interface
Regression
• Logistic regression model: tag chosen = f(used.onSite, used.byUser, interaction, tag dummys, random effect(user))– Used.onSite: imitation– Used.byUser: organizing– Interaction: onSite=1, byUser=1 (i.e., Recommendation)
• Data set:– Randomly chose 30 web pages from the sample that had been
bookmarked by at least 100 users. – In June 2007, the complete public bookmark and tag histories for all of
the approximately 12,000 users who had ever bookmarked any of these 30 web pages were downloaded
– Complete tag histories for 30 web pages bookmarked in del.icio.us, as well as tag histories for all users who ever bookmarked any of those 30 web pages as of June 2007.
Regression Results
• Organizing hypothesis is strongly supported (less influence by social and recommendation mechanisms in Delicious)
Summary
• Applications; Motivations• Tag Types; linguistic Classification• Tag Generation Models; Tag Distributions• Tag Semantics; Tag Visualization; Tag Design Space• Relationship between community influence and personal
tendency– Influenced by personal tagging behavior and tag selection algorithm
(community input)– Tag class distribution differs widely across different groups– Quality of tag recommendation matters
• Tagging process is mainly driven by information organizing behavior (i.e., personal tendency) in Delicious web site.