Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags:...
Transcript of Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags:...
![Page 1: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/1.jpg)
Social Tagging
Kristina Lerman USC Information Sciences Institute
Thanks to Anon Plangprasopchok for providing material for this lecture.
![Page 2: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/2.jpg)
essembly
delicious
Bugzilla
Social Web
![Page 3: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/3.jpg)
essembly
delicious
Bugzilla
Social Web is a platform for people to create, organize, and share information
![Page 4: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/4.jpg)
Create Information
• People create content (resources) • Text posts: blogs, Twitter, … • Images: Flickr, Picasa, … • Videos: YouTube, Vimeo, … • News stories: Digg, Reddit, Slashdot, … • Bookmarks: Delicious, CiteULike, Bibsonomy, … • Personal profiles: Facebook, MySpace, … • Maps: OpenStreetMaps, … • Locations: FourSquare, …
![Page 5: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/5.jpg)
Organize Information
• People organize resources • Annotate with metadata
• tags: descriptive labels • geotags: geographic coordinates
• Add to folders: organize content within personal hierarchies • E.g., sets and collections on Flickr
• Other types of metadata may include • Discussions, comments, reviews • Ratings, votes, …
• Social Tagging most popular form of annotation
![Page 6: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/6.jpg)
Social Tagging: Delicious
Content (webpage)
User Tags
![Page 7: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/7.jpg)
Rainbow bee-eater Merops ornatus Australia Queensland Mackay Gardens
Mackay May 2008 (Set) Birds (Set) Birds (Pool) Canberra (Pool) Field Guide: Birds of the World (Pool) Birds, Birds, Birds (Pool) BIRDPIX (3/day) (Pool) Australian Birds (Pool) Birds – Kingfishers, Pittas, and Bee-eaters (Pool) Birds of Queensland (Pool)
+ + + + +
+ + + +
+
tags
submitter
public groups
discussion
private albums
Social Tagging: Flickr
![Page 8: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/8.jpg)
Share Information
• People share resources • Social networks: broadcast to social connections
• Friends on Facebook, … • Fans/Followers on Twitter, Digg, …
• Groups affiliations • Hotlists: emerge from collective activity
• E.g., Digg front page, Flickr Explore, Flickr Trends…
![Page 9: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/9.jpg)
Social Networks: Facebook
![Page 10: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/10.jpg)
Social Networks: Flickr
![Page 11: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/11.jpg)
Harvesting Knowledge from Social Tagging
Users Tags
Resources
Resource (web page) User Tags
User Resource (photo)
Tags
RR graph: PageRank
UU graph: Social network analysis
RUT hypergraph: Harvesting knowledge from social tagging
![Page 12: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/12.jpg)
Overview
Harvesting knowledge from social tagging • “Structure of Collaborative Tagging Systems”
• Statistics of tagging activity • Consensus about meaning of document quickly emerges from the
opinions of many users
• “Exploiting Social Annotation for Automatic Resource Discovery” • Learn hidden topics in a collection of tagged documents • Use hidden topics to find relevant documents
![Page 13: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/13.jpg)
Social Tagging
• Tags are labels attached to content • Chosen from an uncontrolled personal vocabulary • Help users to more efficiently
• Browse • Filter • Search information
• Collaborative/social tagging • Anyone can attach labels to resources (not only experts or producers
of content) • Collectively, tags represent a semantic annotation of a resource
(alternative to Semantic Web)
![Page 14: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/14.jpg)
Tagging and Taxonomies
• Taxonomy – hierarchical, exclusive organization of objects • Linnaean classification
felidaepantheratiger felidaefeliscat
• File system: articles about cats in Africa
c:\articles\cats c:\articles\africa c:\articles\africa\cats c:\articles\cats\africa
Search multiple folders to find all relevant content
• Tagging – non-hierarchical, inclusive organization of objects • Articles tagged ‘cat’, ‘africa’
But, will not find articles tagged with ‘cheetah’
‘africa’ ‘cats’
‘cats’ AND ‘africa’
![Page 15: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/15.jpg)
Kinds of Tags
• What content is about (topic) identify who or what document is about: ‘cat’, ‘africa’
• What it is what kind of thing it is: ‘article’, ‘blog’, ‘book’
• Who owns it who owns/created content: ‘nikographer’
• Refining categories refine or qualify categories, especially numbers
• Identify qualities or characteristics express opinion: ‘funny’, ‘interesting’
• Self-reference ‘mystuff’
• Task organizing ‘toread’, ‘jobsearch’
![Page 16: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/16.jpg)
Social Tagging Dimensions
• Tagging rights: who can tag? • Self-tagging – only resource owner (blog posts, Flickr by convention) • Free-for-all – anyone can tag a resource (Delicious)
• Consolidation: assisted tag generation? • Blind tagging – user enters tags independently of other users • Suggestive tagging – system suggests tags based on annotations of other
users • Resource type
• Text – Web pages, blog posts, bibliographic material, … • Multimedia – images, videos, …
• Source of content • User-owned – e.g., images on Flickr • Scavenged from the Web – e.g., Delicious
• Connectivity: links between users • Reciprocity – undirected links (Facebook) vs directed (Flickr, Delicious) • Link type – friend relationship vs contact (on Flickr) shows degree of trust
![Page 17: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/17.jpg)
User Motivations
What are users’ motivations to tag? • Organizational
• Mark items for future personal retrieval
• Social • Mark items for others to find, e.g., concert photos on Flickr
• Can result in spamming • Express opinion, e.g., “funny” tag on video
Collective value emerges from tagging decisions of individual users
• How can users be incentivized to contribute high quality annotations?
![Page 18: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/18.jpg)
Social Tagging on del.icio.us
• Social bookmarking site del.icio.us • Users can tag any Web page (URL)
• Delicious suggests tags based on existing tags for the URL • Delicious aggregates popular tags
• Anyone can see bookmarks of others • Users can create social links
• Value of social tagging • Users bookmark for their own benefit
• Organization • Retrieval
• Useful public good emerges • Tag suggestions • List of popular URLs and tags (hotlists)
![Page 19: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/19.jpg)
Tagging on del.icio.us
Content (webpage)
User Tags
![Page 20: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/20.jpg)
Dynamics of del.icio.us
• Delicious dynamics [Golder & Huberman] • User activity • Tag vocabulary growth • Datasets
• Bookmarks collected over 4 days in June 2005 • Sample of users who posted bookmarks in this period
![Page 21: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/21.jpg)
Dynamics of User Interests
• Tags reflect how user’s interests and knowledge change in time • Tag1 and Tag2 are
consistent interests of the user
• Tag3 is new interest • Or a new way to
differentiate between concepts/interests
tag1
tag2 tag3
bookmark
Tim
es t
ag h
as b
een
used
![Page 22: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/22.jpg)
Stable Patterns in Tagging
• Consider a single URL • As it is tagged by more users • Each tag’s proportion represents the combined description of the URL by
many users • After ~100 bookmarks, relative frequency of each tag is fixed
Tag
prop
ortio
n (w
rt a
ll ta
gs)
Number of bookmarks for URL
![Page 23: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/23.jpg)
Findings
• Consensus about a URL’s topics • Emerges quickly- after ~100 users bookmark it
• URLs do not have to become popular for tags to be useful • Minority opinions can stably coexist with popular ones • Can be used to categorize/organize URLs
• Reasons for consensus • Imitation – users imitate tag selection of others
• But, stable patterns also exist for less common tags (not shown to users)
• Shared knowledge • Can we learn it?
![Page 24: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/24.jpg)
Learning from Social Tagging/Annotation
Goal: Learn concepts from social annotations created by many users
• Annotations by an individual user may be inaccurate and incomplete…
• Annotations from many different users may complement each other, making them meaningful in aggregate
![Page 25: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/25.jpg)
“Jaguar”
Animal Car
= ?
By A lion Rohrs By sparky2000
Learning Concepts from Tags
![Page 26: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/26.jpg)
Goal of Learning Algorithm
Tags
“Animal” “Car”
“Flower”
?
Group semantically related tags and resources
Resources
A group ~ A concept
![Page 27: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/27.jpg)
Challenges in Learning from Annotations
• Sparse data 4-7 tags per bookmark; 3.74 tags per photo [Rattenbury07+]
• Ambiguity jaguar: car vs. animal
• Polysemy window: hole in a wall vs. glass pane that resides in it
• Synonymy kid vs. child
• Disagreement cats\africa vs. africa\cats
• Different Levels of Specificity Dog vs. Beagle
• Multiple facets Bird tagged by appearance, location, scientific/colloquial name”
![Page 28: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/28.jpg)
Document Modeling Approaches
• ‘Bag-of-words’ – tf-idf • Document as a vector of word frequencies
• Small reduction in document description length • Does not handle synonymy and polysemy
• Latent semantic indexing - LSI • Identifies subspace of tf-idf that captures most of the variance in a corpus
• Reduction in document description length (# principal components) • Handles polysemy and synonymy
• Topic modeling – pLSI, LDA • Documents as random mixtures over (hidden) topics, where each topic is a
distribution over words • Large reduction in description length (# topics)
• Inference • Given a document corpus, estimate parameters of the model
– Compute distribution of hidden topics given the document
![Page 29: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/29.jpg)
Document (r)
Topics (z)
Possible Words
Possible Topics
Generated words (t)
pLSI (Hofmann99); LDA (Blei03+)
A Stochastic Process of Word Generation
![Page 30: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/30.jpg)
Possible Words
Possible Topics
travel, flights, airline, flight, airlines, guide, aviation, …
map, maps, world, earth, latitude, longitude, directions, address, geography, distance, zip, usa, gmaps, atlas, …
Learned Topics
video, download, bittorrent, p2p, youtube, media, torrent, torrents, movies, …
High probability words in each topic:
![Page 31: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/31.jpg)
Apply LDA to Tagging
Tags (words)
“Animal” “Car”
“Flower”
LDA
Resource (document)
![Page 32: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/32.jpg)
Application to Resource Discovery
• Resource discovery • Given a seed source, find other data sources that provide the same
functionality • e.g., find geocoders like http://geocoder.us, which returns
geographic coordinates of a specified US address
• Benefits • Increase robustness of II applications
• If http://geocoder.us fails, substitute with another source • Increase coverage of II applications
• http://geocoder.ca geocodes US AND Canadian addresses
![Page 33: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/33.jpg)
discovery Invocation
& extraction
semantic typing
source modeling
Background knowledge • Seed URL
anotherWS unisys
unisys
• sample input values
http://wunderground.com
“90254”
• patterns • domain types
unisys(Zip,Temp,Humidity,…)
• definition of known sources • sample values
unisys(Zip,Temp,…) :-weather(Zip,…,Temp,Hi,Lo)
Source Discovery and Modeling [Ambite et al, 2009]
![Page 34: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/34.jpg)
Exploiting Social Annotation for Resource Discovery
Approach: Use topic modeling of social annotation obtained from Delicious to find sources similar to a given seed URL
Seed URL
Candidates Users Tags
URLs Probabilistic Learning Model
Compute URL Similarity
URL’s distribution over concepts
Rank by Similarity To seed
e.g., LDA, to learn concepts
Obtain Annotation corpus from Delicious
![Page 35: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/35.jpg)
• Crawling strategy • For each seed, retrieve the 20 popular tags • For each tag, retrieve sources annotated with same tag • For each source, retrieve all tags
Corpus of Annotated Resources
![Page 36: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/36.jpg)
• Use LDA to learn 80 topics in each corpus • Distributions over topics is used to compute similarity of target URL to
seed
Topic Modeling of Social Annotations
![Page 37: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/37.jpg)
• Manually label top 100 ranked URLs by similarity to seed URL • Compare to Google’s “find similar URLs” functionality
Source Discovery Results
![Page 38: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/38.jpg)
Source Discovery Results
![Page 39: Social Tagging - Information Sciences Institute · • tags: descriptive labels • geotags: geographic coordinates • Add to folders: organize content within personal hierarchies](https://reader033.fdocuments.net/reader033/viewer/2022051916/6007aaf3d1c60d4dac35ca16/html5/thumbnails/39.jpg)
Discussion
• Users express their knowledge through the tags they create while annotating content
• Apply document modeling techniques to social annotations data
• Infer hidden topics in annotated data • Use topics for source discovery task
• Outperforms standard Web search
• Next – Extract more complex types of knowledge from social annotations • Sentiment • Folksonomies