Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu...

Discovering Overlapping Groups in Social Media

Xufei Wang, Lei Tang, Huiji Gao, and Huan Liuxufei.wang@asu.edu

Arizona State University

Social Media• Facebook

– 500 million active users– 50% of users log on to Facebook everyday

• Twitter– 100 million users– 300, 000 new users everyday– 55 million tweets everyday

• Flickr– 12 million members– 5 billion photos

Activities in Social Media

• Connect with others to form “Friends”

• Interact with others (comment, discussion, messaging)

• Bookmark websites/URLs (StumbleUpon, Delicious)

• Join groups if explicitly exist (Flickr, YouTube)

• Write blogs (Wordpress,Myspace)

• Update status (Twitter, Facebook)

• Share content (Flickr, YouTube, Delicious)

Community Structure

• Behavior Studying– Individual ? Too many users– Site level ? Lose too much details– Community level. Yes, provide information with

vary granularity

Overlapping Communities

Colleagues

Family

Neighbors

Related Work

• Disjoint Community Detection– Modularity Maximization– Based on Link Structure, (how to understand ?)

• Overlapping Community Detection– Soft Clustering (Clustering is dense)– CFinder (Efficiency and Scalability)

• Co-clustering– Disjoint– Understanding groups by words (tags)

Problem Statement

• Given a User-Tag subscription matrix M, and the number of clusters k, find k overlapping communities which consist of both users and tags.

Our Contributions

• Extracting overlapping communities that better reflect reality

• Clustering on a user-tag graph. Tags are informative in identifying user interests

• Understanding groups by looking at tags within each group

Edge-centric View

• Cluster edges instead of nodes into disjoint groups– One node can belong to multiple groups – One edge belongs to one group

Edge-centric View

• In an Edge-centric viewedge u1 u2 u3 u4 u5 t1 t2 t3 t4

e1 1 0 0 0 0 1 0 0 0

e2 1 0 0 0 0 0 1 0 0

e3 0 1 0 0 0 1 0 0 0

e4 0 1 0 0 0 0 1 0 0

e5 0 0 1 0 0 0 1 0 0

e6 0 0 1 0 0 0 0 1 0

e7 0 0 0 1 0 0 0 1 0

e8 0 0 0 1 0 0 0 0 1e9 0 0 0 0 1 0 0 1 0

e10 0 0 0 0 1 0 0 0 1

Clustering Edges

• We can use any clustering algorithms (e.g., k-means) to group similar edges together

• Different similarity schemes

i Cxijc

cxSk 1

maxarg

Defining Edge Similarity

• Similarity between two edges e and e’ can be defined, but not limited, by

),()1(),()',( qptjiue ttSuuSeeS

• α is set to 0.5, which suggests the equal importance of user and tag

• Define user-user and tag-tag similarity

Independent Learning

• Assume users are independent, tags are independent

ttuueeS qpjie

)),(),((2

Normalized Learning

• Differentiate nodes with varying degrees by normalizing each node with its nodal degree

)0,...,0,1

,0,...,0,1,0,...0(),(

pi tupi ddtue

),(),()',(

qpuujitt

ttdduuddeeS

Correlational Learning• Tags are semantically close– Tags cars, automobile, autos, car reviews are used to

describe a blog written by sid0722 on BlogCatalog

u Х t u Х k

• Compute user-user and tag-tag cosine similarity in the latent space

Spectral Clustering Perspective• Graph partition can be solved by the Generalized

Eigenvalue problem

Spectral Clustering Perspective• Plug in L,W,Z, we obtain

• U and V are the right and left singular vectors corresponding to the top k largest singular values of user-tag matrix M

Synthetic Data Sets

• Synthetic data sets– Number of clusters, users, and tags – Inner-cluster density and Inter-cluster density (1%

of total user-tag links)– Normalized mutual Information• Between 0 and 1• The higher, the better

Synthetic Performance• We fix the number of users, tags, and density,

but vary the number of clusters

Synthetic Performance• We fixed the number of users, tags, and

clusters, but vary the inner-cluster density

Social Media Data Sets

• BlogCatalog– Tags describing each blog– Category predefined by BlogCatalog for each blog

• Delicious– Tags describing each bookmark– Select the top 10 most frequently used tags for

each person

Inferring Personal Interests

• Category information reveals personal interests, view group affiliation as features to infer personal interests via cross-validation

Connectivity Study• The correlation between the number of co-occurrence

of two users in different affiliations and their connectivity in real networks.

• The larger the co-occurrence of two users, the more likely they are connected

Understanding Groups via Tag Cloud

• Tag cloud for Category Health

Understanding Groups via Tag Cloud• Tag cloud for Cluster Health

Understanding Groups via Tag Cloud• Tag cloud for Cluster Nutrition

Conclusions and Future Work• Overlapping communities on a User-Tag graph• Propose an edge-centric view and define edge

similarity– Independent Learning– Normalized Learning– Correlational Learning

• Evaluate results in synthetic and real data sets• Many applications: link prediction, Scalability

References• I. S. Dhillon, “Co-clustering documents and words using bipartite spectral graph partitioning,”

in KDD ’01, NY, USA• L. Tang and H. Liu, “Scalable learning of collective behavior based on sparse social

dimensions,” in CIKM’09, NY, USA.• L. Tang and H. Liu, “Community Detection and Mining in Social Media,” Morgan & Claypool

Publishers, Synthesis Lectures on Data Mining and Knowledge Discovery, 2010.• G. Palla, I. Dernyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure

of complex networks in nature and society,” Nature’05, vol.435, no.7043, p.814• K. Yu, S. Yu, and V. Tresp, “Soft clustering on graphs,” in NIPS, p. 05, 2005.• U. Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp.

395–416, 2007.• M. E. J. Newman and M. Girvan, “Finding and evaluating community structure in networks,”

Phys. Rev. E, vol. 69, no. 2, p. 026113, Feb 2004.• S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75 –

174, 2010.

Contact the Authors

• Xufei Wang– xufei.wang@asu.edu– Arizona State University

• Lei Tang– ltang@yahoo-inc.com– Yahoo! Labs

Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu...

Documents

Transcript of Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu...

. Londoño S.C*, Garzon N.C, Semken S., Brandt E. Sandra.londono@asu.edu.

PROBLEM STATEMENT: GEOLOGY-GEOPHYSICS · PDF filePROBLEM STATEMENT: GEOLOGY-GEOPHYSICS GOAL Ronald Greeley Greeley@asu.edu Arizona State University 11 January 2001 Ronald Greeley Greeley@asu.edu

Steve Doig Cronkite School of Journalism steve.doig@asu.edu 0:00.

Data Mining and Machine Learning Lab Exploring Temporal Effects for Location Recommendation on Location-Based Social Networks Huiji Gao, Jiliang Tang,

Sean Flynn Arizona State University Sean.J.Flynn@asu.edu ...€¦ · Sean.J.Flynn@asu.edu September 18, 2016 Abstract I study the relation between rm debt structure and future nancial

Research @ {AME + CSE} Hari Sundaram hari.sundaram@asu.edu .

Influence and Correlation in Social Networks Xufei wang Nov-7-2008.

Data Mining and Machine Learning Lab Document Clustering via Matrix Representation Xufei Wang, Jiliang Tang and Huan Liu Arizona State University.

SWIMMING I - webapp4.asu.edu

xufei@wlzq.com.cn 万联证券

GLG110 Geologic Disasters and the Environment Instructors Professor Ramon ArrowsmithProfessor Amanda Clarke Email: ramon.arrowsmith@asu.edu clarke.glg110@asu.eduramon.arrowsmith@asu.educlarke.glg110@asu.edu.

windhorst114.asu.eduwindhorst114.asu.edu/agnlab.short.pdf · 2006-03-28 · windhorst114.asu.edu ... 7 ¸

Geomorphology GLG362/598 Instructor: Kelin X. Whipple, kxw@asu.edu, ISTB4-777 (5-9508)kxw@asu.edu TA: Andrew Darling, aldarlin@asu.edu, ISTB4- 603aldarlin@asu.edu.

Jeremy Rowe jeremy.rowe@asu.edu Copyright Planning Issues.

1 Improving Retrieval Accuracy in Web Databases Using Attribute Dependencies Ravi Gummadi & Anupam Khulbe gummadi@asu.edu – akhulbe@asu.edu Computer Science.

A.M. Kannan (amk@asu.edu) Arizona State University released/3... · 2016-12-08 · A.M. Kannan (amk@asu.edu) Arizona State University Chulalongkorn University December 8, 2016. Fuel

An LP-Based Heuristic for Optimal Planning Menkes van den Briel Department of Industrial Engineering Arizona State University menkes@asu.edu menkes@asu.edu.

Mark Andrew James Arizona State University Mark.A.James@asu.edu

karam@asu.edu arXiv:1804.08020v2 [cs.CV] 26 Apr 2018 · Lina J. Karam Arizona State University karam@asu.edu Abstract Perceptual quality assessment for synthesized textures is a challenging

Textkorpora in angewandter Slawistik Danko.Sipka@asu.edu dsipka Fünf Beispiele.