[] analyzing spammers' social networks for fun and profit

61
WMMKS Lab 郭至軒 Analyzing Spammers' Social Networks for Fun and Profit WWW'12 A Case Study of Cyber Criminal Ecosystem on Twitter Chao Yang, Texas A&M University Robert Harkreader, Texas A&M University Jialong Zhang, Texas A&M University

description

20121219 Lab Paper Presenation.

Transcript of [] analyzing spammers' social networks for fun and profit

Page 1: [] analyzing spammers' social networks for fun and profit

WMMKS Lab 郭至軒

Analyzing Spammers' Social Networks for Fun and Profit

WWW'12

A Case Study of Cyber Criminal Ecosystem on Twitter

Chao Yang, Texas A&M UniversityRobert Harkreader, Texas A&M UniversityJialong Zhang, Texas A&M University

Page 2: [] analyzing spammers' social networks for fun and profit

Criminal Account

malicious behavior

Page 3: [] analyzing spammers' social networks for fun and profit

Twitter Rule

A Twitter account can be considered to be spamming, and thus be suspended by Twitter.

Page 4: [] analyzing spammers' social networks for fun and profit

Twitter Rule

If it has a small number of followers compared to the amount of accounts that it follows.

100010 selfFollow Follow

Page 5: [] analyzing spammers' social networks for fun and profit

Who follow criminal accounts?

Let the criminal accounts still exist.

Page 6: [] analyzing spammers' social networks for fun and profit

Cyber Criminal Ecosystem

criminal account

criminal supporter

legitimate account

inner outer

victim

Page 7: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

inner

Page 8: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

G = (V,E)

V: all criminal accountsE: all follow relationship, directed edge

2,060

9,868

Page 9: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Relationship Graph Connected Components8 weakly connected components (at least 3 nodes)

521 isolated nodes

Page 10: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Finding 1:Criminal accounts tend to be socially connected, forming a small-world network.

Page 11: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Graph Density

Account Follow Relationship Density

Criminal Space in Sample

2,060 9,868 2.33 × 10-3

Entire Twitter Space

41.7 × 106 1.47 × 109 8.45 × 10-7

Page 12: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Graph Density

Account Follow Relationship Density

Criminal Space in Sample

2,060 9,868 2.33 × 10-3

Entire Twitter Space

41.7 × 106 1.47 × 109 8.45 × 10-7

Almost 3,000 times

Page 13: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Reciprocity Number of Bidirectional Links

Reciprocity of 95% criminal accounts higher than 0.2.

Reciprocity of 55% normal accounts higher than 0.2.

Reciprocity of around 20% criminal accounts are nearly 1.0.

Page 14: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Average Shortest Path LengthAverage number of steps along the shortest paths for all possible pairs of graph nodes.

ASPL

Criminal Accounts 2.60

Legitimate Accounts 4.12

Page 15: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Criminal accounts have strong social connections with each other.

Page 16: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

What are the main factors leading to that structure?

Page 17: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Tend to follow many accounts without considering those accounts' quality much.

Following Quality:

average follower number of an account's all following accounts

Page 18: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Tend to follow many accounts without considering those accounts' quality much.

FQ of 85% criminal accounts lower than 20,000.

FQ of 45% normal accounts lower than 20,000.

Page 19: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Criminal accounts, belonging to the same criminal organizations.

Page 20: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Criminal accounts, belonging to the same criminal organizations.

Group criminal accounts into different criminal campaigns by

malicious URL.

17 campaigns

8,667 edges

2,060

9,868

87.8 %

Page 21: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Provide followers to criminal accounts

Break the Following Limits Policy

Evade spam detection

1.

2.

Page 22: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Finding 2:Compared with criminal leaves, criminal hubs are more inclined to follow criminal accounts.

Page 23: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Relationship Graph

HITS algorithm to calculate hub score

k-means algorithm to cluster them

criminal hubs: 90criminal leaf: 1,970

Page 24: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Criminal Following Ratio (CFR):

ratio of the number of an account’s criminal-followings to its total following number

Page 25: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

CRF of 80% criminal hubs higher than 0.1.

CRF of 60% criminal leaves lower than 0.05.

CRF of 20% criminal leaves higher than 0.1.

Page 26: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Why?

Page 27: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Criminal hubs tend to obtain followers more effectively by following other criminal accounts.

Shared Following Ratio (SFR):

percentage of an account’s followers, who also follows at least one of this account’s criminal-followings

Page 28: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

Criminal hubs tend to obtain followers more effectively by following other criminal accounts.

SRF of 80% criminal hubs higher than 0.4.

CRF of 5% criminal leaves higher than 0.4.

Page 29: [] analyzing spammers' social networks for fun and profit

Inner Social Relationship

criminal leaves

criminal hubs

following leaves and acquiring their followers’ information

randomly following other accounts to expect them to follow back

Page 30: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Page 31: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

criminal supportersaccounts outside the criminal community, who have close "follow relationships" with criminal accounts

Page 32: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

MR score:

measuring how closely this account follows criminal accounts

MR score

Page 33: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

the more criminal accounts followed, the higher score

the further away from a criminal account, the lower score

the closer the support relationship between a Twitter account and a criminal account, the higher score

1.

2.

3.

Page 34: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

Malicious Relevance Graph, G = (V,E,W)

V: all accountsE: all follow relationship, directed edgeW: weight for each edge, closeness of relationship

Page 35: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

MR Score Initialization:

Mi = 1, if Vi is criminal accountMi = 0, if Vi is not criminal account

Page 36: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

MR Score Aggregation:

an account’s score should sum up all the scores inherited from the accounts it follows

C1

C2

AMR(C1) = M1

MR(C2) = M2

MR(A) = M1 + M2

Page 37: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

MR Score Dampening:

the amount of MR score that an account inherits from other accounts should be multiplied by a dampening factor of α according to their social distances, where 0 < α < 1

A2A1C

MR(C) = M MR(C) = α × M MR(C) = α2 × M

Page 38: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

MR Score Splitting:

the amount of MR score that an account inherits from the accounts it follows should be multiplied by a relationship-closeness factor A1

A2

CMR(A1) = 0.5 × M

MR(C) = M

MR(A2) = 0.5 × M

Page 39: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

n: number of total nodesIij: { 0, 1 }, if (i,j) ∈ E, Iij = 1; otherwise, Iij = 0

Page 40: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Malicious Relevance Score Propagation Algorithm (Mr.SPA)

I: the column-vector normalized adjacency matrix of nodes

Page 41: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

After Mr. SPA...

use x-means algorithm to cluster accounts based on their MR scores

most accounts have relatively small scores and are grouped into one single cluster

most accounts do not have very close follow relationships with criminal accounts

5,924 criminal supporters

Page 42: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Social ButterfliesThose accounts that have extraordinarily large numbers of followers and followings.

use 2,000 following as a threshold

3,818 social butterflies

Page 43: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Social ButterfliesThe reason why social butterflies tend to have close friendships with criminals is mainly because most of them usually follow back the users who follow them without careful examinations.

Page 44: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Social PromotersThose accounts that have large following-follower ratios, larger following numbers and relatively high URL ratios.

whose URL ratios are higher than 0.1, and following numbers and following-follower ratios are both at the top 10-percentile

508 social promoters

Page 45: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

Social PromotersThe reason why social promoters tend to have close friendships with criminal accounts is probably because most of them usually promote themselves or their business by actively following other accounts without considerations of those accounts’ quality.

Page 46: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

DummiesThose accounts who post few tweets but have many followers.

post fewer than 5 tweets and whose follower numbers are at the top 10-percentile

81 dummies

Page 47: [] analyzing spammers' social networks for fun and profit

Outer Social Relationship

DummiesThe reason why dummies intend to have close friendship with criminals is mainly because most of them are controlled or utilized by cyber criminals.

Page 48: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

The number of Twitter accounts is HUGE!

Page 49: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

start from a seed set

Criminal account Inference Algorithm (CIA)

Page 50: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

criminal accounts tend to be socially connected

criminal accounts usually share similar topics, thus having strong semantic coordinations among them

Criminal account Inference Algorithm (CIA)

1.

2.

Page 51: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Criminal account Inference Algorithm (CIA)

Malicious Relevance Graph, G = (V,E,W)

V: all accountsE: all follow relationship, directed edgeW: weight for each edge, WS(i,j)

P.S. SS: Semantic Similarity Score

Page 52: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

n: number of total nodesIij: { 0, 1 }, if (i,j) ∈ E, Iij = 1; otherwise, Iij = 0

Criminal account Inference Algorithm (CIA)

Page 53: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Dataset I: around half million accounts from our previous study [35]

Dataset II: another new crawled 30,000 accounts by starting from 10 newly identified criminal accounts and using BFS strategy

Page 54: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Selection Strategies

CA: Criminal AccountMA: Malicious Affected Account

100 seeds, select 4,000 accounts

Dataset I

Page 55: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Selection Sizes

CA: Criminal AccountMA: Malicious Affected Account

100 seeds

Dataset I

Page 56: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Seed Sizes

CA: Criminal AccountMA: Malicious Affected Account

select 4,000 accounts

Dataset I

Page 57: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Seed Type

CA: Criminal AccountMA: Malicious Affected Account

100 seeds, select 4,000 accounts

Dataset I

Page 58: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Recursive Inference

CA: Criminal AccountMA: Malicious Affected Account

50 seeds, select 4,000 accounts

Dataset I

Page 59: [] analyzing spammers' social networks for fun and profit

Inferring Criminal Accounts

Evaluation of CIA

Seed Type

CA: Criminal AccountMA: Malicious Affected Account

10 seeds, select 4,000 accounts

Dataset II

Page 60: [] analyzing spammers' social networks for fun and profit

Conclusion

S Provide a macro scale to view the criminal accounts.

W Focus on analysis and use heuristic method. And the detail of semantic similarity score has omitted.

OFind out many malicious accounts, maybe analyze them can improve accuracy of CIA to detect criminal accounts.

T There is bias for training dataset, and let CIA improve not much when detect criminal accounts.

Page 61: [] analyzing spammers' social networks for fun and profit

Thank You for Your Listening!