User Behavior Modeling on Financial Message Boards

30
User Behavior Modeling on Financial Message Boards Pritha D.N Sahaj Biyani December 9, 2015

Transcript of User Behavior Modeling on Financial Message Boards

Page 1: User Behavior Modeling on Financial Message Boards

User Behavior Modeling on Financial Message

BoardsPritha D.N

Sahaj BiyaniDecember 9, 2015

Page 2: User Behavior Modeling on Financial Message Boards

Introduction

Page 3: User Behavior Modeling on Financial Message Boards

Investors Hub

Page 4: User Behavior Modeling on Financial Message Boards

Objective

• To identify the roles users assume in these message board forums.

• Validate the “90-9-1 Rule for Participation Inequality” in the message boards community.

Page 5: User Behavior Modeling on Financial Message Boards

Dataset• Free US listed stocks message boards

• Time Period: January, 2001 - June, 2015

• Total Message Boards: 6,278

• Total Users: 52,558

• Total Posts: 5,624,024

Page 6: User Behavior Modeling on Financial Message Boards

Dataset Analysis

• Percentage of initiated posts: 30%

• 19% of users did not initiate any post.

• 80% of users initiated less than 20 posts.

Page 7: User Behavior Modeling on Financial Message Boards

Dataset Analysis

• Number of boards user participated in:

• 56% of users are active only on 1 board.

• 90% of users are limited to/ active on less than 20 boards.

Page 8: User Behavior Modeling on Financial Message Boards

Dataset Analysis

• Average response time of replies a user makes:

Page 9: User Behavior Modeling on Financial Message Boards

Dataset Analysis

• Number of posts across boards:

• 80% of posts made on less than 200 boards.

• 1000 out of 6278 boards account for 90% of posts made.

Page 10: User Behavior Modeling on Financial Message Boards

Dataset Analysis

• Percentage of initiated posts: 30%• From the graph we infer,

• 19% of users did not initiate any post.

• 80% of users initiated less than 20 posts.

Page 11: User Behavior Modeling on Financial Message Boards

Features1. Number of threads a user initiated over time2. Number of replies a user made over time3. Number of users a user replies to4. Number of users who reply to a user5. Number of boards a user is active on6. Number of followers7. Replier share , AVG[proportion of replies a user gets on a board]8. Reply share, AVG[proportion of reply a user makes on a board]9. Average Response time10. Volume of content he posted11. Number of links he has posted

Content Related

User Network StructureActivity of User

Page 12: User Behavior Modeling on Financial Message Boards

Methodology

• Data Preprocessing

• Feature Selection/Extraction

• Clustering

• Role Inference

Page 13: User Behavior Modeling on Financial Message Boards

Data Preprocessing

• We use Min-Max Normalization• Normalize data between [0 – 1]

Page 14: User Behavior Modeling on Financial Message Boards

Feature Selection

• Step 1 – Feature Extraction • Do Principal Component Analysis• Do K-means on the projected data and extract feature labels

• Step 2 – Feature importance using Random Forest classifier

Page 15: User Behavior Modeling on Financial Message Boards

Principle Component % Variance Cumulative %

Variance

1 62.16 62.16

2 15.07 77.23

3 07.95 85.18

4 05.74 90.92

5 03.57 94.49

6 01.67 96.16

7 01.48 97.64

8 00.68 98.32

9 00.59 98.91

10 00.55 99.46

11 00.54 100

Feature Extraction using PCA

Scree Plot

Page 16: User Behavior Modeling on Financial Message Boards

Choosing the number of clusters

Page 17: User Behavior Modeling on Financial Message Boards

Elbow Plot

• Plot the Within Group Sum of Squares versus K, and look at the “elbow-point” in the plot.

• The first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph.

• Choose the number after the last big drop.

• This "elbow" cannot always be unambiguously identified.

Page 18: User Behavior Modeling on Financial Message Boards

Silhouette Coefficient

a(i) is the average dissimilarity of with all data within the same cluster.

b(i) is the lowest average dissimilarity of to any other cluster, of which is not a member.

Page 19: User Behavior Modeling on Financial Message Boards

Feature Selection

• Train a Random Forest classifier using all the features and labels assigned by K-means.

• Feature importance is defined as the total decrease in node impurity (weighted by the probability of reaching that node ,which is approximated by the proportion of samples reaching that node) averaged over all trees of the ensemble.

Page 20: User Behavior Modeling on Financial Message Boards

Clustering Users• Applied K-Means clustering with K=4.• Run 10 times with different seeds.• 300 iterations in a single run.

Clusters User Count % of UsersCluster 1 47295 91.7

Cluster 2 360 0.73

Cluster 3 3322 6.44

Cluster 4 581 1.13

Page 21: User Behavior Modeling on Financial Message Boards

Cluster AnalysisInitiation of Posts by users of each cluster

Cluster 130%

Cluster 222%

Cluster 344%

Cluster 43%

Post Initiation Share

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Series10

200

400

600

800

1000

1200

Cluster 1

Cluster 2

Cluster 3

Cluster 4

10.9

1066.6

228.298.3

Initiation Per User

Page 22: User Behavior Modeling on Financial Message Boards

Cluster AnalysisReplies by users of each cluster

Cluster 122%

Cluster 228%

Cluster 347%

Cluster 44%

Reply Share

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Series10

500

1000

1500

2000

2500

3000

Cluster 1

Cluster 2

Cluster 3

Cluster 4

17.5

2946.3

534.9

255.9

Reply Per User

Page 23: User Behavior Modeling on Financial Message Boards

Cluster Analysis

Clust er1 Clust er 2 Clust er 3 Clust er 4

34

22

24

18

20

27

11

25

41

49

22

44

4

2

42

2

Inter-Cluster reply %Cluster1 Cluster4 Cluster2 Cluster3

Page 24: User Behavior Modeling on Financial Message Boards

Cluster AnalysisFeature 3: Number of users a user replies to

Page 25: User Behavior Modeling on Financial Message Boards

Cluster AnalysisFeature 4: Number of users who reply to a user

Page 26: User Behavior Modeling on Financial Message Boards

Role Inference• Cluster1: Lurkers

• The post initiated per user and reply made per user ratio are very less.

• Cluster2: Super Users• Very active. Contribute most to the boards. Engage with lot of users.

• Cluster3: Contributors• Account for 45% of total post initiations, 46% of total replies made. Have a high

response time meaning they respond very fast. Backbone of the forum.

• Cluster4: Taciturns• Limited to themselves. Initiate very less but reply often mostly to users in their own

cluster.

Page 27: User Behavior Modeling on Financial Message Boards

Participation Inequality

% of Users Content Contributed0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

91.73

24

0.73

26

6.44

46

1.134

Lurkers Super-Users Contributors Taciturns

Page 28: User Behavior Modeling on Financial Message Boards

Conclusion

• Users take up different roles on online communities and the cluster of users can be identified by their behavioral pattern.

• Participation Inequality exists on financial message boards.

Page 29: User Behavior Modeling on Financial Message Boards

Conclusion

• Users take up different roles on online communities and the cluster of users can be identified by their behavioral pattern.

• Participation Inequality exists on financial message boards as well.

Page 30: User Behavior Modeling on Financial Message Boards

Thank You!