PersonalizedBroadcastMessage...

Personalized Broadcast MessagePrioritization

by

Beidou Wang

B.Sc., Zhejiang University, 2011

Thesis Submitted in Partial Fulfillment of theRequirements for the Degree of

Doctor of Philosophy

in theSchool of Computing ScienceFaculty of Applied Sciences

c© Beidou Wang 2018SIMON FRASER UNIVERSITY

Summer 2018

Copyright in this work rests with the author. Please ensure that any reproductionor re-use is done in accordance with the relevant national copyright legislation.

Approval

Name: Beidou Wang

Degree: Doctor of Philosophy (Computing Science)

Title: Personalized Broadcast Message Prioritization

Examining Committee: Chair: Oliver SchulteProfessor

Martin EsterSenior SupervisorProfessor

Jian PeiSupervisorProfessor

Ke WangInternal ExaminerProfessor

Qiaozhu MeiExternal ExaminerAssociate ProfessorDepartment of Electrical Engineering and ComputerScienceUniversity of Michigan

Date Defended: Jun 27, 2018

ii

Abstract

A broadcast message is defined as a message that can be sent to a group of subscribersat once. Popular types of broadcast messages include broadcast emails, tweets, broadcastshort messages and so on. With billions of broadcast messages sent and received every day,it is fundamentally impacting the way people live, work and communicate. However, therecomes a curse with broadcast messages, the broadcast message overload problem. Broadcastmessage overload refers to the situation in which the majority of broadcast messages areusually unimportant or irrelevant and people have to waste a large amount of time handlingthem, causing a trillion-level economy loss in productivity. The serious situation leads to athriving research field, personalized broadcast message prioritization, which aims to predictthe importance label for broadcast messages and help users to ease the burden of handlingunimportant broadcast messages.

In this thesis, we work on three broadcast message prioritization related research questionsfocusing on two popular types of broadcast messages, broadcast emails, and tweets.

First, we work on the mention recommendation problem in micro-blogging systems, whichis highly related to the tweet prioritization task. Mention recommendation tries to recom-mend the optimal set of users to be mentioned in a tweet in order to expand its diffusion.Considering tweet prioritization and user influence at the same time, we propose the firstframework to handle the mention recommendation problem by designing a new learning torank model.

In our second research question, we focus directly on the personalized broadcast emailprioritization task. We proposed the first broadcast email prioritization framework thatadopts the paradigm of collaborative filtering. To overcome the complete cold start itemchallenge, a novel active learning framework is proposed, considering unique challenges, likethe one-class implicit feedback and time-sensitive feedback.

In the third research question, we continue to work on the broadcast email prioritizationproblem while considering the fact that there exist large numbers of mailing lists in a realemail system. A cross-domain recommendation framework is proposed to transfer extraknowledge from other similar mailing lists, which helps to provide better predictions fornewly enrolled users and new mailing lists.

iii

Keywords: Broadcast Message Prioritization; Mention Recommendation; Active Learning;Cross Domain Recommendation

iv

Table of Contents

Approval ii

Abstract iii

Table of Contents v

List of Tables viii

List of Figures ix

1 Introduction 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Broadcast Message Prioritization Problem . . . . . . . . . . . . . . . 21.2.2 Expand Tweet Diffusion Considering Tweet Prioritization (Mention

Recommendation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Incorporating Collaborative Filtering into Broadcast Email Prioriti-

zation with Active Learning . . . . . . . . . . . . . . . . . . . . . . . 41.2.4 Broadcast Email Prioritization Considering Large Numbers of Mailing

Lists with Cross-domain Recommendation . . . . . . . . . . . . . . . 51.2.5 An Overview of our Approaches to Broadcast Message Prioritization 6

1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.1 Mention Recommendation Considering Tweet Prioritization . . . . . 91.3.2 Broadcast Email Prioritization with Active Learning . . . . . . . . . 91.3.3 Broadcast Email Prioritization with Cross Domain Recommendation 10

1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Related Works 122.1 Tweet Prioritization and Mention Recommendation . . . . . . . . . . . . . . 12

2.1.1 Tweet Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.2 Mention Recommendation . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Email Overload and Email Prioritization . . . . . . . . . . . . . . . . . . . . 142.2.1 Email Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

v

2.2.2 Email Prioritization . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Active Learning in Collaborative Filtering . . . . . . . . . . . . . . . . . . . 18

2.3.1 Non-personalized Active Learning Strategies . . . . . . . . . . . . . . 192.3.2 Personalized Active Learning Strategies . . . . . . . . . . . . . . . . 21

2.4 Cross-domain Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . 252.4.1 Domain Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.4.2 Domain Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.3 Knowledge Transfer Across Domains . . . . . . . . . . . . . . . . . . 28

3 Mention Recommendation Considering Tweet Prioritization 323.1 Background and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3 Recommendation Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.1 Ranking Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.3.2 Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.3 Ranking Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.4 Recommendation Overload Problem . . . . . . . . . . . . . . . . . . 41

3.4 Experiment Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4.3 Comparison Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.1 Algorithm Performance Evaluation . . . . . . . . . . . . . . . . . . . 443.5.2 Feature Importance Evaluation . . . . . . . . . . . . . . . . . . . . . 463.5.3 Recommendation Overload Evaluation . . . . . . . . . . . . . . . . 463.5.4 Result Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Broadcast Email Prioritization with Active Learning 494.1 Background and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3 The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3.1 Sampling Users for Feedback . . . . . . . . . . . . . . . . . . . . . . 544.3.2 Prediction for the Remaining Users . . . . . . . . . . . . . . . . . . . 58

4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.4.2 Data Pre-processing and Analysis . . . . . . . . . . . . . . . . . . . . 604.4.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.4.5 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

vi

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 Broadcast Email Prioritization with Cross Domain Recommendation 685.1 Background and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.3 The CBEP Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.3.1 User Feedback Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 735.3.2 Source Domain Set Selection . . . . . . . . . . . . . . . . . . . . . . 735.3.3 Priority Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.4 Source Domain Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.4.2 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.4.3 Efficiency Improvement . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.5.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.5.2 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5.3 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5.4 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6 Conclusion 876.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.1 Combining Different Methods of Broadcast Message Prioritization . 886.2.2 Active Learning for Personalized News Feed Ranking Problem . . . . 886.2.3 Multi-heuristic Active Learning for Broadcast Email Prioritization . 896.2.4 Consider Additional Features to Better Capture Semantic Similarity

of Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896.2.5 Broadcast Message Prioritization with Deep Neural Network . . . . 89

Bibliography 90

vii

List of Tables

Table 2.1 Comparison of Different Works on Email Prioritization . . . . . . . . 17Table 2.2 Domain Definition and Selection Strategy of Previous Works Part 1 . 26Table 2.3 Domain Definition and Selection Strategy of Previous Works Part 2 . 27

Table 3.1 Statistical Indicators on Modeling User Influence . . . . . . . . . . . . 40Table 3.2 Result Comparison of WTM and Baseline Algorithms . . . . . . . . . 44Table 3.3 Comparison on How Different Features Affect the Performance of WTM 45

Table 4.1 Email Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Table 4.2 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Table 4.3 Sampling Fairness Comparison . . . . . . . . . . . . . . . . . . . . . . 66

Table 5.1 Weighting Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

viii

List of Figures

Figure 1.1 Gmail Uses an Yellow Icon to Mark the Important Emails . . . . . 2Figure 1.2 By adding "@username" in a tweet, one can mention other users. . 4Figure 1.3 The Proposed Prioritization Approaches . . . . . . . . . . . . . . . 8Figure 1.4 Thesis Organization Overview . . . . . . . . . . . . . . . . . . . . . 10

Figure 2.1 Email Categorization Example: Gmail categorizes emails into pri-mary, social and promotions. . . . . . . . . . . . . . . . . . . . . . . 14

Figure 2.2 Active Learning Example from Netflix: In user registration process,a user is asked to rate at least three TV shows or movies. . . . . . . 18

Figure 2.3 Illustration of the Four-node Sub-graph . . . . . . . . . . . . . . . . 22Figure 2.4 An Example of User Partitioning Based Methods: First three layers

of a generated decision tree. . . . . . . . . . . . . . . . . . . . . . . 23Figure 2.5 Categorization of Active Learning Strategies . . . . . . . . . . . . . 25

Figure 3.1 After getting mentioned, a user can receive numerous types of noti-fications from Twitter, including in-app notifications, push notifica-tions, SMS notifications and email notifications. . . . . . . . . . . . 33

Figure 3.2 Performance Comparison of WTM and Baseline Algorithms . . . . 45Figure 3.3 Recommendation Density Comparison Between WTM and CCFR

(200 most recommended users) . . . . . . . . . . . . . . . . . . . . 47

Figure 4.1 Examples of Broadcast Emails: Broadcast Emails are widely usedfor group notification and email Marketing. Usually send from themailing list admin and with limited types of interaction (e.g. do notsupport reply). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Figure 4.2 Broadcast Email Prioritization with Active Learning: In this setting,given a mailing list, we know users’ ratings on previous emails. Themethod aims to predict importance label for a new email F. . . . . 53

Figure 4.3 Activity Time Probability of Users from Korea and United Kingdom 61Figure 4.4 Performance Comparison of Various Algorithms . . . . . . . . . . . 64Figure 4.5 Factor Comparison for Positive Feedback . . . . . . . . . . . . . . . 65Figure 4.6 PAL Performance with Different Sampling Percentage . . . . . . . . 66

ix

Figure 5.1 Domains and Mailing Lists: There are large numbers of mailing listsin an email system, focusing on various topics ranging from politicsto promotions. Each mailing list can be treated as a domain in thecross-domain recommendation problem. . . . . . . . . . . . . . . . . 70

Figure 5.2 Broadcast Email Prioritization with Cross Domain Recommenda-tion: When predicting importance label for a new email F, we notonly consider the ratings from target domain but also consider therating information from related source domains which share overlap-ping users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Figure 5.3 Baseline Comparison at Mailing List Level . . . . . . . . . . . . . . 83Figure 5.4 Baseline Comparison at Mail Level . . . . . . . . . . . . . . . . . . 83Figure 5.5 Optimazation Criteria Analysis . . . . . . . . . . . . . . . . . . . . 84Figure 5.6 Number of Selected Source Domains and Prediction Performance vs.

λpen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85Figure 5.7 All Domains vs. Hot Domains . . . . . . . . . . . . . . . . . . . . . 85

x

Chapter 1

Introduction

1.1 Background

Broadcast message is a message that can be sent to a group of subscribers at once [4]. Thereexist different types of broadcast messages. For instance, a broadcast email is an emailmessage sent to a group of receivers (i.e., a mailing list) [129]. A tweet, essentially a shortpost, is another type of broadcast message and can be sent to the followers of the author[131]. Broadcast short message is a text message which can be sent to the cells of a groupof customers and is widely used in marketing campaigns [94].

Among different types of broadcast messages, broadcast emails and tweets are the mostpopular ones [131, 129] and broadcast messages like emails and tweets are fundamentallyimpacting the way people live, work and communicate every day. For instance, despite 23years of history, email still remains as one of the most important communication toolsnowadays, with 2.6 billion users worldwide and over 100 billion broadcast emails sent orreceived everyday[97]. Take Twitter as another example, with more than 319 million monthlyactive users, over 500 million tweets are generated on Twitter everyday, spreading breakingnews, personal updates and spontaneous ideas [131].

However, together with the benefits comes a curse, broadcast message overload. Broad-cast message overload is defined as the situation in which users are left with the burden ofhaving to process a large volume of broadcast messages of different importance, wasting timeand energy in handling low-importance broadcast messages[139]. Take broadcast emails asan example, 58% of emails are unimportant or irrelevant, with a large portion belonging tobroadcast emails [19]. A person on average spends nearly 380 hours every year to handlethose emails, which causes a huge economic loss in productivity[19, 60] and described byNew York Times as “a $650 Billion Drag on the Economy" [60]. Similar overload problemsalso happen to Twitter. As pointed out by previous researchers [45], 30 tweets per hourare the most people can handle before their mental processing slows. With huge volumes oftweets being published every second, users can end up feeling foggy, irritable, unproductiveor angry after consuming large volumes of tweets[45].

1

Figure 1.1: Gmail Uses an Yellow Icon to Mark the Important Emails

The serious situation of broadcast message overload leads to a thriving research field,personalized broadcast message prioritization, which aims to predict importance labels ofbroadcast messages and alleviates the broadcast message overload problem. Many effortshave been done in both industry and academia to solve this problem[139, 118, 126]. Forinstance, Google has proposed an email importance prediction algorithm for Gmail[3] andit has been used in the Gmail Priority Inbox, in which every important email is marked witha yellow icon next to the sender’s name (Figure 1.1). Twitter also develops personalized feedranking method in which tweets are prioritized based on users’ personalized preference.

1.2 Research Questions

In this thesis, our research focuses on two popular types of broadcast messages, broadcastemails and tweets. Three prioritization related research questions are proposed, with oneworking on expanding tweet diffusion considering tweet prioritization, and two working di-rectly on the prioritization of broadcast emails. In this section, we first propose the definitionof the broadcast message prioritization problem and common characteristics of its differenttypes of applications. Then, we introduce three related research questions, along with theproposed approaches. To balance the thesis structure, we only provide brief introductionsin this section and the details can be found in the following chapters. Last but not least,the relations of the three research questions are further analyzed by discussing similarities& differences of tweets and broadcast emails and how the proposed prioritization methodsrelate to each other.

1.2.1 Broadcast Message Prioritization Problem

Broadcast message prioritization problem refers to the problem of predicting the personal-ized importance label of a broadcast message for a user. That is to say broadcast messageprioritization tries to predict how important a user perceives a broadcast message. Broad-cast message prioritization has lots of applications in real life, given the fact that thereexist many different types of broadcast messages. For instance, tweet prioritization aimsto predict the importance of tweets and can be used to re-rank a user’s tweet timeline byplacing the important tweets at the top of the timeline. Broadcast email prioritization tries

2

to predict the importance of a broadcast email to a specific user and can be used to solvethe email overload problem.

Even though broadcast message prioritization have lots of different types of applications,these applications share lots of common characteristics.

1. Since by definition, a broadcast message is a message that is sent to a group of users,broadcast message prioritization is naturally highly-correlated with collaborative fil-tering, which means we can rely on responses of users who share similar preferencesto predict the importance of a broadcast message.

2. A broadcast message usually comes with text content (e.g. the content of a tweet, thetitle and body of a broadcast email), which means we can always adopt the content-based classification method for the prioritization task.

3. Broadcast message prioritization problem is a cold-start problem, because at the timeof prioritization, the broadcast message has not been sent out, which means it comeswith no user interaction information.

4. Many types of broadcast messages, like tweets and broadcast emails, are time-sensitive,which requires the broadcast message prioritization method to work efficiently.

1.2.2 Expand Tweet Diffusion Considering Tweet Prioritization (MentionRecommendation)

To expand the diffusion of a tweet, how to recommend proper users to be mentioned, con-sidering tweet prioritization and user influence?

Instead of working directly on tweet prioritization, we propose the highly correlatedmention recommendation problem as our first research question.

Twitter, despite being the most influential media for spreading breaking news, personalupdates and spontaneous ideas, suffers from the lack of diffusion for tweets from ordinaryusers. 0.05 percent of Twitter users attract almost 50 percent of all attention within Twitter,and the average retweet rate is merely 0.11. Ordinary users easily feel not being heard, whichleads to negative user experience [131].

Fortunately, mention, the special mechanism enabled by adding "@username" in a tweet,can be used to boost the information diffusion. By using mention mechanism, the tweet isstill sent to all the followers of the author, but the mentioned users will receive additionalnotifications from Twitter and their possible retweet may help to expand the diffusionof the tweet. Thus, recommending proper users to be mentioned in a tweet (i.e., mentionrecommendation) becomes an important question and has not been touched by any previousworks before the publication of our paper[131].

Tweet prioritization plays an important part in mention recommendation because it canhelp to identify users who consider a tweet important (i.e. prioritize it) and those users have

3

Figure 1.2: By adding "@username" in a tweet, one can mention other users.

a higher probability of retweeting the tweet. Another important aspect to be considered inmention recommendation is user influence, because the retweet from influential users canbetter expand the diffusion of the original tweet. Taking the two aspects into consideration,we formulate the mention recommendation problem as a learning to rank problem andrecommend the top-ranked users to be mentioned in the tweet. Our proposed approachessentially belongs to the content-based recommendation. A support vector regression basedranking model is proposed and a series of new features are included. Details of the modelare discussed in chapter 3.

The study of this research question results in the first in-depth study of mention featurein microblogs by resolving the essential problem of whom to mention and there are alreadyover 70 follow-up works citing our work[131]. Our method is thoroughly evaluated on areal-life dataset. The proposed algorithm is proved highly effective compared against mul-tiple baselines and we also analyze how different new features affect the recommendationperformance. New challenges like the recommendation overload problem are also carefullyevaluated and discussed.

1.2.3 Incorporating Collaborative Filtering into Broadcast Email Priori-tization with Active Learning

How to incorporate collaborative filtering into personalized broadcast email prioritizationusing active learning?

Email is one of the most important communication tools today, but email overloadresulting from a large number of unimportant or irrelevant emails is causing trillion-leveleconomy loss every year. Thus personalized email prioritization algorithms are in urgentneed. Despite lots of previous effort on this topic, broadcast email, an important type ofemails, is overlooked in the previous literature. Broadcast emails are significantly differentfrom normal emails, introducing both new challenges and opportunities. On one hand,lack of real senders and limited user interactions invalidate the key features exploited bytraditional email prioritization algorithms; on the other hand, thousands of receivers forone broadcast email bring us the opportunity to predict importance through collaborative

4

filtering. In other words, we could generate priority predictions by collaborative filtering: fora user, if other users with similar interests have considered the email important (i.e. viewedit), he should be very likely to also consider it as an important email.

Based on the challenges and opportunities, we propose the second research question,which aims to design the first personalized email prioritization framework for broadcastemails and exploits collaborative filtering features by considering other users’ responses tobroadcast emails.

However, there exists one key challenge. Each email waiting for priority prediction iscompletely cold. That’s to say no view-email action has been observed since the emailhas not yet been sent to any users, which makes it impossible to exploit the collaborativefiltering features directly. We propose a novel active learning framework to solve the cold-start problem. The intuition is simple. For a new email, we first send it to a small portionof the users in the mailing list (e.g. 5%) without priority labels and wait for a short periodof time (e.g. half an hour) to collect their feedback (whether the user has read the email).Then based on these users’ feedback, we predict the priority for the remaining majority ofusers. Our personalized broadcast email prioritization problem can thus be naturally dividedinto two subproblems. First, how to sample the small portion of users whose feedback canhelp us the most in determining the email priority for the remaining users. Second, onceuser feedback is gathered, how to use it to accurately predict personalized priority for theremaining users. This problem can be considered as a type of active learning problem forthe recommendation system.

To cope with the above-mentioned challenges, we propose a novel active learning frame-work, in which we sample a small set of informative users considering both the preference ofa user and his tendency of giving responsive feedback, by exploiting features including textand attributes of emails and users’ email-view interactions. After gathering the feedback,we use a weight regularized matrix factorization method specifically designed for implicitfeedback to learn the preference scores for the remaining users and use the scores as the keyfeature to predict the final priority of the email.

Our method is thoroughly evaluated on a large scale industrial dataset and is demon-strated to be highly effective compared against numerous baselines.

1.2.4 Broadcast Email Prioritization Considering Large Numbers of Mail-ing Lists with Cross-domain Recommendation

In an email system with large numbers of mailing lists, how to perform personalized broadcastemail prioritization with cross-domain recommendation?

The third research question is a natural extension of the second one and we still adoptthe paradigm of collaborative filtering for broadcast email prioritization. In the second re-search question, an active learning framework was proposed to solve the cold start problemof incorporating collaborative filtering for broadcast email prioritization. However, the ob-

5

tained user feedback from active learning is still limited for well addressing the prioritizationproblem. For example, it cannot well handle new users of a mailing list and new mailinglists, which are very common in real systems and have limited historical data for collab-orative filtering. In the model proposed to solve the second research question [129], onlyone mailing list was considered, i.e., each mailing list is modeled independently withoutconsidering the existence of other mailing lists.

In a real-life mailing system like Gmail, there exist up to millions of various mailinglists, covering different topics varying from political campaigns to e-commerce promos. Thesize of the mailing lists is typically large, commonly containing thousands or even millionsof receivers. A user may be a member of dozens of mailing lists and mailing lists can havelarge numbers of shared users.

The viewing information accumulated in similar mailing lists can be very useful to enrichthe collaborative filtering evidence of a target mailing list. By resorting to these similarmailing lists, the aforementioned new user and new mailing list issues could be significantlyalleviated.

To handle the challenges mentioned and better exploit the extra useful informationfrom large numbers of mailing lists, a new cross-domain recommendation framework isproposed to solve the problem of broadcast email prioritization for many mailing lists. Crossdomain recommendation systems adopt different techniques to transfer learning from sourcedomains (e.g. book) to target domains (e.g. movie) in order to alleviate the sparsity problemand improve the accuracy of recommendations. The intuition of our approach is that eachbroadcast mailing list can be regarded as a domain in a cross-domain recommendationsystem. The problem of predicting the priority of emails with the help of extra informationfrom other related mailing lists can thus be formulated as the problem of improving thequality of recommendations in the target domain by incorporating information accumulatedfrom source domains, as in cross-domain recommendation.

In the study of this research question, we present the first in-depth discussion of per-sonalized prioritization for broadcast emails considering large numbers of mailing lists andpropose the first cross-domain recommendation framework that can select the set of sourcedomains from large numbers of domains. Our method is thoroughly evaluated on a real-lifedataset and is demonstrated to be highly effective compared to baseline methods.

1.2.5 An Overview of our Approaches to Broadcast Message Prioritiza-tion

In the previous sections, we propose three broadcast message prioritization related researchquestions on tweets and broadcast emails and briefly discuss the proposed approaches tosolve them. In this section, we provide an overview of our proposed prioritization approachesby first making a comparison of tweets and broadcast emails and then discussing the rela-tions of the prioritization approaches proposed in previous sections.

6

Broadcast Messages

As two popular types of broadcast messages, tweets and broadcast emails share lots ofcommon characteristics when it comes to prioritization.

1. Both tweets and broadcast emails are broadcast messages sent to a group of users. Atweet is analogous to a broadcast email and the followers of a user are analogous toone mailing list.

2. Both tweets and broadcast emails suffer from the complete cold start challenge inprioritization. That is to say, every email or tweet has not been sent out at the timeof prioritization and thus is completely cold with zero user response.

3. Both tweets and broadcast emails are time-sensitive information. Take tweets as anexample, 25% replies to a tweet happen within 67 seconds, 75% within 17 minutesand 75% message flow lasts less than an hour [131].

Due to these similarities, prioritization methods proposed in our research works canbe applied to both of tweets and broadcast emails. Despite the similarities, tweets andbroadcast emails also have some differences.

1. In micro-blogging systems like Twitter, there exists an explicit social network con-structed by the follow relation and an implicit social network constructed by users’social actions like retweet and reply[131]. In an email system, the social network isimplicit and can be constructed using user behaviors like sending, receiving, replying,and forwarding emails. These features are widely used in email prioritization[139, 65]

2. Besides viewing, tweets are usually attached to multiple types of behaviors like com-ment, retweet and like. In comparison, the types of interaction with broadcast emailsare limited. For instance, we usually will not reply, forward or cc a broadcast email;we would not manually label an irrelevant mail from a broadcast mailing list as spameither since we still wish to receive other emails from the mailing list. The commonuser action on broadcast emails is viewing.

Proposed Prioritization Methods

In this thesis, we propose three broadcast message prioritization methods. As illustrated inFigure 1.3, each of them comes with their own advantages. All of them can work on any typeof broadcast messages and can be further combined together to achieve better performance.

The first method models tweet prioritization by content-based features, like how muchthe user interest profile matches the content of a tweet and content-dependent social tiesbetween the user and the recommendation candidates. Content-based approaches are themost basic and widely used methods for broadcast message prioritization[126, 137, 3]. One

7

Figure 1.3: The Proposed Prioritization Approaches

of the key benefits of the content based approach is that it can well handle the completecold start status of a broadcast message.

As pointed out by previous research works[58] in recommendation systems, collaborativefiltering based method usually outperforms content-based method, because the latent factorlearned from the collaborative filtering method can capture some subtle hidden preferencewhich cannot be explicitly represented by the content features. So a natural idea is toincorporate collaborative filtering into the prioritization of broadcast messages to improvethe performance of the piroiritzation, which leads to our second proposed approach. Asdiscussed in previous sections, broadcast message prioritization suffers from the cold startchallenge which prevents the direct employment of collaborative filtering and thus an activelearning framework is proposed to handle the cold start challenge.

The third approach is a natural expansion of the second approach and we still adopt theparadigm of collaborative filtering for broadcast email prioritization. In our second researchquestion, we only consider single mailing list, but in a real email system, there can belarge numbers of mailing lists and it is beneficial to learn useful extra information fromsimilar mailing lists. The extra information can not only further improve the prioritizationprecision but also help to alleviate the cold start problem of new users and new mailinglists. Thus broadcast email prioritization considering large numbers of mailing lists can becast as a cross-domain recommendation problem, in which each mailing list is a domain incross-domain recommendation. Because of the large numbers of domains (mailing lists) inan email system, we propose the first cross-domain recommendation framework that canautomatically pick the set of source domains from a large number of domains.

As discussed above, our proposed broadcast message prioritization methods are highlyrelated to each other. It is worth noting the methods proposed above can even be furthercombined together to achieve better performance. For instance, the content-based approach

8

can be combined with the active-learning based collaborative filtering approach to achievebetter performance. The simplest way is to use some ensemble learning methods to generatea final prediction score by combing the prediction scores from these two approaches. Moresophisticated combination methods can be studied in the future as possible future directions.

1.3 Contributions

In this thesis, we focus on tweets and broadcast emails, two popular types of broadcast mes-sages. We study three broadcast message prioritization related research questions, includinghow to boost tweet diffusion by considering tweet prioritization(mention recommendation),how to incorporate collaborative filtering into broadcast email prioritization with activelearning and how to conduct broadcast email prioritization considering large numbers ofmailing lists with cross-domain recommendation.

1.3.1 Mention Recommendation Considering Tweet Prioritization

We propose the first in-depth study of mention feature in microblogs and the first approachto solve the mention recommendation problem[131]. Over 70 related works have cited our re-search, with many of them working on follow-ups of our proposed mention recommendationproblem.

We formulate the mention recommendation problem as a learning to rank problemwhich takes important aspects like tweet prioritization and user influence into considerationat the same time. New ranking features and a new diffusion-based relevance definitionare introduced. We also propose the first content-dependent user social tie model[131] torepresent user relations in social networks, which proves to be highly effective based on thecomparison experiments.

1.3.2 Broadcast Email Prioritization with Active Learning

We present the first personalized email prioritization framework tailored for broadcastemails[129]. It is also the first email prioritization framework that adopts the paradigmof collaborative filtering[129].

A novel active learning framework is proposed to handle the complete cold start problem.It is tailored for the one-class implicit feedback data by preferring to sample users with ahigh probability of providing positive feedback. It is the first active learning framework thattakes users’ response time into consideration by favoring to sample users who can providefeedback in time[129].

9

Figure 1.4: Thesis Organization Overview

1.3.3 Broadcast Email Prioritization with Cross Domain Recommenda-tion

We formulate the broadcast email prioritization considering large numbers of mailing listsproblem as a cross-domain recommendation problem and still adopt the paradigm of col-laborative filtering for broadcast email prioritization. During email prioritization, extrainformation can be learned from other relevant domains through the cross-domain recom-mendation model. The extra information can not only help with the cold start problem ofnew users and new mailing lists but also help to improve the overall prediction precisiondue to the additional knowledge introduced.

The proposed cross-domain recommendation model is the first one that can automat-ically select the set of source domains from large numbers of candidate domains[130]. Itcan also dynamically determine the number of source domains to be included based on thenature of the target domain.

1.4 Thesis Organization

As displayed in Figure 1.4, this thesis is organized as follows.In chapter 1, we introduce the background of broadcast message prioritization, including

the serious broadcast message overload problem and the importance of using prioritizationto ease the problem. We then propose three broadcast message prioritization-related re-search questions, focusing on two popular types of broadcast messages, tweets and broad-

10

cast emails. Finally, the major contributions of this thesis are summarized at the end of thischapter.

In chapter 2, we present the background knowledge and related works on research fieldsrelated to this thesis, including tweet prioritization, mention recommendation, email prior-itization, active learning and cross-domain recommendation.

In chapter 3, we introduce the first mention recommendation framework considering bothtweet prioritization and user influence. Mention recommendation problem is formulated asa learning to rank problem and extensive experiments are conducted on a real-life dataset.

In chapter 4, we study the broadcast email prioritization problem and propose a novelactive learning framework to incorporate collaborative filtering. The effectiveness of ourmodel is evaluated on a real-life email dataset.

In chapter 5, we propose a new cross-domain recommendation framework for broadcastemail prioritization considering large numbers of mailing lists and come up with a cross-domain recommendation approach that can automatically select the set of source domainsfrom a large number of candidate domains.

In Chapter 6, we summarize our contributions and discuss possible future research di-rections.

11

Chapter 2

Related Works

In this chapter, we provide a detailed review of the related research fields of the threemethods proposed in this thesis, namely tweet prioritization, mention recommendation formicro-blogging systems, email prioritization, active learning for collaborative filtering sys-tems and cross-domain recommendation. This chapter is organized as follows. In section 2.1,we survey the literature related to tweet prioritization and mention recommendation. In sec-tion 2.2, we introduce two categories of research works trying to solve the email overloadproblem, i.e, email categorization and email prioritization, which is related to our secondand third research works. In section 2.3, we provide a thorough survey of active learningmethods for collaborative filtering systems, which is related to our research on broadcastemail prioritization with single mailing list. In section 2.4, we review the cross-domain rec-ommendation literature, providing surveys on both domain selection and knowledge transferaspects, which are related to our third research work.

2.1 Tweet Prioritization and Mention Recommendation

With the launch in 2007, Twitter has become not only a highly popular web service butalso a thriving research area. Research in this thesis is related to tweet prioritization andmention recommendation in micro-blogging systems.

2.1.1 Tweet Prioritization

Due to the large volume of tweets published every day, users can easily suffer from the tweetoverload problem. Thus, lots of previous research focus on tweet prioritization[126, 31, 140,114] to solve the problem. By figuring out the importance of tweets to users, the proposedalgorithm is widely used on applications like tweet ranking and tweet recommendation.

Lots of previous tweet prioritization related, like the ones working on tweet recommen-dation or tweet ranking, are content-based, taking features like the topic of tweets, usersocial relations, user influence etc. into consideration. Learning to rank framework is widelyused in these works. In [126], the authors propose a personalized tweet ranking method,

12

leveraging the use of retweet behavior. A decision tree based method is used to predictusers’ likelihood of retweeting a tweet. Duanet al. proposes a new tweet ranking strategywhich uses not only the content relevance of a tweet but also the account authority andtweet-specific features such as whether a URL link is included in the tweet[31]. In [17],the authors propose a tweet ranking model considering the three major elements on Twit-ter: tweet topic level factors, user social relation factors and explicit features such as theauthority of the publisher and quality of the tweet.

Graph-based tweet ranking approaches are also widely used in tweet prioritization. In[137], the authors model the structure of Twitter as a graph consisting of users and posts asnodes and retweet relations between the nodes as edges. A variant of the HITS algorithmis proposed to produce a static ranking of tweets. Yanet al. propose a graph-based rankingmodel for tweet recommendation[136]. It exploits a co-ranking model to rank tweets andtheir authors simultaneously using several networks: the social network connecting the users,the network connecting the tweets, and a third network that ties the two together. In[140], authors extend the session-based temporal graph (STG) approach to perform tweetrecommendation, considering three types of features in Twitter: the textual information,the time factor, and the users’ behavior.

Collaborative filtering is rarely used for tweet prioritization, partly because when anew tweet is published, it is completely cold with no user interaction attached. How-ever, collaborative filtering based methods are used for user recommendation[114], URLrecommendation[16], hashtag recommendation[75] in Twitter, in which the recommendeditem is not completely cold-start.

2.1.2 Mention Recommendation

Mention is an innovative interaction mechanism first proposed by Twitter[131]. By using"@username" in the tweet content, one can mention the user in the tweet and raise theawareness of the mentioned user through various types of notifications triggered by themention behavior.

With the new mention mechanism comes a natural question, who should we mention ina tweet. Our work [131] is the first research work to propose a mention recommendationframework using a learning to rank approach, considering features like user interest match,content-dependent social relations, and user influence.

Our work results in a series of follow-up works by other researchers on the mention rec-ommendation problem. Tang et al. formulate the problem of locating mention targets whenposting promotion-oriented tweets as a ranking based recommendation task, and present acontext-aware recommendation framework as a solution[120]. In [46], the authors propose amention suggestion system that takes into consideration not only content of the tweet butalso histories of candidate users. A novel method that extends the translation-based modelis proposed to better capture the text feature. In [146], the authors propose a personalized

13

Figure 2.1: Email Categorization Example: Gmail categorizes emails into primary, socialand promotions.

ranking model with consideration of multi-dimensional relations among users and mentiontweets to generate the personalized mention list. Work [95] proposes a novel tweet propaga-tion model SIR_MF based on a multiplex network framework and the effects of mentioningis evalauted based on final retweet count. In [80], the authors cast the mention recom-mendation problem as a probabilistic problem and propose a method named PersonalizedMention Probabilistic Ranking to find out which candidate user has the maximal capabilityand possibility to help tweet diffusion by utilizing probabilistic factor graph model in theheterogeneous social network.

2.2 Email Overload and Email Prioritization

Email overload is formally defined as the situation in which users are left with the burdenof having to process a large volume of email messages of different importance, causing sig-nificant negative effects on both personal and organization performance[139]. Since emailoverload is causing a severe economic loss in productivity[19, 60, 97], it has been a popu-lar research area for years to solve the overload problem. Efforts from previous literaturecan mainly be classified into two categories, email categorization[142, 47, 28, 73] and emailprioritization[26, 30, 27, 3, 139, 108]. In email categorization methods, emails are catego-rized into different folders so that users can handle emails more efficiently. While in emailprioritization methods, the algorithm tries to directly predict the importance level of anemail and save users lots of time from handling the unimportant emails. Both of them willbe introduced in detail in the following sections.

2.2.1 Email Categorization

Email categorization stands for the task that classifies user’s emails into different categories(folders) to increase the efficiency in handling emails[47]. Nowadays, almost all the majore-mail service providers support automatic email categorization features. AOL was one ofthe first mail services to provide high-level stacks such as Daily Deals, Social Notifications

14

or Photos, and Attachments[142]. Yahoo mail offers Smart views, which provides searchfacets for messages, such as People, Social, Travel, Shopping, and Finance, as detailed in[47]. Gmail has also been offering various ways for users to scan their inboxes with its SmartLabels and Inbox Tabs[3].

Email categorization is a classic classification problem. Early works like [28] implementclassifiers like naive Bayesian classifier and decision trees to predict category label for emails.Both the text-based features (e.g. content of the title and email body) and social networkbased features (e.g. sender and recipient of the emails) are used in the classifier. In morerecent works like [142], authors begin to take temporal features into considerations. Forinstance, a shipping notification email usually comes after an order confirmation. Neuralnetworks models like multi-layer perceptron models and long short-term memory modelsare used to capture this type of temporal features.

For the classification labels, earlier works attempt at mimicking each individual user’spersonal classification habits by using folders built by individual users as labels for theclassifier[71]. This highly personalized approach exhibits two main weaknesses: it relies onsmall datasets as each individual inbox is analyzed independently, and it requires usersto have defined meaningful folders in the first place[47]. Most web services nowadays havechanged their approach towards email classification by offering pre-defined classes (e.g. socialnetwork, promotion) that are common to all users and the pre-defined email categorizationclasses for gmail are displayed in figure 2.1 as an example. On the other hand, researcherslike Koren et al. in [73], have analyzed popular folders, in order to identify not just a fewbut several thousand possible common class labels.

2.2.2 Email Prioritization

Email prioritization focuses on making a personalized prediction of the importance levelof non-spam emails [139, 65, 56]. Previous literature on email prioritization can be dividedinto two major categories based on the end goal of the algorithms, i.e. action predictionand priority label prediction, both of which essentially belong to a classification task. Wecompare the input features, predicted label, classifier, and datasets of the most importantworks in email prioritization field in Table 1 and introduces even more works in the followingsections.

It is worth noting Email Prioritization can be easily confused with spam detection,another important task related to email. Spam detection is commonly defined as the identi-fication of unsolicited bulk emails using data mining techniques[118]. Based on the featuresused, spam detection methods can be divided into two categories, namely, content-based de-tection and sender-based detection. Content-based detection is to identify spamming emailsaccording to the email content[110, 101, 116]. Sender-based detection is to identify spam-ming emails using the email sender information[76, 44, 121]. In contrast, email prioritizationworks on non-spam emails, which usually come from the acquainted senders or users’ self-

15

enrolled mailing lists. Moreover, email prioritization focuses on personalization. That is tosay different users may hold different importance perceptions towards the same email andwe predict users’ personal judgments of the importance levels of non-spam emails.

Action Prediction

Action prediction type of email prioritization methods tries to predict the action one userwill take on the email, like read, reply, delete and forward. Different types of actions indicatedifferent levels of priority or importance. For instance, emails that users reply to or forwardare usually with a high importance level.

Dabbish et al.[26] perform user studies to understand how and why users take differentactions towards incoming emails and come up with the conclusion that people’s ratings ofemail importance and the actions they take on specific emails are based on email messagecharacteristics (e.g. whether the email is work related) and characteristics of receivers andsenders. The analysis is based on a survey participated by 1100 users.

Dredze et al. propose a series of works [30] to reduce email overload by predicting whethersent and received emails necessitate a reply. General features like the text of email messagesand header features like senders and receivers are used along with some specifically tailoredfeatures like dates of emails, salutations (e.g. Dear John) and the indication of questions.A logistic regression based model is built and evaluated on a very small dataset collectedfrom 4 users.

Dotan et al. [27] propose the first large-scale study of usersâĂŹ actions in a majorwebmail service and identified that 85% of users’ emails are never read, which makes iden-tifying messages that users will act on bringing great value to the users. The authors designa classification framework using both local (i.e., individual user inbox-specific features) andglobal (i.e., at the mail population level) features to train and predict actions for all typesof users.

Priority Label Prediction

Priority label prediction type of email prioritization methods aims to predict the importancelabel for the emails directly. Some works formulate this problem as a binary classificationproblem (i.e., predicting whether an email is important or not), while there exist otherworks using more fine-grained importance label (e.g. 5-level importance label in [56]).

There are a limited number of research works focusing on email prioritization fieldbecause of the lack of public dataset, mainly due to privacy issue. Existing works[65, 89,56] are either works from big companies with their own e-mail services like Google andYahoo or works from researchers who collect a small dataset by themselves. All of theseworks formulate label prediction as a classification problem using classic models like logisticregression and SVM.

16

Table 2.1: Comparison of Different Works on Email PrioritizationPaper Input Feature Predicted Labels Classifier Dataset[3] Social features, content fea-

tures, thread features and labelfeatures

Binary importancelabel

Logistic regres-sion

Gmaildataset

[139] Header features (receivers,senders, cc etc.) and textfeatures (title and email body)

5-level importancelabel

SVM Self-collectedsmalldataset

[65] Header features (receivers,senders, cc, timestamp etc.)

Can be used for Bi-nary Classification

NA Emaildatasetfrom theuniversity

[27] Individual features from user’sinbox behavior and global fea-tures at the mail populationlevel

User actions onemails

Logistic re-gression withself-definedregularizers

YahooEmaildataset

[56] Header features (receivers,senders, cc etc.), text features(title and body text, thepast/present tense of text etc.)and email tailored features(presence of attachments etc.)

Binary label SVM Self col-lected datawith 1,500emails

Douglas et. al [3] propose a simple linear logistic regression model to do prioritizationfor Gmail, in which the final prediction is the sum of the global model and the user modellog odds. Four categories of features are considered in the model, including social features,content features, thread features and label features.

In [139], authors use personal social networks to capture user groups and to obtain richfeatures that represent the social roles from the viewpoint of a particular user. They alsodevelop a semi-supervised (transductive) learning algorithm that propagates importancelabels from training examples to test examples through the message and user nodes in apersonal email network.

A social clustering approach is proposed in [65] to predict the email prioritization basedon the relations between its sender and induced social clusters. Communities of interest(COI) which stands for groups of users that have a common bond are identified for priori-tization task. Neustaedter et al. [89] define metrics for measuring the social importance ofusers based on the email elements: from, to and cc, and user actions of replying and reading,which can potentially be used for measuring email prioritization. Horvitz et al. [56] regardthe email prioritization prediction task as a classification problem. They use Support VectorMachines to predict that whether the utility of newly arrived emails is high or low. In [109],the authors build a Naive Bayes classifier for the email prioritization task considering bothtextual and non-textual features from emails.

17

Figure 2.2: Active Learning Example from Netflix: In user registration process, a user isasked to rate at least three TV shows or movies.

However, none of previous works targets on broadcast emails by exploiting the collabo-rative filtering features come with broadcast emails.

2.3 Active Learning in Collaborative Filtering

A Collaborative Filtering based recommendation system uses either users’ explicit ratings(e.g. 5-star ratings for reviews) or implicit ratings (e.g view or purchase history) for itemsto recommend items that the target user has not yet considered but will likely enjoy[72, 36].Besides the recommendation model itself, the number, the distribution, and the qualityof the ratings known by the system can influence the recommenderâĂŹs performance. Ingeneral, the more informative about the user preferences the available ratings are, the higherthe recommendation accuracy[36]. Thus it is important to gather rating points containingextra useful information to obtain the best performance of the recommendation system.

The problem of how to obtain high-quality data that better represents the userâĂŹspreferences and improves the recommendation quality is called active learning. It is worthnoting active learning is a very popular subfield of machine learning and is widely researchedin lots of other fields where data may be not sufficient or expensive to acquire, like classifi-cation, clustering, information extraction and speech recognition[111, 96]. However, in thissection, we will only focus on active learning methods proposed for collaborative filteringbased recommendation systems as they are more close to the thesis topic.

As a thriving research field, there are large numbers of previous works on active learn-ing in collaborative filtering and there are numerous ways to classify those methods. In thissection, we borrow the categorizations from [36] and classify the methods into two big cate-gories, non-personalized, and personalized, and each of them is further partitioned into two

18

subcategories, single-heuristic and multi-heuristic, which will be discussed in the followingsections.

2.3.1 Non-personalized Active Learning Strategies

Non-personalized Active Learning strategies are a series of methods that do not considerpersonalization on user-level and the same set of items will be presented to all the users tolabel.

Uncertainty Reduction Non-personalized Strategies

The intuition behind this type of methods is that the controversial/uncertain items can bringin more information. For instance, if a movie is globally acclaimed or criticized, it usuallybrings in little information about a user’s personalized preference, because the user’s opinionis usually in accordance with the majority of users. However, if the opinion about a moviehighly diverges, the movie usually brings in lots of extra information by acquiring the user’spersonal opinion. Statistic features like variance and entropy are used by these methods.

In variance based methods[87, 122], the items with the highest rating variance will beselected with the intuition of high variance indicating high uncertainty of the item’s rating.Variance is defined as:

V ariance(i) = 1|Ui|

∑u∈Ui

(rui − r̄i)2 (2.1)

in which Ui stands for the set of users who have rated item i and r̄i stands for the averagerating for i.

In entropy based methods[87, 98], with entropy indicating the dispersion (uncertainty)of ratings for an item, items with high entropy in ratings will be sampled. For discreteratings, entropy is defined as follows:

Entropy(i) = −∑r

p(ri = r)log(p(ri = r)) (2.2)

in which r the rating which can range from 1 to 5 for a five-star rating schema or 0 to 1for a binary rating schema and p(ri) stands for the probability of a user rating item i withrating r.

It is worth noting the entropy-based methods may tend to select unpopular items withfew but imbalanced ratings, which jeopardizes the performance. To solve this, Entropy0based method[99] is proposed by assigning rating 0 to all the unrated items. In this way,unpopular items will have lots of rating 0 and thus a low entropy0 score.

Error Reduction Non-personalized Strategies

Even though choosing uncertain items intuitively makes sense, there is no guarantee that itwill help predict the precise ratings of the items, which is the end goal of recommendation.

19

The idea of reducing the error directly leads to a series of error reduction based strategies,whose aim is to choose items that can reduce the overall error rate of the system.

A greedy based strategy is proposed by [41] to choose items whose ratings once gatheredwill result in the lowest RMSE. The item set is selected in a greedy way. For each candidateitem, this strategy computes the reduction of RMSE, before and after adding the ratingof the candidate item to the training set. The RMSE metric over the training set can becomputed by adopting leave-one-out methodology. The item with the largest RMSE will beadded to the item set and the process is repeated to generate the whole item set.

In [81], for a given item, the authors try to actively select a set of users that can wellrepresent the whole population but with little taste overlap.

Attention-based Non-personalized Strategies

These algorithms try to select the popular items for users to rate. These items tend to befamous so that most users know them and can provide rating feedback. They are usuallyused as an initial method to solve cold start or as baselines for comparison[36].

Works [98, 122] propose active learning strategies to select the most popular items andsince users are likely to be familiar with those items, more ratings can be gathered. Butthe information gathered from these strategies may be limited and the extra ratings mayintroduce a bias towards popular items[98].

Another attention-based method [41] handles the problem by using the intuition thatcollaborative filtering relies on the items co-rated with other items and thus the strategytends to select items that are highly co-rated together with other items.

Non-personalized Multi-heuristic Strategies

Intuitively, there might be several heuristics useful for choosing the proper items for usersto label. There exist works trying to combine different heuristics and identify items usingmultiple heuristics.

[98] combines the random selection strategy with popularity-based selection strategy,which improves the number of gathered ratings compared to purely random strategy. [98]also provides another strategy to sample the items based log(popularity)*entropy. As men-tioned in the previous section, pure entropy-based method tends to sample unpopular items.Multiplying log(popularity) helps to ease this problem. Similarly, [99] proposes a strategycalled HELF using the harmonic mean of entropy and log of rating frequency, which alsotries to sample items that are both informative and popular.

20

2.3.2 Personalized Active Learning Strategies

Increase the Acquisition Probability

This type of strategies tends to increase the number of gathered ratings by predicting user’sfamiliarity with the items. Intuitively, presenting users items which they are not familiarwith is a bad idea because they will not be able to provide feedback.

[41] tries to select items that are most similar to the items that a user has rated. Similarto item-based collaborative filtering, Pearson Correlation is used to calculate the similarity.For items i and j, their Pearson Correlation is defined as:

sim(i, j) =∑u∈Uij

(rui − r̄i)(ruj − r̄j)√∑u∈Uij

(rui − r̄i)2 ∑u∈Uij

(ruj − r̄j)2(2.3)

Uij stands for the users who have rated both i and j and r̄i stands for the average ratingfor i.

A binary prediction method is proposed by [34], in which the model first transforms therating matrix into a binary rating matrix and then performs prediction for all the items.The binary rating can be interpreted as whether the user has experienced the item and theunrated items with the highest prediction scores are picked.

In [32], the authors propose an active learning strategy considering not only binaryratings but also using users’ demographic attributes like gender and age group. The selectionstrategy uses a hybrid collaborative filtering method to predict the acquisition probability.

Impact-based Strategies

Similar to the uncertainty reduction based methods, impact based strategies try to pickitems to minimize the rating prediction uncertainty for all the items.

[103] chooses the items whose ratings can impact the rating prediction of other itemsmost. For each user and item pair, the algorithm first predicts the rating rui and generatesr′ui = rui − 1. Two prediction models are generated, with one adding rui and the other oneadding r′ui to the training set, and the absolute values of the differences of their predictionsfor the ratings of all the items other than i are computed. The influence of i is estimatedby summing up all these differences and the items with the highest influence are selectedfor active learning[103, 36].

[85] proposes a method to choose items whose ratings have the highest impact on theprediction of other ratings. In a collaborative filtering based method, information is trans-ferred by co-rating the same item. As displayed in the left figure in Figure 2.3, we defineuser and item as nodes in a graph and rating as a path between user and item. To performcollaborative filtering recommendation, at least a four-node sub-graph should be created.In this sub-graph with users u1, u2 and item i1,i2, u1 has rated i1 and i2 and u2 has ratedi1, so that we can predict u2’s rating for i2. As shown in the right figure in Figure 2.3, an

21

Figure 2.3: Illustration of the Four-node Sub-graph

example of the aforementioned four-node sub-graph can be the sub-graph made up of nodesu1, u2, p1 and p2. The more sub-graphs are created, the better the recommendation resultwill be. [85] aims to select the item set that can help generate the largest number of newfour-node sub-graphs.

Prediction-based Strategies

In these methods, the active learning model first predicts users’ preference towards theitems and then chooses items based on the predicted ratings.

In [52, 63], the methods predict the ratings for items and pick items with high predictedratings. The authors use Aspect Model which is a probabilistic latent semantic model inwhich users are considered to be a mixture of multiple interests or aspects. So for user uand item i, the predicted rating is calculated as:

P (r|u, i) =∑z∈Z

p(r|i, z)P (z|u) (2.4)

in which each user u has a probabilistic membership in each of the aspects, z ∈ Z. Further-more, the model assumes that users in the same interest groups have same movie-ratingpatterns, thus, users and items are independent of each other given the latent class variable,z. Additionally, [52] also implements the flexible mixture model (FMM), which is a modifiedversion of aspect model considering two layers of latent aspects, one grouping the users withsimilar interests and the other grouping the items with similar patrons.

In [36, 35] the authors proposed two completely contrary item selection strategies. Inthe first strategy, the authors propose to select the items with highest predicted scores.The intuition is that a user should be familiar with the result and be happy about thepresented items for labeling because they are highly likely to be liked by the user. In thesecond strategy, authors propose to use the opposite heuristic, it chooses the items with thelowest predicted scores arguing that when a new user is registered, since the system has noor few ratings of that user, the model parameters computed for this user can be inaccurate

22

Figure 2.4: An Example of User Partitioning Based Methods: First three layers of a gener-ated decision tree.

and considerably different from the optimal ones. Therefore, it could be better to choosethe items whose ratings are not correctly predicted by the recommender[67].

Also in [67] authors propose a strategy to first perform matrix factorization on the ratingmatrix to get the latent factor for items and then choose items whose representative vectorshave the minimum Euclidean norm. The intuition of this strategy is that after active learninghas acquired a number of ratings and the prediction accuracy has achieved an acceptablelevel, it could be better to stabilize the prediction model and to avoid large changes of thelatent factors. In order to do that, the changes of the latent factors (gradient) should beminimized and it may result in a more stable prediction model[36].

User Partitioning Based Method

In this type of methods, users are first clustered and the item selection strategy aims tochoose items that can help to better reveal which cluster a target user belongs to.

In [99], users are first clustered based on the Pearson Correlation of their rating profiles.Then a decision tree is built with the clusters as the leaf nodes and items as the internalnodes. At each internal node, a user will provide feedback by rating the item. Hence, themost informative items are those that rating them will enable to better classify the user in

23

his representative cluster. Information gain, a classic criterion in building decision trees isused to choose the optimal set of items for the user to label.

In [43], the authors also propose a decision tree based method, in which all the nodesare representing groups of users. When building the tree, the internal nodes are dividedinto three groups of users based on the ratings they give to the item. Then for each of thesegroups, the rating predictions of their unrated items are estimated and RMSE is computedbased on the predicted ratings and ground truth ratings. The item which will generate thesmallest total RMSE error is chosen and then the process is repeated to choose the next itemfor labeling. An example with the first three layes of a generated decision tree is providedin Figure 2.4.

Personalized Multi-heuristic Strategies

Similar to the non-personalized multi-heuristic strategies, the following personalized activelearning strategies combine multiple heuristics to exploit all their advantages.

In [103], the authors combine influence based method with uncertainty based methodto choose items that both have large uncertainty in ratings and a big influence on affectingthe ratings of other items. The objective function is:

argmaxiV ariance(i)Influence(i) (2.5)

In [67], the authors try to combine the intuition of picking the items with lowest predictedratings and the items generating the minimum Euclidean norm. In the beginning, the itemsare selected mainly by minimum rating, which aims to reduce the overall error. When enoughratings are gathered, the minimum Euclidean norm becomes more important to keep therecommendation system stable.

In [147], the authors build user profiles adaptively according to the userâĂŹs responsesin the course of the interview process and the user profile ui is tied to the userâĂŹs responsesin the form of a function.

[34] proposes a strategy combining random selection strategy with rating predictionstrategy. When the user does not have enough ratings to generate predictions, randomselection strategy can be used to generate item set for the active learning method.

Another way to combine multiple active learning strategies is to use a voting schemain which a score for an item is defined as the number of votes given by a committee ofn strategies[33]. Though rarely applied to recommendation problem, batch mode activelearning has been an emerging area for classification tasks. In [15], authors propose a novelbatch-mode active learning approach for classification problems that selects a batch of queryinstances such that the distribution represented by the selected query set and the availablelabeled data is closest to the distribution represented by the unlabeled data with the helpof Maximum Mean Discrepancy(MMD). Hoi et al. [54] apply Fisher information matrix to

24

Figure 2.5: Categorization of Active Learning Strategies

select a set of informative instances. Guo [49] choose a batch of query samples based onmaximum mutual information with the unlabeled set of data. Yu et al. [141] select a set ofinstances closest to the basis vectors that approximate the set of unlabeled data.

As summarized in figure 2.5, we have introduced different types of active learning strate-gies. However, most of the above-mentioned methods are designed for the explicit ratingprediction problem. The challenges of one-class implicit feedback in email prioritizationtask (e.g. the uncertainty in negative feedback and the informativeness difference betweenpositive and negative feedback) has not been discussed and cannot be effectively handledby previous methods.

2.4 Cross-domain Recommendation

Nowadays, users provide feedback for different types of items like books, movies, musicand express their opinions on different social media and service providers like Facebook,Twitter, Amazon. An intuitive idea is to leverage all the available personal data providedin different domains (e.g. book, music or Amazon shopping reviews) to generate betterrecommendations[39, 100].

Cross domain recommendation systems adopt different techniques to transfer learningfrom source domains (e.g. book) to target domains (e.g. movie) in order to alleviate thesparsity problem and improve the accuracy of recommendations[130]. Different from largenumbers of previous literature[39, 100, 119, 91, 130, 78, 12, 10, 77, 133] focusing on how to

25

Table 2.2: Domain Definition and Selection Strategy of Previous Works Part 1DomainNotion

Domains Used Source Domain Selection Strategy Reference

Item At-tribute

Book categories Choose top k most popular categories assource domains.

Cao etal.[12]

Movie genre Domains are defined as different categoriesof movies. All domains except for the oneused as target domain are used as sourcedomains and a similarity weight betweeneach source domain and target domain iscalculated based on both rating and con-tent features.

Berkovskyet al.[10]

Similar domains like action and thriller arefirst combined to single representative do-main based on expert knowledge. All rep-resentative domains are used as source do-mains.

Lee et al.[77]

Review ratings fromdifferent categories ofservice in Yelp

Choose the best target-source domain pairby using a Canonical Correlation Analysisapproach.

Sahebi etal.[106]

Item

Books and movies(BookCrossing andMovieLens)

Manually assigned. Li et al.[79]

Books and movies(LibraryThing andMovieLens)

Manually assigned. Zhang etal.[143]

Books and movies(Douban books andNetflix movies)

Manually assigned. Zhao etal.[144]

Books, music, andvideos from Amazon

Domains are manually picked and multi-ple domains are used as source domains. Atensor factorization model is proposed tomake the recommendation domain-aware.

Hu et al [57]

Books, movies, music,and TV shows fromFacebook likes

The 4 domains are 4 most popular typesof entities in Facebook like data. All theother domains except the one used as tar-get domain are used as source domains.

Shapira etal.[125]

The authors try to figure out for a giventarget domain which source domain is thebest one by trying out all the possibletarget-source domain pairs in an offlineevaluation setting. The approach is feasibleonly because the total number of domainsthe authors used is extremely small.

Tiroshi andKufli[124]

Music and tourism-based knowledgegraph data

Manually assigned. FernÃąndez-TobÃŋas etal.[38]Kaminskaset al.[66]

26

Table 2.3: Domain Definition and Selection Strategy of Previous Works Part 2Domain Notion Domain Category Domain Selection Strategy Reference

System

Books, movies, music,and TV show infor-mation from 6 socialnetworks

Information from all the socialnetworks is used together in therecommendation phase. Com-parison studies are also done toevaluate which social networkcontributes most in recommen-dation.

Tiroshi etal.[124]

Movies (Netflix) The authors split the Netflixdataset based on users andtreat the split dataset as datafrom two systems to performcross-domain recommendation.

Cremonesi etal.[25]

Music (tags) fromBlogger.com andLast.fm

Manually assigned. Stewart etal.[115]

Tags from Delicious,Flickr, StumbleUpon,and Twitter

Manually assigned. Differentsource domain and target do-main pairs are tried out.

Abel et al.[2]

transfer knowledge between different domains, this section will also focus on reviewing animportant but often overlooked field, domain definition and domain selection.

2.4.1 Domain Definition

Distinct notions of the domain in cross-domain recommendation have been considered inprevious literatures[39, 119]. For instance, some have treated items like movies and booksas belonging to different domains, while others have considered items such as action moviesand comedy movies as from different domains. In [78], the authors classified domains intothree categories, system domains, data domains, and temporal domains. System domainsare the different datasets upon which the recommender systems are built, and in which somekind of transfer learning is performed. Data domains are the different representations of userpreferences, which can be implicit (e.g. clicks, purchases) or explicit (e.g. ratings). Finally,temporal domains are subsets in which a dataset is split based on timestamps[39, 78].

However, in this thesis, we use the domain classification standard proposed in [100], inwhich domain is classified based on the attributes and types of items.

Item Attribute Level In this type of domain setting, items are of the same type andconsidered to be in different domains if they have different attribute values. For in-stance, science fiction movie and romantic movie may belong to different domains inthis domain setting because they have different attributes in genre.

27

Item Level In this domain setting, items of different domains are of different types andthey may share some attributes. For instance, books and movies are from differentdomains in this setting and sharing attributes like genre and language.

System Level In this setting, items are of the same type but come from different systems,for instance, movies rated in IMDB and movies watched on Netflix.

According to [100], most papers consider domains at item level (55%) and system level(24%).

2.4.2 Domain Selection

In this section, we summarize all the domain definition and domain selection strategy fromprevious works in Table 2.2 and Table 2.3. As displayed in the tables and also based onthe previous research[100], the most frequently addressed domains are movies (75%), books(57%), music (39%) and TV (18%). Pairs of domains are frequently studied and movies arefrequently used as the domain and often paired with books (33%), music (19%), and TV(7%).

To the best of our knowledge, how and why to select those domains are rarely discussed.Domains are usually selected manually based on human intuition or expert knowledge.Questions like how we can be sure books will be the source domain for movies and howmany source domains we should use for a target domain is rarely touched. There exist onlya very limited number of previous works discussing domain selection[124, 106]. In [124], theauthors try to predict the source domain by trying out all the possible target-source domainpairs in an offline recommendation evaluation setting, which obviously cannot work whenthere exist large numbers of domains. In [8], the authors aim to find the most similar sourcedomains by exploiting the similarity of text features based on tf-idf, which is ingeneralizablein collaborative filtering where the data is in rating format. In [106], the authors propose thefirst work to predict the target-source domain pair considering a large number of domainpairs in cross-domain recommendation systems by using a Canonical Correlation Analysisapproach. However, none of the previous works addresses the problem of selecting the setof source domains for a target domain.

2.4.3 Knowledge Transfer Across Domains

One important aspect of cross-domain recommendation is to transfer knowledge from sourcedomains to target domain to provide extra information for the generation of better recom-mendations. There are three major categories of approaches to transfer knowledge betweendomains, aka. linking domains, sharing latent features and transferring rating patterns[25].Though this is not the major concern or technical contribution of my thesis, in the followingsections, they will be introduced in detail.

28

Linking Domains

In this type of cross-domain recommendation frameworks, common attributes and knowl-edge between domains are used to establish connections, e.g., shared users and items, itemattributes, semantic networks, association rules etc[25].

The most simple case is that different domains share the same set of users and thesame type of preference representation. For instance, a cross-domain system in which eachmovie genre is defined as a domain and the same set of users have ratings across differentdomains. In [9, 10], Berkovsky et al. designed a cross-domain recommendation strategythat combines rating matrix from source domains and target domain into a unified ratingmatrix and applied collaborative filtering to the new rating matrix. The accuracy of thetarget domain is improved after aggregating the additional ratings from source domains.

In a more realistic case, the cross-domain strategy may rely on shared users or items.The major assumptions are there is an overlap of users or items across domains and user oritem similarity spans across domains. Shapira et al. [112] produce a set of candidate usersin the source domain based on k-nearest neighbors and use them in the target domain toprovide the recommendation for cold start users. Tiroshi et al.[125] use random walk to findnearest neighbor sets from source domains and use them to generate recommendations inthe target domain. Similar techniques can also be used in sharing common item scenario, in[115], the authors provide tag recommendation by leveraging the shared tags between twosystems.

However, in the case where there are no overlapping users or items across domains, extrainformation can be introduced to make the cross-domain recommendation work. In [20], thealgorithm recommends items in the target domain that are with the same attributes as itemsthat the user likes in the source domain.

Another popular usage of extra information is knowledge base like Wikipedia and DB-pedia. In [83], Loizou uses Wikipedia to relate user preference across multiple domains. Agraph is built in which each node is a Wikipedia concept (page) that represents items likedby users and hyperlinks between Wikipedia pages and users’ ratings (likes) of items are rep-resented as edges. A Markov chain model is used to produce the recommendation results.Kaminskas et al[38, 66] use the knowledge graph/knowledge base, DBpedia to build linksbetween items in the source domain and target domain. A constrained spreading algorithmis used to rank and filter items based on the designed graph.

Additional methods including using association rules like in [11], the authors build as-sociation rules to connect user preference to user personality types and transfer knowledgebetween domains based on the personality types.

29

Sharing Latent Features

In collaborative filtering methods, users and items are typically embedded as latent factorswhich are with a denser representation. In sharing latent feature methods, latent factorsshared between domains are exploited to support cross-domain recommendations[25].

Pan et al.[93] propose a method with the assumption that latent factors in source do-mains and target domain should be similar. In their approach, user and item latent factorsare first generated based on SVD in source domains. In target domain, the user and itemlatent factors from source domains are added as a regularization constraint to transfer theknowledge from source domains to the target domain.

Pan et al.[92] also propose a method in which latent factors from source domains andtarget domain are learned simultaneously. The authors propose a tri-factorization methodthat factorizes the rating matrix in three parts: a user-specific latent feature matrix, anitem-specific latent feature matrix, and a data-dependent core matrix. The rating matrix ofsource domain and target domain are factorized at the same time and user and item latentfactors are shared across domains for information transfer. However, this method requiresusers and items from source and target domain to be identical which can be infeasible inmany real-world cases.

In [57], Hu et al. argue that the traditional user and item matrix cannot fully capturethe heterogeneity of items, so the authors extend original user-item matrix factorizationframework into a user-item-domain tensor factorization method. Rating matrix from severaldomains are simultaneously decomposed into shared user, item, and domain latent factors,and genetic algorithm automatically estimates optimal weights of the domains.

Transferring Rating Patterns

This type of methods is with the intuition that even if the source and target domains donot share overlapping users and items, latent correlations may exist between preferences ofgroups of users for groups of items, which are named as rating patterns[25]. Those latentrating patterns build connections between domains and transfer knowledge. For instance,across book and movie domains, there will always be a group of users interested in sciencefiction.

In [79], Li et al. propose a method that simultaneously co-clusters users and items inthe source domain, to extract rating patterns. The intuition is that users with similar tastesor items with similar attributes usually behave very similarly. If users and items can bewell clustered, a much more compact user-item rating matrix, which only comprises therepresentatives of all the user/item clusters, is able to summarize and compress the originalredundant rating matrix. As a result, we only need the cluster indicators for the users anditems to recover the original rating matrix. This is why the transferred knowledge is calleda codebook. Clustering is performed using a tri-factorization of the source rating matrix.

30

In the target domain, the clustering membership is calculated based on solving a quadraticinteger programming problem by minimizing the quadratic loss.

In [88], Moreno et al. extend the codebook idea to a multi-source domain scenario, inwhich source domains are generated based on weighted linear combinations of codebooksfrom multiple source domains.

One of the major motivation of the codebook (rating pattern) transferring method isthat they can transfer knowledge even if two domains do not have any overlapping usersor items. However, Cremonesi et al.[23] have partially disproved by showing that this typeof methods does not transfer knowledge when source and target domains do not overlapand the increase of accuracy measured is due to a pitfall in the evaluation procedure asit compares a user-based kNN algorithm (before the injection) with a matrix factorizationalgorithm (after the injection) and matrix factorization is known to outperform kNN, whichcasts doubts on this type of methods.

31

Chapter 3

Mention RecommendationConsidering Tweet Prioritization

In this chapter, we focus on the mention recommendation task which is highly related totweet prioritization. Mention recommendation aims to recommend the optimal set of usersto be mentioned in a tweet to maximize its diffusion. Tweet prioritization is an importantaspect to be considered because users who think a tweet important have a higher probabilityof retweeting the tweet. Additional aspect like user influence also need to be modeled at thesame time. A learning to rank model is proposed to take all these aspects into consideration.The background of the mention recommendation problem, the design of the learning to rankmodel, and the experiment evaluation results will be introduced in detail in the remainingsections of this chapter.

3.1 Background and Overview

With more than 140 million active users and over 500 million messages posted per day,Twitter has become one of the most influential media for spreading and sharing break-ing news, personal updates and spontaneous ideas. In micro-blogging systems like Twitter,users tweet about any topics within the 140-character limit and follow others to receivetheir tweets. Furthermore, with retweeting (forward a tweet), information can be effectivelyrelayed beyond adjacent neighbors, virtually giving every user the power to spread infor-mation broadly.

However, recent studies [5][134][145] show that the diffusion power of tweets from differ-ent users varies significantly: 0.05 percent of Twitter users attract almost 50 percent of allattention within Twitter and the spread of a tweet from an ordinary user is rather limited,with an average retweet rate of 0.11. This suggests a very limited diffusion for most tweets.

Fortunately, as a new feature in micro-blogging systems, Mention can help ordinaryusers to improve the visibility of their tweets and go beyond their immediate reach insocial interactions. Mention is enabled in a tweet by adding "@username". By using the

32

Figure 3.1: After getting mentioned, a user can receive numerous types of notificationsfrom Twitter, including in-app notifications, push notifications, SMS notifications and emailnotifications.

mention mechanism, the tweet is still sent to all the followers of the author, but all theusers mentioned in a tweet will additionally receive a mention notification (e.g., by pushnotification or by an e-mail ) and are able to retrieve the tweet from their personal mentiontab. By using Mention, one can draw attention from a specific user, or highlight a place ororganization anytime. Properly using mention can quickly help an ordinary user spreadinghis tweets:

1. By mentioning a non-follower of the tweet author, the non-follower may retweet it tohis followers and spread the tweet to a new group of users, which usually leads tofurther cascade diffusion.

2. By mentioning a follower of the author, the mention serves as a useful notification,especially when the follower follows a large number of other users and a tweet canbe easily swamped in the enormous number of tweets. It’s also critical for a tweet tobe viewed promptly as 25% replies to a tweet happen within 67 seconds, 75% within17 minutes and 75% message flow lasts less than an hour [138]. So, without propernotification, a tweet may easily be neglected as one’s followers fail to read it in time.

Despite the significance of the mention feature, to the best of our knowledge, MentionRecommendation is seldom studied in previous works. To better help an ordinary user spreadtheir thought in Micro-blogging systems, we propose a novel Mention Recommendationalgorithm named whom-to-mention, in which we help a tweet to reach more people byrecommending proper users to be mentioned before publishing it.

33

The recommendation can be formulated as a ranking problem. Traditionally, one canrank users based on the similarity between a tweet and a user’s profile (e.g. the aggregationof all the tweets posted by a user) and recommend the top-ranked users to be mentioned.However, there are several challenges which make the traditional recommendation methodsfail:

Information Diffusion Based Relevance: In classic information retrieval tasks (e.g. TRECadhoc retrieval tasks), relevance is usually interpreted as topical relevance, whichstands for to what extent the topic of a result matches the topic of the query. How-ever, the goal of mention recommendation is to find candidates who can help spreada tweet. Instead of topical relevance, the power of information diffusion should beconsidered in the definition of relevance.

Content-dependent User Relationship Model: In traditional social network recom-mendation, user relationship is usually modeled as a weighted graph with edges indi-cating the bonds between two users based on explicit social relationship. The inter-active functions (e.g. retweet, reply, mention) in micro-blogging systems allow us toadopt the implicit network derived from user’s interactive behaviors to achieve moreprecise user relationship predictions. Moreover, it brings in new features for model-ing user relationship, as users’ interactions are usually content (topic) related, whichmakes the user relationship model content-dependent. For instance, a user may se-lectively retweet sport news from another user while ignoring other contents such asmovie comments from the same user. Modeling the content-dependent user relation-ship based on the implicit network of user interactions thus remains a challenge.

Recommendation Length Restriction: Due to the strict length restriction of a tweet,only a small number of users can be mentioned in a tweet. Moreover, a tweet mention-ing a lot of users is likely to be treated as a spam tweet, which will decrease others’interest in retweeting it. Thus, to accomplish the mention recommendation task, thealgorithm needs to be optimized for mentioning only a small number of users.

Recommendation Overload Problem: Traditional recommendation systems such as thoseused in Amazon may recommend one item to large numbers of users, which resultsin popular products. However, in the mention recommendation system, a user beingrecommended too many times will suffer from the severe mention overload problems.Tons of mention notifications will not only interrupt user‘s daily use of microblogs butalso result in frustration and decrease user’s interest in retweeting.

To cope with all the above-mentioned challenges, whom-to-mention is proposed. We usea machine learning approach to train a ranking model which consists of three parts: rankingfeatures, relevance, and a ranking function [29]:

34

As mentioned earlier, multiple aspects like tweet prioritization (i.e., whether a userwill consider the tweet interesting or important) and user influence need to be considered.Intuitively, a user who thinks a tweet important will be more likely to retweet it and theretweet from a more influential user will results in larger diffusion of the tweet. To model theaforementioned aspects, we adopt a series of new features in our proposed learning to rankframework, including: the match of the given tweet and interest profiles of the recommendedusers, the user relationship between the recommended users and the author of the tweet, andthe influence of the recommended users. Furthermore, we manage to model user relationshipbased on the implicit network derived from user retweet interactions, which we name as usersocial tie model. We take advantage of the content-dependent feature of user social ties andmake use of the content feature of the tweets one user has retweeted from another in a usersocial tie.

Instead of the classic topical relevance model, the relevance in whom-to-mention is rede-fined as the potential diffusion a user may bring to a tweet, estimating by the expectationof user coverage, which will be further explained in section 3.3.2.

A Support Vector Regression (SVR) based ranking function is then trained to calculatethe relevance of a candidate user to a tweet and ranks the most relevant candidates on thetop of the recommendation list. Constraints are carefully designed in the ranking processto avoid the recommendation overload problem.

It is worthwhile to highlight the following three aspects of our whom-to-mention recom-mendation scheme.

1. We present the first in-depth study of mention feature in microblogs by resolving theessential problem of whom to mention. Instead of passively waiting to be retweeted byothers, our novel recommendation scheme allows users to improve the diffusion of theirtweets by reaching out to the right person with the help of mention recommendation.

2. We formulate the mention recommendation as a ranking problem and to find themost proper users to be mentioned, a ranking function is learned with a novel in-formation diffusion based relevance, incorporating with new features including userinterest match, user social ties, and user influence. We model user relationship basedon the implicit network derived from user’s retweet interactions and fully exploit itscontent-dependent features.

3. Our method is thoroughly evaluated on a real-life dataset. Whom-to-mention algo-rithm is proved highly effective compared against a large number of baselines. Weanalyze how different features affect the recommendation performance with carefullydesigned comparison experiment. New issues like the recommendation overload prob-lem are evaluated and discussed.

35

3.2 Problem Definition

We formulate whom-to-mention as a learning to rank task consisting of a set of users, U ,each of whom maintains a user interest profile and a user influence profile. For a user,u ∈ U , a user interest profile ru, consists of a set of descriptive attributes and tf-idf featuresextracted from a modified bag of words model used on u’s recent tweets. A user influenceprofile su is made up of attributes related to user’s influence on Twitter. For users, u, v ∈ U ,there exists a social tie tieu,v based on the retweeting interactions between u and v, whichincludes a scalar attribute indicating the strength of bonds between u and v and tf-idffeatures extracted from the tweets u retweets from v. A query q consists of tf-idf featuresextracted from a specific tweet.

For each tweet q from user u, we would like to rank all the other users v ∈ U − u basedon features including user interest match, user social ties, and user’s influence, so that therelevant candidates ranks above non-relevant candidates.

3.3 Recommendation Framework

The key of whom-to-mention is to rank the candidate users given a specific tweet and we usea machine learning approach to train a ranking model for our recommendation task, whichis made up of three parts: ranking features, relevance, and a ranking function[29]. Rankingfeatures include all the attributes useful to model important aspects to be considered forthe mention recommendation task, like tweet prioritization and user influence. Based onour recommendation task, relevance refers to the potential diffusion a user could bring to aspecific tweet. A ranking function is a machine learning model which predicts the relevancegiven observable ranking features. We will discuss the detail of the three parts in this section.

3.3.1 Ranking Features

User Interest Match

The match of a tweet and the candidate’s interest is an intuitively important feature formodeling the tweet prioritization aspect of the mention recommendation task. When men-tioning a candidate in a tweet, a candidate interested in it is more likely to retweet it.

To calculate the match, the largest challenge is to generate the user interest model onmicro-blogging systems, which differs from traditional user interest models because users’tweets are limited to only 140 characters in length, covering a wide variety of topics, aswell as often presented with shorthands and special formats such as hashtags. Moreover,the nature of our recommendation task requires capturing more detailed aspects of interest.For instance, a football fan may be assumed interested in sports based on topic modelingtechnics like LDA. However, it is not a good interest match, if we mention the football fan

36

in a tweet talking about a basketball match, even though both football and basketball canbe categorized as sports.

Based on previous studies [55], topic modeling techniques like LDA may not fit theshort-length, ambiguous, noisy text feature on Twitter. Consequently, we use a modifiedbag-of-words model to generate the user interest model.

To begin with, we aggregate a user’s recent tweets. For a candidate user u, we define duas the set of recent tweets for u; in this work, we will assume that du is u’s 1000 most recenttweets. We also extract the words from the hash tag topics, which we name as hu and theyare important because they are usually used to identify a topic or an event. Besides thetweets, we also consider all the attributes from the user profile page, including user’s fullname, the location, a short biography, and tags. For a user u, we choose the short biographyfeature fu and tag feature tagu for the interest modeling. A user interest profile ru is thendefined as ru = {du, hu, fu, tagu} and R = {ru|u ∈ U}. In this way, R provides us the basisfor user interest modeling.

To cope with the short noisy text, we first analyze around 50,000 hot short queries(popular words or phrases) based on the latest search engine query log covering a lot of newwords and words in short-hand format and we denote these words as Dict. In this way, apopular phrase like "Big Bang Theory" is considered as a word in Dict. We filter the textin R, eliminating all the stop words, only keeping a word if it’s either identified as a nounor a word from Dict.

The name entity recognition for tweets is conducted with the help of ICTCLAS 1 (atoolkit used for word split and name entity recognition) and the query log is provided bySogou 2 (a leading search engine company in China).

Given a query (tweet) qu from user u, we apply the same word parsing strategy asmentioned above and represent qu and R as tf.idf-based term vectors, which are furtherused to estimate the user interest match. Because there are large numbers of candidatesusers with millions of tweets, to calculate the match score efficiently, we use Lucene, aproven, fast, robust and scalable indexing and retrieval tool widely used for informationretrieval tasks. The match score between a tweet qu and a user interest profile rv in Luceneis defined as:

iscore(qu, rv) = coord(qu, rv) · queryNorm(qu)·∑t∈qu

(tf(t ∈ rv) · idf(t)2 · norm(t, rv)) (3.1)

1http://ictclas.org/

2http://www.sogou.com/labs/dl/w.html

37

http://ictclas.org/

http://www.sogou.com/labs/dl/w.html

The tf(t ∈ rv) correlates to the term’s frequency :

tf(t ∈ rv) = √nt,rv (3.2)

where nt,rv is the frequency of term t in rv and normalization of the document length isdefined in norm(t, rv) for efficiency consideration.

idf(t) stands for the Inverse Documentary Frequency defined as:

idf(t) = 1 + log( |R||r : t ∈ r|) (3.3)

norm(t, rv) is a normalization factor defined as:

norm(t, rv) = lengthNorm(rv) · boost(t) (3.4)

where lengthNorm is a length normalization factor which ensures short document con-tributes more to the score. We also consider boost factors that terms from different sourcesown different weights (e.g. a term from tagu is more important than one from du). Accordingto evaluation on training data, we set the boost boost(t) as:

boost(t) =

2 if t ∈ hu, fu, tagu1 if t /∈ hu, fu, tagu

(3.5)

coord(qu, rv) is a score factor based on how many query terms are found in documentrv and queryNorm(q) is a normalization factor used to make scores between queries com-parable. They are implemented using Lucene’s function which detail can be found in thedocumentation here 3.

User Social Tie Modeling

User relationship also plays an important role in the tweet prioritization aspect of the men-tion recommendation task, because social network users usually consider tweets from anacquainted user more important and an acquaintance is usually more likely to retweet com-pared with a total stranger. Previous studies [74][62] mainly study explicit social connectionsbased on the follow relationship of Twitter. However, according to a study of Facebook [1],people only communicate with a few of their explicit declared friends. So modeling userrelationship based on some implicit networks can be better indicators of the actual socialrelationships between users.

3http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/core/org/apache/lucene/search/Similarity.html

38

http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/core/org/apache/lucene/search/Similarity.html

http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/api/core/org/apache/lucene/search/Similarity.html

In our work, user relationship model is based on implicit connections derived fromusers’ retweet activities in micro-blogging systems, which we name as user social tie model.Though lots of work on retweet behaviors have been done, they are usually in the informationdiffusion perspective instead of modeling user relationships [5][6][13].

We make two assumptions in modeling user social ties. First, user social ties can bederived from the retweet interactions between two users and frequency of interaction canbe used to quantify the strength of a social tie. Second, the social tie between two usersis content-dependent. Thus in our model, a user social tie consists of three parts: nodesincluding the two users of a tie, a strength score indicating how strong two users are bondedin a tie and a content vector indicating topics the user interested in retweeting. The detailare explained as follows.

For users u, v ∈ U , we define tweet set rtu,v as:

rtu,v = {tw|tw is a tweet u retweets from v} (3.6)

We define the social tie strength as stru,v

stru,v = |rtu,v| (3.7)

We pre-process the text features in rtu,v using the same method as with pre-processingusers’ interest profiles and define user social tie between user u and v as:

tieu,v = {rtu,v, stru,v} (3.8)

It is important to notice that tieu,v 6= tiev,u. Given a tweet qu from user u, we cancalculate the user relationship score by multiplying the strength of the social tie with thesimilarity between qu and rtu,v:

tscore(qu, rtu,v) = stru,v · coord(qu, rtu,v) · queryNorm(qu)

·∑t∈qu

(tf(t ∈ rtu,v) · idf(t)2 · norm(t, rtu,v)) (3.9)

All the factors in formula 3.9 are defined the same as in formula 3.1.

User Influence Modeling

Intuitively, user influence is also important to the performance of our recommendation task.If two users are both likely to retweet the tweet, the more influential one could help it reachmore people by initiating a larger cascade of retweets. Given a user u, we summarize a seriesof statistical indicators which may indicate his influence in Tabel 3.1.

We can define u’s influence profile su as:

39

Table 3.1: Statistical Indicators on Modeling User InfluenceDenotation ExplanationFollower(u) The number of followers of user u, one of the most popular

metrics on estimating a user’s influence.Avg_retweet(u) The average number of retweets for each tweet from u.Avg_reply(u) The average number of replies for each tweet from u.Avg_coverage(u) The average number of users a tweet from u can reach. The

coverage of a tweet is defined in detail in section 3.3.2.

su = {Follower(u), Avg_retweet(u),

Avg_reply(u), Avg_coverage(u)}(3.10)

3.3.2 Relevance

In traditional text retrieval tasks (e.g. search engine retrieval tasks), relevance usually refersto the topical match between the query and the document[29]. When interpreted in this way,we can always rely on editors to manually assess the relevance based on their experienceand expertise. However, when it comes to our mention recommendation task, editors haveto compare a tweet (query) with user profiles made up of thousands of tweets and analyzehundreds of content-based user relationship bonds, which makes the process time-consumingand result inaccurate.

Instead, we can calculate the relevance based on user behavioral information. Our rec-ommendation scheme aims to spread a tweet to more people by mentioning proper users init. So we can define the relevance between a tweet and a user as the diffusion the user bringsto it. Intuitively, the diffusion can be easily estimated by how many retweets a user initiatesafter retweeting the tweet. However, for ordinary users, the retweet cascades of their tweetsare usually very small. For instance, given a tweet, if one user can results in 3 retweets eachby a user with 100 followers and another user brings it 2 retweets each by a user with 1000followers, the latter user obviously helps it to reach more people (2000 vs. 300). Thus, it’smore accurate to estimate the relevance based on the number of users a candidate helps thetweet to reach, which we name as coverage. We denote the relevance of query q and user vas rel(q, v) and define it as:

rel(q, v) = {∑

Follower(u)|

u ∈ the retweet cascades initiate by v}(3.11)

3.3.3 Ranking Function

Many machine learning models can be used as a ranking function for our whom-to-mentionrecommendation task. We adopt a machine-learned ranking function based on support vec-

40

tor regression (SVR) because it is a proven regression algorithm which is adaptive to complexsystems, robust in dealing with corrupted data and with a good generalization ability [135].

Given a query qu from user u and a candidate match v, we use SVR to compute a scoreto serve as the relevance rel(qu, v). We define xqu,v as the feature vector corresponding tothe pair (qu, v).

xqu,v = {iscore(qu, rv), tscore(qu, rtv), sv} (3.12)

The set of training data is as {(x1, y1), ..., (xn, yn)}, where xi ⊂ Rm stands for the featurevector for a pair of query and candidate in which m is the number of feature dimensions,and yi ⊂ R stands for the corresponding relevance value.

A generic SVR estimating function is with the form as:

f(x) = (w · φ(x)) + b (3.13)

w ⊂ Rm, b ⊂ R and φ stands for a nonlinear transformation from Rm to high-dimensional space. The core goal of SVR is to learn the value of w and b to minimizerisk of regression.

Risk(f) = Cn∑i=0

L(f(xi)− yi) + 12 ||w||

2 (3.14)

L(·) is a loss function and C is a constant used to determine penalties for estimationerrors which is determined with grids search and cross-validation techniques. We experi-ment the performance of different kernel functions and choose kernel function with the bestperformance (RBF kernel). Detail of SVR can be found in [113].

3.3.4 Recommendation Overload Problem

One new issue of our whom-to-mention task is that the recommendation results may con-centrate on a few popular users, which causes mention overload (users get too many mentionnotifications because of the mention recommendation). Moreover, different users may re-spond differently to the overload. For instance, some users may not want to receive anymention notification from the recommender at all, while some others may feel okay even ifmentioned 100 times in a day.

In our recommendation framework, we carefully cope with this problem by allowingusers to freely set an up-limit of recommended times per day. After ranking phase, allthe candidates with recommendation up-limit reached are eliminated and the top k of theremaining candidates are then recommended. In the real application, within a day, ourrecommendation scheme follows a first publish, first to choose policy and recommend thenext best candidate once a user’s recommendation up-limit is reached.

In our evaluation, since our test tweets are published over a period of time, we set theup-limit for mentioning at 25, which as a matter of fact, is a quite strict constraint.

41

3.4 Experiment Setting

We design the experiments with 3 goals:(1) To evaluate how our proposed algorithm per-forms compared with other baseline algorithms;(2)To test how different features we consid-ered affect the recommendation performance;(3) To consider how new challenges like therecommendation overload, affect the performance of our algorithm.

The key challenge of the experiment design lies in evaluating the information diffusion(coverage of users) resulted by mentioning a user in a tweet. Instead, we make an approxi-mate estimation using the user’s retweet behavior. For example, if user A retweets a tweet tand helps t reach 500 people, it’s reasonable to assume that A will retweet it if we mentionA in t. So in our evaluation, by mentioning A in t, the user coverage A brings to t is 500.If user B has never retweeted t, we assume B will not retweet t when mentioning him in tand the user coverage B brings to t thus is considered to be 0.

3.4.1 Data Collection

We collected data from Sina Weibo, a Twitter-like micro-blogging system in China withmore than 400 million registered users and over 100 million messages posted per day. Dif-ferent from Twitter’s API, which is restricted in retrieving mention and retweet timelines,Weibo’s API allows us to get all the tweets from a user’s different timelines. Moreover, weobtained authorizations from over 5000 real Weibo users, who grant us full access to allthe authentication-protected user data, including user profiles, tweets, the retweet timeline,the reply and mention timeline, and accurate reply and retweet number for each tweet. Weparse 48,000 tweets published by the authorized users, only keeping tweets being retweetedmore than 5 times, which leaves us 132,796 retweet records and 7800 tweets to serve as thetraining and testing tweets.

Based on the retweet records, 52,468 users participate in retweeting and are consideredas our recommendation candidates. We collect the most recent 1000 tweets from theseusers (around 46 million in total) and record their personal information including the fullname, the location, user biography etc. Average retweet rate and reply rate for each userare calculated based on the most recent 200 tweets (around 11 million in total). 97,164user social ties are established based on retweet interactions. In our experiment, we splitthe parsed tweets into training and testing data set with an 80/20 proportion and cross-validation is used.

3.4.2 Evaluation Metrics

We evaluate the results using both standard information retrieval metrics[51][18] and metricsfeaturing on measuring information propogation[138]. In particular, we use the followingmetrics: precision (P ), average precision at K (AP@K), retweet times (RT ), user coverage

42

(Cov) and normalized user coverage (Cov_N), which are defined as,

P = Nhit

m(3.15)

AP@K =∑Ki=1(P (i))Nhit

(3.16)

RT = {∑u∈R|t| |t ∈ Tt,u} (3.17)

Cov = {∑u∈R

∑follower(v)|v ∈ Ut,u} (3.18)

Cov_N = {arctan(∑u∈R

∑(follower(v)))|v ∈ Ut,u} (3.19)

where m is the size of the recommendation list, Nhit is the number of users in therecommendation list belonging to the top m relevant matches and P (k) means the precisionat cut-off k in the recommendation list. For a user u, a tweet t and the recommendationlist R, we define Tt,u as all the retweets from the retweet cascades initiated by u retweetingt and Ut,u as all the users from the retweet cascades initiated by u retweeting t.

Retweet times stands for the number of hops in a tweet propagation and each hop in-creases the chance for the tweet to reach more users. User coverage is a more intuitionalmetric which is the cumulative number of users that a tweet has reached due to the men-tion recommendation. In the normalized user coverage, we normalize the coverage with anarctan() function, to make the coverage number from different algorithms more comparable.

Due to the length restriction, only a limited number of users can be mentioned in atweet and thus we set the length of recommendation list as 5 in our evaluations.

3.4.3 Comparison Algorithms

To the best of our knowledge, no previous studies have been done on the whom-to-mentiontask. Though the task is with lots of new challenges, we try our best to adapt several classicrecommendation algorithms to this new problem to serve as baseline comparison algorithms.

• Content-based Recommendation (CR). A content-based recommendation algorithmsimilar to [50] is carefully designed. User profiles are based on the content of tweets andattributes from user profile page. A specific tweet is considered as an item, illustratedby its content. Both the user profile and item are modeled as tf.idf-based vectors andwe recommend users by ranking the cosine similarity scores of user profile and item.

• Content-boosted Collaborative Filtering Recommendation (CCFR). For our task, therecommendation is conducted before a tweet is published and there thus exist no userinteraction behaviors like retweet and reply to serve as ratings, so the recommenderis always in a cold start state. We choose Content-boosted Collaborative Filtering

43

Table 3.2: Result Comparison of WTM and Baseline AlgorithmsWTM CR CCFR BR INFR RR Twitter Weibo

Precision 0.1343 0.0309 0.079 0.0492 0.0279 1.47E-04 - -AP@5 0.1005 0.0207 0.0515 0.0416 0.0178 4.91E-05 - -

Retweet Times 3.1026 0.9395 1.8058 1.1990 1.0147 0.0015 0.110 0.798Normalized Coverage 0.8525 0.2649 0.5640 0.2349 0.1969 0.0023 - -

Recommendation[86] which copes with the cold start problem of traditional Collabo-rative Filtering. A tweet is viewed as an item and a candidate is regarded as a user.When a new item (tweet) needs the recommendation, we find 5 most similar itemsfrom training data based on content similarity and recommend users by combiningthe recommendation results from the similar items.

• Bond-based Recommendation (BR). In BR, we recommend candidates to a tweetbased on the social bonds between candidates and the tweet author, which means thecloser a candidate is linked to the author, the more likely he will be recommended.The social bond is modeled based on users’ retweet interactions.

• Influence-based Recommendation (INFR). In INFR, we recommend candidates basedon their influence, which is a linear combination of influence features mentioned insection 4.1.3. We try to recommend the most influential users to mention given atweet.

• Random Recommendation (RR). In RR, 5 users are randomly chosen from the can-didates to generate the recommendation list.

• Twitter and Weibo. Based on statistics from previous studies[5] [145], we get theaverage retweet rate and coverage of a tweet on Twitter. With the help of the data wecollect for user influence modeling (11 million tweets from Weibo), we calculate theaverage retweet rate and coverage for a tweet from Weibo. These numbers show thegeneral average diffusion of a tweet in a Micro-blogging system.

3.5 Results and Analysis

3.5.1 Algorithm Performance Evaluation

As shown in table 3.2 and figure 3.2, our whom-to-mention approach (WTM) significantlyimproves the diffusion of a tweet in all the metrics. We draw the following conclusions fromthese results.

First, Random Recommendation (RR) barely shows any effect, which makes it clearthat simply mentioning some users has little effect in improving the diffusion of a tweet.

44

Table 3.3: Comparison on How Different Features Affect the Performance of WTMALL NO_Interest No_Inf No_Ties No_Content Twitter Weibo

Precision 0.1342 0.1328 0.1319 0.0658 0.1171 - -AP@5 0.1005 0.0985 0.1129 0.0410 0.0869 - -

Retweet Times 3.1026 3.0559 3.0359 1.7540 2.6770 0.110 0.798Coverage 3716 3643 3592 2185 3239 1100 711

Precision AP@K0

0.05

0.1

0.15

0.2

WTM

CCFR

BR

CR

INFR

RR

Retweet Times Normalized Coverage0

0.5

1

1.5

2

2.5

3

3.5

WTM

CCFR

BR

CR

INFR

RR

Figure 3.2: Performance Comparison of WTM and Baseline Algorithms

Second, the poor performance of Influence-based Recommendation (INFR) is because in-fluential users may be neither interested in the tweet, nor share any social ties with theauthor. Moreover, mention notifications may be easily neglected by the influential userssince they usually receive thousands of mention notifications per day. Third, Content-basedRecommendation (CR), although effective, is not as good as those based on user relation-ships like BR and CCFR; this is partly attributed to the noise and ambiguity existing in thetweet-based user profiles and item profiles. Fourth, the performance of Bond-based Recom-mendation (BR) shows users who share strong social ties with the author are more likely tohelp him retweet the tweet and it is in accordance with our daily experience. Furthermore,Content-boosted Collaborative Filtering Recommendation (CCFR) shows the best perfor-mance in all of our comparison algorithms, owing to both its adoption of sophisticated CFrecommendation algorithm based on the implicit retweeting interaction network and theincorporation of content-based features during the recommendation.

Finally, our SVR based whom-to-mention recommendation (WTM) outperforms all thecomparison algorithms. Even comparing with CCFR, it shows 70% increase in precision, a94% increase in AP@5, a 72% increase in retweet rate and a 51% increase in normalizedcoverage of users. Our algorithm benefits from the exploitation of all the new features,a careful design of relevance model and a ranking function based on machine learningtechniques.

45

3.5.2 Feature Importance Evaluation

To analyze how features used in our proposed algorithm contribute to the learned model, wedesign this contrast experiment by eliminating one feature at a time and observe how theperformance of our model changes. Furthermore, since we assume user social ties in micro-blogging systems are content-dependent, we design a contrast algorithm by eliminating allthe content information from our user social ties, leaving only the number of interactiontimes to indicate the strength of social ties. All the results are listed in Table 3.3.

We note that when eliminating user interest match score (No_Interest), AP@5 suf-fers from a 2.0% decline and the coverage of users suffers from a 2.0% decline. Similar touser interest, the coverage of users decreases 3.4% after excluding user influence features(No_Influence) from our model. When we eliminate the user social ties feature (No_Ties),the model suffers a 60% decline of AP@5 and a 41% decline in coverage. This result is inaccordance with the results in section 3.5.1, which shows although user interest match anduser influence help to improve the recommendation result, content-dependent user socialties play a much more significant role in the recommendation. It’s worth noting that AP@5exhibits the best performance after eliminating the influence features, indicating that notall influential users are interested in the tweet and many pay little attention to mentionssince they may receive hundreds or even thousands per day. However, the influence featuresdo help to expand the retweet rate and user coverage because the influence brought byinfluential users outweighs the precision loss.

Furthermore, after removing all the content feature from the social ties (No_ContentInTie),a 14% decline in AP@5 and a 13% decline in coverage prove that content feature in social tiesplays an important part in the recommendation and user social ties are content-dependent.

3.5.3 Recommendation Overload Evaluation

If everyone uses the whom-to-mention system, recommendation overload may occur anda popular user may receive tons of mention notification from the recommendation systemwhich will result in a severe interruption. We show how many times a user is recommendedin our evaluation in a descending order in figure 3.3. From the figure the recommendationdistribution of WTM is more smooth compared with our best comparison algorithm CCFR.It is also worth noting that in CCFR, there exist users recommended hundreds of timeswhich may lead to potential mention overload while our algorithm avoids the overloadproblem by setting the constraints based on user’s free will.

3.5.4 Result Discussion

The experiment results may seem a bit low, which is in accordance with our expectation.On one hand, we ascribe it to we performing an off-line evaluation by using users’ retweetlog to estimate the possible information cascade and a perfect recommendation match in

46

Figure 3.3: Recommendation Density Comparison Between WTM and CCFR (200 mostrecommended users)

the real world may be regarded as a miss in the evaluation as a result of lack of retweetlog given the tweet. However, by comparing our algorithm with a set of carefully designedcomparison algorithms, we believe our algorithm performs well based on the remarkableimprovement in all metrics. On the other hand, attracting others to retweet is not an easyjob and comparing with the average retweet rate 0.11 on Twitter (0.78 on Weibo), ouraverage 3.1 retweet rate shows a notable improvement.

Based on our comparison evaluation, it shows the content-dependent user social tie fea-ture plays a much more important role compared with user interest match and user influence.We propose 3 reasons for this phenomenon: First, though with careful pre-processing, theambiguity and noise in the tweets still decrease the accuracy of user interest match. As amatter of fact, even though both are content features derived from users’ tweets to modelusers’ interest, the content feature from users’ social ties shows more effectiveness comparedwith content feature from users’ interest model, because the former feature is with less noise(users usually prefer to choose a well-written tweet with a clear topic to retweet). Second,though intuitively influential users can lead to a larger diffusion of the tweet, they are usu-ally mentioned by large numbers of people every day, which makes them more easily toignore the mention notifications. Third, the content-dependent retweet social tie is a strongindication. A user retweeting another users’ tweet usually indicates a close user relationshipand people who are close are more likely to retweet a tweet from each other. Moreover,retweet shows a strong interest in the topic of the tweet, so the user will be very likely toretweet the tweet on the same topic again in the future.

47

3.6 Conclusion

In this chapter, we offer the first in-depth study on Mention Recommendation problem andpropose a new recommendation model to expand the diffusion of tweets by recommendingproper users to mention. Important aspects like tweet prioritization and user influence aretaken into consideration by our model. We formulate this new problem as a learning to rankproblem and propose new features, new relevance, and a machine-learned ranking functionto solve it.

We find that the best performance of the algorithm is achieved when all the new features,including user interest match, user social ties, and user influence, are used. A relevancedefined by the coverage of users and an SVR based ranking function also help to improvethe performance. Based on our comparison experiments, we also find that user relationshipbased features play a more important role than other features. Furthermore, we confirm thatthe content-dependent feature in user relationships is of high effectiveness in improving theprecision of mention recommendation.

48

Chapter 4

Broadcast Email Prioritizationwith Active Learning

In this section, we take one popular type of broadcast messages, broadcast emails, as anexample and work directly on broadcast email prioritization task. Different from previousstudies in the field, we propose the first broadcast email prioritization framework whichadopts the paradigm of collaborative filtering. However, broadcast email prioritization suf-fers from complete cold start challenge because the broadcast email is not sent out at thetime of prioritization and thus with no user interaction. A novel active learning frameworkis proposed in this section to handle the problem by first selecting a small set of users tocollect their feedback and then predict the preference for the remaining majority of usersbased on the collected feedback. Experiments are conducted on a real-life email datasetfrom Samsung and the results show the improvement achieved by our proposed approach.


With over 100 billion emails sent and received per day, email is undoubtedly one of themost prevalent personal and business communication tools[97]. However, along with thegreat benefits come significant drawbacks. According to previous research, 58% of emailsare unimportant or irrelevant, and a person on average spends nearly 380 hours every yearto handle those emails, which causes trillion-level economy loss in productivity[19][60]. Thisphenomenon is called email overload, and there is an urgent need to develop a system thatcan automatically learn the personal priorities of the emails to mitigate the problem.

Different from spam filtering which has been widely explored in the literature[110][101],personalized email prioritization aims at making a personalized prediction of the importancelabel of non-spam emails. Many efforts have been done in both industry and academiato solve this problem[139][118]. For instance, Google has proposed an email importanceprediction algorithm for Gmail[3] and it has been used in the Gmail Priority Inbox, inwhich every important email is marked with a yellow icon next to the sender’s name.

49

Figure 4.1: Examples of Broadcast Emails: Broadcast Emails are widely used for groupnotification and email Marketing. Usually send from the mailing list admin and with limitedtypes of interaction (e.g. do not support reply).

However, broadcast email, one important type of email with interesting and challengingcharacteristics, has been overlooked in previous personalized email prioritization systems.A broadcast email is an email message that is sent to a group of receivers, usually by orga-nizations, companies and web services. The size of the group is typically large, commonlycontaining thousands or even millions of receivers. A typical email user may be members ofdozens of broadcast mailing lists. For instance, as a student, we are in the broadcast mailinglist of graduate students; as an employee, we are in the company’s developer departmentlist; as a consumer, we are in the promo lists of various products. Even though there areimportant and related broadcast emails, a large portion of them are irrelevant and unimpor-tant and one can easily get swamped by them. Moreover, it makes us more likely to miss thereally important broadcast emails since we are used to neglecting them. Thus, personalizedemail prioritization is even more important for broadcast emails. However, broadcast emailsare significantly different from normal emails. The following challenging characteristics ofbroadcast emails could fail traditional personalized email prioritization methods:

The Same Sender One of the most indicative features from previous works [3][139] forpersonalized email prioritization is the social feature based on the interactions betweenthe sender and the receiver. For instance, if a high percentage of a sender’s emailswere read by the receiver, we can deem the sender important and predict his followingemails are also important to the receiver. However, for a broadcast mailing list, thereis usually only one sender (e.g. mailing list admin) and a receiver may get hundredsof different emails from the same broadcast sender.

50

Limited Types of Interaction Traditional methods often exploit a user’s interactionswith emails for importance prediction. Compared to normal emails, nevertheless, thetypes of interaction with broadcast emails are limited. For instance, we usually willnot reply, forward or cc a broadcast email; we would not manually label an irrelevantemail from a broadcast mailing list as spam either since we still wish to receive otheremails from the mailing list. The common user action on broadcast emails is viewing.

Hence, many key features of traditional methods cannot be extracted for broadcastemails, which significantly deteriorate their performance. Despite these new challenges,broadcast emails also bring us new opportunities. The most prominent one is that eachbroadcast email is sent to thousands of users and other users’ responses (view or not) canbe very helpful in predicting a target user’s preference. In other words, we could generatepriority predictions by collaborative filtering: for a user, if other users with similar inter-ests have considered the email important (i.e. viewed it), he should be very likely to alsoconsider it as an important email. In this section, we propose the first personalized emailprioritization framework for broadcast emails, in which we exploit collaborative filteringfeatures by considering other users’ responses to broadcast emails. However, there existsone key challenge. Each email waiting for priority prediction is completely cold. That’s tosay no view-email action has been observed since the email has not yet been sent to anyusers, which makes it impossible to exploit the collaborative filtering features directly.

We propose a novel active learning framework to solve the cold-start problem. Theintuition is simple. For a new email, we first send it to a small portion of the users in themailing list (e.g. 5%) without priority labels and wait for a short period of time (e.g. half anhour) to collect their feedback (whether the user has read the email). Then based on theseusers’ feedback, we predict the priority for the remaining majority of users. Our personalizedemail prioritization problem can thus be naturally divided into two subproblems. First, howto sample the small portion of users whose feedback can help us the most in determiningthe email priority for the remaining users. Second, once user feedback is gathered, how touse them to accurately predict personalized priority for the remaining users.

Our problem can be considered as an active learning problem for recommendation.However, due to the unique characteristics of the broadcast email prioritization task, thereexist several new challenges which make the traditional active learning recommendationmethods insufficient for our task.

Implicit Feedback To the best of our knowledge, the literature on active learning in rec-ommender systems focuses on explicit feedback [70][68][117][69][59], like the Movie-Lens and Netflix datasets which deal with 1-5 star user ratings. The underlying as-sumption is that once we query a user about an item and get his feedback, we can knowhis preference on the item based on his rating. While in our task, we only considerusers’ view-email actions, which are one-class implicit feedback. If the user viewed the

51

email (positive feedback) we infer the email is important. However, if the user did notview the email (negative feedback), there is no way we can infer the importance of theemail since we could attribute a user not reading an email to either a lack of interestor a lack of awareness of the email.

Timely Response In the active learning recommendation literature, there exists an un-derlying assumption that users queried for feedback always provide feedback in time.Therefore, traditional active learning algorithms can afford to take multiple roundsto sample users, choosing the next round of users based on the feedback from theprevious round. However, this is not the case in our task. When we send the email toa small portion of the subscribers for feedback, we can only wait for a short periodof time for the responses, due to the real-time nature of emails. Hence, our problemrequires us to sample users who can provide responsive feedback in time and performsa batch-mode user feedback sampling in a single round.

Completely Cold Item In previous works on active learning for recommendation [53, 64,102], the methods require that even for cold start items, there are a small numberof initial ratings. However, in our problem, every email waiting for prioritization iscompletely cold with zero user response.

Fairness in User Querying For every email requiring prioritization, we need to querysome informative users for feedback. The choice of informative users needs to be fair.That is to say, we cannot always pick the same set of users for feedback querying,because it will result in them losing the chance of benefiting from the service of emailprioritization and may even generate user frustration.

To cope with the above-mentioned challenges, we propose a novel active learning frame-work, in which we sample a small set of informative users considering both the preference ofa user and his tendency of giving responsive feedback, by exploiting features including textand attributes of emails and users’ email-view interactions. After gathering the feedback,we use a weight regularized matrix factorization method specifically designed for implicitfeedback to learn the preference scores for the remaining users and use the scores as the keyfeature to predict the final priority of the email.

Since there is no publicly available dataset that contains personal importance judg-ments by real users for broadcast emails due to the privacy concern [139], we collect anindustrial dataset from Samsung Electronics. The dataset contains thousands of broadcastemails from one of Samsung Electronics’s company broadcast mailing lists with thousandsof Samsung employees as subscribers. These employees are from all over the world and withdiverse demographic features. We collect both the text features (e.g. email title, content)and attributes features (e.g. sender, receiver, timestamp) for the broadcast emails. We alsocollect user’s view logs of these emails in a 9-month time window. We conduct extensive

52

Figure 4.2: Broadcast Email Prioritization with Active Learning: In this setting, given amailing list, we know users’ ratings on previous emails. The method aims to predict impor-tance label for a new email F.

experiments and demonstrate that our method outperforms all the baseline algorithms interms of prediction accuracy.

Our main contributions are as follows:

1. We present the first in-depth discussion of personalized prioritization for broadcastemails and propose an active learning based framework to solve the problem. In par-ticular, we exploit the collaborative filtering features in email prioritization.

2. We propose a novel active learning model that can handle one class implicit feedback,and considers users’ time-sensitive responsiveness for active learning based recommen-dation.

3. Our method is thoroughly evaluated on a large scale real industrial dataset and isdemonstrated to be highly effective compared against a large number of baselines.


The task is personalized prioritization for broadcast emails. That is to say, we want topredict whether a broadcast email is important or not for a given user. The problem can bedivided into two sub problems. First, sample a small portion of users whose feedback canbest help us in predicting the email priority for the remaining users. Second, predict thepriority for the remaining users based on the feedback collected from the sampled users. Wedefine the problem formally as follows.

53

Table 4.1: Email FeaturesFeature Category FeaturesEmail Body Email title, content of email bodyEmail Header Sender ID (only one sender, email-admin), receiver ID, time stamp

of the emailUser Attribute Receiver ID, receiver country, receiver timezone

For user set U and email set E, we define a binary email importance label set I basedon users’ email-view interactions. That’s to say, for user u ∈ U and email e ∈ E,

Iu,e =

1 if u has viewed e

0 if u hasn’t viewed e(4.1)

For each email e, we record its text and attribute-based features, e.g. title, content,sender and receiver of the email. For each user u, we record user attribute features, e.g.country and timezone. Details of the features are provided in Table 4.1. Given a new emailenew, we define the two sub problems mentioned above as:

Sampling Users for Feedback Given U, E, I, enew and time interval Tfeedback in whichwe collect users’ feedback, select the subset S of k users from U whose feedback inTfeedback maximizes the prediction accuracy of IU−S,enew .

Prediction For Remaining Users Given U, E, I, enew, IS,enew , predict the importancelabel set IU−S,enew .

4.3 The Framework

We propose an active learning framework for the broadcast email prioritization problem.There are two parts of our framework, sampling informative users for feedback and makingpriority prediction for the remaining users based on the collected feedback, which will bedescribed in detail in the following subsections.

4.3.1 Sampling Users for Feedback

To the best of our knowledge, none of the previous works using active learning for recom-mendation focuses on how to handle one-class implicit feedback data and most of them,especially the personalized active learning for recommendation methods require at least asmall portion of initial ratings for each cold start user/item[3][139][118]. None of the previ-ous works considers the time cost of obtaining users’ feedback and none of them considersthe fairness issue when sampling users for feedback. However, all of the above-mentionedchallenges are very important in sampling users for feedback for our task and are carefullyhandled in our framework. Next, we first introduce important criteria of sampling usersconsidered in our framework and then discuss our sampling strategy in detail.

54

Sampling Positive Feedback

To make our method general, we only consider email-view interaction for our broadcastemail prioritization task. There are two kinds of feedback we can receive. First, positivefeedback which means the user has viewed the email, is a relatively clear evidence that thisemail is important. Second, negative feedback which means the user fails to view the emailin time, is a mixture of cases where the user is unaware of the email or the user thinksthe email is unimportant. Our proposed sampling method aims to sample more positivefeedback from the users for the following reasons.

More Informative Positive feedback give us clearer and more confident information ofusers’ preference to the email and decreases the overall uncertainty of the predictedemail importance.

Data Sparsity The collected feedback will be used in a matrix factorization frameworkto predict the email priority for other users. Each new email waiting for prioritizationis completely cold and we can only sample a small portion of users for feedback.Moreover, only a small portion of them will give positive feedback in time. Thus, thepositive feedback data for the new email can be very sparse if we do not employ asampling strategy favoring positive feedback.

It is worth noting that collecting too much positive feedback may introduce bias to thesystem, which will have a negative impact on priority prediction. We will discuss in section4.3.2 about how to cope with this issue. There are two factors considered in our work inorder to sample more positive feedback: users’ preference and users’ responsiveness, whichwill be explained in detail as follows.

Predicting Users’ Preference

The first way to get more positive feedback for an email is to predict users’ preference to theemail and sample users who are predicted to be interested in it. However, since the emailwaiting for prioritization is completely cold with zero email-view interaction, we have torely on additional information. Fortunately, the text features of emails provide us a naturalway to link the new email to old emails. We use a hybrid recommendation algorithm topredict a user’s preference towards the new email by combining the ideas of content-basedrecommendation and item-based collaborative filtering.

The text features of an email e are represented as < et, eb >, where et and eb corre-spond to the term vectors for the title and body of e and each dimension of a term vectorcorresponds to the tf-idf value of a term. Similar to [132], stop-word filtering, stemming andpart of speech tagging are performed on the text features and only nouns are kept in thevocabulary.

55

We define the similarity between 2 emails ei and ej as the weighted average of the cosinesimilarities of their titles and bodies.

sim(ei, ej) = cos(eti, etj) + αcos(ebi, ebj) (4.2)

α is a constant weight and is learned by cross validation. For a new email enew requiringprioritization, we first find the top k similar emails Esim based on the similarity definedabove. Then we predict user uj ’s preference on enew using an item based collaborativefiltering idea by aggregating uj ’s responses on Esim.

Scoreenew,uj =∑ei∈Esim

Iuj ,eisim(enew, ei)∑ei∈Esim

sim(enew, ei)(4.3)

Scoreenew,uj is used later in the user sampling phase. It is worth noting that other methodslike the method of [40] can also be used in our framework as long as we adapt it to estimateusers’ preference on completely cold items with the help of additional text features. Westick to the above described approach for simplicity and efficiency.

Predicting Users’ Responsiveness

Different from previous works in active learning, the time cost of waiting for users’ feedbackcannot be ignored in our work since we can only afford to wait for a short period of time fora user to respond. Thus, another crucial way to increase positive feedback rate is to sampleresponsive users who return their feedback in time. There are two factors that need to beconsidered to predict whether a user is responsive or not.

Users’ Activity Time Users are only active (available to check emails) in a certain timewindow. For instance, a user usually only checks broadcast emails from his companyduring working hours and some users may start working early and leave early whileothers may prefer the opposite. Moreover, users are located in different countriesand different time zones with various local work routines. It is important to generatepersonal activity time probability model and only query users who are active.

Email Checking Frequency Different users have different email checking habits. Someusers may only check their inbox twice a day while others respond to the email in realtime with the help of push notifications. To sample more responsive users, we alwaysprefer users who check their inbox frequently.

We estimate the active time and email checking frequency based on the timestamps ofa user’s previous view-email behavior. All the timestamps are converted to the correspond-ing local time according to the timezone feature from user attributes. We define user u’stemporal active profile as D(u) =< vt1(u), vt2(u), ..., vt24(u) >, where vt(u) is the number of

56

days in which u was active in time interval t. Each day is divided into 24 time intervals. Forexample, t1 is the time interval from 0:00 to 0:59. We regard u to be active in the intervalt in a day if at least one of u’s view-email interactions is observed in the interval t on thatday.

Since for each user the view-email interactions can be very sparse, when generatingpersonal active time probability model, we also rely on the view-email interactions of otherusers from the same country. We define the user set from country j as Uj . For a user uicoming from country j, we define ui’s probability of being active at time interval t as

Pt(ui) =

∑e∈E

Iui,e

1 + γ∑e∈E

Iui,e

vt(ui)ob(ui)

+ 11 + γ

∑e∈E

Iui,e

1|Uj |

∑u∈Uj

vt(u)ob(u) (4.4)

where ob(ui) refers to the total number of observation days for user ui and ob(ui) =min(number of days ui is registered , number of days of the experiment observation ).γ is a constant weight which can be learned by cross validation. The number of days of theexperiment observation for our work is 270 (9 months).

∑e∈E Iui,e is the total number of

ui’s view-email interactions. The intuition behind Equation (4) is that if ui has few view-email interactions, the estimation of active probability for time interval t relies more onthe average active probability in t of other users from the same country. As the number ofview-email interactions increases, ui’s own interactive data will gradually become dominantin the probability estimation. Pt(ui) is used later in the user sampling.

As mentioned earlier, a user’s email checking frequency also matters in giving a respon-sive feedback. We define the one-hour time window after one view-email interaction of auser as an email checking session and all the following view-email interactions within theone-hour time window belong to the same session. Denoting {session(u)} as the set of allemail checking sessions of user u, we define email checking frequency for u as:

frequency(u) = |{session(u)}|ob(u) + ζ

(4.5)

in which ζ is a constant used for smoothing and can be determined by cross-validation.

Sampling Strategy

As discussed above, for a user u and a new email enew sent at his local time t, the probabilityof u providing a positive feedback to enew within the time limit is related to u’s preferencetowards enew (scoreenew,u), u’s probability of being active at t (Pt(u)) and u’s email checkingfrequency (frequency(u)). We define the probability of u giving a positive feedback to enewat t within the time limit as:

P (u, enew, t) = 11 + e−(β0+β1scoreenew,u+β2Pt(u)+β3frequency(u)) (4.6)

57

It is a logistic regression model considering the above mentioned 3 factors. β = {β0, ..., β3}is the model parameter which is trained based on the validation set.

For all users, we sort them in descending order of P (u, enew, t). We then sample the topk users to form the sampled user set S.

4.3.2 Prediction for the Remaining Users

Since we tend to sample responsive positive feedback, bias could be introduced when wemake predictions for the remaining users. In this section, we first discuss how we use theweighted low-rank approximation technique to eliminate the bias and then propose theclassification model for the final priority label prediction.

Weighted Low-rank Approximation

After receiving the feedback from the sampled users, we propose to use a matrix factor-ization based method to predict the preference for the remaining users. Since the feedbackis one-class implicit and our sampling method introduces bias towards positive feedback,a weighted low-rank approximation method is developed to handle the implicit data andcorrect the bias.

The intuition behind the proposed method is to punish (add weight to) unexpectedsampled feedback. That is to say, during the training phase, for all the feedback gatheredby querying sampled users, we will punish negative feedback which we predict to have ahigh positive probability in the sampling phase and positive feedback which we predict tohave a low positive probability in the sampling phase, by adding additional weights oncorresponding training data points. For all the other training points, we will assign highweights to positive feedback and lower weights to negative feedback as described in [90][58].

Given the expanded importance label set I′ = {I, IS,enew}, our objective is to minimizethe loss function

L(P,Q) =∑ij

Wij(I ′ij −Pi.QTj.) + λ(||P||2F + ||Q||2F ) (4.7)

in which P ∈ Rm×d and Q ∈ R(n+1)×d stand for the latent vectors for U and {E, enew}.Wij

is a non-negative weight for ui and ej . Different from all the previous active learning works,we exploit the responsive positive feedback probabilities predicted in the user samplingphase of active learning in the weighting scheme to eliminate the bias from sampling. Theweighting scheme of non-negative weight matrix W is summarized in Table 2. We set mequals to 1 and δ equals to 0.2 in the experiment.

Alternating Least Squares (ALS) is used to solve our optimization problem by fixing Pand Q alternatively while optimizing the unfixed parameter.

58

Table 4.2: Weighting SchemesFeedback Type Weighting Scheme

Positive Sampled Feedback 1 + (m− Prp(ui, enew, tj))Negative Sampled Feedback Prp(ui, enew, tj)Other Positive Feedback 1Other Negative Feedback δ

When fixing Q and solving ∂L(P,Q)∂Pi.

Pi. = I′iW̃i.Q(QTW̃i.Q + λ(∑j

WijID))−1 (4.8)

where W̃i. ∈ R(n+1)×(n+1) is a diagonal matrix with the elements of Wi. on the diagonaland ID ∈ Rd×d is an identity matrix.

Similarly, when fixing P and solving ∂L(P,Q)∂Qj.

Qj. = I′T.j W̃.jP(PTW̃.jP + λ(∑i

WijID))−1 (4.9)

where W̃.j ∈ R(m)×(m) is a diagonal matrix with the elements of W.j on the diagonal.Detail of using ALS to solve matrix factorization problems is not the concern of this thesisand can be viewed in [90].

For each remaining user ui ∈ (U− S), we can predict his preference to enew as

yi,enew = PiQTenew

(4.10)

Feedback-sensitive Classification

Once yi,enew is estimated for all the remaining users, we can combine it with any additionalfeatures proposed by previous methods (e.g. content feature and label feature) and putthem in a classification model (e.g. a logistic regression model as proposed in [3]) to predictthe email priority labels for the remaining users.

In this work, to make it simple, we use yi,enew as the only feature considered in priorityclassification. We devise a classification method that can make a feedback sensitive classi-fication. The intuition is that for each email a certain percentage of users will consider itas important, but the percentage varies among different emails since they are with differenttopics, written quality etc. We can infer the percentage of users thinking the email impor-tant by the percentage of positive feedback in the sampling phase. We define the thresholdfor email enew as

H(enew) = θpos(enew) + pos(E)θk +mn

(4.11)

59

pos(enew) is the total number of positive feedback from the k queried users. pos(E) is thetotal number of positive responses from all the m users for all the n previous emails. θ is aconstant to balance the global percentage of positive responses with enew-specific percentageof positive responses estimated from the sampled feedback and can be inferred by cross-validation. For the top H(enew) percent of users according to yi,enew , we predict enew asimportant while for others as unimportant.

4.4 Experiments

4.4.1 Dataset

We collected emails, view logs of emails and user information from a large business mailinglist for employees within Samsung Electronics. Employees from all around the world receiveemails with various topics, like win notices of deals, meeting agendas of customers, businessobjectives, news and technical issues.

The dataset contains 6291 broadcasting emails sent to 2805 Samsung employees, gener-ating 398,343 view records. For each email, we collected both text data like titles, contents,and attributes like the receiver, timestamp, timezone etc. All the emails have the samesender attribute (email-admin). We split the dataset into training set (containing 5632emails and their view logs) and testing set (659 emails and their view logs) based on acertain time point. Since we only have users’ implicit email viewing action data, we assumean email is important to a user if the user has viewed it.

It is worth noting that the data set is relatively small with only one sender and containsonly view-email behavior. Accuracy could be better if we can incorporate information likedeletions of emails, flagging emails as important and skipping an email. However, due tothe privacy concerns, there is no public dataset containing importance judgments by realusers for broadcast emails [139] and this is the best dataset we can get.

All user data was analyzed and stored in accordance with Samsung’s privacy policy.Only the view logs of the authorized broadcast emails were extracted and all the usersare Samsung Employees. The dataset was completely anonymized by mapping user ids andemail ids to integer indices before any analysis. All the features extracted from messageswere deleted after training.

4.4.2 Data Pre-processing and Analysis

The users in our dataset are located in 67 countries with widely varied time zones. To handlethe timezone variance, the timestamps of users’ view log were first converted to their localtime. Then as mentioned in the previous section, we calculated the active probabilities inall time intervals both for each user and for all the users from the same country, withthe assumption that users from the same country share similar work routines. Based onour dataset, we noticed that users from different countries are with different activity time

60

0 5 10 15 20 250

0.1

0.2

0.3

0.4

0.5

0.6

Time

Use

rs’ A

ctiv

ity T

ime

Pro

babi

lity

KoreaUnited Kingdom

Figure 4.3: Activity Time Probability of Users from Korea and United Kingdom

distributions due to working culture differences. For example, as shown in Figure 4.3, mostusers from the United Kingdom tend to view emails at work time from 8:00 to 18:00. Whilein Korea, users tend to work in much longer time with many users checking their emailsearly in the morning or late at night.

4.4.3 Baselines

Many previous active learning recommendation methods [53, 64, 102] require at least asmall amount of initial ratings, which is not applicable to our problem since emails requiringprioritization are completely cold items. Due to the restrictions of broadcast emails, i.e. thesame sender and limited types of interaction, most email prioritization methods cannot bedirectly applied to our task [65, 56, 89] or lack key features [3, 139, 118] if applied.

In the experiment section, we refer to our own method as Positive-feedback-orientedActive Learning (PAL). We try our best to adapt the following methods from previousliterature for comparison. The first baseline is an adaption of the email prioritization algo-rithm used for Gmail inbox. The second baseline is adapted from a hybrid recommendationalgorithm. The next 3 baselines are 3 active learning based recommendation algorithmsfrom different research works. We use them to replace the user sampling part of our ownalgorithm and for all the active learning baselines, the classic weight regularized matrix fac-torization method designed for implicit feedback from [90][58] [48] is used for the preferencescore prediction and a logistic regression model is used for label prediction. The last base-line is a variation of our own method by eliminating the weighting scheme in the weightedlow-rank approximation process. For all active learning methods including our own, thepercentage of sampled users is set to 10% and the length of time period waiting for users’response is set to 1 hour unless otherwise specified.

61

Importance Ranking (IR)

We adapt the importance ranking algorithm used for Gmail Priority Inbox[3]. Four cate-gories of features are considered in the model, including social features, content features,thread features and label features. However, due to the characteristics of broadcast emails,social features, thread features and label features are inapplicable. We generate its contentbased features as follows. Each email waiting for prioritization is represented as a termvector where each dimension is the tf-idf value of the corresponding term extracted fromthe text data of emails. For each user, we generate a user interest profile by aggregating allthe emails the user has read and generating the corresponding term vector. For an emailrequiring priority prediction, the cosine similarity between the user interest profile and theemail term vector is calculated as the content feature and a logistic regression model istrained for label prediction.

Hybrid Based Prediction (HBP)

HBP is the method described in section 4.1.2, which we used to predict a user’s preferencetowards a new email. It is a combination of content-based recommendation and item-basedcollaborative filtering. Once we get users’ preference scores for the email, a logistic regressionmodel is used for label prediction. Please refer to section 4.1.2 for more detail.

Popular Sampling Active Learning(PSAL)

Inspired by [43, 105], for a new email enew, we sample k users who have viewed the highestnumber of emails. This method is also equivalent to sampling the users with largest Varianceor Entropy of Ratings [105], because in our task, we only have one-class implicit feedbackand if we treat all the items without feedback as negative feedback (rating 0), to sample auser with the largest entropy/variance, we need to find someone with equivalent number ofpositive and negative ratings. However, since most users have far more negative rating thanpositive ratings, sampling user with large entropy/variance is thus equivalent to samplingpopular users.

Coverage Sampling Active Learning(CSAL)

Inspired by [42, 105], for a new email enew, we sample k users who have highly co-viewedemails with other users. Here, Coverage(i) =

∑j nij , where nij is the number of emails that

are viewed by both users i and j. The users with high Coverage values are then sampled.The heuristic used by this strategy is that users co-view the same emails with many otherusers can better reflect other users’ interests, and thus their viewing behavior is more helpfulfor predicting the viewing behavior of other users.

62

Exploration Sampling Active Learning(ESAL)

Exploration is important for the completely cold emails in our task [123]. Inspired by [104,14], we first construct the user-email viewing matrix (entries are equal to 0 or 1). Thenthe matrix is normalized and each row of the matrix is regarded as a user vector. Finally,we gradually sample users ensuring that the sampled users are similar to the unsampledusers and are less similar with the already sampled users. The intuition is the sampled usersshould be both representative of the unsampled users while diverse among the sampledusers.

PAL Without Weight(PAL-SVD)

In order to test how the weighted low-rank approximation impacts the result, we proposeanother baseline called PAL-SVD, which is the same as our algorithm except that we elim-inate the weighting scheme for matrix factorization and just use a basic SVD model to dothe prediction.


Since our task is a classification task, we use precision, recall and f-score as the mainevaluation metrics. Based on the predicted label from the algorithm and ground truth labelfrom the data set, a prediction is either true positive (tp), true negative (tn), false positive(fp), or false negative (fn). The metrics are defined as

Precision = tp

tp+ fp

Recall = tp

tp+ fn

F − score = 2 ∗ precision ∗ recallprecision+ recall

4.4.5 Results and Analysis

Algorithm Comparison

In this section, we compare PAL with all baselines. Since different active learning algorithmsquery different sets of users for feedback, to make fair comparisons, the performance isevaluated on the same set of users, which is the intersection of the unsampled user sets ofall the compared algorithms.

As shown in Figure 4.4, our method significantly outperforms all the baselines on all theevaluation metrics. Traditional email prioritization methods like IR [3] does not performwell in broadcast email prioritization, because many types of key features cannot be appliedhere due to the “one sender” and “limited type of interaction” challenges. HBP can work

63

Precision Recall Fscore0

0.1

0.2

0.3

0.4

0.5

IRHBPPSALCSALESALPAL−SVDPAL

Figure 4.4: Performance Comparison of Various Algorithms

on our task, but the noises hidden in the content feature inevitably prevent the algorithmfrom generating relatively more accurate results.

Active learning recommendation methods may improve the prediction accuracy becausethe collected feedback allow us to predict the prioritization based on other user’s responsesby collaborative filtering. However, sampling strategies matter here, since in our task wecan only collect the implicit feedback from a small portion of users in a very short timeperiod. Our method outperforms all the other active learning baselines by considering users’preferences, active time distributions and email-viewing frequencies when sampling users.When compared with different active learning baselines, the improvement of our proposedalgorithm further justifies our choice of sampling users who can provide positive feedbackin time.

The proposed weighted low-rank approximation method uses the predicted positive feed-back probabilities from the sampling phase as penalty weights to eliminate the bias from thesampling phase and it also works well in practice (PAL-SVD vs. PAL). It is worth notingthat sampling methods like PSAL, CSAL, and ESAL are non-personalized, which meansthey cannot generate different user samples for different emails and results in a negativeimpact on the prediction accuracy. Methods like PSAL and CSAL tend to sample popularusers who almost like (read in our case) every item they see and these users actually cannotprovide enough information for the matrix factorization based prediction. Moreover, theydo not comply with our proposed fairness criterion, which means the same set of users maybe sampled frequently. This issue will be discussed at the end of this section in detail.

Factors for Positive Feedback Prediction

As mentioned in section 4.3.1, we consider three different factors to help sample morepositive feedback, namely, users’ preferences to the email, their activity time distributions

64

0.5 hour 1 hour 4 hours 24 hours0

5

10

15

20

25

30

35

40

Length of Waiting Time−window

Pos

tive

Fee

dbac

k P

erce

ntag

e (%

)

All FeaturesNo Preference FeatureNo Activity Time FeatureNo Frequency Feature

Figure 4.5: Factor Comparison for Positive Feedback

and their email checking frequencies. In this section, we remove these three factors one at atime and evaluate how each factor affects the percentage of positive feedback we get duringsampling. Moreover, the length of the time-window during which we wait for responsesfrom sampled users also impacts the percentage of positive feedback we can get during thesampling phase. So we also plot the percentage of positive feedback against different lengthsof waiting time-windows. For users’ activity time probability feature, if the time-windowlength is larger than one hour, we consider the average probability of all the time intervalsthat are in the time window.

From the results displayed in Figure 4.5, we can see all these three factors are helpfulin boosting the positive feedback rate in the sampling phase and thus further increase theprioritization precision. User preference and email checking frequency contribute more thanactivity time distribution. This makes sense because a user’s active time is not always fixed.For instance, one may be on leave or traveling to another country on business. The noisesand uncertainty in the activity time distribution factor affect its performance.

The percentage of positive feedback is positively correlated with the length of the waitingtime-window, which is in accordance with our expectation. Our sampling strategy can workwell even if we only give users a very short time period to respond because we consider users’responsiveness related factors (i.e. activity time distribution and email checking frequency)when sampling users. It is also worth noting that with the increase of the length of waitingtime-window, responsiveness related factors become less important and the user preferencefactor becomes more important in raising the positive feedback rate.

Active Learning Cost

Active learning has cost. That’s to say the sampled users to whom we send emails forfeedback collection cannot benefit from the email prioritization service. Figure 4.6 shows

65

0.01 0.03 0.05 0.15 0.20 0.30 0.500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Percentage of Sampled Users

Precision@5Recall@5Fscore@5

Figure 4.6: PAL Performance with Different Sampling Percentage

Table 4.3: Sampling Fairness ComparisonSampling Methods Coverage Average Sample Times

Random 2805 65.78PAL 898 205.48

PSAL, CSAL, ESAL 280 659

that how the percentage of sampled users affect the final performance of the prioritizationtask. Note that even though increasing the percentage of sampled users helps to improve theprediction performance of the remaining users, sampling more users also increase the cost ofactive learning. Our method works well even if only a small portion of users(e.g. 5-10%) aresampled. This is because our sampling strategy tends to sample positive feedback and evenwith only a small percentage of users sampled, we can still get enough positive feedback forthe model training and preference prediction.

Sampling Fairness

In our broadcast email prioritization framework, it is real users that we are sampling, sofairness is a very important criterion. If we keep sampling the same set of users again andagain, they will quickly get annoyed and it is also unfair since they cannot benefit fromthe prioritization service. We compare our method with the active learning baselines withregard to the fairness criterion. For better comparison, we also add a random samplingbaseline, which, due to its natural randomness, undoubtedly is the best strategy when weonly consider fairness in user sampling. We use user coverage (the total number of sampledusers during the test) and the average sampled times as metrics. The results are storedin Table 3. From the results we can see, our method covers 2 times more users comparedwith PSAL, CSAL, and ESAL. The results make sense since PSAL, CSAL and ESAL are

66

unpersonalized active learning sampling strategies, which can easily cause serious fairnessissues.

4.5 Conclusion

In this chapter, we present the first framework for personalized broadcast email prioritizationand propose a novel active learning framework to solve the problem. This is also the firstwork to incorporate collaborative filtering into email prioritization.

To exploit the collaborative filtering features of broadcast emails, we devise an activelearning strategy that aims to sample enough positive feedback within the limited time win-dow by exploiting factors including users’ preferences, their activity time probability distri-butions and their email checking frequencies. A weighted low-rank approximation methodis proposed to eliminate the possible bias from the sampling phase and generate accuratepreference estimations for the remaining users. Finally, we develop a feedback-sensitive clas-sification method for personalized priority prediction.

Comprehensive experiments are conducted on an industrial dataset from Samsung Elec-tronics and the results show that our method outperforms all the baselines and the variousfactors considered are indeed useful to help to collect more positive feedback even understrict restrictions. An evaluation of active learning cost demonstrates that our method per-forms well even if only a small potion of users are sampled for feedback.

67

Chapter 5

Broadcast Email Prioritizationwith Cross DomainRecommendation

This chapter continue to focuses on the broadcast email prioritization and is a naturalextension of the last chapter. In the previous chapter, even though a novel active learningframework is proposed to incorporate collaborative filtering in email prioritization, newlyenrolled users or newly created mailing lists still cannot be well handled because of theirlack of interaction history. Moreover, in the previous chapter, we focus on broadcast emailprioritization within single mailing list, but in real life email systems, there typically existsa large number of mailing lists. This actually provides opportunities for the newly enrolledusers and new mailing lists, because extra knowledge can be transferred from other similarmailing lists to help with the prioritization task in a target mailing list. Thus a new cross-domain recommendation framework is proposed to handle the problem of broadcast emailprioritization considering large numbers of mailing lists. The proposed framework is alsothe first cross-domain recommendation framework that can automatically select the optimalset of source domains from large numbers of candidate domains. Our proposed approach isthoroughly evaluated on a real-life dataset and is demonstrated to be highly effective basedon the comparison experiments.


Despite 23 years history, email still remains as one of the most important communicationtools nowadays, with 2.6 billion users worldwide and over 205 billion emails sent or receivedeveryday[97]. However, together with the blessing comes a curse. Email overload, “a $650Billion Drag on the Economy" described by New York Times[82], is causing serious troublesfor email users. Based on previous research, 58% of emails are irrelevant or unimportantand a person on average has to waste at least one hour per day to handle them[19][60].The serious situation of email overload leads to a thriving research field, personalized email

68

prioritization, in which importance labels for non-spam emails are predicted and variousliterature[139][118] have been working on it.

However, broadcast email, an important type of email, has been overlooked in the previ-ous personalized email prioritization literature. A broadcast email is an email message that issent to a group of receivers, usually by organizations, companies, and web services[129] andeach group of receivers is called a mailing list. Every day huge numbers of broadcast emailsare sent to millions of mailing lists, often for group notification (e.g. emails from a universitygraduate student list) and email marketing (e.g. promo emails from an e-commerce websitelist). Handling these broadcast emails can be both overwhelming and time-consuming andthe really important and interesting broadcast emails can easily get swamped.

In the last chapter, an active learning framework was proposed to incorporate collab-orative filtering into broadcast email prioritization. However, the obtained user feedbackfrom active learning is still limited for well addressing the prioritization problem. For ex-ample, it cannot well handle new users of a mailing list and new mailing lists, which arevery common in real systems and have limited historical data for collaborative filtering.In our previous model from [129], only one mailing list was considered, i.e., each mailinglist is modeled independently without considering the existence of other mailing lists. Ina mailing system like Gmail, there exist up to millions of various mailing lists, coveringdifferent topics varying from political campaigns to e-commerce promos. The size of themailing lists is typically large, commonly containing thousands or even millions of receivers.A user may be a member of dozens of mailing lists and mailing lists can have large numbersof shared users. The viewing information accumulated in similar mailing lists can be veryuseful to enrich the collaborative filtering evidence of a target mailing list. By resorting tothese similar mailing lists, the aforementioned new user and new mailing list issues couldbe significantly alleviated.

In this chapter, we propose a new cross-domain recommendation framework to solvethe problem of broadcast email prioritization with many mailing lists. Cross domain rec-ommendation systems adopt different techniques to perform transfer learning from sourcedomains (e.g. book) to target domains (e.g. movie) in order to alleviate the sparsity prob-lem and improve the accuracy of recommendations. The intuition of our approach is thateach broadcast mailing list can be regarded as a domain in a cross-domain recommendationsystem. The problem of predicting the priority of emails with the help of extra informationfrom other related mailing lists can thus be formulated as the problem of improving thequality of recommendations in the target domain by incorporating information accumulatedfrom source domains, as in cross-domain recommendation. However, due to the unique char-acteristics of the broadcast email prioritization task, several challenges exist and make thetraditional cross domain recommendation methods fail.

Million Domain Challenge Most previous cross domain recommendation works focusedon a relatively small set of domains, like two or three domains[107]. And the selection

69

Figure 5.1: Domains and Mailing Lists: There are large numbers of mailing lists in an emailsystem, focusing on various topics ranging from politics to promotions. Each mailing listcan be treated as a domain in the cross-domain recommendation problem.

of source domains is usually done manually based on expert intuition. For instance,books intuitively makes a good source domain for movies because they can sharesimilar sets of genre. However, with millions of domains in an email system, there isno way to rely on intuition or expert to select the optimal set of source domains foreach target domain. What to select, how many to select and what makes a good setof source domains are challenging questions that can only be answered by a carefullydesigned algorithm.

Multi-criteria Source Domain Selection To select the set of source domains, multiplecriteria need to be considered and none of previous cross domain recommendationmodels has worked on it. Moreover, signals from these criteria come in different formatsand can be contradictory with each other, which makes it challenging to support themall at the same time.

A Dynamic Source Domain Set Size In the cross-domain recommendation literature,there exists an underlying assumption that the number of source domains used in across-domain recommendation is given or fixed. However, this is not the case in ourtask. For instance, if a mailing list happens to have many similar mailing lists with lotsof shared users and similar user feedback patterns, more source mailing lists should beincluded so that more useful extra information can be included. The algorithm shouldbe able to dynamically decide the number of source domains to be selected.

70

To address the above-mentioned challenges, we formulate the selection of the set ofsource domains as an optimization problem considering criteria including overlap of users,feedback pattern similarity, and coverage of users. Two methods are then proposed to solvethe optimization problem efficiently. A weight regularized matrix factorization method isused to make predictions based on information from both the selected source domains andthe target domain.

Our main contributions are as follows:

1. We present the first in-depth discussion of personalized prioritization for broadcastemails considering large numbers of mailing lists.

2. We propose the first cross-domain recommendation framework that can select the setof source domains from large numbers of domains.

3. Our method is thoroughly evaluated on a real-life dataset and is demonstrated to behighly effective compared to baseline methods.


The task is personalized prioritization for broadcast emails considering multiple mailinglists. That is to say, we want to predict whether a broadcast email is important or not fora given user. There are large numbers of mailing lists in an e-mail system. For simplicity,we assume each broadcast email is only sent to one mailing list and a user can enroll inmultiple mailing lists. For a broadcast email waiting for prioritization prediction, we definethe mailing list which it is sent to as the target mailing list (equivalent to the target domainin the cross-domain recommendation) and all the remaining mailing lists as source mailinglists (equivalent to source domains in the cross-domain recommendation). In this thesis,domain and mailing list are used as synonyms. The broadcast email prioritization problemcan be divided into the following three sub problems.

1. Sample the feedback from a small portion of users to solve the cold start problem,since each broadcast email waiting for prioritization is completely cold with no userinteraction.

2. Find the set of source mailing lists whose extra information can help the most withpriority prediction.

3. Predict the priority of the broadcast email with the help of the feedback from thesampled users and extra information from the source mailing lists.

For user set U and email set E, we define a binary email importance matrix I based onusers’ feedback on emails. I.e., for user u ∈ U and email e ∈ E,

71

Figure 5.2: Broadcast Email Prioritization with Cross Domain Recommendation: Whenpredicting importance label for a new email F, we not only consider the ratings from targetdomain but also consider the rating information from related source domains which shareoverlapping users.

Iu,e =

1 if u has viewed e

0 if u hasn’t viewed e(5.1)

We define a mailing list as a set of users Mi ⊂ U and assume that there are n mailinglists in total, denoted byM = {M1, ...,Mn}. We define the email set Ei as the set of emailssent to Mi.

For each email e, we record its sender and receiver. For each user u, we record userfeatures like country and timezone. Given a new email enew to be sent to mailing list Mt,we define the three subproblems mentioned above as:

Sampling Users for Feedback Given U , E, I, enew, Mt and time interval Tfeedback inwhich we collect users’ feedback, select a subset S of users from Mt and collect theirfeedback, IS,enew , within time window Tfeedback.

Choose Source Mailing Lists Given U , E, I, Mt, Et, select the set of source mailinglists M ′ ⊆ M whose extra information maximizes the prediction accuracy of emailpriority for enew. The choice of mailing lists is independent of enew.

72

Prediction for Remaining Users Given U , E, I,M ,Mt,M ′, E, enew, IS,enew , predictthe priority for IMt−S,enew .

5.3 The CBEP Framework

In this section, we introduce our Cross-domain Broadcast Email Prioritization (CBEP)framework to solve the three subproblems of broadcast email prioritization: user feedbacksampling, source domain set selection and priority prediction. Source domain set selectionis the major contribution of the proposed method and will only be briefly described in thissection with details presented in next section.

5.3.1 User Feedback Sampling

In a broadcast email prioritization task, each email waiting for priority prediction is com-pletely cold. That’s to say no view-email action has been observed since the email has notyet been sent to any users, which makes it impossible to perform collaborative filteringdirectly. Thus, we need to first sample the feedback from a small portion of users to solvethe cold start problem. Feedback in this chapter refers to user’s view-email action. Thereare two types of possible feedback. Positive feedback which means the user has viewed theemail, is a relatively clear evidence that this email is important and negative feedback whichmeans the user fails to view the email, is a mixture of cases where the user is unaware ofthe email or the user thinks the email is unimportant.

We propose a simple strategy to collect the initial feedback. For a new email, we sendit to all the users without priority labels and we wait for a short period of time. We thenpredict the priority based on the initial feedback collected within this period of time. Theonly challenge of this strategy is to determine how long we should wait to achieve the besttrade-off between gathering enough feedback for accurate priority prediction and makingpredictions as soon as possible. We employ cross-validation to determine the optimal lengthof the waiting time

5.3.2 Source Domain Set Selection

Each mailing list can be regarded as a domain and the viewing information accumulatedin one domain can be used to improve the quality of recommendations in another domain,which is especially helpful if the user has few or no views (e.g. a new user to a mailing list)or the target mailing list has little of information (e.g. a mailing list with limited numberof users or items). With up to millions of mailing lists in a mailing system, how to selectthe set of source domains to best improve the prediction accuracy creates a million domainchallenge.

To solve the million domain challenge, multiple criteria need to be considered. Signalsfrom these criteria come in different formats and can be contradictory with each other.

73

Thus we propose to formulate this as an optimization problem. Basically, we will searchfor a binary assignment to each candidate source domain that optimizes the predictionaccuracy. Details will be introduced in next section.

5.3.3 Priority Prediction

The feedback from the target domain, the selected source domains and the sampled userswill be used for priority prediction. Formally, we define the feedback set used for priorityprediction as I ′ = {IMt,Et , IM ′,EM ′ , IS,enew}.

We use a weighted low-rank approximation method for the priority prediction. Theintuition behind the proposed method is to assign different weights to the feedback basedon the source of the information. Given the target domainMt and a selected source domainMi, we assign larger weights to feedback from the target domain (IMt,Et) and the weightof feedback from the source domain (IMi,Ei) will be determined based on the similarityof the feedback patterns between the target domain and source domain, Simi(t), (definedin section 5.1.2). For feedback from source domains, the feedback from the shared usersbetween Mi and Mt will be given larger weights than those from the non-shared users.Details of the weighting scheme are summarized in Table 1. Wpos and Wneg is the weightfor positive feedback and negative feedback which are set to be 5 and 1. δ is a constantused for smooth and ι is a decay factor which are set to be 0.1 and 0.9 respectively. All theconstant parameters are tuned by cross-validation.

Our objective is to minimize the following loss function:

L(P ,Q) =∑ij

Wij(I ′ij − Pi.QTj.) + λ(||P ||2F + ||Q||2F ) (5.2)

in which P ∈ R|M ′∪Mt|×d and Q ∈ R(|EM ′∪Et|+1)×d stand for the latent vectors for users{M ′,Mt} and items {EM ′ ,Et, enew}. Wij is a non-negative weight for ui and ej and theweighting scheme of non-negative weight matrix W is summarized in Table 5.1.

Alternating Least Squares (ALS) is used to solve the optimization problem by fixing Pand Q alternatively while optimizing the other parameter.

When fixing Q and solving ∂L(P ,Q)∂Pi.

Pi. = I ′iW̃i.Q(QTW̃i.Q+ λ(∑j

WijID))−1 (5.3)

where W̃i. ∈ R(|EM ′∪Et|+1)×(|EM ′∪Et|+1) is a diagonal matrix with the elements of Wi. onthe diagonal and ID ∈ Rd×d is an identity matrix.

Similarly, when fixing P and solving ∂L(P ,Q)∂Qj.

Qj. = I ′T.j W̃.jP (P TW̃.jP + λ(∑i

WijID))−1 (5.4)

74

Table 5.1: Weighting SchemesSource of Feedback Type WeightIMt,Et

Positive Wpos

IMt,EtNegative Wneg

IMi∩Mt,Ei Positive (2 ∗ Simi(t) + δ) ∗Wpos

IMi∩Mt,EiNegative (2 ∗ Simi(t) + δ) ∗Wneg

IMi−Mi∩Mt,EiPositive (2 ∗ Simi(t) + δ) ∗Wpos ∗ ι

IMi−Mi∩Mt,Ei Negative (2 ∗ Simi(t) + δ) ∗Wneg ∗ ι

where W̃.j ∈ R(|M ′∪Mt|)×(|M ′∪Mt|) is a diagonal matrix with the elements of W.j on thediagonal. Details of using ALS to solve matrix factorization problems are discussed in [58].

For each remaining user ui ∈ (Mt − S), the priority to enew is predicted as

Ii,enew = PiQTenew

(5.5)

After estimating Ii,enew for all the remaining users, it can be used as a feature in aclassification model (e.g. a logistic regression model as proposed in [3]) and additionalfeatures like content feature can also be easily added to the classification model. It is worthnoting above mentioned method is just one of the many matrix factorization methods whichcan be applied to the priority prediction task. The model needs to be re-trained after eachnew email comes in, which may be infeasible in real life scenarios. However, there are alreadyseveral methods [128][84] for fast incremental matrix factorization for recommendation withpositive-only feedback, which can be easily applied to the priority prediction task. Sinceit is not the major contribution of this thesis, readers can refer to [128][84] for furtherinformation.

For simplicity, the estimated importance feedback Ii,enew will be the only feature con-sidered in priority classification of this chapter. The intuition of our method for priorityclassification is that for each email a certain percentage of users will consider it as impor-tant, but the percentage varies among different emails since they are with different topics,written quality etc. An important email will result in more views during the time we waitfor user feedback. Thus we can infer the percentage of important emails by the number ofviews we observed in the sampling phase. We define the percentage of users consideringemail enew important as:

H(enew) = pos(enew)posavg(Mt)

tr(ITMtIMt)

|Mt| ∗ |Et|(5.6)

pos(enew) is the total number of view-email behaviors observed in the waiting time windowfor enew and posavg(Mt) is the average number of view-email behaviors observed in the

75

waiting time window for all the emails from Mt. The second term of (6) stands for theaverage percentage of important emails for Mt.

For the top H(enew) percent of users according to yi,enew , we predict enew as importantwhile for others as unimportant.

5.4 Source Domain Selection

To solve the source domain set selection problem, we formulate it as an optimization prob-lem. In section 5.4.1, we first discuss our proposed method including all the selection criteria.However, since the optimization problem is difficult to solve directly, we propose two so-lutions in section 5.4.2. In section 5.4.3, some additional measures are proposed to furtherimprove the computational efficiency.

5.4.1 Problem Definition

Formally, given the target mailing list Mt, we define a binary vector α of size n whereeach entry αi indicates whether the source mailing list Mi is selected or not. If a sourcemailing list Mi is selected, the corresponding entry αi is 1 else 0. Thus our source domainselection problem reduces to finding α that maximizes the objective function introduced inthe following sections.

Overlap of Users

A user can enroll in any number of mailing lists and thus mailing lists can have sharedusers, i.e. users enrolled in multiple mailing lists. We prefer source mailing list with a largernumber of shared users with the target mailing list for the following reasons.

1. Recent work[24] has confirmed that without additional external knowledge, the knowl-edge between two domains can only be transferred if they are linked by shared users oritems. In our case, only the shared users can be used to transfer information betweendomains and the higher the percentage of shared users, the easier to transfer extraknowledge from the source domain.

2. Since similar mailing lists are more likely to attract the same set of users, the overlap ofusers is also an indirect indication of the semantic similarity of two mailing lists. Usersfrom similar mailing lists are likely to have similar preferences of emails. For instance,the mailing lists of e-commerce fashion brands A&F and Hollister may have a largenumber of shared users because they are targeting similar group of users with similarproducts and user’s preference of promotions can be transferred from one mailing listto the other.

76

We define overlap percentage between source mailing listMi and target mailing listMt

as:overlapi(t) = |Mi ∩Mt|

|Mt|(5.7)

The larger overlapi(t) is, the larger the number of shared users between the sourcemailing list and the target mailing list.

Similar Feedback Pattern

Users’ feedback patterns vary across domains. For example, users who share similar feedbackin an e-commerce mailing list may share completely different preference in the mailing listof the university. Only source domains with similar feedback patterns to the target domainwill be helpful in cross-domain recommendation. Or else, the source domain may introducenoise to the system and jeopardize the recommendation performance. In this chapter, wemodel the similarity of the feedback patterns between two mailing lists as the average ratingsimilarity difference of each pair of shared users between the two mailing lists. Intuitively,two mailing lists have similar feedback patterns means that for shared users between thetwo mailing lists, two users with similar feedback patterns in one mailing list also sharesimilar feedback patterns in the other and vice versa.

Formally, for each mailing list Mi, user u can be represented as a binary vector vi,u ofsize |Ei| with each entry indicating whether u has read the corresponding email from Ei

or not. We define the shared user set between two mailing lists i and j as Ci,j and theirfeedback pattern similarity as:

sim(i, j) = 1− 12|Ci,j |2

∑u,w∈Ci,j

|cos(vi,u,vi,w)− cos(vj,u,vj,w)| (5.8)

Larger sim(i, j) value indicates larger feedback pattern similarity. Specifically, we denotethe feedback pattern similarity of a target mailing list Mt and a source mailing list Mi assimi(t) = sim(i, t) and simi(t) will be used as a constraint in our objective function.

Coverage of Users

We aim to select the set of source mailing lists M ′ and each Mi ∈M ′ has a set of sharedusers Ci,t with target domainMt. Intuitively, we want the number of shared users betweenM ′ and Mt to be as large as possible so that we can cover and transfer extra informationfor more users in the target mailing list. That’s to say we want to choose a size-k mailinglist setM ′ so that the number of shared users between these source mailing lists inM ′ andthe target mailing list Mt is maximized:

max| ∪Mi⊆M ′ Ci,t| (5.9)

77

This is actually an unweighted maximum coverage problem, which is NP-hard. Insteadof modeling this criterion directly, we propose a constraint related to this criteria. We definethe overlap percentage between source mailing lists Mi, Mj and target mailing list Mt as

overlapi,j(t) = |Mi ∩Mj ∩Mt||Mt|

(5.10)

By introducing the triple-domain overlap overlapi,j(t) as a constraint in the objectivefunction, we expect the user sets shared with the target mailing list from different sourcemailing lists to be as diverse as possible and not to concentrate on the same set of users.

Objective Function

Combining all the criteria mentioned above, we propose the following objective function:

arg maxα

(λoverlapn∑i=1

overlapi(t)αi + λsim

n∑i=1

simi(t)αi

− λcovn

n∑i=1

n∑j=1

overlapi,j(t)αiαj)1∑n

i=1 αi + δ

(5.11)

subject to:α : αi ∈ {0, 1} for i = 1...n

The first term in equation 5.11 ensures the selected mailing lists to have a large per-centage of overlap users with the target mailing list.

The second term ensures the selected mailing lists to have similar feedback patternswith the target mailing list.

The third term is subject to favoring pairs of mailing lists whose shared user sets withthe target mailing list have little overlap. Combining the third factor with the first one, weprefer mailing lists that have a large overlap with the target mailing list but small overlapwith each other, which provides good coverage of users.

It is worth noting that we do not specify the number of source mailing lists to beselected because we think it should be a dynamic number related with the target domainand should be chosen automatically by the algorithm. We add the fourth term 1∑n

i=1 αi+δas

a normalizer to the objective function to prevent it from selecting too many source mailinglists. Selecting too many source mailing lists will not only introduce noise but also increasethe computational burden of the system. λoverlap, λsim, and λcov are constant weights forthe first three terms which can be learned by cross-validation. δ is also a constant, whichinfluences the number of source mailing lists selected. The larger δ is, the more sourcemailing lists will be picked.

78

5.4.2 Solutions

The optimization problem in Formula 5.11 is an integer programming problem with bothquadratic term and fraction in its objective function, which makes it extremely difficult,if not possible, to be solved directly. So we propose two heuristic solutions to solve theproblem. Both of them first reformulate the original optimization problem as a quadraticinteger programming problem and then further relax the constraints to make it a quadraticlinear programming problem, which can be solved in polynomial time.

For the first solution, we transform the denominator of the objective function 5.11 intoa term in the numerator.

arg maxα

(λoverlapn∑i=1


n∑i=1

simi(t)αi

− λcovn− 1

n∑i=1

n∑j=1

overlapi,j(t)αiαj − λpenn∑i=1

αi

(5.12)

The newly added fourth term serves as a penalty to prevent the objective function fromselecting too many source mailing lists. λpen is a constant parameter which can be learnedby cross-validation.

The second solution is based on the intuition that since selecting too many sourcedomains will not only increase the computational burden of the system but also introducenoise, we do not want to choose too many source domains. That is to say, we can setan upper bound zmax as the maximum number of source domains that can be used in across-domain recommendation. Our optimization problem then can be formulated as

arg maxα

λoverlap

n∑i=1


n∑i=1

simi(t)αi

− λcovn− 1

n∑i=1

n∑j=1

overlapi,j(t)αiαj

s.t. αi ∈ {0, 1},n∑i=1

αi = zk

(5.13)

We solve the optimization problem in equation 5.13 zmax times for zk ∈ {1, 2, ..., zmax}and for every zk, a set of source domains αk is obtained. We compute the objective functionvalue of the original objective function 5.11 for each αk and select αk with the highest value.The corresponding zk is the number of source mailing lists to be selected.

Both solutions transform the original optimization problem into an integer quadraticprogramming problem, which is still NP-hard. We relax the constraints αi ∈ {0, 1} to0 6 αi 6 1, which makes it a continuous optimization problem that can be solved in

79

polynomial time. For the first solution, a threshold γ will be learned by cross-validation andthe source domains with αi > γ are selected. For the second solution, the number of sourcedomains can be automatically determined by the solution. We use the ’quadprog’ functionin MATLAB to solve this quadratic programming problem.

5.4.3 Efficiency Improvement

Directly calculating the coefficients simi(t) and overlapi,j(t) in objective function 5.11 maybe time consuming. Given target domain Mt, the time complexity of calculating simi(t)and overlapi,j(t) is O(n ∗ |Ci,t|2) and O(n2 ∗ |U |), which may be unacceptable in real emailsystem because the number of mailing lists and the number of shared users between mailinglists can be very large. The following measures are proposed to improve the efficiency.

1. Only consider source mailing lists with a certain number of shared users with thetarget mailing list. That is to say we set up a minimum overlap threshold κ and onlyconsider Mi with |Ci,t| > κ. In this way, we will dramatically reduce the number ofcandidate mailing lists.

2. Randomly sample user pairs for the feedback pattern similarity calculation. The num-ber of pairs of shared users can be extremely large, if there are lots of shared usersbetween Mi and Mt. We only need a sample of the pairs to approximate simi(t)defined in equation 5.8 to obtain a relatively accurate approximate value.

3. Source mailing list selection can be performed offline, since it does not depend onenew waiting for email prioritization but depends only on the mailing list Mt to whichenew is sent. Therefore, the set of source domains can be pre-calculated offline andupdated periodically (e.g. weekly), which further relieves the computational burdenof the system.

5.5 Experimental Evaluation

5.5.1 Dataset

In the experiments, we used a real-life dataset from Samsung Electronics. We collectedemails and their view logs from a large business mailing list within Samsung. The mailinglist sends notifications related to a large internal forum and employees from all around theworld to receive emails with various topics, like win notices of deals, meeting agendas ofcustomers, business objectives, news and technical issues. When a new thread is posted, anotification will be emailed to all the users. We split the mailing list into 490 sub mailinglists based on sections of the forum. That is to say, a user belongs to a sub mailing listiff he has interacted with notification emails of threads published in that section. We treateach sub mailing list as a real mailing list in our experiments. The dataset contains 6506

80

broadcasting emails sent to 2433 Samsung employees, generating 333,979 view records. Wesplit the data set into the training set (containing 5475 emails and their view logs) andtesting set (1031 emails and their view logs) based on a certain time point.

We only record users’ email viewing data and we assume an email is important to auser if the user has viewed it. It is worth noting that the data set is relatively small withonly view-email behavior. Accuracy could be better if we can incorporate information likedeletions of emails, flagging emails as important and skipping an email. However, due to theprivacy concerns, there is no public dataset containing importance judgments for broadcastemails [139].

All data was analyzed and stored in accordance with Samsung’s privacy policy. Only theview logs of the broadcast emails were extracted and all the users are Samsung Employees.The dataset was completely anonymized by mapping user ids and email ids to integer indicesbefore any analysis.


Since our task is a classification task, we use precision, recall and f-score as the mainevaluation metrics. Based on the predicted label from the algorithm and ground truth labelfrom the data set, a prediction is either true positive (tp), true negative (tn), false positive(fp), or false negative (fn). The metrics are defined as

Precision = tp

tp+ fp

Recall = tp

tp+ fn

F − score = 2 ∗ precision ∗ recallprecision+ recall

In the experiments, we evaluate the precision, recall, and f-score at two levels:

Mail Level The mail level precision, recall and fscore of an algorithm are defined as theaverage precision, recall, and f-score of all the emails in the test set.

Mailing List Level The mailing list level precision, recall and fscore of an algorithm aredefined as the average precision, recall, and f-score of all the mailing lists in the testset. At the mailing list level, the evaluation metrics implicitly give more weights tocold mailing lists since all mailing lists are treated equally disregarding how manyemails or users they contain.

5.5.3 Baselines

It is worth noting in our previous work[129], experiments have already confirmed that by us-ing collaborative filtering, our method can outperform various existing email prioritization

81

methods, including the content-based methods[3][129]. So in this chapter, the experimentaldesign focuses on testing how different source domain selection strategies affect the perfor-mance of email prioritization and evaluating the performance of our major contribution,a cross-domain recommendation framework to select a set of source domains from a largenumber of candidate source domains.

Since we propose two solutions for CBEP, we refer to the first solution as CBEP-A1(corresponding to objective function 5.12) and the second method as CBEP-A2 (corre-sponding to objective function 5.13). We adapt the following methods for comparison. Thefirst four baselines are four different source domain set selection strategies, which we use toreplace the source domain selection part of CBEP and they all use the same user feedbacksampling strategy and classification strategy as described in section 5.3.1 and 5.3.3. Theweighted matrix factorization method designed for implicit feedback from [90][58] [48] isused for these four baselines. The last baseline is a variation of CBEP by eliminating theweighting scheme in the weighted low-rank approximation process. We set the waiting timefor gathering user feedback as 45 min for all algorithms and set zmax = 15 for CBEP-A2.

Single Mailing List (SML) SML only considers information from the target mailing list,disregarding information from other mailing lists. SML is similar to the email priori-tization method proposed in [129].

All Mailing Lists (AML) AML considers all the source mailing lists and combines theinformation from every mailing list with the target mailing list.

Overlapping Mailing Lists (OML) OML selects the top k mailing lists with the largestpercentage of overlap with the target domain as the source domains. Overlap is definedin section 5.4.1 and we choose k = 5 for the experiments.

Feedback Similar Mailing Lists (FSML) FSML selects the top k mailing lists with thehighest feedback similarity with the target domain as the source domains. Feedbackpattern similarity is defined in section 5.4.1 and we choose k = 5 for the experiments.

CBEP Without Weight (CBEP-SVD) In order to test how the weighted low-rank ap-proximation impacts the result, we propose another baseline called CBEP-SVD, whichis the same as our algorithm except that we eliminate the whole weighting schemementioned in section 5.3.3 and just use a basic SVD model.

5.5.4 Results and Analysis

Comparison with Baselines

In this section, we compare the two solutions to all baselines in terms of precision, recalland f-score at both the mail and mailing list level.

82

precsion recall f-score0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45SMLAMLOMLFSMLCBEP-SVDCBEP-A1CBEP-A2

Figure 5.3: Baseline Comparison at Mailing List Level


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45SMLAMLOMLFSMLCBEP-SVDCBEP-A1CBEP-A2

Figure 5.4: Baseline Comparison at Mail Level

As shown in figure 5.4 and 5.3, CBEP-A1 and CBEP-A2 significantly outperform all thebaselines on all the evaluation metrics. SML performs worst, which makes sense since it onlyconsiders information from the target domain and disregards all the additional informationfrom other mailing lists. Moreover, new users (users who newly enrolled in a mailing list)and cold mailing lists (mailing list with a limited number of emails) are common, and SMLcannot handle these cold start problems. AML performs significantly better than SMLsince it includes the information from all the other mailing lists. However, considering allthe mailing lists deteriorates the prioritization precision due to the noise introduced by theunrelated mailing lists.

OML and FSML choose the set of source domains based on the overlap of users andfeedback pattern similarity. They both outperform SML by incorporating information fromsimilar domains, and FSML performs better than OML. By selecting a limited number ofsource domains, FSML can already achieve similar performance as AML. The performanceof CBEP-SVD is not good compared with CBEP-A1 and CBEP-A2, which further confirmsour weighting scheme is useful. Both of our methods CBEP-A1 and CBEP-A2 perform sig-

83


0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

CBEP-All

CBEP-No-Overlap

CBEP-No-Feedback-Similarity

CBEP-No-Coverage

Figure 5.5: Optimazation Criteria Analysis

nificantly better than the best baseline FSML. Compared with FSML, CBEP-A1 improvesthe f-score by 40% at mailing list level and by 12% at mail level. The improvement comesfrom three aspects. First of all, CBEP can choose a set of source domains based on multiplecriteria. Secondly, it is able to dynamically determine the number of source domains to beselected. Last but not least, CBEP uses a weighted matrix factorization method and givesdifferent weights to different source domains.

Criteria for Source Domain Selection

As mentioned in section 5.4, we consider three different factors to select the source domains,namely, the overlap of users, feedback pattern similarity, and coverage of users. In thissection, we remove our three criteria considered in our objective function 5.11 one at a timeby eliminating the corresponding term from 5.11 and optimizing the objective function basedon the remaining terms. In this way, we evaluate how each criterion individually affects theprediction precision. Due to the page limit, we only show the mailing list level results forCBEP-A1 and note that the other results show similar trends.

From the results displayed in Figure 5.5, we can see all these three factors are usefulin boosting the email prioritization performance. The feedback pattern similarity criterionshows more importance than the overlap of users criterion, which is in accordance with ourobservation of the results of OML and FSML in figure 5.3. The coverage of users criterionturns out to be the most important one by allowing the set of source domains to cover moreusers in the target domain.

Number of Selected Source Domains

One of the advantages of the CBEP framework is its ability to dynamically determine thenumber of source domains to be selected. That is to say for each target domain, we canget the size of source domain set. In this section, we take our first approximation method

84

λpen

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

PrecisionRecallFscorePercent

Figure 5.6: Number of Selected Source Domains and Prediction Performance vs. λpen

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

precsion

recall

f-score

precsion

recall

f-scoreCBEP-A1SML

AllMailingLists

Top50

MailingLists

Figure 5.7: All Domains vs. Hot Domains

CBEP-A1 as an example to analyze how the parameter settings affect the number of selectedsource domains and the precision, recall, and f-score of the prioritization algorithm.

In CBEP-A1, the penalty term weight λpen affects the number of the source domainsto be selected. The larger λpen is, the fewer source domains are selected. In figure 5.6, weshow how the average percentage of source domains selected (denoted as the percent infigure 5.6), precision, recall, and f-score vary for different settings of λpen. It is worth notingthat choosing too many or too few source domains can both jeopardize the predictionperformance.

Performance on Cold Domains

Cold mailing lists with limited number of items, users or user feedback are common inreal systems. One of the advantages of cross-domain recommendation is its ability to solvethis kind of cold start problem by incorporating information from other domains. This canalready be verified by the results in figures 5.4 and 5.3, in which CBEP-A1 and CBEP-A2

85

gain more improvement at the mailing list level than at the mail level since mailing list levelimplicitly gives more weight to cold mailing lists.

We further verify the good performance of CBEP on cold mailing lists by evaluatingCBEP-A1 and SML on the top 50 hottest (in terms of the number of user feedback) mailinglists and comparing these results to the results on all the mailing lists. See Figure 5.7. Weexpect precision, recall, and f-score should be higher on the top 50 dataset, since these mail-ing lists have more abundant training data. Surprisingly, CBEP-A1 performs even betteron all mailing lists. We attribute this to two reasons. On one hand, CBEP indeed performswell on cold mailing lists by incorporating information from other source mailing lists. Onthe other hand, for some of the most popular mailing lists, there may already be abundantdata so that the marginal utility of the extra information introduced by CBEP diminishes.

5.6 Conclusion

In this chapter, we introduce the problem of personalized broadcast email prioritizationconsidering large numbers of mailing lists. We formulate the problem as a cross domainrecommendation problem and propose a novel cross domain recommendation frameworkCBEP to solve it.

We propose the first cross domain framework that can automatically select the setof source domains from a large number of candidate domains. The domain selection ap-proach contains an optimization model that considers multiple selection criteria includingthe overlap of users, feedback pattern similarity, and coverage of users. A weighted low-rankapproximation method is proposed to make predictions based on information from both thetarget domain and the selected source domains.

Comprehensive experiments are conducted on a real-life dataset from Samsung Electron-ics. The results show that our method CBEP outperforms all the baselines and the variousdomain selection criteria considered indeed all help to improve the prediction performance.

86

Chapter 6

Conclusion

6.1 Summary

This thesis investigates the task of broadcast message prioritization which is of great sig-nificance due to the serious situation of the broadcast message overload. In particular, thethesis focuses on prioritization related tasks on two popular types of broadcast messages,tweets and broadcast emails. Three related research questions, including mention recommen-dation considering tweet prioritization, broadcast email prioritization with active learningand broadcast email prioritization with cross domain recommendation, are studied. Thestudies of the three research questions are summarized as follows:

• In chapter 3, we propose the first mention recommendation framework which exploitsthis special mechanism of micro-blogging systems to expand the diffusion of tweets. Itleads to a series of following up works in this research area. The mention recommenda-tion is formulated as a learning to rank problem and multiple aspects, including tweetprioritization (user’s interest in tweets) and user influence, are taken into considera-tion. This is also the first work that proposes a content-dependent user relationshipmodel based on users’ retweet interactions. The proposed model is evaluated on areal-life micro-blogging system dataset and comprehensive experiments not only showthe superiority of our proposed method but also demonstrate the effectiveness of allthe new features considered in our model.

• In chapter 4, we propose the first in depth discussion on broadcast email prioritizationand propose the first email prioritization framework taking collaborative filtering intoconsideration. To overcome the complete cold start challenge of broadcast emails, anovel active learning framework is proposed and it is the first collaborative filteringbased active learning framework tailored for one-class implicit feedback and timesensitive feedback. The proposed model is evaluated on a real-life email dataset andoutperforms all the comparison baselines.

87

• In chapter 5, we propose the first broadcast email prioritization framework that takeslarge numbers of mailing lists into consideration. The problem is formulated as a cross-domain recommendation problem and we propose the first cross domain frameworkthat solves the problem of how to automatically select the set of source domainsfrom a large number of candidate domains. Evaluated on a real-life email dataset, ourproposed approach outperforms the baselines with the help of our automatic sourcedomain selection strategy.

6.2 Future Directions

There exist lots of promising future directions in the related research areas, like the broad-cast message prioritization , active learning and cross domain recommendation. In thissection, we list some of them.

6.2.1 Combining Different Methods of Broadcast Message Prioritization

From chapter 3 to chapter 5, we introduced multiple types of broadcast message prioriti-zation method, including content-based method, collaborative filtering with active learningand collaborative filtering with cross domain recommendation. One can easily combinethem together. For instance, we can get two prioritization prediction scores based on thetwo methods from chapter 3 and chapter 4. One naive way to combine them is to builda Logistic Regression classifier to predict the email priority label by using the two pri-oritization prediction scores as input features. More sophisticated methods, like the onesconsidering both content-based features and collaborative filtering in a unified model, canalso be studied and may be a promising future direction.

6.2.2 Active Learning for Personalized News Feed Ranking Problem

News feed is a popular application nowadays. For instance, status from Facebook, imagesfrom Instagram and cards from Google Now all belong to news feed which is influencingthe life of billions of users every day. Active learning strategies proposed for our broadcastmessage prioritization problem can also be used to help the personalized news feed rankingtask by first sending the feed to a small set of users to collect feedback and then predictthe preference of the majority of users based on the collected feedback. News feed is usuallyrelated to social networks, which proposes interesting new challenges for active learning.For instance, previous studies[61] show social network friends may share similar preferencesand thus it may make sense to sample users who are not friends to each other to increasethe diversity of active learning sampling. Active learning for personalized news feed rankingconsidering social network information is a future research area which we are currentlyworking on.

88

6.2.3 Multi-heuristic Active Learning for Broadcast Email Prioritization

In chapter 4, the active learning framework for broadcast email prioritization only considersa single-heuristic, sampling users who can provide positive feedback in time. However, thereexist multiple additional heuristics to further improve the performance of our proposedmodel. For instance, the active learning strategy should sample users who are representa-tive of the remaining users and it should also make sure the sampled users to as diverseas possible. How to propose a multi-heuristic active learning strategy to consider all theaforementioned heuristics is a very interesting problem and this is also a future researcharea which we are currently working on.

6.2.4 Consider Additional Features to Better Capture Semantic Similar-ity of Domains

In chapter 5, the proposed source domain selection strategy only considers the implicitfeedback provided by the users. The semantic similarity between the selected source domainsand the target domain can be indirectly inferred from criteria like overlapping users and therating pattern similarity. However, there exist other features that can be exploited to bettercapture the semantic similarity of the domains, for instance, the text features associatedwith the broadcast emails. How to incorporate these additional features into the currentdomain selection strategy can be a potential future research topic.

6.2.5 Broadcast Message Prioritization with Deep Neural Network

Deep neural network based models are becoming more and more popular in the recommen-dation field[22, 127, 7]. Deep neural network models are not only achieving better perfor-mance in classic recommendation tasks[22, 127] but also making breakthroughs in relatedfields like transfer learning and cross domain recommendation[21, 37]. In chapter 4 andchapter 5, we use the traditional matrix factorization models to predict the prioritizationscores for broadcast emails but deep neural network based models may help us to achieveeven better results. It is a promising direction to design new deep neural network basedmodels to work on our broadcast message prioritization problems.

89

Bibliography

[1] Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. Analyzing user modeling ontwitter for personalized news recommendations. In International Conference on UserModeling, Adaptation, and Personalization, pages 1–12. Springer, 2011.

[2] Fabian Abel, Eelco Herder, Geert-Jan Houben, Nicola Henze, and Daniel Krause.Cross-system user modeling and personalization on the social web. User Modelingand User-Adapted Interaction, pages 1–41, 2013.

[3] Douglas Aberdeen, Ondrej Pacovsky, and Andrew Slater. The learning behind gmailpriority inbox. In LCCC: NIPS 2010 Workshop, 2010.

[4] Aweber. What are broadcast messages? Aweber Knowledge Base, 2018.

[5] Eytan Bakshy, Jake M Hofman, Winter A Mason, and Duncan J Watts. Everyone’san influencer: quantifying influence on twitter. In Proceedings of the fourth ACMinternational conference on Web search and data mining, pages 65–74. ACM, 2011.

[6] Eytan Bakshy, Jake M Hofman, Winter A Mason, and Duncan J Watts. Identifyinginfluencers on twitter. In Fourth ACM International Conference on Web Seach andData Mining (WSDM), 2011.

[7] Yoshua Bengio. Practical recommendations for gradient-based training of deep archi-tectures. In Neural networks: Tricks of the trade, pages 437–478. Springer, 2012.

[8] Shlomo Berkovsky, Dan Goldwasser, Tsvi Kuflik, and Francesco Ricci. Identify-ing inter-domain similarities through content-based analysis of hierarchical web-directories. In ECAI, pages 789–790, 2006.

[9] Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. Cross-domain mediation incollaborative filtering. In International Conference on User Modeling, pages 355–359.Springer, 2007.

[10] Shlomo Berkovsky, Tsvi Kuflik, and Francesco Ricci. Distributed collaborative fil-tering with domain specialization. In Proceedings of the 2007 ACM conference onRecommender systems, pages 33–40. ACM, 2007.

[11] Iván Cantador, Ignacio Fernández-Tobías, and Alejandro Bellogín. Relating personal-ity types with user preferences in multiple entertainment domains. In CEUR WorkshopProceedings. Shlomo Berkovsky, 2013.

90

[12] Bin Cao, Nathan N Liu, and Qiang Yang. Transfer learning for collective link pre-diction in multiple heterogenous domains. In Proceedings of the 27th internationalconference on machine learning (ICML-10), pages 159–166, 2010.

[13] Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and P Krishna Gummadi.Measuring user influence in twitter: The million follower fallacy. Icwsm, 10(10-17):30,2010.

[14] Rita Chattopadhyay, Zheng Wang, Wei Fan, Ian Davidson, Sethuraman Pan-chanathan, and Jieping Ye. Batch mode active sampling based on marginal probabilitydistribution matching. In SIGKDD, pages 741–749. ACM, 2012.

[15] Rita Chattopadhyay, Zheng Wang, Wei Fan, Ian Davidson, Sethuraman Pan-chanathan, and Jieping Ye. Batch mode active sampling based on marginal prob-ability distribution matching. ACM Transactions on Knowledge Discovery from Data(TKDD), 7(3):13, 2013.

[16] Jilin Chen, Rowan Nairn, Les Nelson, Michael Bernstein, and Ed Chi. Short and tweet:experiments on recommending content from information streams. In Proceedings ofthe SIGCHI Conference on Human Factors in Computing Systems, pages 1185–1194.ACM, 2010.

[17] Kailong Chen, Tianqi Chen, Guoqing Zheng, Ou Jin, Enpeng Yao, and Yong Yu.Collaborative personalized tweet recommendation. In Proceedings of the 35th interna-tional ACM SIGIR conference on Research and development in information retrieval,pages 661–670. ACM, 2012.

[18] Ming-Syan Chen, Jiawei Han, and Philip S. Yu. Data mining: an overview froma database perspective. IEEE Transactions on Knowledge and data Engineering,8(6):866–883, 1996.

[19] M Chui, J Manyika, J Bughin, R Dobbs, C Roxburgh, H Sarrazin, G Sands, andM Westergren. The social economy: Unlocking value and productivity through socialtechnologies. McKinsey Global Institute, (July):1–18, 2012.

[20] Ronald Chung, David Sundaram, and Ananth Srinivasan. Integrated personal recom-mender systems. In Proceedings of the ninth international conference on Electroniccommerce, pages 65–74. ACM, 2007.

[21] Ronan Collobert and Jason Weston. A unified architecture for natural languageprocessing: Deep neural networks with multitask learning. In Proceedings of the 25thinternational conference on Machine learning, pages 160–167. ACM, 2008.

[22] Paul Covington, Jay Adams, and Emre Sargin. Deep neural networks for youtuberecommendations. In Proceedings of the 10th ACM Conference on RecommenderSystems, pages 191–198. ACM, 2016.

[23] Paolo Cremonesi and Massimo Quadrana. Cross-domain recommendations withoutoverlapping data: myth or reality? In Proceedings of the 8th ACM Conference onRecommender systems, pages 297–300. ACM, 2014.

91

[24] Paolo Cremonesi and Massimo Quadrana. Cross-domain recommendations withoutoverlapping data: myth or reality? In Recsys, pages 297–300. ACM, 2014.

[25] Paolo Cremonesi, Antonio Tripodi, and Roberto Turrin. Cross-domain recommendersystems. In Data Mining Workshops (ICDMW), 2011 IEEE 11th International Con-ference on, pages 496–503. Ieee, 2011.

[26] Laura a. Dabbish, Robert E. Kraut, Susan Fussell, and Sara Kiesler. UnderstandingEmail Use: Predicting Action on a Message. Proceedings of the 2005 Conference onHuman Factors in Computing Systems (CHI), pages 691–700, 2005.

[27] Dotan Di Castro, Zohar Karnin, Liane Lewin-Eytan, and Yoelle Maarek. You’ve gotmail, and here is what you could do with it!: Analyzing and predicting actions onemail messages. In Proceedings of the Ninth ACM International Conference on WebSearch and Data Mining, pages 307–316. ACM, 2016.

[28] Yanlei Diao, Hongjun Lu, and Dekai Wu. A comparative study of classification basedpersonal e-mail filtering. Knowledge Discovery and Data Mining. Current issues andnew applications, pages 408–419, 2000.

[29] Fernando Diaz, Donald Metzler, and Sihem Amer-Yahia. Relevance and ranking inonline dating systems. In Proceedings of the 33rd international ACM SIGIR conferenceon Research and development in information retrieval, pages 66–73. ACM, 2010.

[30] Mark Dredze, Tova Brooks, Josh Carroll, Joshua Magarick, John Blitzer, and Fer-nando Pereira. Intelligent email: reply and attachment prediction. In Proceedings ofthe 13th international conference on Intelligent user interfaces, pages 321–324. ACM,2008.

[31] Yajuan Duan, Long Jiang, Tao Qin, Ming Zhou, and Heung-Yeung Shum. An em-pirical study on learning to rank of tweets. In Proceedings of the 23rd InternationalConference on Computational Linguistics, pages 295–303. Association for Computa-tional Linguistics, 2010.

[32] Mehdi Elahi, Matthias Braunhofer, Francesco Ricci, and Marko Tkalcic. Personality-based active learning for collaborative filtering recommender systems. In Proceeding ofthe XIIIth International Conference on AI*IA 2013: Advances in Artificial Intelligence- Volume 8249, pages 360–371, New York, NY, USA, 2013. Springer-Verlag New York,Inc.

[33] Mehdi Elahi, Valdemaras Repsys, and Francesco Ricci. Rating elicitation strategiesfor collaborative filtering. In International Conference on Electronic Commerce andWeb Technologies, pages 160–171. Springer, 2011.

[34] Mehdi Elahi, Francesco Ricci, and Neil Rubens. Active learning strategies for ratingelicitation in collaborative filtering: A system-wide perspective. ACM Transactionson Intelligent Systems and Technology (TIST), 5(1):13, 2013.

[35] Mehdi Elahi, Francesco Ricci, and Neil Rubens. Active learning strategies for ratingelicitation in collaborative filtering: A system-wide perspective. ACM Trans. Intell.Syst. Technol., 5(1):13:1–13:33, January 2014.

92

[36] Mehdi Elahi, Francesco Ricci, and Neil Rubens. A survey of active learning in collab-orative filtering recommender systems. Computer Science Review, 20:29–50, 2016.

[37] Ali Mamdouh Elkahky, Yang Song, and Xiaodong He. A multi-view deep learning ap-proach for cross domain user modeling in recommendation systems. In Proceedings ofthe 24th International Conference on World Wide Web, pages 278–288. InternationalWorld Wide Web Conferences Steering Committee, 2015.

[38] Ignacio Fernández-Tobías, Iván Cantador, Marius Kaminskas, and Francesco Ricci.A generic semantic-based framework for cross-domain recommendation. In Proceed-ings of the 2Nd International Workshop on Information Heterogeneity and Fusion inRecommender Systems, HetRec ’11, pages 25–32, New York, NY, USA, 2011. ACM.

[39] Ignacio Fernández-Tobías, Iván Cantador, Marius Kaminskas, and Francesco Ricci.Cross-domain recommender systems: A survey of the state of the art. In SpanishConference on Information Retrieval, 2012.

[40] Zeno Gantner, Lucas Drumond, Christoph Freudenthaler, Steffen Rendle, and LarsSchmidt-Thieme. Learning attribute-to-feature mappings for cold-start recommenda-tions. ICDM, pages 176–185, 2010.

[41] Nadav Golbandi, Yehuda Koren, and Ronny Lempel. On bootstrapping recommendersystems. In Proceedings of the 19th ACM international conference on Information andknowledge management, pages 1805–1808. ACM, 2010.

[42] Nadav Golbandi, Yehuda Koren, and Ronny Lempel. On bootstrapping recommendersystems. In CIKM, pages 1805–1808. ACM, 2010.

[43] Nadav Golbandi, Yehuda Koren, and Ronny Lempel. Adaptive bootstrapping of rec-ommender systems using decision trees. In WSDM, pages 595–604. ACM, 2011.

[44] Jennifer Golbeck and James A Hendler. Reputation network analysis for email filter-ing. In CEAS, 2004.

[45] Manuel Gomez-Rodriguez, Krishna P Gummadi, and Bernhard Schoelkopf. Quanti-fying information overload in social media and its impact on social contagions. InICWSM, pages 170–179, 2014.

[46] Yeyun Gong, Qi Zhang, Xuyang Sun, and Xuanjing Huang. Who will you@? In Pro-ceedings of the 24th ACM International on Conference on Information and KnowledgeManagement, pages 533–542. ACM, 2015.

[47] Mihajlo Grbovic, Guy Halawi, Zohar Karnin, and Yoelle Maarek. How many folders doyou really need?: Classifying email into a handful of categories. In Proceedings of the23rd ACM International Conference on Conference on Information and KnowledgeManagement, pages 869–878. ACM, 2014.

[48] Guibing Guo, Jie Zhang, Zhu Sun, and Neil Yorke-Smith. Librec: A java library forrecommender systems. In UMAP, 2015.

[49] Yuhong Guo. Active instance sampling via matrix partition. In Advances in NeuralInformation Processing Systems, pages 802–810, 2010.

93

[50] Ido Guy, Inbal Ronen, and Eric Wilcox. Do you know?: recommending people toinvite into your social network. In Proceedings of the 14th international conferenceon Intelligent user interfaces, pages 77–86. ACM, 2009.

[51] David J Hand. Principles of data mining. Drug safety, 30(7):621–622, 2007.

[52] Abhay S. Harpale and Yiming Yang. Personalized active learning for collaborativefiltering. In Proceedings of the 31st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, SIGIR ’08, pages 91–98, NewYork, NY, USA, 2008. ACM.

[53] Thomas Hofmann. Collaborative filtering via gaussian probabilistic latent semanticanalysis. In SIGIR, pages 259–266. ACM, 2003.

[54] Steven CH Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Batch mode active learn-ing and its application to medical image classification. In Proceedings of the 23rdinternational conference on Machine learning, pages 417–424. ACM, 2006.

[55] Liangjie Hong and Brian D Davison. Empirical study of topic modeling in twitter. InProceedings of the first workshop on social media analytics, pages 80–88. ACM, 2010.

[56] Eric Horvitz, Andy Jacobs, and David Hovel. Attention-sensitive alerting. In Pro-ceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pages305–313. Morgan Kaufmann Publishers Inc., 1999.

[57] Liang Hu, Jian Cao, Guandong Xu, Longbing Cao, Zhiping Gu, and Can Zhu. Per-sonalized recommendation via cross-domain triadic factorization. In Proceedings ofthe 22nd international conference on World Wide Web, pages 595–606. ACM, 2013.

[58] Yifan Hu, Yehuda Koren, and Chris Volinsky. Collaborative filtering for implicitfeedback datasets. In ICDM, pages 263–272. Ieee, 2008.

[59] Z Huang. Selectively acquiring ratings for product recommendation. ICEC, pages379–388, 2007.

[60] Thomas Jackson, Ray Dawson, and Darren Wilson. Case study: evaluating the effectof email interruptions within the workplace. EASE, Keele, UK, (April):3–7, 2002.

[61] Mohsen Jamali and Martin Ester. A matrix factorization technique with trust prop-agation for recommendation in social networks. In Proceedings of the fourth ACMconference on Recommender systems, pages 135–142. ACM, 2010.

[62] Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. Why we twitter: under-standing microblogging usage and communities. In Proceedings of the 9th WebKDDand 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pages56–65. ACM, 2007.

[63] Rong Jin and Luo Si. A bayesian approach toward active learning for collaborative fil-tering. In Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence,UAI ’04, pages 278–285, Arlington, Virginia, United States, 2004. AUAI Press.

94

[64] Rong Jin and Luo Si. A bayesian approach toward active learning for collaborativefiltering. In Proceedings of the 20th conference on Uncertainty in artificial intelligence,pages 278–285. AUAI Press, 2004.

[65] Lisa Johansen, Michael Rowell, Kevin RB Butler, and Patrick Drew McDaniel. Emailcommunities of interest. In CEAS, 2007.

[66] Marius Kaminskas, Ignacio Fernández-Tobías, Iván Cantador, and Francesco Ricci.Ontology-based identification of music for places. In Information and CommunicationTechnologies in Tourism 2013, pages 436–447. Springer, 2013.

[67] R. Karimi, C. Freudenthaler, A. Nanopoulos, and L. Schmidt-Thieme. Non-myopicactive learning for recommender systems based on matrix factorization. In 2011 IEEEInternational Conference on Information Reuse Integration, pages 299–303, Aug 2011.

[68] Rasoul Karimi, Christoph Freudenthaler, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Non-myopic active learning for recommender systems based on Matrix Fac-torization. IEEE IRI, pages 299–303, 2011.

[69] Rasoul Karimi, Christoph Freudenthaler, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Towards optimal active learning for matrix factorization in recommendersystems. ICTAI, pages 1069–1076, 2011.

[70] Rasoul Karimi, Christoph Freudenthaler, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Exploiting the characteristics of matrix factorization for active learning inrecommender systems. RecSys, page 317, 2012.

[71] Bryan Klimt and Yiming Yang. The enron corpus: A new dataset for email classifica-tion research. In European Conference on Machine Learning, pages 217–226. Springer,2004.

[72] Yehuda Koren and Robert Bell. Advances in collaborative filtering. In Recommendersystems handbook, pages 77–118. Springer, 2015.

[73] Yehuda Koren, Edo Liberty, Yoelle Maarek, and Roman Sandler. Automatically tag-ging email by leveraging other users’ folders. Proceedings of the 17th ACM SIGKDDinternational conference on Knowledge discovery and data mining - KDD ’11, pages913–921, 2011.

[74] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, asocial network or a news media? In Proceedings of the 19th international conferenceon World wide web, pages 591–600. ACM, 2010.

[75] Su Mon Kywe, Tuan-Anh Hoang, Ee-Peng Lim, and Feida Zhu. On recommendinghashtags in twitter networks. In International Conference on Social Informatics, pages337–350. Springer, 2012.

[76] Ho-Yu Lam and Dit-Yan Yeung. A learning approach to spam detection based onsocial networks. In 4th Conference on Email and Anti-Spam (CEAS), page 35, 2007.

[77] C-H Lee, Y-H Kim, and P-K Rhee. Web personalization expert with combiningcollaborative filtering and association rule mining technique. Expert Systems withApplications, 21(3):131–137, 2001.

95

[78] Bin Li. Cross-domain collaborative filtering: A brief survey. In Tools with ArtificialIntelligence (ICTAI), 2011 23rd IEEE International Conference on, pages 1085–1086.IEEE, 2011.

[79] Bin Li, Qiang Yang, and Xiangyang Xue. Can movies and books collaborate? cross-domain collaborative filtering for sparsity reduction. In IJCAI, volume 9, pages 2052–2057, 2009.

[80] Quanle Li, Dandan Song, Lejian Liao, and Li Liu. Personalized mention probabilisticranking–recommendation on mention behavior of heterogeneous social network. In In-ternational Conference on Web-Age Information Management, pages 41–52. Springer,2015.

[81] Nathan N. Liu, Xiangrui Meng, Chao Liu, and Qiang Yang. Wisdom of the better few:Cold start recommendation via representative based rating elicitation. In Proceedingsof the Fifth ACM Conference on Recommender Systems, RecSys ’11, pages 37–44,New York, NY, USA, 2011. ACM.

[82] Steve Lohr. Is Information Overload a $650 Billion Drag on the Economy? . NewYork Times, 2007.

[83] Antonis Loizou. How to recommend music to film buffs: enabling the provision ofrecommendations from multiple domains. PhD thesis, University of Southampton,2009.

[84] Xin Luo, Yunni Xia, and Qingsheng Zhu. Incremental collaborative filtering recom-mender based on regularized matrix factorization. Knowledge-Based Systems, 27:271–280, 2012.

[85] Carlos Eduardo Mello, Marie-Aude Aufaure, and Geraldo Zimbrao. Active learningdriven by rating impact analysis. In Proceedings of the Fourth ACM Conference onRecommender Systems, RecSys ’10, pages 341–344, New York, NY, USA, 2010. ACM.

[86] Prem Melville, Raymond J Mooney, and Ramadass Nagarajan. Content-boosted col-laborative filtering for improved recommendations. Aaai/iaai, 23:187–192, 2002.

[87] Arnd Kohrs-Bernard Merialdo. Improving collaborative filtering for new-users bysmart object selection. In Proceedings of International Conference on Media Features(ICMF), May, 2001.

[88] Orly Moreno, Bracha Shapira, Lior Rokach, and Guy Shani. Talmud: transfer learningfor multiple domains. In Proceedings of the 21st ACM international conference onInformation and knowledge management, pages 425–434. ACM, 2012.

[89] Carman Neustaedter, AJ Bernheim Brush, Marc A Smith, and Danyel Fisher. Thesocial network and relationship finder: Social sorting for email triage. In CEAS, 2005.

[90] Rong Pan, Yunhong Zhou, Bin Cao, Nathan N Liu, Rajan Lukose, Martin Scholz,and Qiang Yang. One-class collaborative filtering. In ICDM08, pages 502–511. IEEE,2008.

96

[91] Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE Transactionson knowledge and data engineering, 22(10):1345–1359, 2010.

[92] Weike Pan, Nathan N Liu, Evan W Xiang, and Qiang Yang. Transfer learning topredict missing ratings via heterogeneous user feedbacks. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 2318, 2011.

[93] Weike Pan, Evan Wei Xiang, Nathan Nan Liu, and Qiang Yang. Transfer learningin collaborative filtering for sparsity reduction. In AAAI, volume 10, pages 230–235,2010.

[94] Cvetkovic Peersman, Srba Cvetkovic, Paul Griffiths, and Hugh Spear. The globalsystem for mobile communications short message service. IEEE Personal Communi-cations, 7(3):15–23, 2000.

[95] Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Sumanth Bandi, Anand Ku-mar, Jean-Loup Guillaume, and Bivas Mitra. On the role of mentions on tweet virality.In Data Science and Advanced Analytics (DSAA), 2016 IEEE International Confer-ence on, pages 204–213. IEEE, 2016.

[96] Michael Prince. Does active learning work? a review of the research. Journal ofengineering education, 93(3):223–231, 2004.

[97] Sara Radicati, Principal Analyst, and Justin Levenstein. Email Statistics Report ,2013-2017. 44(0):2013–2017, 2013.

[98] Al Mamunur Rashid, Istvan Albert, Dan Cosley, Shyong K. Lam, Sean M. McNee,Joseph A. Konstan, and John Riedl. Getting to know you: Learning new user prefer-ences in recommender systems. In Proceedings of the 7th International Conference onIntelligent User Interfaces, IUI ’02, pages 127–134, New York, NY, USA, 2002. ACM.

[99] Al Mamunur Rashid, George Karypis, and John Riedl. Learning preferences of newusers in recommender systems: An information theoretic approach. SIGKDD Explor.Newsl., 10(2):90–100, December 2008.

[100] Francesco Ricci, Lior Rokach, and Bracha Shapira. Introduction to recommendersystems handbook. Springer, 2011.

[101] Gordon Rios and Hongyuan Zha. Exploring support vector machines and randomforests for spam detection. In CEAS, 2004.

[102] Nicholas Roy and Andrew McCallum. Toward optimal active learning through montecarlo estimation of error reduction. ICML, pages 441–448, 2001.

[103] Neil Rubens and Masashi Sugiyama. Influence-based collaborative active learning.In Proceedings of the 2007 ACM Conference on Recommender Systems, RecSys ’07,pages 145–148, New York, NY, USA, 2007. ACM.

[104] Neil Rubens and Masashi Sugiyama. Influence-based collaborative active learning. InRecSys, pages 145–148. ACM, 2007.

[105] Neil Rubens, Ryota Tomioka, and Masashi Sugiyama. Output divergence criterionfor active learning in collaborative settings. IPSJ, 2(3):87–96, 2009.

97

[106] Shaghayegh Sahebi and Peter Brusilovsky. It takes two to tango: An exploration ofdomain pairs for cross-domain collaborative filtering. In Proceedings of the 9th ACMConference on Recommender Systems, RecSys ’15, pages 131–138, New York, NY,USA, 2015. ACM.

[107] Shaghayegh Sahebi and Peter Brusilovsky. It takes two to tango: An exploration ofdomain pairs for cross-domain collaborative filtering. In Recsys, pages 131–138. ACM,2015.

[108] M Sappelli, S Verberne, and W Kraaij. Combining textual and non-textual featuresfor e-mail importance estimation. In Proceedings of the 25th Benelux Conference onArtificial Intelligence, pages 168–174, 2013.

[109] Maya Sappelli, Suzan Verberne, and Wessel Kraaij. Combining textual and non-textual features for e-mail importance estimation. In BNAIC 2013: Proceedingsof the 25th Benelux Conference on Artificial Intelligence, Delft, The Netherlands,November 7-8, 2013. Delft University of Technology (TU Delft); under the auspices ofthe Benelux Association for Artificial Intelligence (BNVKI) and the Dutch ResearchSchool for Information and Knowledge Systems (SIKS), 2013.

[110] Minoru Sasaki and Hiroyuki Shinnou. Spam detection using text clustering. In 2005International Conference on Cyberworlds (CW’05), pages 4–pp. IEEE, 2005.

[111] Burr Settles. Active learning literature survey. University of Wisconsin, Madison,52(55-66):11, 2010.

[112] Bracha Shapira, Lior Rokach, and Shirley Freilikhman. Facebook single and cross do-main data for recommendation systems. User Modeling and User-Adapted Interaction,pages 1–37, 2013.

[113] Alex J Smola and Bernhard Schölkopf. A tutorial on support vector regression. Statis-tics and computing, 14(3):199–222, 2004.

[114] Kate Starbird, Grace Muzny, and Leysia Palen. Learning from the crowd: collaborativefiltering techniques for identifying on-the-ground twitterers during mass disruptions.In Proceedings of 9th International Conference on Information Systems for CrisisResponse and Management, ISCRAM, 2012.

[115] Avaré Stewart, Ernesto Diaz-Aviles, Wolfgang Nejdl, Leandro Balby Marinho, Alexan-dros Nanopoulos, and Lars Schmidt-Thieme. Cross-tagging for personalized opensocial networking. In Proceedings of the 20th ACM Conference on Hypertext andHypermedia, HT ’09, pages 271–278, New York, NY, USA, 2009. ACM.

[116] Salvatore J Stolfo, Shlomo Hershkop, Ke Wang, Olivier Nimeskern, and Chia-Wei Hu.A behavior-based approach to securing email systems. In International Workshop onMathematical Methods, Models, and Architectures for Computer Network Security,pages 57–81. Springer, 2003.

[117] Dougal J Sutherland, Barnabas Poczos, and Jeff Schneider. Active Learning andSearch on Low-Rank Matrices. SIGKDD, pages 212–220, 2013.

98

[118] Guanting Tang, Jian Pei, and Wo Shun Luk. Email mining: tasks, common techniques,and tools. Knowledge and Information Systems, pages 1–31, 2013.

[119] Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. Cross-domain collaboration recom-mendation. In Proceedings of the 18th ACM SIGKDD international conference onKnowledge discovery and data mining, pages 1285–1293. ACM, 2012.

[120] Liyang Tang, Zhiwei Ni, Hui Xiong, and Hengshu Zhu. Locating targets throughmention in twitter. World Wide Web, 18(4):1019–1049, 2015.

[121] Bradley Taylor. Sender reputation in a large webmail service. In CEAS, 2006.

[122] Ivan R Teixeira, Francisco de AT de Carvalho, Geber L Ramalho, and Vincent Corru-ble. Activecp: A method for speeding up user preferences acquisition in collaborativefiltering systems. In Brazilian Symposium on Artificial Intelligence, pages 237–247.Springer, 2002.

[123] Nava Tintarev and Judith Masthoff. Active Learning in Recommender Systems, vol-ume 54. Springer US, 2011.

[124] Amit Tiroshi, Shlomo Berkovsky, Mohamed Ali Kaafar, Terence Chen, and Tsvi Ku-flik. Cross social networks interests predictions based ongraph features. In Proceedingsof the 7th ACM Conference on Recommender Systems, RecSys ’13, pages 319–322,New York, NY, USA, 2013. ACM.

[125] Amit Tiroshi and Tsvi Kuflik. Domain ranking for cross domain collaborative filter-ing. In International Conference on User Modeling, Adaptation, and Personalization,pages 328–333. Springer, 2012.

[126] Ibrahim Uysal and W Bruce Croft. User oriented tweet ranking: a filtering approach tomicroblogs. In Proceedings of the 20th ACM international conference on Informationand knowledge management, pages 2261–2264. ACM, 2011.

[127] Aaron Van den Oord, Sander Dieleman, and Benjamin Schrauwen. Deep content-based music recommendation. In Advances in neural information processing systems,pages 2643–2651, 2013.

[128] João Vinagre, Alípio Mário Jorge, and João Gama. Fast incremental matrix factoriza-tion for recommendation with positive-only feedback. In User Modeling, Adaptation,and Personalization, pages 459–470. Springer, 2014.

[129] Beidou Wang, Martin Ester, Jiajun Bu, Yu Zhu, Ziyu Guan, and Deng Cai. Which toview: Personalized prioritization for broadcast emails. In WWW16, pages 1181–1190,2016.

[130] Beidou Wang, Martin Ester, Yikang Liao, Jiajun Bu, Yu Zhu, Ziyu Guan, and DengCai. The million domain challenge: Broadcast email prioritization by cross-domainrecommendation. In Proceedings of the 22nd ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, pages 1895–1904. ACM, 2016.

99

[131] Beidou Wang, Can Wang, Jiajun Bu, Chun Chen, Wei Vivian Zhang, Deng Cai, andXiaofei He. Whom to mention: Expand the diffusion of tweets by @ recommendationon micro-blogging systems. In Proceedings of the 22Nd International Conference onWorld Wide Web, WWW ’13, pages 1331–1340, New York, NY, USA, 2013. ACM.

[132] Beidou Wang, Can Wang, Jiajun Bu, Chun Chen, Wei Vivian Zhang, Deng Cai, andXiaofei He. Whom to mention: expand the diffusion of tweets by@ recommendationon micro-blogging systems. In WWW, pages 1331–1340, 2013.

[133] Pinata Winoto and Tiffany Tang. If you like the devil wears prada the book, will youalso enjoy the devil wears prada the movie? a study of cross-domain recommendations.New Generation Computing, 26(3):209–225, 2008.

[134] Shaomei Wu, Jake M Hofman, Winter A Mason, and Duncan J Watts. Who sayswhat to whom on twitter. In Proceedings of the 20th international conference onWorld wide web, pages 705–714. ACM, 2011.

[135] Xindong Wu, Vipin Kumar, J Ross Quinlan, Joydeep Ghosh, Qiang Yang, HiroshiMotoda, Geoffrey J McLachlan, Angus Ng, Bing Liu, S Yu Philip, et al. Top 10algorithms in data mining. Knowledge and information systems, 14(1):1–37, 2008.

[136] Rui Yan, Mirella Lapata, and Xiaoming Li. Tweet recommendation with graph co-ranking. In Proceedings of the 50th Annual Meeting of the Association for Computa-tional Linguistics: Long Papers-Volume 1, pages 516–525. Association for Computa-tional Linguistics, 2012.

[137] Min-Chul Yang, Jung-Tae Lee, Seung-Wook Lee, and Hae-Chang Rim. Finding in-teresting posts in twitter based on retweet graph analysis. In Proceedings of the 35thinternational ACM SIGIR conference on Research and development in informationretrieval, pages 1073–1074. ACM, 2012.

[138] Shaozhi Ye and S Felix Wu. Measuring message propagation and social influenceon twitter. com. In International Conference on Social Informatics, pages 216–231.Springer, 2010.

[139] Shinjae Yoo, Yiming Yang, Frank Lin, and Il-Chul Moon. Mining Social Networks forPersonalized Email Prioritization. SIGKDD, page 967, 2009.

[140] Jianjun Yu, Yi Shen, and Zhenglu Yang. Topic-stg: Extending the session-basedtemporal graph approach for personalized tweet recommendation. In Proceedings ofthe 23rd International Conference on World Wide Web, pages 413–414. ACM, 2014.

[141] Kai Yu, Jinbo Bi, and Volker Tresp. Active learning via transductive experimentaldesign. In Proceedings of the 23rd international conference on Machine learning, pages1081–1088. ACM, 2006.

[142] Aston Zhang, Lluis Garcia-Pueyo, James B Wendt, Marc Najork, and Andrei Broder.Email category prediction. In Proceedings of the 26th International Conference onWorld Wide Web Companion, pages 495–503. International World Wide Web Confer-ences Steering Committee, 2017.

100

[143] Yu Zhang, Bin Cao, and Dit-Yan Yeung. Multi-domain collaborative filtering. arXivpreprint arXiv:1203.3535, 2012.

[144] Lili Zhao, Sinno Jialin Pan, Evan Wei Xiang, Erheng Zhong, Zhongqi Lu, and QiangYang. Active transfer learning for cross-system recommendation. In Proceedingsof the Twenty-Seventh AAAI Conference on Artificial Intelligence, AAAI’13, pages1205–1211. AAAI Press, 2013.

[145] Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan,and Xiaoming Li. Comparing twitter and traditional media using topic models. InEuropean Conference on Information Retrieval, pages 338–349. Springer, 2011.

[146] Ge Zhou, Lu Yu, Chu-Xu Zhang, Chuang Liu, Zi-Ke Zhang, and Jianlin Zhang. Anovel approach for generating personalized mention list on micro-blogging system. InData Mining Workshop (ICDMW), 2015 IEEE International Conference on, pages1368–1374. IEEE, 2015.

[147] Ke Zhou, Shuang-Hong Yang, and Hongyuan Zha. Functional matrix factorizationsfor cold-start recommendation. In Proceedings of the 34th International ACM SIGIRConference on Research and Development in Information Retrieval, SIGIR ’11, pages315–324, New York, NY, USA, 2011. ACM.

101

PersonalizedBroadcastMessage...

Documents

Transcript of PersonalizedBroadcastMessage...