IEEE TRANSACTIONS ON MOBILE COMPUTING 1 ... A Semantic-based Friend Recommendation System for Social...

1536-1233 (c) 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI10.1109/TMC.2014.2322373, IEEE Transactions on Mobile Computing

IEEE TRANSACTIONS ON MOBILE COMPUTING 1

Friendbook: A Semantic-based FriendRecommendation System for Social Networks

Zhibo Wang, Student Member, IEEE, Jilong Liao, Qing Cao, Member, IEEE,Hairong Qi,Senior Member, IEEE, and Zhi Wang, Member, IEEE,

Abstract—Existing social networking services recommend friends to users based on their social graphs, which may not be the mostappropriate to reflect a user’s preferences on friend selection in real life. In this paper, we present Friendbook, a novel semantic-basedfriend recommendation system for social networks, which recommends friends to users based on their life styles instead of socialgraphs. By taking advantage of sensor-rich smartphones, Friendbook discovers life styles of users from user-centric sensor data,measures the similarity of life styles between users, and recommends friends to users if their life styles have high similarity. Inspiredby text mining, we model a user’s daily life as life documents, from which his/her life styles are extracted by using the Latent DirichletAllocation algorithm. We further propose a similarity metric to measure the similarity of life styles between users, and calculate users’impact in terms of life styles with a friend-matching graph. Upon receiving a request, Friendbook returns a list of people with highestrecommendation scores to the query user. Finally, Friendbook integrates a feedback mechanism to further improve the recommendationaccuracy. We have implemented Friendbook on the Android-based smartphones, and evaluated its performance on both small-scaleexperiments and large-scale simulations. The results show that the recommendations accurately reflect the preferences of users inchoosing friends.

Index Terms—Friend recommendation, mobile sensing, social networks, life style

F

1 INTRODUCTION

Twenty years ago, people typically made friends withothers who live or work close to themselves, such asneighbors or colleagues. We call friends made throughthis traditional fashion as G-friends, which stands for ge-ographical location-based friends because they are influ-enced by the geographical distances between each other.With the rapid advances in social networks, servicessuch as Facebook, Twitter and Google+ have providedus revolutionary ways of making friends. According toFacebook statistics, a user has an average of 130 friends,perhaps larger than any other time in history [2].

One challenge with existing social networking servicesis how to recommend a good friend to a user. Mostof them rely on pre-existing user relationships to pickfriend candidates. For example, Facebook relies on asocial link analysis among those who already sharecommon friends and recommends symmetrical usersas potential friends. Unfortunately, this approach maynot be the most appropriate based on recent sociologyfindings [16], [27], [29], [30]. According to these studies,

• Zhibo Wang is with the Department of Electrical Engineering andComputer Science, University of Tennessee, Knoxville, USA, 37909, andthe State Key Laboratory of Industrial Control Technology, ZhejiangUniversity, Hangzhou, P.R.China, 310027.E-mail: [email protected]

• Jilong Liao, Qing Cao and Hairong Qi are with the Department ofElectrical Engineering and Computer Science, University of Tennessee,Knoxville, USA, 37909.E-mail: {jliao2, cao, hqi}@utk.edu

• Zhi Wang is with the State Key Laboratory of Industrial Control Technol-ogy, Zhejiang University, Hangzhou, P.R.China, 310027.E-mail: [email protected]

the rules to group people together include: 1) habitsor life style; 2) attitudes; 3) tastes; 4) moral standards;5) economic level; and 6) people they already know.Apparently, rule #3 and rule #6 are the mainstreamfactors considered by existing recommendation systems.Rule #1, although probably the most intuitive, is notwidely used because users’ life styles are difficult, if notimpossible, to capture through web actions. Rather, lifestyles are usually closely correlated with daily routinesand activities. Therefore, if we could gather informationon users’ daily routines and activities, we can exploitrule #1 and recommend friends to people based on theirsimilar life styles. This recommendation mechanism canbe deployed as a standalone app on smartphones oras an add-on to existing social network frameworks. Inboth cases, Friendbook can help mobile phone users findfriends either among strangers or within a certain groupas long as they share similar life styles.

In our everyday lives, we may have hundreds ofactivities, which form meaningful sequences that shapeour lives. In this paper, we use the word activity to specif-ically refer to the actions taken in the order of seconds,such as “sitting”, “walking”, or “typing”, while we usethe phrase life style to refer to higher-level abstractionsof daily lives, such as “office work” or “shopping”. Forinstance, the “shopping” life style mostly consists of the“walking” activity, but may also contain the “standing”or the “sitting” activities.

To model daily lives properly, we draw an analogybetween people’s daily lives and documents, as shown inFigure 1. Previous research on probabilistic topic modelsin text mining has treated documents as mixtures of




Document

Topics

Words

Life document (daily lives)

Life styles (e.g., office work)

Activities (e.g., walking)

Fig. 1: An analogy between word documents and people’s daily lives.

topics, and topics as mixtures of words [10]. Inspiredby this, similarly, we can treat our daily lives (or lifedocuments) as a mixture of life styles (or topics), andeach life style as a mixture of activities (or words).Observe here, essentially, we represent daily lives with“life documents”, whose semantic meanings are reflectedthrough their topics, which are life styles in our study.Just like words serve as the basis of documents, people’sactivities naturally serve as the primitive vocabulary ofthese life documents.

Our proposed solution is also motivated by the recentadvances in smartphones, which have become more andmore popular in people’s lives. These smartphones (e.g.,iPhone or Android-based smartphones) are equippedwith a rich set of embedded sensors, such as GPS, ac-celerometer, microphone, gyroscope, and camera. Thus,a smartphone is no longer simply a communicationdevice, but also a powerful and environmental realitysensing platform from which we can extract rich contextand content-aware information. From this perspective,smartphones serve as the ideal platform for sensingdaily routines from which people’s life styles could bediscovered.

In spite of the powerful sensing capabilities of smart-phones, there are still multiple challenges for extractingusers’ life styles and recommending potential friendsbased on their similarities. First, how to automaticallyand accurately discover life styles from noisy and het-erogeneous sensor data? Second, how to measure thesimilarity of users in terms of life styles? Third, whoshould be recommended to the user among all the friendcandidates? To address these challenges, in this paper,we present Friendbook, a semantic-based friend recom-mendation system based on sensor-rich smartphones.The contributions of this work are summarized as fol-lows:• To the best of our knowledge, Friendbook is the first

friend recommendation system exploiting a user’slife style information discovered from smartphonesensors.

• Inspired by achievements in the field of text mining,we model the daily lives of users as life documentsand use the probabilistic topic model to extract lifestyle information of users.

• We propose a unique similarity metric to character-ize the similarity of users in terms of life styles andthen construct a friend-matching graph to recom-mend friends to users based on their life styles.

• We integrate a linear feedback mechanism that ex-ploits the user’s feedback to improve recommenda-tion accuracy.

• We conduct both small-scale experiments and large-scale simulations to evaluate the performance ofour system. Experimental results demonstrate theeffectiveness of our system.

The rest of the paper is organized as follows. Section2 discusses related work. Section 3 provides the high-level overview of Friendbook. Section 4 presents activityrecognition and life style modeling and extraction. InSection 5, we describe the social graph construction anduser impact estimation. We elaborate on the user queryand friend recommendation in Section 6. We describe thefeedback mechanism in Section 7. In Section 8, we evalu-ate the performance of Friendbook intensively with bothsimulations and real experiments. Finally, we concludethe paper and present the future work in Section 9.

2 RELATED WORK

Recommendation systems that try to suggest items (e.g.,music, movie, and books) to users have become moreand more popular in recent years. For instance, Amazon[1] recommends items to a user based on items the userpreviously visited, and items that other users are look-ing at. Netflix [3] and Rotten Tomatoes [4] recommendmovies to a user based on the user’s previous ratingsand watching habits. Recently, with the advance ofsocial networking systems, friend recommendation hasreceived a lot of attention. Generally speaking, existingfriend recommendation in social networking systems,e.g., Facebook, LinkedIn and Twitter, recommend friendsto users if, according to their social relations, they sharecommon friends.

Meanwhile, other recommendation mechanisms havealso been proposed by researchers. For example, Bianand Holtzman [8] presented MatchMaker, a collabora-tive filtering friend recommendation system based onpersonality matching. Kwon and Kim [20] proposeda friend recommendation method using physical andsocial context. However, the authors did not explainwhat the physical and social context is and how to obtainthe information. Yu et al. [32] recommended geograph-ically related friends in social network by combiningGPS information and social network structure. Hsu etal. [18] studied the problem of link recommendation inweblogs and similar social networks, and proposed anapproach based on collaborative recommendation usingthe link structure of a social network and content-basedrecommendation using mutual declared interests. Gouet al. [17] proposed a visual system, SFViz, to supportusers to explore and find friends interactively underthe context of interest, and reported a case study usingthe system to explore the recommendation of friendsbased on people’s tagging behaviors in a music com-munity. These existing friend recommendation systems,however, are significantly different from our work, as we




exploit recent sociology findings to recommend friendsbased on their similar life styles instead of social rela-tions.

Activity recognition serves as the basis for extract-ing high-level daily routines (in close correlation withlife styles) from low-level sensor data, which has beenwidely studied using various types of wearable sensors.Zheng et al. [33] used GPS data to understand thetransportation mode of users. Lester et al. [21] useddata from wearable sensors to recognize activities basedon the Hidden Markov Model (HMM). Li et al. [22]recognized static postures and dynamic transitions byusing accelerometers and gyroscopes. The advance ofsmartphones enables activity recognition using the richset of sensors on the smartphones. Reddy et al. [26]used the built-in GPS and the accelerometer on thesmartphones to detect the transportation mode of anindividual. CenceMe [24] used multiple sensors on thesmartphone to capture user’s activities, state, habits andsurroundings. SoundSense [23] used the microphone onthe smartphone to recognize general sound types (e.g.,music, voice) and discover user specific sound events.EasyTracker [7] used GPS traces collected from smart-phones that are installed on transit vehicles to determineroutes served, locate stops, and infer schedules.

Although a lot of work has been done for activityrecognition using smartphones, there is relatively littlework on discovery of daily routines using smartphones.The MIT Reality Mining project [12] and Farrahi andGatica-Perez [14] tried to discover daily location-drivenroutines from large-scale location data. They could inferdaily routines such as leaving from home to office andeating at a restaurant. However, they could not discoverthe daily routines of people who are staying at thesame location. For instance, when one stays at home,his/her daily routines like “eating lunch” and “watchingmovie” could not be discovered if only using the locationinformation. In [13], Farrahi and Gatica-Perez took a stepfurther and overcame the short-coming of discoveringdaily routines of people staying in the same location byconsidering combined location and physical proximitysensed by the mobile phone. Another closely relatedwork was presented in [19], which used a topic modelto extract activity patterns from sensor data. However,they used two wearable sensors, but not smartphones,to discover the daily routines. In our work, we attemptto use the probabilistic topic model to discover life stylesusing the smartphone. We further utilize patterns discov-ered from activities as a basis for friend recommendationthat helps users find friends who have similar life styles.Note that the work in this paper is significantly differentfrom our preliminary demo work of Friendbook [31] thatrecommended friends to users based on the similarity ofpictures taken by users.

3 SYSTEM OVERVIEWIn this section, we give a high-level overview of theFriendbook system. Figure 2 shows the system architec-

ture of Friendbook which adopts a client-server modewhere each client is a smartphone carried by a user andthe servers are data centers or clouds.

Friend-matching Graph

Construction

User Impact Ranking

Life Style Indexing

Hadoop Data & Computation Infrastructure

Life Style Analysis

Data Collection User Query

LifeDocuments

Friend Query

Feedback Control

UserFeedback

Server

Fig. 2: System architecture of Friendbook.

On the client side, each smartphone can record data ofits user, perform real-time activity recognition and reportthe generated life documents to the servers. It is worthnoting that an offline data collection and training phaseis needed to build an appropriate activity classifier forreal-time activity recognition on smartphones. We spentthree months on collecting raw data of 8 volunteers forbuilding a large training data set. As each user typicallygenerates around 50MB of raw data each day, we chooseMySQL as our low level data storage platform andHadoop MapReduce as our computation infrastructure.After the activity classifier is built, it will be distributedto each user’s smartphone and then activity recognitioncan be performed in real-time manner. As a user con-tinually uses Friendbook, he/she will accumulate moreand more activities in his/her life documents, basedon which, we can discover his/her life styles usingprobabilistic topic model.

On the server side, seven modules are designed to ful-fill the task of friend recommendation. The data collectionmodule collects life documents from users’ smartphones.The life styles of users are extracted by the life styleanalysis module with the probabilistic topic model. Thenthe life style indexing module puts the life styles of usersinto the database in the format of (life-style, user) in-stead of (user, life-style). A friend-matching graph can beconstructed accordingly by the friend-matching graph con-struction module to represent the similarity relationshipbetween users’ life styles. The impacts of users are thencalculated based on the friend-matching graph by theuser impact ranking module. The user query module takesa user’s query and sends a ranked list of potential friendsto the user as response. The system also allows users togive feedback of the recommendation results which canbe processed by the feedback control module. With thismodule, the accuracy of friend recommendation can beimproved.

In the following sections, we will elaborate on all thecomponents of the system.




4 LIFE STYLE EXTRACTION USING TOPICMODEL

4.1 Life Style ModelingAs stated in Section 1, life styles and activities are reflec-tions of daily lives at two different levels where dailylives can be treated as a mixture of life styles and lifestyles as a mixture of activities. This is analogous to thetreatment of documents as ensemble of topics and topicsas ensemble of words. By taking advantage of recentdevelopments in the field of text mining, we model thedaily lives of users as life documents, the life styles astopics, and the activities as words.

Given “documents”, the probabilistic topic modelcould discover the probabilities of underlying “topics”.Therefore, we adopt the probabilistic topic model todiscover the probabilities of hidden “life styles” fromthe “life documents”. In probabilistic topic models, thefrequency of vocabulary is particularly important, asdifferent frequency of words denotes their informationentropy variances. Following this observation, we pro-pose the “bag-of-activity” model (Figure 3) to replace theoriginal sequences of activities recognized based on theraw data with their probability distributions. Thereafter,each user has a bag-of-activity representation of his/herlife document, which comprises a mixture of activitywords.

A1, B2, C1

E4, E4, A1, C1

...

Life Document

...

A1

B2

C1

E4

2

1

2

2

Bag of Activity

Activity Label Frequency

Fig. 3: Bag-of-Activity modeling for life document.

Let w = [w1, w2, ..., wW ] denote a set of activities,where wi is the ith activity and W is the total numberof activities. Let z = [z1, z2, ..., zZ ] denote a set of lifestyles, where zi is the ith life style and Z is the totalnumber of life styles. Let d = [d1, d2, ..., dn] denote a setof life documents, where di is the ith life document andn is the total number of users. Let p(wi|dk) denote theprobability of the activity wi in a certain life documentdk, p(wi|zj) denote the probability of how much theactivity wi contributes to the life style zj , and p(zj |dk)denote the probability of the life style zj embedded inthe life document dk. According to the probabilistic topicmodel, we have

p(wi|dk) =

Z∑j=1

p(wi|zj)p(zj |dk) (1)

Observe that p(wi|dk) can be easily calculated by usingthe “bag-of-activity” representation for the life documentdk.

p(wi|dk) =fk(wi)∑Wi=1 fk(wi)

(2)

where fk(wi) denotes the frequency of wi in dk.We represent the life styles of a user using the life style

vector, denoted by Lk = [p(z1|dk), p(z2|dk), ..., p(zZ |dk)].In this paper, our objective is to discover the life stylevector for each user given the life documents of all users.However, in Eq. 1, although p(wi|dk) can be calculatedeasily, p(wi|zj) and p(zj |dk) are difficult to solve becauseof the hidden feature of life styles.

In the following sections, we first present the detailsof activity recognition used to calculate p(wi|dk), thenshow how to use the Latent Dirichlet Allocation (LDA)decomposition algorithm to solve Eq. 1 so that we canobtain the life style vector of each user.

4.2 Activity RecognitionTo derive p(wi|dk), we need to first classify or recognizethe activities of users. Life styles are usually reflected asa mixture of motion activities with different occurrenceprobability. Therefore, two motion sensors, accelerome-ter and gyroscope, are used to infer users’ motion activ-ities. Generally speaking, there are two mainstream ap-proaches: supervised learning and unsupervised learn-ing. For both approaches, mature techniques have beendeveloped and tested. In practice, the number of activi-ties involved in the analysis is unpredictable and it is dif-ficult to collect a large set of ground truth data for eachactivity, which makes supervised learning algorithmsunsuitable for our system. Therefore, we use unsuper-vised learning approaches to recognize activities. Here,we adopt the popular K-means clustering algorithm [9]to group data into clusters, where each cluster representsan activity. Note that activity recognition is not the mainconcern of our paper. Other more complicated clusteringalgorithms can certainly be used. We choose K-means forits simplicity and effectiveness.

Raw data

Preprocessing

Feature extraction

Activity recognition

Preprocessed data

Feature vectors

Activities

Fig. 4: The flowchart of activity recognition

Figure 4 shows the flowchart of activity recognition.Since the raw data collected on the smartphones arenoisy, we first use a median filter [5] with sliding




windows to filter out the outliers of the noisy data.In order to further improve recognition accuracy, fea-tures are extracted to characterize the data after pre-processing. We have tested several features, such asmean, standard deviation, correlation, and the combi-nation of them, on the data and found that standarddeviation is the most representative feature for char-acterizing motion activities. Therefore, a feature vec-tor f = [tc, accx, accy, accz, gyrx, gyry, gyrz] is used tocharacterize user’s activities instead of the raw data,where tc is the mean of normalized time; accx, accyand accz represent the standard deviation of normalizeddata obtained from the accelerometer in the x, y andz directions, respectively; gyrx, gyry and gyrz representthe standard deviation of normalized data obtained fromthe gyroscope in the x, y and z directions, respectively.We then apply the K-means clustering algorithm onthe feature vectors to group them into different clus-ters as well as calculating the cluster centroids. Ourempirical study in Section 8.1.1 indicates that K = 15is a good compromise between classification accuracyand computational time. The cluster centroids are thendistributed to the smartphones. Then each smartphonecould independently recognize activities based on theminimum distance rule and upload the activity sequenceinstead of the raw data to the server.

4.3 Life Style Extraction using LDA

Given the life documents of all users, Eq. 1 can be furtherrepresented as a matrix decomposition problem.

p(w|d) = p(w|z)p(z|d) (3)

where p(w|d) = [p(w|d1), p(w|d2), ..., p(w|dn)]is the activity-document matrix as shownin Figure 5 containing the probability ofeach activity over each life document, andp(w|dk) = [p(w1|dk), p(w2|dk), ..., p(wW |dk)]T is the kthcolumn in the activity-document matrix representingthe probabilities of activities over the life document dkof user k; p(w|z) = [p(w|z1), p(w|z2), ..., p(w|zZ)] is theactivity-topic matrix as shown in Figure 5 representingthe probability of each activity over each life style (topic),and p(w|zk) = [p(w1|zk), p(w2|zk), ..., p(wW |zk)]T is thekth column in the activity-topic matrix representingthe probabilities of activities over the life style zk;p(z|d) = [p(z|d1), p(z|d2), ..., p(z|dn)] is the topic-document matrix as shown in Figure 5 containing theprobability of each topic over each life document, andp(z|dk) = [p(z1|dk), p(z2|dk), ..., p(zZ |dk)]T is the kthcolumn in the topic-document matrix representing theprobabilities of life styles over the life document dk ofuser k.

The above matrix decomposition problem is actuallythe Latent Dirichlet Allocation (LDA) model [10]. Weuse the Expectation-Maximization (EM) method to solvethe LDA decomposition, where the E-step is used toestimate the free variational Dirichlet parameter γ [15]

= xw

d z

z

Activity-document Activity-topic Topic-document

w

d

Fig. 5: Matrix decomposition for life styles analysis. (Redrawn from[19])

and multinomial parameter Φ in the standard LDAmodel [10] and the M-step is used to maximize the loglikelihood of the activities under these parameters. Afterthe EM algorithm converges, we are able to calculate thedecomposed activity-topic matrix. Readers are referredto [10] for more details of the LDA algorithm and alter-native decomposition approaches. It is worth noting thatthe matrix decomposition process can be implementedmore efficiently through incremental iteration. That is,when a user’s life document changes or a new user’slife document is uploaded to the system, Friendbook cancalculate the new life style vectors for each user basedon previously derived life style vectors and the newlife document. We did not implement the incrementaliteration in the current framework by simply replyingon the computing power of cloud computing. As partof our future work, we could add this implementationto the framework to make Friendbook scalable to large-scale systems.

It is also worth noting that since our system usesunsupervised learning algorithms to recognize activitiesand the topic model to discover life styles, the physicalmeanings of derived “activities” (or cluster centers fromthe K-means algorithm) or “topics” are unknown to us.As mentioned in [19], such meaning can be estimated viathe additional step of comparing the topic activations tothe actual structure of the subject’s day and then iden-tifying topics that correspond to possible daily routines.In Friendbook, since we are to only compare “similarity”in activities or topic patterns, there is no need to infer thephysical meaning of each cluster center or topic. On theother hand, not revealing the actual physical meaningof activities and topics also has advantages from theperspective of preserving privacy.

5 FRIEND-MATCHING GRAPH AND USER IM-PACTTo characterize relations among users, in this section, wepropose the friend-matching graph to represent the sim-ilarity between their life styles and how they influenceother people in the graph. In particular, we use the linkweight between two users to represent the similarity oftheir life styles. Based on the friend-matching graph, wecan obtain a user’s affinity reflecting how likely this userwill be chosen as another user’s friend in the network.

5.1 Similarity MetricWe define a new similarity metric to measurethe similarity between two life style vectors.




Let Li = [p(z1|di), p(z2|di), ..., p(zZ |di)] andLj = [p(z1|dj), p(z2|dj), ..., p(zZ |dj)] denote the lifestyle vectors of user i and user j, respectively.

We argue that the similarity is not only affected bytheir life style vectors as a whole, but also by themost important life styles, i.e., the elements within thevector with larger probability values, also known as thedominant life styles. We also argue that two users donot share much similarity if majority of their life stylesare totally different. Therefore, the similarity of life stylesbetween user i and user j, denoted by S(i, j), is definedas follows:

S(i, j) = Sc(i, j) · Sd(i, j) (4)

where Sc(i, j) is used to measure the similarity of thelife style vectors of users as a whole, Sd(i, j) is used toemphasize the similarity of users on their dominant lifestyles.

We adopt the commonly used cosine similarity metricfor Sc(i, j), that is,

Sc(i, j) = cos(Li,Lj) (5)

In order to calculate Sd(i, j), we first define the set ofdominant life styles of a user.

Definition 1. Dominant life styles: The set of dominant lifestyles of a user i, Di, is a subset of all the life styles satisfyingthe following requirements:

1) The total probability distribution of the set is larger thanor equal to λ which is a predefined threshold.

2) The probability distribution of any life style in the set islarger than or equal to that of any life style not in theset.

3) The set should have the minimum number of life styles.

These requirements guarantee that life styles withlarger probabilities are more probably to be in-cluded in the set. To find Di, we sort the lifestyle vector Li in the descending order with re-spect to the probabilities of life styles. Then we haveLi = [p(zi1|di), p(zi2|di), · · · , p(ziZ |di)] wherep(zis|di) ≥ p(zit|di) if s ≤ t. The size of dominant lifestyle set is calculated as

qi = arg minq

(

q∑k=1

p(zik|di) ≥ λ) (6)

Finally, we can obtain the dominant life style setDi = {zi1, · · · , ziqi}.

The similarity metric Sd(i, j) for measuring the simi-larity of the dominant life style sets of two users is thendefined as

Sd(i, j) =2|Di ∩Dj ||Di|+ |Dj |

(7)

The range of Sd(i, j) is [0, 1]. Observe that the higherthe percentage of the same life styles, the larger thesimilarity. When there is no overlap between Di andDj , the similarity is 0. When Di and Dj are the same,

the similarity is 1. Since both Sc(i, j) and Sd(i, j) varybetween 0 and 1, we conclude that the similarity metricS(i, j) varies between 0 and 1.

As an example to show the calculation of twousers’ life style similarity, we assume that there aretwo users 1 and 2 in the system, who have the lifestyle vectors L1 = [0.3, 0.1, 0.2, 0.3, 0.1] andL2 = [0.2, 0.1, 0.4, 0, 0.3], respectively. The numberof life style topics is 5. We first calculate Sc(1, 2) =cos(L1,L2) = 0.6708. Given λ = 0.8, we can calcu-late the dominant life style sets of these two users,D1 = {z1, z4, z3} and D2 = {z3, z5, z1}, respectively.Therefore, the dominant life style similarity is calculatedas Sd(1, 2) = 2×2

3 + 3 = 0.67. Finally, the similarity of user1 and 2 is S(1, 2) = Sc(1, 2) · Sd(1, 2) = 0.45.

5.2 Friend-matching Graph ConstructionBased on the similarity metric, we model the relationsbetween users in real life as a friend-matching graph.

Definition 2. Friend-matching graph: It is aweighted undirected graph G = (V,E,W ), whereV = {v1, v2, · · · , vn} is the set of users and n is thenumber of users, E = {e(i, j)} is the set of links betweenusers, and W : E → R is the set of weights of edges. Thereis an edge e(i, j) linking user i and user j if and only iftheir similarity S(i, j) ≥ Sthr, where Sthr is the predefinedsimilarity threshold. The weight of that edge is representedby the similarity, that is, ω(i, j) = S(i, j).

v1

v5

v4

v3

v6

v2

v7

v80.99

0.6

0.99

0.570.62

0.39

0.98

0.99

0.44

0.6

0.54

0.58

0.47

Fig. 6: An example of Friend-matching Graph for 8 users.

Figure 6 demonstrates a friend-matching graph basedon the life styles of 8 users and the similarity thresholdSthr is set to 0.3. An edge linking two users means theyhave similar life styles (e.g., e(1, 7)), and their similarityis quantified by the weight of the edge (e.g., ω(1, 7) =0.62). Some isolated vertices mean that they do not shareenough similar life styles with others (e.g., user 4). Weuse the following representation to convert the graphinto a matrix representation.

N = (Nij)n×n =

0 ω(1, 2) · · · ω(1, n)

ω(2, 1) 0 · · · ω(2, n)...

.... . .

...ω(n, 1) ω(n, 2) · · · 0

(8)




Note that the values on the diagonal are all 0 becausewe would not recommend himself/herself to a user.

5.3 User Impact Ranking

The friend-matching graph has been constructed to re-flect life style relations among users. However, we stilllack a measurement to identify the impact ranking ofa user quantitatively. Intuitively, the impact rankingmeans a user’s capability to establish friendships inthe network. In other words, the higher the ranking,the easier the user can be made friends with, becausehe/she shares broader life styles with others. Inspiredby PageRank [25] which is used in web page ranking,we form the idea that a user’s ranking is reflected by hisneighbors in the friend-matching graph and how muchhis neighbors endorse the user as a friend.

Once the ranking of a user is obtained, it providesguidelines to those who receive the recommendationlist on how to choose friends. The ranking itself, how-ever, should be independent from the query user. Inother words, the ranking depends only on the graphstructure of the friend-matching graph, which containstwo aspects: 1) how the edges are connected; 2) howmuch weight there is on every edge. Moreover, theranking should be used together with the similarityscores between the query user and the potential friendcandidates, so that the recommended friends are thosewho not only share sufficient similarity with the queryuser, and are also popular ones through whom the queryuser can increase their own impact rankings.

Let N (i) denote the set of neighbors of user i. Letr = [r(1), r(2), · · · , r(n)]T denote the impact rankingvector where r(i) is the impact ranking of user i in thefriend-matching graph, and n is the number of users inthe system. The calculation of r(i) is defined as follows:

r(i) =

∑j∈N (i) ω(i, j) · r(j)∑

j∈N (i) ω(i, j)(9)

As shown in Eq. 9, the impact ranking of a user isaffected by its neighbors from two aspects: first, thesimilarity between itself and a neighbor; second, theimpact ranking of its neighbors. Note that in friend-matching graph, ω(i, j) = 0 if j is not a neighbor of i.Also ω(i, i) = 0 because we would not recommend him-self/herself to a user. Therefore, Eq. 9 can be rewrittenas follows.

r(i) =

∑j ω(i, j) · r(j)∑

j ω(i, j)(10)

The calculation of r(i) is an iterative process becauseany change of its neighbors will change r(i) accordingly.Therefore, we use a matrix representation to clearly showthe iterative process of Eq. 10.

rTk+1 = rTk ·H (11)

where k and k+1 indicate two subsequent iteration steps,and H = (Hij)n×n is the transitional matrix representingthe similarity of neighbors. Combining Eq. 8 and Eq. 10,an element Hij in H is calculated as follows:

Hij =ω(i, j)∑j ω(i, j)

=Nij∑j Nij

(12)

In practice, it is possible that people choose friendsrandomly rather than based on their importance. There-fore, we revise the transitional matrix by introducing amatrix with equal value. Then the transitional matrix ismodified to Eq. 13.

H = ϕH + (1− ϕ)1

neeT (13)

where e is the n × 1 unit vector and ϕ is the dampingfactor used to emphasize the importance of the friend-matching graph.

Finally, the iterative process for calculating the impactrank of users is carried out as follows.

rTk+1 = rTk · H (14)

The process terminates when the impact ranking vec-tor converges to a stable value. In our experiment, theprocess terminates when

∑ni=1 |rk+1(i)−rk(i)| ≤ ε where

ε is an arbitrary small value larger than 0. The pseu-docode for the iterative process is shown in Algorithm1. After the impact vector r is calculated, we can rankthe users in a non-descending order so that a user withhigher impact ranking is always ahead of a user withlower impact ranking.

In general, by using the Hadoop MapReduce frame-work, the impact ranking process can converge quickly.However, this is becoming increasing infeasible whenthe size of the system is becoming very large. Fortu-nately, accordingly to the incremental computation ofPageRank [6], [11] and the distributed computation ofPageRank [28], the iterative matrix-vector multiplicationmethod in Eq. (14) can be implemented incrementally ordistributively for large-scale evolving graphs. In [28], theauthors presented a fast algorithm that takes O(

√log n/ε)

rounds in undirected graphs, where n is the networksize and ε is a fixed constant. Therefore, Friendbook isscalable to large-scale systems if we could implement theiterative matrix-vector multiplication method incremen-tally or distributively, which would be our future work.

6 QUERY AND FRIEND RECOMMENDATION

Before a user initiates a request, he/she should haveaccumulated enough activities in his/her life documentsfor efficient life styles analysis. The period for collectingdata usually takes at least one day. Longer time wouldbe expected if the user wants to get more satisfied friendrecommendation results. After receiving a user’s request(e.g., life documents), the server would extract the user’slife style vector, and based on which recommend friendsto the user.




Algorithm 1 Computing users’ impact ranking

Input: The friend-matching graph G.Output: Impact ranking vector r for all users.

1: for i = 1 to n do2: r0(i) = 1

n3: end for4: δ =∞5: ε = e−9

6: while δ > ε do7: for i = 1 to n do8: rk+1(i) =

∑j

1−ϕn rk(j) + ϕ

∑j ω(i,j)·rk(j)∑

j ω(i,j)

9: end for10: δ =

∑ni=1 |rk+1(i)− rk(i)|

11: end while12: return r

The recommendation results are highly dependent onusers’ preference. Some users may prefer the system torecommend users with high impact, while some usersmay want to know users with the most similar life styles.It is also possible that some users want the system torecommend users who have high impact and also similarlife styles to them. To better characterize this require-ment, we propose the following metric to facilitate therecommendation,

Ri(j) = βS(i, j) + (1− β)rjκ (15)

where Ri(j) is the recommendation score of user j forthe query user i, S(i, j) is the similarity between useri and user j, and rj is the impact of user j. β ∈ [0, 1]is the recommendation coefficient characterizing users’preference. κ is introduced to make S(i, j) and rj in thesame order of magnitude, which can be roughly set ton/10, where n is the number of users in the system.When β = 1, the recommendation is solely based on thesimilarity; when β = 0, the recommendation is solelybased on the impact ranking.

With the metric in Eq. 15, our recommendation mecha-nism for finding the most appropriate friends to a queryuser is described as follows. For a query user i, the servercalculates the recommendation scores for all the usersin the system and sorts them in the descending orderaccording to their recommendation scores. The top pusers will be returned to the query user i. The parameterp is an integer and can be defined by the querying user.The complexity of our recommendation mechanism isO(n) since it checks all users in the system, where n isthe overall number of users in the system.

As the number of users increases, the overhead ofquery and recommendation increases linearly. In reality,users may have totally different life styles and it is notnecessary to calculate their recommendation scores atall. Therefore, in order to speed up the query and rec-ommendation process, we adopt the reverse index tableusing (life-style, user) pair instead of (user, life-style) pairin the database. Figure 7 shows the difference. With the

User1

User2

User3

User4

...

Life style 23, life style 99, life style 4 ...

Life style 4 ...

Life style 2, life style 4 ...

Life style 7, life style 77 ...

Style 2

Style 4

Style 7

Style 23

...

User3

User1, user2, user3

User4

User1

User Life Style Index

Reverse Index

Fig. 7: Illustration of the reverse index table.

reverse index table, before calculating recommendationscore for each user, the server first picks up all the usershaving overlapping life styles with the query user andsets the similarities of rest users to the query user to 0.The server then checks all the users to calculate theirrecommendation scores. Although the complexity is stillO(n), we can observe that the reverse index table reducesthe computation overhead, the advantage of which isconsiderable when the system is in large-scale.

The pseudocode of the friend recommendation mech-anism is shown in Algorithm 2.

Algorithm 2 Friend recommendation

Input: The query user i, the recommendation coefficientβ and the required number of recommended friendsfrom the system p.

Output: Friend list Fi.1: Fi ← ∅, Q← ∅2: extracts i’s life style vector Li using the LDA algo-

rithm.3: for each life style zk the probability of which in Li

is not zero do4: put users in the entry of zk into Q5: end for6: for each user j /∈ Q do7: S(i, j)← 08: end for9: for each user j in the database do

10: Ri(j) = βS(i, j) + (1− β)rjκ11: end for12: sort all users in decreasing order according to Ri(j)13: put the top p users in the sorted list to Fi

Friendbook also uses GPS location information to helpusers find friends within some distance. In order toprotect the privacy of users, a region surrounding theaccurate location will be uploaded to the system. When auser uses Friendbook, he/she can specify the distance of




friends before recommendation. In this way, only friendshaving similarity with the user within the specifieddistance can be recommended as friends.

Privacy is very important especially for users whoare sensitive to information leakage. In our design ofFriendbook, we also considered the privacy issue andthe existing system can provide two levels of privacyprotection. First, Friendbook protects users’ privacy atthe data level. Instead of uploading raw data to theservers, Friendbook processes raw data and classifiesthem into activities in real-time. The recognized activ-ities are labeled by integers. In this way, even if thedocuments containing the integers are compromised,they cannot tell the physical meaning of the documents.Second, Friendbook protects users’ privacy at the lifepattern level. Instead of telling the similar life stylesof users, Friendbook only shows the recommendationscores of the recommended friends with the users. Withthe recommendation score, it is almost impossible toinfer the life styles of recommended friends.

7 FEEDBACK CONTROL

To support performance optimization at runtime, we alsointegrate a feedback control mechanism into Friendbook.After the server generates a reply in response to aquery, the feedback mechanism allows us to measure thesatisfaction of users, by providing a user interface thatallows the user to rate the friend list. Let r denote theimpact ranking vector calculated from the feedback ofusers. Here, r = [r(1), r(2), · · · , r(n)]T where n is thenumber of current users of the system. Let r(i, j) denotethe score that user j rates user i. Then we have:

r(i, j) =

{r(i, j) if user j rates user i,r(i) otherwise.

(16)

where the second equation in Eq. 16 means that thefeedback score is equal to the original score if user jdoes not rate user i. This may commonly occur whenthe user does not know the persons being recommendedespecially when our system becomes very large.

Based on Eq. 16, we define the impact ranking of useri influenced by the feedback of users as follows:

r(i) =∑j

r(i, j)/n. (17)

which takes the feedback of all users into consideration.Finally, the original impact ranking vector R calculated

from the friend-matching graph is updated as follows:

r = αr + (1− α)r (18)

where α is named as confidence factor and 0 ≤ α ≤ 1.The final impact ranking vector considers both the in-fluence of friend-matching graph and the feedback fromusers. When α > 0.5, the friend-matching graph domi-nates the impact ranking, however, when α < 0.5, users’feedback will significantly affect the impact ranking. In

this way, the system takes users’ feedback into considera-tion to improve the accuracy of future recommendations.

8 EVALUATION

In this section, we present the performance evaluationof Friendbook on both small-scale field experiments andlarge-scale simulations.

8.1 Evaluation using Real DataWe first evaluate the performance of Friendbook onsmall-scale experiments. Eight volunteers help con-tribute data and evaluate our system. Table 1 demon-strates the profession of these users. Most of them arestudents, while the rest include a businessman, an officeworker, and a waitress. Each volunteer carries a NexusS smartphone with Friendbook application installed inadvance.

They are required to start the application after theywake up and turn it off before they go to bed. Asidefrom this, we do not impose any additional requirementon the usage of the smartphone. For example, we do notrequire them to carry the smartphone all the time duringthe day or attach the smartphone to some special partsof the body.

It is worth noting that some of the eight users arealready friends before experiments but some of themare not. In fact, some strangers within the group becomefriends afterwards. However, strangers living far awayfrom each other do not become friends although theychoose each other as a friend at the friend recommenda-tion phase. This also motivates the usage of GPS infor-mation into the system to improve the recommendationaccuracy.

TABLE 1: Profession of users

User ID 1 2 3 4 5 6 7 8Student

√ √ √ √ √

Waitress√

Office Worker√

Businessman√

In the following, we first present the activity classifi-cation results using K-means clustering algorithm, andthen show the performance evaluation on real experi-ments.

8.1.1 Activity ClassificationFigure 8 shows the classification results using the K-means clustering algorithm on the data collected fromthe 8 users for a period of three months. Featurevectors instead of raw data are used for classifica-tion. Each feature vector consists of 7 attributes, f =[tc, accx, accy, accz, gyrx, gyry, gyrz] (see Section 4.2), andwe extract feature vectors every 60 seconds. As shownin Figure 8(a), the sum of squared error drops quicklywhen the cluster number K increases from 1 to 10, and




0 5 10 15 20 25 300

2000

4000

6000

8000

10000

12000

14000

Cluster Number

Sum

of S

quar

ed E

rror

(a) Classification results

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Activities ID

Act

ivity

pro

babi

litie

s

(b) Activity distribution

Fig. 8: Classification performance using the K-means clustering.

then does not change too much after 15. This implies thatthe daily lives of the volunteers are roughly composedof 15 different types of activities. Although we can useadditional activities to characterize their daily lives ata finer scale, we find 15 an appropriate compromiseas the most important activities are already involved.Other activities occur rarely and cannot considerablyaffect the friend recommendation results. Therefore, weuse K = 15 as the number of activities in Friendbook.

Figure 8(b) demonstrates the distribution of 15 activ-ities. Most of activities have the probability of about6%, and three activities have the probability of largerthan 10%, and the remaining three activities have theprobability of less than 2%. Corresponding to each ac-tivity, a centroid feature vector is calculated by usingthe K-means clustering algorithm. These 15 centroids aredistributed to each smartphone so that it can performreal-time onboard activity recognition.

8.1.2 Friend Recommendation ResultsThere are four free parameters used to generate thefriend recommendation results, including the similaritythreshold for friend-matching graph Sthr (Definition 2),the threshold λ (Eq. 6) that controls the number of domi-nant life styles, the damping factor ϕ that emphasizes theimportance of the friend matching graph (Eq. 13), andthe number of life styles. In our practical experiments,

(a) (b) (c)

Fig. 9: User interfaces: (a) query-recommendation interface; (b) connec-tion interface; (c) rating interface.

User ID

Use

r ID

1 2 3 4 5 6 7 8

1

2

3

4

5

6

7

8 1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Fig. 10: The gray image representation of the eight users’ similarity.

we have used the following values as default throughempirical studies, i.e., the similarity threshold Sthr is setto 0.5, the threshold λ is set to 0.8, the damping factor ϕis set to 0.85, and the number of life styles is set to 10.

Figure 9 shows several user interfaces of Friendbook.Figure 9(a) shows a snapshot of the query and recom-mendation user interface. The IDs and recommendationscores of recommended friends are shown in the list.Note that Friendbook returns the ID of users insteadof their real names due to privacy concerns in ourexperiments. Figure 9(b) and 9(c) show the snapshotsof user feedback interfaces. Users can connect to peoplein the recommended friend list through our system andalso give a score on the recommended friends. Note thatwe intentionally anonymize the personal informationin Figure 9 to protect the privacy of subjects. In thereal system, when a user wants to use the system,he/she will be encouraged to complete his/her personalprofile, e.g., name and photo. Therefore, the name andphoto information as well as the similarity score of eachrecommended friend will be shown to the user.

Figure 10 illustrates the gray-scale image representa-tion of the similarity matrix for 8 users. The blocks inthe diagonal has the darkest color because users alwayshave the perfect match with themselves. As shown inFigure 10, user 1 has strong relationship with user 2and user 5, user 3 has strong relationship with user 7,




user 6 has relationship with the aforementioned usersbut not very strong, while user 4 and user 8 have norelationship with others at all. The result is consistentwith the ground truth of professions shown in Table 1because people have the same profession usually havethe same life styles.

TABLE 2: User impact ranking of 8 users.

Rank User ID Rank Score1 1 0.1332 7 0.1273 4 0.1254 8 0.1255 5 0.1246 2 0.1237 6 0.1238 3 0.118

The user impact rankings of the 8 users are shownin Table 2. The top ranks are users 1 and 7, followedby users 4 and 8 who seem to have high impact ranks.However, users 4 and 8 are not supposed to be higherthan others because they have no connections. Indeed,because of this, they should always maintain the initialscore. Since we only have 8 users in the system, eachof whom uses 1

8 = 0.125 as its initial random impact,as described in Algorithm 1, which results in that theirrankings are even higher than some of the connectedusers. If we have 1000 users, they should only have 0.001for their final score, thus they would not have so highrank. Although the independent users are not avoidable,we only recommend a small portion of top p friends.The chance that an independent user ranks in top p issmall when the total number of users is large. Therefore,the recommendation quality should not be affected muchfrom independent users.

8.2 Evaluation using Simulated DataWe perform simulations to further evaluate performanceof Friendbook when the scale of the system is large. Ourfriend recommendation method is based on life stylesextracted from sensors on users’ smartphones, whichis quite different from existing friend recommendationmethods. To the best of our knowledge, there is no realdata set that can be used for a large-scale performanceevaluation.

In our simulation, we randomly and independentlygenerate the life style vectors for 1,000 users, Lk =[p(z1|dk), p(z2|dk), · · · , p(z10|dk)], k = 1, · · · , 1000. Notethat since the life style vector contains the probabilityof each life style that sum to 1, each entry of the lifestyle vector is randomly generated between 0 and 1,and normalized to guarantee the sum of the values inthe vector is equal to 1. For each user, the similaritiesbetween itself and all the other users can be calculatedbased on the similarity metric in Eq. 4 and the 100 mostsimilar users are chosen as its true friends, denoted as

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

10

20

30

40

50

60

70

80

90

100

Similarity thereshold

Rec

omm

enda

tion

reca

ll (%

)

beta=0.1beta=0.3beta=0.6beta=0.9

Fig. 11: Evaluation on similarity threshold. The number of returnedrecommended friends is 100.

Gi for user i. After the life styles are uploaded to thesystem, a friend-matching graph can be constructed andeach user has an impact. We then let each user query thesystem and obtain its friend recommendation results. LetFi denote the set of recommended friends. The followingmeasurement metrics are used for performance evalua-tion.• Recommendation precision Rp: the average of the ratio

of the number of recommended friends in the set oftrue friends of the query user over the total numberof recommended friends.

Rp =

∑i |Fi ∩Gi|/|Fi|

1000

where | · | denotes the number of elements in a set·.The dominator is 1000 because Rp is the average of1000 users in one experiment.

• Recommendation recall Rr: the average of the ratio ofthe number of recommended friends in the set oftrue friends of the query user over the number ofthe set of true friends of the query user, which is100 in our experiments.

Rr =

∑i |Fi ∩Gi|/|Gi|

1000=

∑i |Fi ∩Gi|/100

1000

Using these performance metrics, we study the effectof a few parameters, such as the similarity threshold,Sthr, for friend-matching graph construction, and therecommendation coefficient, β, for balancing users’ pref-erence on impact and similarity. For all the simulationresults presented in this paper, each data point is anaverage of 100 experiments.

8.2.1 Effect of Similarity Threshold Sthr

We first evaluate the effect of the similarity thresholdSthr which is important for friend-matching graph con-struction. When Sthr is too small, almost all users inthe system are connected, which increases the overheadfor graph construction and maintenance as well as theranking process. While if Sthr is too large, the friend-matching graph cannot reflect the true relationshipsbetween users since those with high similarity may notbe connected with each other. Therefore, the value of




100 200 300 400 500 600 700 800 900 10000

10

20

30

40

50

60

70

80

90

100

# of recommended friends

Rec

omm

enda

tion

reca

ll (%

)

beta=0beta=0.3beta=0.6beta=1.0

(a) Recommendation recall

100 200 300 400 500 600 700 800 900 10000

10

20

30

40

50

60

70

80

90

100

# of recommended friends

Rec

omm

enda

tion

prec

isio

n (%

)

beta=0beta=0.3beta=0.6beta=1.0

(b) Recommendation Precision

Fig. 12: Evaluation on recommendation coefficient β.

the similarity threshold for the friend-matching graphconstruction should be carefully selected.

Figure 11 shows the effect of the similarity thresholdon the recommendation recall. We only consider thecase that 100 friends are recommended. In this case, therecommendation precision equals the recommendationrecall. We first observe that the recommendation recalldrops when Sthr increases. The reason is that links withsmall similarities are gradually removed from the graphas Sthr increases. We also notice that the influence of Sthr

is more obvious for smaller β. This is because removinglinks affects the impact rankings of users and meanwhilesmaller β relies more on the impact rankings. In ourfollowing experiments, we choose Sthr = 0.5 when weevaluate other parameters.

8.2.2 Effect of recommendation coefficient β

β is introduced to characterize users’ preference onimpact and similarity. Figure 12(a) and 12(b) shows theevaluation results of β on recommendation recall andprecision, respectively.

When β = 0, the recommendation recall is relativelylow which is just a little better than random recommen-dation. This is because the recommendation only relieson the impact rankings of users rather than similaritiesbetween users with the query user. The recommendationrecall increases as β increases because similarity is moreand more emphasized. When β reaches 1, all friends in

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 17.6

7.8

8

8.2

8.4

8.6

8.8

9

Recommendation coefficient β

Com

prom

ise

scor

e B

Fig. 13: The impact of β on B.

a query user’s group are recommended. Note that evenwhen β is not too large (e.g., β = 0.3), the recommenda-tion recall is still high and reaches 100% quickly as thenumber of recommended friends increases. As shown inFigure 12, the recommendation precision decreases whenthe number of recommendation friends increases andfinally reaches 10% when 1000 users are recommended.This is because we only choose 100 users as the set oftrue friends for each user. The recommendation precisionalso increases when β increases because similarity playsmore and more important roles as β increases.

It is obvious that our recommendation is much betterthan random recommendation which is usually 10% onaverage. However, note that we cannot conclude thatthe larger β the better, as shown in Figure 12. Becauseusers may prefer impact to similarity and our metricscannot reflect the recommendation accuracy on impact.To better characterize the recommendation results, wedefine a new metric, called compromise score, as follows.

B =

100∑j=1

S(i, ij)rij

where B is the compromise score and ij is the jth recom-mended friend for the query user i. In this experiment,100 friends are recommended for each query user. Figure13 shows the impact of β on the metric. As we mentionedbefore, each data point is an average of 100 experiments.We can see that the metric achieves its maximum when βis around 0.3. Although the recommended friends havehigh impact when β is small, the similarity betweenthem and the query user is low. In contrast, the similaritybetween the recommended friends and the query user ishigh, but their popularity is low. Therefore, in order towell balance the similarity and popularity, β should becarefully selected. Our system leaves the setting of β tousers, so they can find friends based on their preferences.

8.3 Resource ConsumptionFinally, we evaluate the energy consumption perfor-mance of the Friendbook client application. Usually, theuser will not have the incentive to use the applicationif the battery runs out in less than 10 hours, which is




0 500 1000 1500 20000

10

20

30

40

50

60

70

80

90

100

Time (minutes)

Ene

rgy

leve

l (%

)

Friendbook OffFriendbook On

Fig. 14: Energy consumption comparison.

the typical hour of usage for a day (the user can re-charge the battery during the night). Therefore, energyconsumption is another important metric that has tobe measured. We test the energy consumption of thesame smartphone under two modes: idle mode withFriendbook off and active mode with Friendbook on.Either mode is under a user’s normal use such asmaking phone calls, checking emails, sending SMS, etc.As shown in Figure 14, Friendbook drops the batteryto 15% in about 13 hours. The evaluation shows thatFriendbook achieves satisfactory results on the energyperformance.

TABLE 3: Resources requirements comparison.

Aspect Friendbook Android GoogleClient Service Maps

Size 76 KB − 12 MBCPU usage <1% 13% <1%Data usage <10KB 1.95MB 526KB

Runtime memory 4.2MB 33MB 6.9MB

In addition to energy, as shown in Table 3, we mon-itor the runtime resources that our client applicationconsumes comparing to other Android system servicesand applications. It is easily observed that the clientapplication is small in size while the runtime memoryusage is comparable to some popular applications, suchas Google Maps. In addition, the wireless data usage islow so that the user does not need to worry about thedata plan of their mobile phone. Moreover, even whenwe turn on GPS, accelerometer and gyroscope sensors allthe time, the total CPU usage of the client application isstill very low on average.

9 CONCLUSION AND FUTURE WORK

In this paper, we presented the design and implementa-tion of Friendbook, a semantic-based friend recommen-dation system for social networks. Different from thefriend recommendation mechanisms relying on socialgraphs in existing social networking services, Friend-book extracted life styles from user-centric data collectedfrom sensors on the smartphone and recommended po-tential friends to users if they share similar life styles. We

implemented Friendbook on the Android-based smart-phones, and evaluated its performance on both small-scale experiments and large-scale simulations. The re-sults showed that the recommendations accurately re-flect the preferences of users in choosing friends.

Beyond the current prototype, the future work can befour-fold. First, we would like to evaluate our systemon large-scale field experiments. Second, we intend toimplement the life style extraction using LDA and theiterative matrix-vector multiplication method in userimpact ranking incrementally, so that Friendbook wouldbe scalable to large-scale systems. Third, the similaritythreshold used for the friend-matching graph is fixed inour current prototype of Friendbook. It would be inter-esting to explore the adaption of the threshold for eachedge and see whether it can better represent the similar-ity relationship on the friend-matching graph. At last, weplan to incorporate more sensors on the mobile phonesinto the system and also utilize the information fromwearable equipments (e.g., Fitbit, iwatch, Google glass,Nike+, and Galaxy Gear) to discover more interestingand meaningful life styles. For example, we can incor-porate the sensor data source from Fitbit, which extractsthe user’s daily fitness infograph, and the user’s placeof interests from GPS traces to generate an infographof the user as a “document”. From the infograph, onecan easily visualize a user’s life style which will makemore sense on the recommendation. Actually, we expectto incorporate Friendbook into existing social services(e.g., Facebook, Twitter, LinkedIn) so that Friendbookcan utilize more information for life discovery, whichshould improve the recommendation experience in thefuture.

ACKNOWLEDGEMENTS

The authors would like to thank the anonymous review-ers whose insightful comments have helped improvethe presentation of this paper significantly. This workwas supported in part by NSF CNS-1017156 and CNS-0953238, and in part by NSFC 61273079 and NSFC11174316.

REFERENCES

[1] Amazon. http://www.amazon.com/.[2] Facebook statistics. http://www.digitalbuzzblog.com/

facebook-statistics-stats-facts-2011/.[3] Netfix. https://signup.netflix.com/.[4] Rotten tomatoes. http://www.rottentomatoes.com/.[5] G. R. Arce. Nonlinear Signal Processing: A Statistical Approach.

John Wiley & Sons, 2005.[6] B. Bahmani, A. Chowdhury, and A. Goel. Fast incremental and

personalized pagerank. Proc. of VLDB Endowment, volume 4, pages173-184, 2010.

[7] J. Biagioni, T. Gerlich, T. Merrifield, and J. Eriksson. EasyTracker:Automatic Transit Tracking, Mapping, and Arrival Time PredictionUsing Smartphones. Proc. of SenSys, pages 68-81, 2011.

[8] L. Bian and H. Holtzman. Online friend recommendation throughpersonality matching and collaborative filtering. Proc. of UBI-COMM, pages 230-235, 2011.

[9] C. M. Bishop. Pattern recognition and machine learning. SpringerNew York, 2006.




[10] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation.Journal of Machine Learning Research, 3:993-1022, 2003.

[11] P. Desikan, N. Pathak, J. Srivastava, and V. Kumar. Incrementalpage rank computation on evolving graphs. Proc. of WWW, pages1094-1095, 2005.

[12] N. Eagle and A. S. Pentland. Reality Mining: Sensing Complex Co-cial Systems. Personal Ubiquitous Computing, 10(4):255-268, March2006.

[13] K. Farrahi and D. Gatica-Perez. Probabilistic mining of socio-geographic routines from mobile phone data. Selected Topics inSignal Processing, IEEE Journal of, 4(4):746-755, 2010.

[14] K. Farrahi and D. Gatica-Perez. Discovering Routines fromLargescale Human Locations using Probabilistic Topic Models.ACM Transactions on Intelligent Systems and Technology (TIST), 2(1),2011.

[15] B. A. Frigyik, A. Kapila, and M. R. Gupta. Introduction to thedirichlet distribution and related processes. Department of ElectricalEngineering, University of Washignton, UWEETR-2010-0006, 2010.

[16] A. Giddens. Modernity and Self-identity: Self and Society in thelate Modern Age. Stanford Univ Pr, 1991.

[17] L. Gou, F. You, J. Guo, L. Wu, and X. L. Zhang. Sfviz: Interestbasedfriends exploration and recommendation in social networks. Proc.of VINCI, page 15, 2011.

[18] W. H. Hsu, A. King, M. Paradesi, T. Pydimarri, and T. Weninger.Collaborative and structural recommendation of friends usingweblog-based social network analysis. Proc. of AAAI Spring Sym-posium Series, 2006.

[19] T. Huynh, M. Fritz, and B. Schiel. Discovery of Activity Patternsusing Topic Models. Proc. of UbiComp, 2008.

[20] J. Kwon and S. Kim. Friend recommendation method usingphysical and social context. International Journal of Computer Scienceand Network Security, 10(11):116-120, 2010.

[21] J. Lester, T. Choudhury, N. Kern, G. Borriello, and B. Hannaford.A Hybrid Discriminative/Generative Approach for Modeling Hu-man Activities. Proc. of IJCAI, pages 766-772, 2005.

[22] Q. Li, J. A. Stankovic, M. A. Hanson, A. T. Barth, J. Lach, andG. Zhou. Accurate, Fast Fall Detection Using Gyroscopes andAccelerometer-Derived Posture Information. Proc. of BSN, pages138-143, 2009.

[23] E. Miluzzo, C. T. Cornelius, A. Ramaswamy, T. Choudhury, Z.Liu, and A. T. Campbell. Darwin Phones: the Evolution of Sensingand Inference on Mobile Phones. Proc. of MobiSys, pages 5-20, 2010.

[24] E. Miluzzo, N. D. Lane, S. B. Eisenman, and A. T. Campbell.Cenceme-Injecting Sensing Presence into Social Networking Ap-plications. Proc. of EuroSSC, pages 1-28, October 2007.

[25] L. Page, S. Brin, R. Motwani, and T. Winograd. The PagerankCitation Ranking: Bringing Order to the Web. Technical Report,Stanford InfoLab, 1999.

[26] S. Reddy, M. Mun, J. Burke, D. Estrin, M. Hansen, and M. Srivas-tava. Using Mobile Phones to Determine Transportation Modes.ACM Transactions on Sensor Networks (TOSN), 6(2):13, 2010.

[27] I. Ropke. The Dynamics of Willingness to Consume. EcologicalEconomics, 28(3):399-420, 1999.

[28] A. D. Sarma, A. R. Molla, G. Pandurangan, and E. Upfal. Fastdistributed pagerank computation. Springer Berlin Heidelberg, pages11-26, 2013.

[29] G. Spaargaren and B. Van Vliet. Lifestyles, Consumption andthe Environment: The Ecological Modernization of Domestic Con-sumption. Environmental Politics, 9(1):50-76, 2000.

[30] M. Tomlinson. Lifestyle and Social Class. European SociologicalReview, 19(1):97-111, 2003.

[31] Z. Wang, C. E. Taylor, Q. Cao, H. Qi, and Z. Wang. Demo:Friendbook: Privacy Preserving Friend Matching based on SharedInterests. Proc. of ACM SenSys, pages 397-398, 2011.

[32] X. Yu, A. Pan, L.-A. Tang, Z. Li, and J. Han. Geo-friends rec-ommendation in gps-based cyber-physical social network. Proc. ofASONAM, pages 361-368, 2011.

[33] Y. Zheng, Y. Chen, Q. Li, X. Xie, and W.-Y. Ma. UnderstandingTransportation Modes Based on GPS Data for Web Applications.ACM Transactions on the Web (TWEB), 4(1):1-36, 2010.

Zhibo Wang received the B.E. degree in Au-tomation from Zhejiang University, China, in2007. He is currently pursuing the Ph.D. de-gree in Electrical Engineering and ComputerScience at University of Tennessee, Knoxville.His research interests include wireless sensornetworks, mobile sensing systems and cyberphysical systems. He is a student member ofIEEE.

Jilong Liao received his B.E. degree from Uni-versity of Electronic Science and Technology,China, in 2010. He received his Master ofScience degree from University of Tennessee,Knoxville in 2013. His major research inter-ests included mobile system, data analysis, dis-tributed system and wireless sensor networks.He is currently working in Microsoft’s operatingsystem group (OSG) as a software developmentengineer.

Qing Cao received his Ph.D. degree from theUniversity of Illinois in 2008, his M.S. degreefrom the University of Virginia, and his B.S.degree from Fudan University, China. He is cur-rently an assistant professor in the Departmentof Electrical Engineering and Computer Scienceat the University of Tennessee. His researchinterests include wireless sensor networks, em-bedded systems, and distributed networks. Heis a member of ACM, IEEE, and the IEEE Com-puter Society.

Hairong Qi received the B.S. and M.S. degreesin computer science from Northern JiaoTongUniversity, Beijing, China in 1992 and 1995 re-spectively, and the Ph.D. degree in computer en-gineering from North Carolina State University,Raleigh, in 1999. She is currently a Professorwith the Department of Electrical Engineeringand Computer Science at the University of Ten-nessee, Knoxville. Her current research interestsare in advanced imaging and collaborative pro-cessing in resource-constrained distributed envi-

ronment, hyperspectral image analysis, and bioinformatics. Dr. Qi is therecipient of the NSF CAREER Award. She also received the Best PaperAward at the 18th International Conference on Pattern Recognitionand the 3rd ACM/IEEE International Conference on Distributed SmartCameras. She recently receives the Highest Impact Paper from the IEEEGeoscience and Remote Sensing Society.

Zhi Wang received the B.S. degree fromShenyang Jian Zhu University, China, in 1991,the M.S. degree from Southeast University,China, in 1997, and the Ph.D. degree fromShenyang Institute of Automation, the ChineseAcademy of Sciences, China, in 2000. During2000-2002as Post-doc, he has conducted re-search in Institut National Polytechique de Lor-raine, France and Zhejiang University, China re-spectively. During 2010-2011, he has conductedresearch as advanced scholar at UW-Madion,

USA. He is currently an Associate Professor in the Department ofControl Science and Engineering of Zhejiang University. His researchinterests include wireless sensor networks, visual sensor networks,industrial communication and systems and networked control systems.He is a member of IEEE and ACM.

IEEE TRANSACTIONS ON MOBILE COMPUTING 1 ... A Semantic-based Friend Recommendation System for Social...

Documents

Transcript of IEEE TRANSACTIONS ON MOBILE COMPUTING 1 ... A Semantic-based Friend Recommendation System for Social...