A Survey on Privacy and Security in Online Social - arXiv · 2. A TAXONOMY OF PRIVACY AND SECURITY...

1

A Survey on Privacy and Security in Online SocialNetworksImrul Kayes, University of South FloridaAdriana Iamnitchi, University of South Florida

Online Social Networks (OSN) are a permanent presence in today’s personal and professional lives of a huge segment of the

population, with direct consequences to offline activities. Built on a foundation of trust–users connect to other users with common

interests or overlapping personal trajectories–online social networks and the associated applications extract an unprecedentedvolume of personal information. Unsurprisingly, serious privacy and security risks emerged, positioning themselves along two

main types of attacks: attacks that exploit the implicit trust embedded in declared social relationships; and attacks that harvest

user’s personal information for ill-intended use. This article provides an overview of the privacy and security issues that emergedso far in OSNs. We introduce a taxonomy of privacy and security attacks in OSNs, we overview existing solutions to mitigate

those attacks, and outline challenges still to overcome.

Categories and Subject Descriptors: H.5.2 [Computer-Communication Networks] Distributed Systems, Distributed applica-tions

General Terms: Privacy and Security

Additional Key Words and Phrases: Privacy, Security, Online social networks

1. INTRODUCTION

Online Social Networks (OSNs) have become a mainstream cultural phenomenon for millions of Inter-net users. Combining user-constructed profiles with communication mechanisms that enable users tobe pseudo-permanently “in touch”, OSNs leverage users’ real-world social relationships and blend evenmore our online and offline lives. As of 2014, Facebook had 1.32 billion monthly active users and it wasthe second most visited site on the Internet [Alexa, 2014]. Twitter, a social micro-blogging platform,claims over 500 million users [Holt, 2013]. According to Nielsen’s 2012 survey, social networking wasthe fourth most popular online activity [Nielsen, 2012].

Perhaps more than previous types of online applications, OSNs are blending in real life: companiesare mining trends on Facebook and Twitter to create viral content for shares and likes; employers arechecking Facebook, LinkedIn and Twitter profiles of job candidates1; law enforcement organizationsare gleaning evidence from OSNs to solve crimes2; activities on online social platforms change politicalregimes [Lotan et al., 2011] and swing election results3.

Because users in OSNs are typically connected to friends, family, and acquaintances, a commonperception is that OSNs provide a more secure, private and trusted internet-mediated environment foronline interaction [Cutillo et al., 2009b]. In reality, however, OSNs have raised the stakes for privacyprotection because of the availability of an astonishing amount of personal user data which would nothave been exposed otherwise. More importantly, OSNs expose now information from multiple socialspheres – for example, personal information on Facebook and professional activity on LinkedIn – that,aggregated, leads to uncomfortably detailed profiles [Nissenbaum, 2011].

1http://goo.gl/kHJFI52http://www.cnn.com/2012/08/30/tech/social-media/fighting-crime-social-media/3http://goo.gl/9A6FR

ACM Computing Surveys Template.

arX

iv:1

504.

0334

2v1

[cs

.SI]

27

Jan

2015

1:2 • Kayes and Iamnitchi

Unwanted disclosure of user information combined with the OSNs-induced blur between the pro-fessional and personal aspects of user lives allow for incidents of dire consequences. The news me-dia covered some of these, such as the case of a teacher suspended for posting gun photos [Dam,2009] or employee fired for commenting on her salary compared with that of her boss [Mail, 2011]),both on Facebook. On top of this, social networks themselves intentionally (e.g., Facebook Beaconcontroversy [Dwyer, 2011]) or unintentionally (e.g., published anonymized social data used for de-anonymization and inference attacks [Narayanan et al., 2011]) are contributing to breaches in userprivacy. Moreover, the high volume of personal data, either disclosed by the technologically-challengedaverage user or due to OSNs’ failure to provide sophisticated privacy tools, have attracted a variety oforganizations (e.g., GNIP4, 80legs5) that aggregate and sell user’s social network data. In addition, thetrusted nature of OSN relationships has become an effective mechanism for spreading spam, malwareand phishing attacks. Malicious entities are launching a wide range of attacks by creating fake pro-files, using stolen OSN account credentials sold in the underground market [Staff, 2010] or deployingautomated social robots [Wagner et al., 2012].

This paper provides a comprehensive review of solutions to privacy and security issues in OSNs.While previous literature reviews on OSN privacy and security are focused on specific topics, such asprivacy preserving social data publishing techniques [Zheleva and Getoor, 2011], social graph-basedtechniques for mitigating Sybil attacks [Yu, 2011], or OSN design issues for security and privacy re-quirements [Zhang et al., 2010], we address a larger spectrum of security and privacy problems andsolutions. First, we introduce a taxonomy of attacks based on OSNs’ stakeholders. We broadly cate-gorize attacks as attacks on users and attacks on the OSN and then refine our taxonomy based onentities that perform the attacks. These entities might be human (e.g., other users), computer pro-grams (e.g., social applications) or organizations (e.g., crawling companies). Second, we present howvarious attacks are performed, what counter-measures are available, and what are the challenges stillto overcome.

2. A TAXONOMY OF PRIVACY AND SECURITY ATTACKS IN ONLINE SOCIAL NETWORKS

We propose a taxonomy of privacy and security attacks in online social networks based on the stake-holders of the OSN and the forms of attack targeted at the stakeholders. We identify two stakeholdersin online social networks: the OSN users and the OSN itself. On one hand, OSN users share an as-tonishing amount of information ranging from personal to professional; the misuse of this informationcan have significant consequences. On the other hand, OSN services handle users’ information andmanage all users’ activities in the network, being responsible for the correct functioning of its servicesand maintaining a profitable business model. Indirectly, this translates into ensuring that their userscontinue to happily use their services without becoming victims of malicious actions.

The distinction we make between OSN users and the OSN itself as stakeholders is defined by scale:isolated attacks on users may not affect the wellbeing of the OSN. However, a large attack on userpopulation may translate into reputation damage, service disruption, or other consequences with directeffect on the OSN.

We thus classify online social network privacy and security issues into the following attack categories(summarized in Table I).

(1) Attacks on Users: these attacks are isolated, targeting a small population of random or specificusers. We identify various such attacks based on the attacker:

4http://gnip.com/5http://80legs.com/


Privacy and Security in Online Social Networks: A Survey • 1:3

(a) Attacks from other users: Users might put themselves at risk by interacting with other users,specially when some of them are strangers or mere acquaintances. Moreover, some of theseusers may not even be human (e.g., social robots [Hwang et al., 2012]), or may be crowdsourcingworkers strolling and interacting with users for mischievous purposes [Stringhini et al., 2013].Therefore, the challenge is to protect users and their information from other users.

(b) Attacks from social applications: For enhanced functionality, users may interact with variousthird-party-provided social applications linked to their profiles. To facilitate the interactionbetween OSN users and these external applications, the OSN provides application developersan interface through which to access user information. Unfortunately, OSNs put users at riskby disclosing more information than necessary to these applications. Malicious applications cancollect and use users’ private data for undesirable purposes [Felt and Evans, 2008].

(c) Attacks from the OSN: Users’ interactions with other users and social applications are fa-cilitated by the OSN services, in exchange for, typically, full control over user’s informationpublished on the OSN. While this exchange is explicitly stated in Terms of Service documentsthat the user must agree with (and supposedly read first), in reality few users understand theextent of this exchange [Fiesler and Bruckman, 2014] and most users do not have a real choiceif they don’t agree with the exchange. Consequently, the exploitation by the OSN of user’s per-sonal information is seen as a breech of trust, and many solutions have been proposed to hidepersonal information from the very service that stores it.

(d) De-anonymization and inference attacks: OSN services publish social data for others (e.g., re-searchers, advertisers) to analyze and use for other purposes. Typically, this data is anonymizedto protect user information. However, an attacker can de-anonymize social data and infer at-tributes that the user did not even mention in the OSN (such as sexual or political orientationinferred from the association with other users).

(2) Attacks on the OSN: these attacks are aimed at the service provider itself, by threatening its corebusiness.(a) Sybil Attacks: Sybil attacks are characterized by users assuming multiple identities to manip-

ulate the outcome of a service [Douceur, 2002]. Not specific to OSNs, Sybil attacks were used,for example, to determine the outcome of electronic voting [Riley, 2007], to artificially boost thepopularity of some media [Ratkiewicz et al., 2011], or to manipulate social search results [Ju-rek, 2011]. However, OSNs have also become vulnerable to Sybil attacks: by controlling manyaccounts, Sybil users are illegitimately increasing their influence and power in the OSNs [Yuet al., 2006].

(b) Crawling attacks: Large-scale distributed data crawlers from professional data aggregatorsexploit the OSN-provided APIs or scrape publicly viewable profile pages to build databasesfrom user profiles and social links. Professional data aggregators sale such databases to in-surance companies, background-check agencies, credit-ratings agencies, or others [Bonneauet al., 2009]. Crawling users’ data from multiple sites and multiple domains increases profil-ing accuracy. This profiling might lead to “public surveillance”, where an overly curious agency(e.g., government) could monitor individuals in public through a variety of media [Nissenbaum,2004].

(c) Social Spam: Social spam are contents or profiles that an OSN’s “legitimate” users don’t wish toreceive [Heymann et al., 2007]. Spam undermines resource sharing and hampers interactivityamong users by contributing phishing attacks, unwanted commercial messages, and promotingwebsites. Social spam spreads rapidly via OSNs due to the embedded trust relationships amongonline friends, which motivates a user to read messages or even click on links shared by herfriends.



Attacks on Users Attacks on the OSNAttacks from other users Sybil attacksAttacks from social applications Crawling attacksAttacks from the OSN Social spamDe-anonymization and inference attacks Distributed Denial-of-service attacks (DDoS)

MalwareAttacks

Table I. : Categories of attacks.

(d) Distributed Denial-of-service attacks (DDoS). DDoSes are common forms of attacks, where aservice is sent a large amount of seemingly inoffensive service requests that overload the ser-vice and deny access to it [Mirkovic et al., 2004]. As many popular services, OSNs are alsosubjected to such coordinated, distributed attacks.

(e) Malware Attacks: Malware is the collective name for programs that gain access, disrupt com-puter operation, gather sensitive information, or damage a computer without the knowledgeof the owner. ONSs are being exploited for propagating malware [Facebook, 2012]. Like socialspam, malware propagation is rapid due to the trust relationships in social networks.

The rest of the paper is organized as follows. Mitigating attacks on users (Sections 3 to 6) includediscussions of attacks from other users (Section 3), from social applications (Section 4), from the OSNitself (Section 5), and de-anonymization and inference attacks (Section 6). Mitigating attacks on theOSN (Sections 7 to 11) includes a discussion of Sybil attacks (Section 7), crawling attacks (Section 8),social spam (Section 9), distributed denial-of-service attacks (Section 10) and malware (Section 11).Finally, we conclude the paper in Section 12.

3. MITIGATING ATTACKS FROM OTHER USERS

Users reveal an astonishing amount of personally identifiable information on OSNs, including physi-cal, psychological, cultural and preferential attributes. For example, Gross and Acquisti’s study [Grossand Acquisti, 2005a] shows that 90.8% of Facebook profiles have an image, 87.8% of profiles haveposted their birth date, 39.9% have revealed phone number, and 50.8% profiles show their current res-idence. The study also shows that the majority of users reveal their political views, dating preferences,current relationship status, and various interests (including music, books, and movies).

Due to the diversity and specificity of the personal information shared on OSNs, users put them-selves at risk for a variety of cyber and physical attacks. Stalking, for example, is a common risk asso-ciated with unprotected location information6. Demographic re-identification was shown to be doable:87% of the US population can be uniquely identified by gender, ZIP code and full date of birth [Sweeney,2000]. Moreover, the birth date, hometown, and current residence posted on a user’s profile are enoughto estimate the user’s social security number and thus expose the user to identity theft [Gross andAcquisti, 2005a]. Unintended revealing of personal information brings other online risks, includingscraping and harvesting [Lindamood et al., 2009; Strufe, 2010], social pushing [Jagatic et al., 2007],and automated social engineering [Bilge et al., 2009].

Given the amount of sensitive information users expose on OSNs and the different types of relation-ships in their online social circles, the challenge OSNs face is to provide the correct tools for users toprotect their own information from others while taking full advantage of the benefits of informationsharing. This challenge translates into a need for fine-grained settings, that allow flexibility within atype of relationships (as not all friends are equal [Banks and Wu, 2009; Cummings et al., 2002]) and

6http://www.theguardian.com/technology/2012/feb/01/social-media-smartphones-stalking



flexibility with the diversity of personal data. However, this fine granularity in classifying bits of per-sonal information and social relationships leads to an overwhelmingly complex cognitive task for theuser. Such cognitive challenges worsen an already detrimental user tendency of ignoring settings alltogether, and blindly trusting the default privacy configurations that serve the OSN’s interests ratherthan the user’s.

Solutions to these three challenges are reviewed in the remainder of this section. Section 3.1 surveyssolutions that allow fine tunings in setting protection of personal data. The complexity challenge isaddressed in the literature on two planes: by providing a visual interface in support of the complexdecision that the user has to make (Section 3.2) and by automating the privacy settings (Section 3.3).To address the problem of users not changing the platform’s default settings, researchers proposedvarious solutions presented in Section 3.4.

3.1 Fine-grained Privacy Settings

Fine-grained privacy advocates [Krishnamurthy and Wills, 2008; Simpson, 2008] argue that fine-grained privacy controls are crucial features for privacy management. Krishnamurthy et al. [Krish-namurthy and Wills, 2008] introduce privacy “bits”—pieces of user information grouped together forsetting privacy controls in OSNs. In particular, they categorize a user’s data into multiple pre-definedbits, namely thumbnail (e.g., user name and photo); greater profile (e.g., interests, relationships andothers); list of friends; user-generated content (such as photos, videos, comments and links) and com-ments (e.g., status updates, comments, testimonials and tags about the user or user content). Users canshare these bits with a wide range of pre-defined users, including friends, friends of friends, groups,and all. Current OSN services (e.g., Facebook and Google+) have implemented this idea by allowingusers to create their own social circles and to define which pieces of information can be accessed bywhich circle.

To help users navigate the amount of social information necessary for setting correct fine-grainedprivacy policies, researchers suggest various ways to model the social graph. One model is based onontologies that exploits the inherent level of trust associated with relationship definition to specifyprivacy settings. Kruk [Kruk, 2004] proposes Friend-of-a-Friend (FOAF)-Realm, an ontology-basedaccess control mechanism that uses RDF to describe relations among users. The system uses a genericdefinition of relationships (“knows”) as a trust metric and generate rules that control a friend’s access toresources based on the degree of separation in the social network. Choi et al. [Choi et al., 2006] proposea more fine-grained approach, which considers named relationships (e.g., “worksWith”, “isFriendOf”,“knowsOf”) in modeling the social network and the access control. A more nuanced trust-related accesscontrol model is proposed by Carminati et al. [Carminati et al., 2006] based on relationship type,degree of separation, and a quantification of trust between users in the network.

For more fine-grained ontology-based privacy settings, semantic rules have been used. Rule-basedpolicies represent the social knowledge base in an ontology and define policies as Semantic Web RuleLanguage (SWRL) rules. SWRL7 is a language for the Semantic Web, which can represent rules aswell as logic. Researchers used SWRL to express access control rules that are set by the users. Finally,access request related authorization is provided by reasoning on the social knowledge base. Systemsthat leverage OWL and SWRL to provide rule-based access control framework are [Elahi et al., 2008;Carminati et al., 2009; Masoumzadeh and Joshi, 2011]. Although conceptually similar, [Carminatiet al., 2009] provides richer OWL ontology and different types of policies; access control policy, adminpolicy and filtering policy. A more detailed semantic rule-based model is developed by Masoumzadehand Joshi [Masoumzadeh and Joshi, 2011]. Rule-based privacy models have two challenges to over-

7http://www.w3.org/Submission/SWRL/



come. First, authorization is provided by forward reasoning on the whole knowledge base, challengingscalability with the size of the knowledge base. Second, rule management is complex and requires ateam of expert administrators [Engelmore, 1988].

Role and Relationship-Based Access Control (ReBAC) are other types of fine-grained privacy mod-els that employ roles and relationships in modeling the social graph. The working principle of thesemodels is two-fold: 1) track roles or relationships between resource (e.g., photos) owner and the re-source accessor; 2) enforce access control policies in terms of the roles or relationships. Fong [Fong,2011b] proposes a ReBAC model based on the context-dependent nature of relationships in social net-works. This model targets social networks that are poly-relational (e.g., teacher-student relationshipsare distinct from child-parent relationships), directed (e.g., teacher-student relationships are distinctfrom student-teacher relationships) and tracks multiple access contexts that are organized into a tree-shaped hierarchy. When access is requested in a context, the relationships from all the ancestor con-texts are combined with the relationships in the target access context to construct a network on whichauthorization decisions are made. Giunchiglia et al. [Giunchiglia et al., 2008] propose RelBac, anotherrelation-based access control model to support sharing of data among large groups of users. The modeldefines permissions as relations between users and data, thus separating them from roles. The entity-relationship model of RelBac enables description logics and as well as the reasoning for access controlpolicies.

In practice, many online social networks (such as Facebook) have already implemented fine-grainedcontrols. A study of Bonneau et al. [Bonneau and Preibusch, 2010] on 29 general purpose online socialnetwork sites shows that 13 of them offer a line-item setting where individual data items could be setwith different visibility. These line-item settings are granular (one data item is one ‘bit’) and flexible(users can change social circles).

3.2 View-centric Privacy Settings

Lack of appropriate visual feedback has been identified as one of the reasons for confusing and timeconsuming privacy settings [Strater and Lipford, 2008]. View-centric privacy solutions are built on theintuition that a better interface for setting privacy controls can impact users’ understanding of privacysettings and thus their success in correctly exercising privacy controls. These solutions visually informthe user of the setting choices and consequences of his choices.

In [Lipford et al., 2008], the authors propose an alternative interface for Facebook privacy settings.This interface is a collection of tabbed pages, where each page shows a different view of the profile asseen by a particular audience (e.g., friends, friends of friends, etc), along with controls for restrictingthe information shared with that group. While this solution provides visual feedback on how otherusers will see her profile, it’s management is tedious for users with many groups.

A simpler interface is proposed by C4PS (Colors for Privacy Settings) [Paul et al., 2012], whichapplies color coding for different privacy visibilities to minimize the cognitive overhead of the autho-rization task. This approach applies four color schemes for different groups of users; red–visible tonobody; blue–visible to selected friends; yellow–visible to all friends; and green–visible to everyone. Auser can change the privacy setting for a specific data item by clicking the buttons on the edge of theattribute. The color of the buttons shows the visibility of the data. If users click “selected friend” (blue)button, a window will open in which friends or groups (a pre-defined set of friends) are granted accessto the data item.

A similar approach is implemented in today’s most popular OSNs in different ways. For example,Facebook provides a dropdown of viewers (e.g., only me, friends, and public) with icons as visual feed-back. In the custom setting, users can set more granular scales, e.g., share the data item with friendsof friends, friends of those tagged and restrict sharing with specific people or lists of people.ACM Computing Surveys Template.


3.3 Automated Privacy Settings

Automated privacy settings methods employ machine learning to automatically configure a user’s pri-vacy setting with minimal user effort.

Fang and Lefevre’s privacy wizard [Fang and LeFevre, 2010] iteratively asks a user about his privacypreferences (allow or deny) for specific (data item, friend) pairs. The wizard constructs a classifier fromthese preferences, which automatically assigns privileges to the remaining of the user’s friends. Theclassifier considers two types of features: community structure (e.g., to which community a friend of theuser belongs) and profile information (such as age, gender, relationship status, education, political andreligion views, work history). The classifiers employed (NaiveBayes, NearestNeighbors and DecisionTree) use uncertainty sampling [Lewis and Gale, 1994], an active learning paradigm, acknowledgingthe fact that users may quit labeling friends at any time.

Social Circles [Adu-Oppong et al., 2008] is an automated grouping technique that analyzes the users’social graph to identify “social circles”, clusters of densely and closely connected friends. The authorsposit social circles as uniform groups from the perspective of privacy settings. The assumption is thatusers will share the same information with all friends in a social circle. Hence, friends are automati-cally categorized into social circles for different circle-specific privacy policy settings. To find the socialcircles, they used a (α, β) clustering algorithm proposed in [Mishra et al., 2007]. While convenient, thisapproach limits users’ flexibility in changing the automate settings.

Danezis [Danezis, 2009] aims to infer the context within which user interactions happen, and en-forces policies to prevent users that are outside that context from seeing the interaction. Conceptuallysimilar to Social Circles, contexts are defined as cohesive groups of users, e.g., groups that have manylinks within the group and fewer links with non-members of the group. The author used a greedyalgorithm to extract the set of groups from a social graph.

An inherent tradeoff for this class of solutions is ease of use vs. flexibility: while the average usermight be satisfied with an automatically-generated privacy policy, the more savy user will want moretransparency and possibly more control. To this end, the privacy wizard [Fang and LeFevre, 2010]provides for advanced users the visualization of a decision tree model and tools to change it. Anotherchallenge for some of these solutions is bootstrapping: a newcomer in the online social network has nohistory of interactions to inform such approaches.

3.4 Default Privacy Settings

Studies have shown that users on OSNs often do not take advantage of the privacy controls available.For example, more than 99% Twitter users retained the default privacy setting where their name, listof followers, location, website, and biographical information are visible [Krishnamurthy et al., 2008].Similarly, the majority of Facebook users has default settings [Gross and Acquisti, 2005b; Acquisti andGross, 2006; Krishnamurthy and Wills, 2008]. Under-utilization of privacy options are mostly due topoor privacy setting interface [Lipford et al., 2008], intricate privacy settings [Madejski et al., 2011],and inherent trust in OSNs [Acquisti and Gross, 2006; Boyd, 2004]. The problem with not changingthe default settings is that they almost always tend to be more open that the users would prefer [Liuet al., 2011]. To overcome this situation, approaches to automatically generate more appropriate de-fault privacy settings have been proposed.

PriMa [Squicciarini et al., 2010] automatically generates privacy policies, acknowledging the factthat the average user will find the task of personalizing his access control policies overwhelming, dueto growing complexity of OSNs and the diversity of user content. The policies in PriMa are generatedbased on the average privacy preference of similar and related users, the accessibility of similar itemsfrom similar and related users, closeness of owner and accessor (measured by the number of common



friends), the popularity of the owner (i.e., popular users have sensitive profile items), etc. However,a large number of factors and their parametrized tuning contribute to longer policy generation andenforcement time. A related approach, PolicyMgr [Shehab et al., 2010], uses supervised learning ofuser-provided example policy settings and builds classifiers that are then used for automatically gen-erating access control policies.

Aegis [Kayes and Iamnitchi, 2013a; Kayes and Iamnitchi, 2013b] is a privacy framework and imple-mentation that leverages the ‘Privacy as Contextual Integrity’ theory proposed by Nissenbaum [Nis-senbaum, 2004] for generating default privacy policies. Unlike the approaches just presented above,this solution does not need user input or access history. Instead, it aggregates social data from differ-ent OSNs in an ontology-based data store and then applies the two norms of Nissembaum’s theoryto regulate the flow of information between social spheres and access to information within a socialsphere.

4. MITIGATING ATTACKS FROM SOCIAL APPLICATIONS

Social applications, written by third-party developers and running on OSN platforms, provide en-hanced functionality linked to a user profile. For example, Candy Crush Saga8 (a social game) andHoroscopes9 (users can check horoscope) are two popular social applications on Facebook.

The social networking platform works as a proxy between users and applications and mediates thecommunication between them. To better understand this proxy, we show data flow between a third-party social application and the Facebook platform in Figure 1. An application is hosted on a third-party server and runs on user’s data that are taken from the Facebook platform. When a user installsthe application on Facebook, it takes permission from the user to use some of her profile information.Application developers write the application pages of an application using Facebook mark-up language(FBML)—a subset of HTML and CSS extended with proprietary Facebook tags.

When a user interacts with an application, such as clicks an application icon on Facebook to generatehoroscopes (step 1 on Figure 1), Facebook requests the page from the third-party server where theapplication is actually hosted (step 2). The application requests the user’s profile information usingsecret communication with Facebook (step 3). The application uses the information (e.g., birth datemay be used to create horoscopes) and returns a FBML page to Facebook (step 4). Facebook finallytransforms the application page from the server by replacing the FBML page with standard HTML,JavaScript (step 5), and transmits the output page to the end user (step 6).

OSN users are facing multiple risks while using social applications. First, an application might bemalicious; it could collect a high volume of user data for unwanted usage. For example, to show thisvulnerability, BBC News developed a malicious application that could collect large amounts of userdata in only three hours [Kelly, 2008].

Second, application developers can violate developer policies to control user data. Application devel-opers are supposed to abide by a set of rules set by the OSNs, called “developer policies”. Developerpolices are intended to prohibit application developers from misusing personal information or forward-ing it to other parties. However, reported incidents [Mills, 2008; Steel and Fowler, 2010] show thatapplications violate these developer policies. For example, a Facebook application, “Top Friends” en-abled everyone to view the birthday, gender and relationship status of all Top Friends users, eventhough those users kept their privacy for those information to private [Mills, 2008], violating the de-veloper policies that private information of friends are not accessible. The Wall Street Journal finds

8https://www.facebook.com/candycrushsaga9https://www.facebook.com/dailyhoroscopes



User ApplicationServerFacebook

1. Interact with the app

6. Display output page

2. Request for app URL3. Communication,

query user data(session secret required)

4. Output pageFacebook markup(FBML)

5. Transform to HTML+JS

Fig. 1: Data flow in a Facebook application.

evidence that Facebook applications transmit identifying information to advertising and tracking com-panies [Steel and Fowler, 2010].

Finally, third-party social applications can query more data about a user from an OSN, regardlesswhether needed or not for proper operation. A study by Felt and Evans [Felt and Evans, 2008] of150 of the top applications on Facebook shows that most of the applications only needed user name,friends, and their networks. However, 91% of social networking applications have accessed data thatthey do not need for operation. This violates the principle of least privilege [Saltzer and Schroeder,1975], which states that every user should only get the minimal set of access rights that enables himto complete his task.

We identified three classes of solutions that attempt to minimize the privacy risks stated above: (i) byanonymizing social data made available to applications (Section 4.1); (ii) by defining and enforcingmore granular privacy policies that the third-party applications have to respect (Section 4.2); and(iii) by providing third-party platforms for executing these applications and limiting the transfer of thesocial data from applications to other parties (Section 4.3).

4.1 Anonymizing Social Data For Third-party Applications

Privacy-by-proxy [Felt and Evans, 2008] uses special markup tags that abstract user data and han-dle user input. Third-party applications do not have access to users’ personal data, rather they useusers’ IDs and tags to display data to users. For example, to display a user’s hometown, an applicationwould use a tag <hometown id=“3125”/>. The social network server would then replace the tag withreal data value (e.g., “New York”) while rendering the corresponding page to the user. However, appli-cations might rely on private data for operations, for example a horoscope application might requireusers’ gender information. A conditional tag handles this dependency (e.g., <if-male> tag can choosethe gender of an avatar). Privacy-by-proxy ensures privacy by limiting what applications can access,which might also limit the social value and usability of the applications. Data availability throughproxy also means that application developers have to expose the business logic to social network sites(in a form of Javascript to end users). This might discourage third-party developers in the first place.Moreover, applications could still develop learning mechanisms to infer attributes of a user. For ex-ample, developers might include scripting code in the personal data dependent conditional executionblocks (if-else) that could send information to an external server when the block executes.

Similar to Privacy-by-proxy, PESAP [Reynaert et al., 2012] provides anonymized social data to appli-cations. However, PESAP secures the information flow inside the browser, so that applications cannotdo information leakage though outgoing communications with other third-parties. The anonymizationis provided by encrypting the IDs of the entities of the social graph with an application-specific sym-



metric key. Applications use a REST API to get access to the anonymized social graph. PESAP providesa re-identification end-point in order to enable users to see the personal information of their friends inthe context of social applications. Secure information flow techniques protect the private informationin the browser of a user. This is done by a dynamic, secure multi-execution flow technique [Devrieseand Piessens, 2010], which analyzes information flow inside a browser and ensures that the flow com-plies with certain policies. The multi-execution flow technique labels the inputs and the outputs of thesystem with security labels and runs a separate sub-execution of the program for each security label.The inputs have designated security labels and can be accessed by a sub-execution having the same ora higher security label. Figure 2 shows the data flow in a PESAP-aware browser.

Application Code

User

Secure Multi-Execution

Browser

Reidentification AppPortal RESTapi App

Anonymous data

Private data

Fig. 2: Data flow in a PESAP aware browser [Reynaert et al., 2012].

4.2 Enforcing Additional Privacy Policies To Social Applications

Besmer et at. [Besmer et al., 2009] propose an access control framework for applications, which addsa new user-application policy layer on the top of the user-user policy to restrict the information ap-plications can access. Upon installing an application, a user can specify which profile information theapplication could access. However, the framework still uses user-to-user policy to additionally governan application’s access to friends’ information on behalf of the user (Alice’s installed applications willnot get her friend Bob’s private data if user-user policy of Bob denies Alice to do so). An additionalfriendship-based protection restricts the information the application can request of a user’s friends.For example, Alice installs an application which requests her friend Bob’s information and Bob didnot install the application. Consider that Bob’s default privacy policy is very permissive. But Alice isa privacy conscious and she allows applications to access only the Birth Date attribute. According tofriendship-based protection, when the application will request Bob’s information via Alice, it will onlybe able to get Bob’s birth date. So, friendship-based protection enables Alice’s privacy policies to extendto Bob. The model works well for privacy-savvy concerned users who make informed decisions aboutan application’s data usage while installing an application. An additional functionality could be a setof restrictive default policies for average users.

4.3 Third-party Platforms For Running Social Applications

Egele et al. [Egele et al., 2012] note that, since popular OSN services such as Facebook did not imple-ment user-defined access control mechanisms to date, pragmatic solutions should not rely on the helpACM Computing Surveys Template.


of OSNs. They introduce PoX, a browser extension for Facebook applications that runs on a client ma-chine and works as a proxy to provide fine-grained access controls. PoX works as a reference monitorwhich sits between applications and the Facebook server and controls an application’s access to users’data stored on the server. In so doing, an application requests the proxy for users’ profile data. Upon re-ceiving the request, the proxy performs access control checks based on user-provided privacy settings.If the request is allowed, the proxy signs the access request with its key, sends the request to the OSNserver, and finally replays the result from the server to the application. This application to server dataflow is shown in Figure 3. An application developer needs to use the PoX server-side library insteadof the Facebook server-side library. One potential challenge is to motivate application developers towrite PoX-aware applications when existing mechanisms (e.g., Facebook application environment) areperfectly in place.

2. Request user profile data from proxy

5. Transmit allowed data to app server

3. Request data allowed by ACL

4. Transmit allowed data to proxy

Facebook server

Client1. Open application without

sending session secret FacebookApplication

6. Display application page

Client-sideproxy

Fig. 3: A data-flow between applications and server with PoX [Egele et al., 2012].

xBook [Singh et al., 2009] is a restricted ADSafe-based JavaScript framework that provides a server-side container in which applications are hosted and a client-side environment to render the applica-tions to users. xBook is different than PoX in that it not only controls third-party applications’ access touser data (which PoX also does), but also it limits what applications do with the data. Applications aredeveloped as a set of components; a component is a smallest granular building block of codes monitoredby xBook. A component also reveals the information that the component can access and the externalentity with which it communicates. During the deployment of an application in xBook, an applicationdeveloper requires to specify these information. From the specification, xBook generates a manifest forthe application. A manifest is a set of statements that specifies what user data the application will useand with which external services it will share the data. At the time of installing the application, themanifest will be presented to the user. In this way, a user will be able to make a more informed decisionbefore installing an application. Although xBook controls third-party applications’ access to user dataand limits application’s data usage, it has to deal with two challenges. First, the platform itself hasto be trusted by users and by applications, as it is expected to protect users’ personal data and enablethird-party applications to execute. Second, hosting and executing applications in xBook requires re-sources (storage, computation and maintenance) that may be difficult to provide in the absence of abusiness model.



5. PROTECTING USER DATA FROM THE OSN

The “notice-and-consent” approach to online privacy is the status-quo for practically all online services,OSNs included. This approach informs the user of the privacy practices of the service and provides theuser a choice whether to engage in the service or not.

The limitations of this approach have been acknowledged for long. First, the long and abstruseprivacy policies offered for reading are virtually impossible to understand, even if the user is willingto invest the time for reading them. For example, on August 2014, we found 4389 words on Facebook’sprivacy policies and 3473 words on Twitter’s privacy policies. Second, such policies always leave roomfor future modifications; therefore, the user is expected to read them repeatedly in order to practiceinformed consent. And third, long as they are, these privacy policies tend to be incomplete10, as theyoften cannot include all the parties to which user’s private information will be allowed to flow (such asadvertisers). Consequently, generally people do not read the Terms of Service and when they do, theydo not understand them [Fiesler and Bruckman, 2014].

A second serious deterrent for users protecting their online privacy is the “take-it-or-leave-it” “choice”the users are offered. While it may seem as a free choice, in reality the cost of not using the onlineservice (whether email, browsing, shopping, etc) is unacceptably high.

Cornered in this space of falsely informed and lack of choice, users may look for solutions that allowthem to use the online service without paying the information cost associated with it. Researchersbuilt on this intuition in two directions. The first direction tends to hide the user information from thevery service that stores it (Section 5.1). The second taps into different business models than the onesthat make a living from user’s private information and replaces the centralized service provider witha fully decentralized solution that is privacy-aware by design (Section 5.2).

5.1 Protection by Information Hiding

This line of work is empirically supported by the Acquisti and Gross’s study [Acquisti and Gross, 2006]that shows that while 60% of users trust their friends completely with their private and personalinformation, only 18% of users trust Facebook to the same degree.

The general approach for hiding information from the OSN is based on the observation that OSNscan run on fake data. If the operations that OSNs perform on the fake data are mapped back to originaldata, users can still use the OSNs without providing them real information. Fake data could be cipher-text (encrypted) or obtained by substituting the original data with pre-mapped data from a dictionary.Encrypted data can be stored on a user’s trusted device (including third-party servers or a friend’scomputer). Access controls are provided by allowing authorized users (e.g., friends) to get the originaldata from the fake data. Different implementations of this idea are presented next.

flyByNight [Lucas and Borisov, 2008] is a Facebook application that enables users to communicateon Facebook without storing a recorded trace of their communication in Facebook. The flyByNightFacebook application generates a public/private key pair and a password during configuration. Thepassword is used as a key to encrypt the private key and the key is stored on flyByNight server.When a user installs the application, it downloads a client-side JavaScript from the FlyByNight server.This JavaScript does key generation and cryptographic operations. The application knows a user’sfriends and their public keys who have also installed the flyByNight application. To send messages tofriends, a user enters the message into the application and selects the recipient friends. The client-side JavaScript encrypts the content of the message with other users’ public keys, tags the encryptedmessage with the Facebook ID numbers of their recipients, and sends them to a flyByNight messagedatabase server. The encrypted messages reside on the flyByNight server. When a user reads a mes-

10http://goo.gl/yXqI1s



sage, she provides the password to get the private key (stored in the flyByNight key database). Theprivate key is used to decrypt the message. flyByNight operates under the regulation of Facebook,as it is a Facebook application. It is possible that the computation load on the Facebook servers dueto encryption, as well as the suspicious lack of communication among users might attract Facebook’sattention and lead to deactivating the application. In the worst case, users lose their ability of hidingtheir communication, but previous messages remain hidden from the OSN.

Persona [Baden et al., 2009] hides user data from the OSN by combining attribute-based encryption(ABE) and public key cryptography. The core functionalities of current OSNs such as profiles, walls,notes, etc., are implemented in Persona as applications. Persona uses an application “Storage” to enableusers to store personal information, and share them with others through an API. Persona applicationin Facebook is similar to any third-party Facebook application, where users log-in by authenticatingto the browser extension. The browser extension translates Persona’s special markup language. Userinformation is stored in Persona storage services rather than on Facebook and other Persona users canaccess the data given that they have the necessary keys and access rights. Similar to the flyByNight,Persona’s operation depends on the OSN, as core functionalities are implemented as applications.

NOYB [Guha et al., 2008] distorts user information in an attempt to hide real identities from theOSN, allowing only trusted users (e.g., friends) to get access to the restored, correct information. Toimplement this idea, NOYB splits a user’s information into atoms. For example, Alice’s name, genderand age (Alice, F, 26) are split into two atoms: (Alice, F) and (26). Instead of encrypting the informa-tion, NOYB replaces a user’s atom with pseudorandomly picked another user’s atom. So, Alice’s firstatom is substituted with, for example, the atom (Athena, F) from Athena’s profile, and the secondatom with Bob’s atom from the same class (38). All atoms from the same class for all users are storedin a dictionary. NOYB uses ciphered index of a user’s atom to substitute an atom from this dictionary.Only an authorized friend knows the encryption key and can reverse the encryption. A proof-of-conceptimplementation of NOYB as a Firefox browser plugin adds a button to ego’s Facebook profile that en-crypts his information and another button on alter’s page that decrypts alter’s profile. The clevernessof NOYB is that it stores legitimate atoms of information in plain text, thus not raising the suspi-cions of the OSN. The challenge, however, is the scalability of the dictionaries: the dictionaries arepublic, contain atoms from both NOYB users and non-users, and are maintained by a third party withunspecified business/incentive model.

FaceCloak [Luo et al., 2009], implemented as a Firefox browser extension, protects user informationby storing fake data in the OSN. Unlike NOYB, it does not replace a user’s information with anotheruser’s information, rather it uses dictionaries and random Wikipedia articles as replacements. A user,say Alice, can protect information from the OSN by using a special marker pre-defined by FaceCloak(@@ in their implementation). When Alice submits the form to the OSN, FaceCloak intercepts the sub-mitted information, replaces the fields that start with the special marker by appropriate fake text andstores the fake data in the OSN. It uses a dictionary (for profile information) and random Wikipedia ar-ticles (for walls and notes) to provide fake data. Now, using Alice’s master key and personal index key,FaceCloak does the encryption of the real data, computes MAC keys, computes the index, and sendsthem to a third-party server. Now consider one of Alice’s friends Bob, who has installed FaceCloakin his browser, and Bob wants to see Alice’s information. After downloading Alice’s page (which alsoincludes fake data from the OSN), FaceCloak computes indexes of relevant fields using master andpersonal index key of Alice. Then it downloads the corresponding values from the third-party server.Upon receiving the value, FaceCloak checks the integrity of the received cipher-text, decrypts it, andsubstitutes the real data for the fake data. If the value is not found, then the data is left unchanged.See the architecture of FaceCloak in Figure 4. FaceCloak depends on a “parallel” centralized infras-tructure to store the encrypted data, which means that a third-party has to maintain all users’ data,



probably without getting any benefits from it. And, users have to trust the reliability of the third-partyserver, which also represents a single point of failure.

Publisher

1. Keys

2. fake information

3. encrypted real information

2. encrypted real information

3. fake information

Facebook

Viewer Third party

Fig. 4: FaceCloak architecture [Luo et al., 2009].

Virtual Private Social Networks) (VPSN) [Conti et al., 2011], unlike flyByNight, FaceCloak, andNOYB, does not require third-party services to protect users’ information from an OSN. Instead, theyleverage the computational and storage resources of the OSN users to store real profile data of otherusers, while storing fake profile data on Facebook. FaceVPSN is a Firefox browser extension thatimplements VPSN for Facebook. In FaceVPSN, user Alice changes her profile information to somefake information and stores the fake information in Facebook and sends by email her correct and fakeprofiles in a prespecified XML format to her friends. In order to access Alice’s real profile, her friendshave to have FaceVPSN installed (as a regular Firefox extension) and use its GUI to add Alice’s XMLfile. When Alice’s friend Bob requests Alice’s Facebook page, Facebook sends an HTML response thathas Alice’s fake data from Facebook. FaceVPSN’s JavaScript code is triggered when “page load” eventis fired. The JavaScript code of FaceVPSN searches the profile information of Alice in Bob’s stored XMLfile and replaces the fake information with real information.

Unlike other solutions presented above, FaceVPSN does not risk being suspended by the OSN (sinceit is not an application running with the OSN’s support). Like FaceCloak, however, FaceVPSN re-quires a user’s friends to install the FaceVPSN extension in order to see the user’s profile. Moreover,FaceVPSN demands a high degree of user interaction that might affect usability. In particular, uponthe addition of a new contact to the friend list, the user has to explicitly exchange profile informationwith the new friend and upload it into the FaceVPSN application. On top of it, every change of profileinformation has to be emailed as an XML file to all friends, and the friends are required to go throughthe XML update process in order to see the changes. This entire process affects usability, given thehigh number of friends a user might have in OSNs (e.g., half the Facebook users have more than 200friends, and 15% have more than 500 friends11)

While the various implementations of the idea of hiding the personal information from the OSNhave different tradeoffs, as discussed above, there are also risks associated with the approach itself.First, because the OSN operates on fake data (whether encrypted or randomized), it will not be able to

11http://www.pewresearch.org/fact-tank/2014/02/03/6-new-facts-about-facebook/



provide some personalized services such as social search and recommendation. Second, users end uptransferring their trust from the OSN to either a third-party server or friends’ computers for unclearbenefits. The third-party server provides yet another service whose terms of use are probably pre-sented in yet another incomprehensible Terms of Service document, with an opt-out “choice”. Friends’computers require extra care for fault tolerance and malicious attacks.

5.2 Protection Via Decentralization

An alternative to obfuscate information from the OSN is to migrate to another service that is especiallydesigned for user privacy protection. Research in this area explored the design space of decentralized(peer-to-peer) architectures for managing user information, thus avoiding the centralized service witha global view of the entire user population. The typical overlay used in most of these solutions is basedon distributed hash tables, preferred over unstructured overlays for their performance guarantees. Inaddition, data is encrypted and only authorized users get access to the plain text. In this section, wediscuss decentralized solutions for OSNs. There are three dimensions that differentiate the solutions:(1) how the distributed hash table has been implemented (e.g., OpenDHT, FreePastry, Likir DHT)? (2)where to store users’ content (e.g., nodes run by the user, by the friends or cloud infrastructures)? (3)how to manage encryption keys for access controls (e.g., public-key infrastructure, out-of-band)?

PeerSooN’s [Buchegger et al., 2009] architecture has two-tiers. One tier, implemented using OpenDHT,serves as a look-up service to find a user. It stores users’ meta-data for example, the IP address, infor-mation about files, and notifications for users. A peer can connect to another peer asking the look-upservice directly to get all required information. The second tier is formed by peers and it contains users’data, such as user profiles. Users can exchange information either through the DHT (e.g., a message isstored within the DHT if receiver of a message is offline) or directly between their devices. The systemassumes a public-key infrastructure (PKI) for privacy protection. A user encrypts data with the publickeys of the intended audience, i.e., the friends of the user.

Safebook [Cutillo et al., 2009b; Cutillo et al., 2009a] is a decentralized OSN, which uses a peer-to-peer architecture to get rid of a central, omniscient authority. Safebook has three main components:a trusted identification service for certification of public keys and the assignment of pseudonyms;matryoshkas, a set of concentric shells around each user, which serve to replicate the profile dataand anonymizes traffic; and a peer to peer substrate (e.g., DHT) for the location of matryoshkas thatenables access to profile data and exchange messages.

LifeSocial.KOM [Graffi et al., 2011] is another P2P-based OSN. It implements common function-alities in OSNs using OSGi-based12 software components called “plugins”. As a P2P overlay, it usesFreePastry for interconnecting the participating nodes and PAST for reliable, replicated data storage.The system uses cryptographic public keys as user ID. To protect privacy, a user encrypts a privatedata object (e.g., profile information) with a symmetric cryptographic key. She then encrypts the sym-metric cryptographic key individually with the public keys of authorized users (e.g., her friends) andappends to the data object. The object and the list of encrypted symmetric keys are also signed bythe user and they are stored in the P2P overlay. Other users in the system can authenticate the dataobject by using the public key of the author. But only authorized users (e.g., friends) can decrypt thesymmetric key and thus, the content of the object.

LotusNet [Aiello and Ruffo, 2012] is a framework for the implementation of a P2P based OSN on aLikir DHT [Aiello et al., 2008]. It binds a user identity to both overlay nodes and published resourcesfor robustness of the overlay network and secures identity based resource retrieval. Users’ informationis encrypted and stored in the Likir DHT. Access control responsibility is assigned to overlay index-

12http://www.osgi.org/Specifications/HomePage



nodes. Users issue signed grants to other users for accessing their data. DHT returns the stored datato the requestor only if the requestor can provide a proper grant, signed by the data owner.

Vis-a-Vis [Shakimov et al., 2011] targets high content availability. Users store their personal datain Virtual Individual Servers (VISes), which are kept on the user’s computer. The server data arealso replicated on a cloud infrastructure so that the data is available from the cloud when a user’scomputer is offline. Users can share information with other users using peer-to-peer overlay networksthat connect VISes of the users. The cloud service needs to be rented (considering the high volume ofthe data users store in OSNs), which makes the scheme monetary dependent.

Prometheus [Kourtellis et al., 2010] is a peer-to-peer social data management system for socially-aware applications. It does not implement traditional OSN functionalities (e.g., profile creation, man-agement, contacts, messaging, etc.), rather it manages users’ social information from various sourcesand exposes APIs for social applications. Users’ social data are encrypted and stored in a group oftrusted peers selected by users for high service availability. Prometheus architecture is based on Pas-try, a DHT-based overlay, and it uses Past to replicate social data. An inference on social data is subjectto user defined access control policy enforced by the trusted peers. Prometheus relies on a public-keyinfrastructure (PKI) for user authentication and message confidentiality.

The toughest challenge for decentralized OSNs is to convince traditional OSN users to migrate totheir systems. Centralized social networks have large, established user bases and they are accessi-ble from anywhere. Moreover, they already have a mature infrastructure, making good revenues fromusers’ data and maintaining excellent usability. However, decentralized OSNs are also becoming pop-ular, specially among privacy-aware users. For example, Diaspora13 is a fully operating open source,stable and decentralized OSN, which relies on user contributed local servers to provide all the majorcentralized OSN functionalities.

6. MITIGATING DE-ANONYMIZATION AND INFERENCE ATTACKS

Analysis of social data has become immensely popular in a variety of domains, such as biological sys-tems, organizational studies, information science, communication studies, economics, political science,social psychology, development studies, anthropology and sociolinguistics. Researchers and agenciescollect or purchase social data to do the analysis. For example, Kwak et al. [Kwak et al., 2010] collectedthe entire Twitter network as of 2010: 41.7 million Twitter profiles, 1.47 billion follower-following re-lations, and 106 million tweets. In addition, some organizations and OSN service providers publishsocial data for others to analyze. For example, the Federal Energy Regulatory Commission published arepository of approximately 500, 000 email messages of Enron Corporation, which has been frequentlyanalyzed for research [Klimt and Yang, 2004; Shetty and Adibi, 2005].

However, publishing and allowing the collection of social network data involves privacy disclosurerisks. For example, in 2006 AOL released an anonymized dataset of twenty million search keywordsfor over 650,000 users [Arrington, 2006]. The dataset was published for research purpose and novelfindings emerged (e.g., [Nunes et al., 2008; Adar et al., 2007; Jansen and Booth, 2010]). However,despite the fact that the data released was anonymized, users’ privacy was compromised. To make thepoint, the New York Times identified an individual from this dataset by cross referencing users withphonebook listings.

Privacy attacks in published or collected social network data can be categorized into two cate-gories: de-anonymization attacks and inference attacks. In the following, we discuss the types of de-anonymization attacks, how these attacks take place, and what solutions were proposed to combat such

13https://joindiaspora.com/



attacks. More in-depth discussion on de-anonymization can be found in [Zheleva and Getoor, 2011]. Inthis work, we include the latest work on de-anonymization attacks.

6.1 De-anonymization Attacks

In de-anonymization attacks, an attacker uses external background knowledge and published socialdata to de-anonymize/identify users in the social graph, and thus learn sensitive user information.

We categorize privacy breaches due to de-anonymization attacks into four classes:

(1) Identity disclosure reveals the identity of a user and makes him vulnerable in the real world. Forexample, although a published dataset on the disease-infection network could advance researchon how the disease transmits in communities, an adversary (e.g., an insurance company) thatcan identify an individual and his disease could exploit this information in unintended ways (forexample, for denying insurance).

(2) Social attributes disclosure refers to the disclosure of sensitive data associated with a user. Forexample, disclosure of a user’s date of birth, gender and home address could allow the inference ofthe user’s social security number (SSN) and hence could lead to identity theft [Gross and Acquisti,2005a].

(3) Relationship disclosure refers to the situation when the relationships of a user are exposed andthis information exploited. For example, two nodes (e.g., companies) in a transaction network areconnected by an edge and a weight (e.g., transaction expense) if they are involved in a financialtransaction. An adversary, for example a competitor company, can detect whether two target com-panies have done a financial transaction if it can infer whether an edge exists between the twocompanies in the network. The adversary can learn the transaction expense from the edge weightand can exploit that information to get advantages.

(4) Social graph property disclosure refers to the disclosure of various graph metrics, such as degree,betweenness centrality, closeness centrality, or clustering co-efficient. An attacker can find outthe most central users in the network and can make the network structurally vulnerable. Forexample, an attacker can identify and remove the highest betweenness centrality nodes to disruptcommunications between other nodes in the network [Newman, 2010].

6.1.1 De-anonymization Attack Techniques. Anonymization is usually done by substituting person-ally identifying information associated with each user with a random ID [Wu et al., 2010]. However,this substitution is not sufficient to preserve users’ privacy. For example, consider a social network inFigure 5(a) that has been anonymized in Figure 5(b) by replacing user names with random IDs. Now, ifan attacker knows that Alice, David and Asley are friends of Bob and Alice-David and Asley-David arealso friends, a subgraph shown in Figure 5(c), then the attacker can uniquely identify the subgraph inthe anonymized network (shown in Figure 5(d)). So, the attacker will be able to re-identify Bob in theanonymized and published social network.

Researchers have shown different techniques to perform de-anonymization attacks. Backstorm etal. [Backstrom et al., 2007] present two types of attacks—active and passive. In active attacks, theattacker is assumed to be able to modify the network prior to social network data release. The attackerchooses an arbitrary set of target individuals (whose privacy she wants to compromise), creates asmall number of new user accounts, makes connections with target individuals (thus forms edges),and establishes a highly distinguishable pattern comprising nodes and edges among the new accounts.The attacker can then efficiently find the subgraph in the released anonymized network, thus canexpose the identities of the target individuals.

In passive attacks, an attacker does not have to create new accounts or connections. The intuition isthat most nodes in social networks form small uniquely identifiable subgraphs. So, the attacker simply



Alice

David

Bob

Kevein

Dave

Erin

Asley

Oscar

Charlie

1

3

2

6

9

8

5

7

4

(c) The subgraph of Bob

(b) Anonymized social network(a) The social network

Bob

Alice

David

Asley

1

3

2

6

9

8

5

7

4

(b) Bob is re-identified

Fig. 5: Anonymization and de-anonmymization attacks.

has to form a coalition with other users. The attacker recruits k− 1 number of his neighbors and formsa coalition of size k. The users in the coalition know names of their neighbors outside of the coalition.Finally, the attacker tries to identify the subgraph (formed by the coalition) in the published socialnetwork, and compromises the privacy of neighboring nodes.

Narayanan and Shmatikov [Narayanan and Shmatikov, 2009] demonstrate the feasibility of a large-scale de-anonymization attack under the assumption that the attacker has background knowledgeof a different network whose membership partially overlaps with the target network. Using a de-anonymization algorithm, the authors show that a third of the common users of both Twitter andFlickr can be identified in the anonymous Twitter social graph with a low (12%) error rate.

Both attacks [Backstrom et al., 2007; Narayanan and Shmatikov, 2009] presented above use a sub-graph as background knowledge. However, the way an attacker achieves this knowledge is different.In [Backstrom et al., 2007], an attacker creates the background knowledge by adding nodes and edgesin the social graphs or forming a coalition among nodes. In [Narayanan and Shmatikov, 2009], theauthors propose to collect network data by crawling OSNs, or deploying third-party malicious applica-tions.

6.1.2 Privacy Preserving Anonymization Methods. In order to combat de-anonymization attacks, asocial network should be anonymized properly before publishing. We categorize privacy preserving so-cial network data anonymization methods into two categories: (1) edge modification-based approachesand (2) clustering-based generalization.

Edge modification-based approaches: Edge modification-based approaches preserve privacy by mod-ifying the social graph structure via addition, deletion or randomization of the edges. Zhour andPei [Zhou and Pei, 2011] consider a de-anonymization attack, where an attacker, equipped with theACM Computing Surveys Template.


background knowledge about the target’s 1-hop neighbors, attempts to re-identify the target in theanonymized dataset using neighborhood matching. Their anonymization method is inspired by k-anonymity model. Although not targeted to social network data publishing, k-anonymity model en-sures that each user’s information in a released dataset cannot be identified from at least k − 1 otherindividuals in the dataset [Sweeney, 2002] .

Zhour and Pei extend k-anonymization to social networks. The goal of the anonymization is toensure that even knowing the neighborhood of a node, an attacker will not be able to re-identifythe node in the anonymized dataset with confidence higher than 1

k . Let we have a social networkG = (V,E) and the anonymized network G = (V , E), where there exists a bijection function f : V → Vand for each (u, v) ∈ E, (f(u), f(v)) ∈ E. The authors assume that an attacker has the knowledgeof the neighborhood subgraph of a node u ∈ V (G), denoted by NeighborG(u). The goal of the k-anonymization is to ensure that there exist at least k − 1 other nodes v1, v2, v3, . . . , vk−1 ∈ V (G)such that NeighborG(f(v1)), . . . ,NeighborG(f(vk−1)) are isomorphic. The anonymization method firstextracts the neighborhoods of all nodes in the network. Then it greedily combines nodes into groupsand anonymizes the neighborhoods of the nodes in the group, until k-anonymity conditions are met.To anonymize neighborhoods of two nodes such as NeighborG(u) and NeighborG(v), the method firstfinds all perfect matches of neighborhood components in NeighborG(u) and NeighborG(v). For those un-matched components, it tries to pair similar components based on anonymization cost and anonymizesthem (this might involve an addition of an edge between two nodes).

Liu and Terzi [Liu and Terzi, 2008] propose k-degree anonymity to combat de-anonymization at-tacks. They assume that an attacker has the background knowledge of the degree of a target node.The attacker could search the degrees of the nodes in the published network and could re-identify atarget node. k-degree anonymity ensures that for every node u in the graph, there exist at least (k− 1)other nodes having the same degree as u. Similar to the work of Zhour and Pei, even having the degreebackground knowledge, an attacker will not be able to re-identify the node in the anonymized datasetwith confidence higher than 1

k . Their anonymization algorithm has two steps. In the first step, it startsfrom a degree sequence d of the original network G(V,E) and constructs a new degree sequence d thatis k-degree anonymous, so that degree anonymization cost is minimized. In the second step, the algo-rithm constructs a graph G = (V , E) such that dG = d, V = V and E = E. To solve the first step, theauthors used a dynamic programming method, while the second step is based on a set of graph con-struction algorithms given a degree sequence with constraints. To construct the new degree sequencethe algorithm uses a randomized edge swap transformation strategy.

Clustering-based generalizations: Clustering-based generalizations cluster nodes and edges into groupsand anonymize a subgraph into a super-node [Zhou et al., 2008]. So, details about users are hidden.

Hay et al. [Hay et al., 2008] propose a vertex clustering-based generalization approach to combat de-anonymization attacks. They model an attacker’s background knowledge as the access to an entity thatanswers a restricted knowledge query about a target node in the network. They assume three types ofqueries: vertex refinement queries return the local structure of a node in an iterative refined way (e.g.,degree of a node, the set of neighbors degrees of a node); subgraph queries confirm a subgraph arounda target node; hub fingerprint queries for a target node returns the vectors of distances between thenode and a set of hubs (note that in social networks a hub is defined as a node with high betweennessand degree centrality [Newman, 2010]). The anonymity method is based on structural similarity. Theintuition is that structurally similar nodes may be indistinguishable to an attacker. The anonymitymethod generalizes a social graph by grouping nodes into partitions and publishes the number ofnodes in each partition including the densities of edges across and within the partitions. The size of



the partition is at least k (a positive integer), which is similar to k-anonymity in relational data. Themethod used a simulated annealing algorithm for partitioning [Russell et al., 1995].

Zheleva and Getoor [Zheleva and Getoor, 2008] consider a link re-identification attack, where nodeshave multiple types of edges and an attacker attempts to re-identify sensitive edges. As a backgroundknowledge, they assume that an attacker can predict a sensitive edge based on other non-sensitiveedges. They describe five anonymization techniques: i) remove all sensitive edges; (ii) remove somenon-sensitive edges which significantly contribute to the prediction of a sensitive edge; (iii) collapsethe anonymized nodes into a single node for each equivalence class (they assume that nodes are clus-tered into equivalence classes) and publish the count of same types of edges between two equivalenceclass nodes; (iv) similar technique as (iii), but it needs the equivalence class nodes to have the sameconstraints as any two nodes in the social network; (v) remove all edges.

Campan and Truta [Campan and Truta, 2009] propose an edge generalization technique, leveragingthe k-anonymity model. In their model, each node is similar to at least other (k − 1) nodes consideringattributes and associated structured information (e.g., neighborhood structure of nodes). Nodes arepartitioned into clusters and nodes from a cluster are combined into one single node. Edges betweentwo clusters are collapsed into a single edge. An edge between two clusters are labeled with the numberof edges between them. While this approach is similar to [Zheleva and Getoor, 2008], two major differ-ences are: (i) [Campan and Truta, 2009] considers all relationships are the same type, but in [Zhelevaand Getoor, 2008] there are different types of relations; (ii) [Campan and Truta, 2009] considers bothgeneralization and structural information loss while clustering.

The main challenge for anonymization methods is providing sufficient anonymity while preserving(all) the relevant structural properties of the network. Without preserving enough of the structuralproperties of the original network, publishing anonymized social network datasets loses its value.

6.2 Mitigating Inference Attacks

The goal of an inference attack is to infer undisclosed private information about a user using otherpublished details of that user. For example, a person might not want to state her political affiliationin Facebook because of privacy concerns. But if he is a member of “ban the same sex marriage” group,then from this group membership an inference may be possible regarding his political affiliation.

Zheleva and Getoor [Zheleva and Getoor, 2009] study four social networks (Facebook, Flickr, Dogsterand BibSonomy) and show how an attacker can exploit public and private user profiles to learn privateattributes such as user location and gender. They show that declared social relationships and inferredgroup memberships are enough to predict undisclosed private information. Using the classificationmodel LINK-GROUP, a combination of link and group-based classification models, they were able toaccurately discover the information of private-profile users.

Heatherly et al. [Heatherly et al., 2013] describe three sanitization techniques to prevent undisclosedprivate information inference from a released social network dataset. First, they build classificationmodels to accurately predict private data from the available details (attributes) of a user. Then theyapply the sanitization techniques to reduce the accuracy of the models. In brief, the techniques areas follows: (i) remove some details (e.g., attributes) to decrease the classification accuracy of sensitiveattributes; (ii) alter the link structure of the social graph by adding and removing links and (iii) providea generalization of details. For example, if a user inputs a favorite activity as “Boston Celtics”, the namewill be replaced by a more generalized term “Basketball”. Experimenting on a Facebook dataset, theauthors conclude that removal of attributes and friendship links together in the published data is thebest way to reduce classifier accuracy.

Dey et al. [Dey et al., 2012] attempt to infer the age of over one million Facebook users in NewYork city. Exploiting the Facebook social graph, they design an iterative algorithm which estimates aACM Computing Surveys Template.


user’s age based on her friends’ ages (e.g., from inferred high school graduation year), friends of friends’edges and so on. They find that for most users, including users who take maximal measures to preventprivacy leakage by hiding their friend lists, it is possible to estimate ages with an error of only a fewyears. The authors recommend to hide high school graduation year and friend lists to other users whoare not friends from users’ profiles as a solution.

7. MITIGATING SYBIL ATTACKS

The Sybil attack is a fundamental problem in distributed systems. The term Sybil was first introducedby Douceur [Douceur, 2002], inspired from a 1973 book after the same name about the treatmentof a person Sybil Dorsett, who manifests sixteen personalities. In Sybil attacks, an attacker createsmultiple identities and influence the working of the system.

OSNs including Digg, YouTube, Facebook and BitTorrent have become vulnerable to Sybil attacks.For example, Facebook anticipates that up to 83 million of its users may be illegitimate [BBC, 2012],which is far more than what it anticipates (54 million) earlier [Cellan-Jones, 2012]. The high numberof Sybils in the OSN is due to the fact that users can create accounts on OSNs quickly and freely;typically only an email address is enough for an account opening.

Sybil users affect the correct functioning of the system by contributing malicious contents. By con-trolling a lot of identities, Sybil users increase their influence and power in the OSNs [Nazir et al.,2010; Ratkiewicz et al., 2011]. For example, Sybil users can outvote real users on YouTube and canpromote clients’ content to the top position by giving more up votes [Riley, 2007]. In Facebook, userscontrol Sybil identities to gain higher status in social games [Nazir et al., 2010]. In Twitter, paid or-ganizations conduct political campaigns disguising themselves as mass population [Ratkiewicz et al.,2011]. Researchers find that Sybils forward malware and spam on social media [Gao et al., 2010; Grieret al., 2010; Irani et al., 2010; Stringhini et al., 2010; Thomas et al., 2011]. Sybil identities are usedto acquire users’ private contact lists [Bilge et al., 2009; Fong, 2011a]. Moreover, Sybils manipulateGoogle social search results [Jurek, 2011] and location crowdsourcing results [Marinando, 2010].

Malicious activities from Sybil users are posing serious threats to OSN users, who trust the serviceand depend on it for online interactions. In future the threat will be aggravated as nowadays morepeople are relying on OSNs for primary online communications [Lenhart et al., 2010; Murphy, 2010]and ready-to-get news [Kwak et al., 2010]. Sybils cost OSN providers, too, in terms of monetary lossesand time. OSN providers spend significant resources and times to detect, verify, and shut down Sybilidentities. For example, Tuenti, the largest OSN in Spain, dedicates 14 full-time employees to manuallyverify user reported Sybil identities [Cao et al., 2012].

However, mitigation of the Sybil attack is not easy. Some open systems employ CAPTCHA [vonAhn et al., 2004], IP address filtering, and computational puzzles to mitigate Sybil attacks [Walshand Sirer, 2006; Peterson and Sirer, 2009; Piatek et al., 2008]. Unfortunately, none of the solutionsworked well in a real-world system. Crowdsource CAPTHA cracking businesses [Acohido, 2009] employcheap laborers from the undeveloped countries to break codes. The IP address filtering allows only oneaccount/identity per IP address, which causes serious problems for users behind NATs, and yet failsagainst the attackers controlling a subnet [Yu et al., 2006]. And finally, computational puzzles require apotential OSN registration to solve a computational problem with some controlled difficulty. Althoughit requires human efforts, an attacker might be equipped with more resources and could solve theproblem. Moreover, all of these solutions cannot reduce the number of Sybil identities in the system,at best they can limit the rate of Sybils’ intrusion in the system.

Two categories of solutions are available to defend Sybils: Sybil detection and Sybil resistance. Sybildetection schemes [Yu et al., 2006; Cao et al., 2012; Wang et al., 2013; Yang et al., 2011] leverage thesocial graph structure to identify whether a given user is Sybil or non-Sybil (Section 7.1). On the other



hand, Sybil resistance schemes do not explicitly label users’ as Sybils or non-Sybils, rather they useapplication-specific knowledge to mitigate the influence of the Sybils in the network [Post et al., 2011;Post et al., 2011; Viswanath et al., 2012b] (Section 7.2). In a tutorial and survey Haifeng Yu [Yu, 2011]compiles social graph-based Sybil detection techniques. In this paper, we report latest works on thatcategory, as well as Sybil resistance schemes.

7.1 Sybil detection

Sybil detection techniques model an online social network (OSN) as an undirected graph G = (V,E),where a node v ∈ V is a user in the network and an edge e ∈ E between two nodes corresponds to asocial connection between the users. This connection could be a friendship relationship on Facebook ora colleague relationship on LinkedIn, and is assumed to be trusted.

The social graph has n = |V | nodes and m = |E| edges. By definition, if all nodes correspond todifferent persons, then the system should have n users. But, some persons have multiple identities.These users are Sybil users and all the identities created by a Sybil user are called Sybil identities. Anedge between a Sybil user and a non-Sybil user may exist if a Sybil user is able to create a relationship(e.g., friend, colleague) with a non-Sybil user. These types of edges are called attack edges (see Figure 6).

Attackers can launch Sybil attacks by creating many Sybil identities and creating attack edgeswith non-Sybil users. Detection systems against Sybil attacks provide mechanisms to detect whethera user (node) v ∈ V is Sybil or non-Sybil. Those mechanisms are based on the authority (e.g., theOSN provider) knows the topology of the network (a centralized solution), or a node only knows itssocial connections (a decentralized solution). Some common assumptions of Sybil detection schemesare below.

Assumption 1: Attackers can create a large number of Sybil identities in OSNs and can create con-nections among those Sybil identities, but they lack trust relationships because of their inability tocreate an arbitrary number of social relationships to non-Sybil users. Intuitively, a social relationshipreflects trust and an out-of-band social interaction. So, it requires significant human efforts to estab-lish such a relationship. The limited number of attack edges differentiates Sybil and non-Sybil regionsin a social graph as shown in Figure 6.

Assumption 2: The non-Sybil region of a social graph is fast-mixing. Mixing time determines howfast a random walk’s probability of landing at each node reaches the stationary distribution [Boydet al., 2005; Flaxman, 2008]. A limited number of the attack edges causes sparse cut between Sybiland non-Sybil regions. Non-Sybil regions do not show sparse cut as non-Sybils are well connected. Assuch, there should be a difference in terms of mixing time of the non-Sybil regions compare to theentire social graph.

Assumption 3: The defense mechanism knows at least one non-Sybil. This assumption is essential ina sense that without this knowledge the Sybil and non-Sybil regions become identical to the system.

Most of the Sybil detection techniques are based on social graphs. Social graph-based approachesleverage random walks [Yu et al., 2006; Danezis and Mittal., 2009; Cao et al., 2012; Wei et al., arch],social community [Viswanath et al., 2010], and network centrality [Xu et al., 2010a] to detect Sybilsin the network. SybilGuard [Yu et al., 2006] is a decentralized Sybil detection scheme, which usesAssumption 1, Assumption 2 and Assumption 3. A social graph with a small quotient cut has a largemixing time, which implies that a random walk should be long in order to converge to the stationarydistribution. So, the presence of too many Sybil nodes in the network disrupts the fast mixing property,in a sense that they increase social network mixing time by contributing small quotient cuts. Thus, averifier, which is itself a non-Sybil node, can break this symmetry by examining the anomaly of themixing time in the network. In order to detect Sybils, a non-Sybil node (say a verifier) can perform arandom route starting from itself and of a certain length w (a theoretically identifiable quantity, butACM Computing Surveys Template.


Non-Sybil Region

Attack edge

Sybil Region

Non-Sybils Sybils Trusted users

Fig. 6: The system model for Sybil detection.

the paper experimentally shows that this is 2000 for a topology of one-million nodes). A suspect (a nodethat is in question) is identified as non-Sybil if it’s random route intersects with the verifier’s randomroute. As the underlying assumption is that the number of attack edges should be limited, the verifier’sroute should remain within the non-Sybil region with high probability, given the appropriate choice ofw.

SybilInfer’s [Danezis and Mittal., 2009] assumptions are also Assumption1, Assumption 2 and As-sumption 3. Moreover, it assumes that a modified random walk over a social network, that yields auniform distribution over all nodes, is also fast mixing. The core of SybilInfer is a Bayesian inferencethat detects approximate cuts between non-Sybil and Sybil regions in social networks. These identifiedcuts are used to infer the labels (Sybil or non-Sybil) of the nodes, with an associated probability.

SybilRank [Cao et al., 2012] is also a random walk-based Sybil detection scheme, which uses all threeassumptions and ranks user according to their perceived likelihood of being Sybils. Using early termi-nated power iteration, SybilRank computes landing probability of random short walks and from thatit ranks users, so that substantial portion of the Sybil users have low rank. The design of SybilRankis influenced by an observation on early terminated random walks in social graphs—if a walk of thiskind starts from a non-Sybil node, then it has a high degree-normalized landing probability to land atnon-Sybil node than a Sybil node. SybilRank terms the probability of a random walk to land on a nodeas the node’s trust, ranks nodes based on that and filters lower ranked nodes as potential Sybil users.Rather than keeping computationally intensive a large number of random walk traces used in othergraph-based Sybil defense schemes [Yu et al., 2006; Yu et al., 2010], it uses power iteration [Langvilleand Meyer, 2004] in calculating the landing probability of random walks.

Operations performed by SybilRank are shown in Figure 7.Viswanath et al. [Viswanath et al., 2010] suggest to use community detection algorithms for Sybils’

detection. They show that although other graph property based Sybil defense schemes have differ-ent working principles, the core of those works revolves around detecting local communities around atrusted node. So, existing community detection algorithms could be used to defend the Sybils also. Al-though, not explicitly mentioned, their approach is centralized, because community detection requiresa central authority to have the knowledge of the entire topology.

Xu et al. [Xu et al., 2010a] propose Sybil detection based on the betweenness rank of the edges. Thebetweenness of an edge is defined as the number of shortest paths in the social graph passing theedge [Brandes, 2001]. The scheme assumes that the number of attack edges is limited and Sybil and



Propagating trust via O(log n) power iterationsStage I

Ranking nodes based on degree-normalized trustStage II

Annotating the ranked list with the portion of fakesStage III

fake users. A further novelty of our approach is thatwe use power iteration [34], a standard technique to effi-ciently calculate the landing probability of randomwalksin large graphs. This is in contrast to prior work that useda large number of randomwalk traces [23,40,56–58] ob-tained at a high computational cost. For ease of exposi-tion, we refer to the probability of the random walk toland on a node as the node’s trust.As shown in Figure 3, SybilRank unveils users that

are suspected to be Sybils after three stages. In StageI, through w = O(log n) power iterations (§4.2), trustflows from known non-Sybil nodes (trust seeds) andspreads over the entire network with a bias towards thenon-Sybil region. In Stage II, SybilRank ranks nodesbased on their degree-normalized trust. In the final stage,SybilRank assigns portions of fake nodes in the intervalsof the ranked list. This enables OSNs to focus their man-ual inspection efforts or to regulate the frequency withwhich they send CAPTCHAs to suspected users.We next describe each stage in detail.

Figure 3: SybilRank detects Sybils in three stages. Black usersare Sybils. Darker nodes obtain less trust after trust propagation.

4.2 Propagating trustPower iteration involves successive matrix multiplica-tions where each element of the matrix represents therandom walk transition probability from one node toa neighbor node. Each iteration computes the land-ing probability distribution over all nodes as the randomwalk proceeds by one step.

4.2.1 Early-terminated random walksWe terminate the power iterations after O(log n) steps.The number of iterations needed to reach the stationarydistribution is equal to the graph’s mixing time. In anundirected graph, if a random walk’s transition proba-bility to a neighbor node is uniformly distributed, thelanding probability on each node remains proportionalto its degree after reaching the stationary distribution.SybilRank exploits the mixing time difference betweenthe non-Sybil region GH and the entire graph G to dis-

tinguish Sybils from non-Sybils. The intuition is thatif we seed all trust in the non-Sybil region, then trustcan flow into the Sybil region only via the limited num-ber of attack edges. If we terminate the power iterationearly before it converges globally, non-Sybil users willobtain higher trust than that in the stationary distribution,whereas the reverse holds for Sybils.Trust propagation via power iteration. We defineT (i)(v) as the trust value on node v after i iterations. Ini-tially, the total trust, denoted as TG (TG > 0), is evenlydistributed onK (K > 0) trust seeds v1, v2, . . . , vK :

T (0)(v) =

!TG

K if node v is one of theK trust seeds0 else

Seeding trust on multiple nodes makes SybilRank ro-bust to seed selection errors, as incorrectly designating anode that is Sybil or close to Sybils as a seed causes onlya small fraction of the total trust to be initialized in theSybil region.During each power iteration, a node first evenly dis-

tributes its trust to its neighbors. It then collects trustdistributed by its neighbors and updates its own trust ac-cordingly. The process is shown below. Note that thetotal amount of trust TG remains unchanged.

T (i)(v) ="

(u,v)!E

T (i"1)(u)

deg(u)

Early termination. With sufficiently many power it-erations, the trust vector converges to the stationary dis-tribution: limi#$ T (i)(v) = deg(v)

2m ! TG [16]. How-ever, SybilRank terminates the power iteration after w =O(log n) steps, thus before convergence. This num-ber of iterations is sufficient to reach an approximatelyuniform distribution of degree-normalized trust over thefast-mixing non-Sybil region, but limits the trust escap-ing to the Sybil region.Alternative use of power iteration. We note thatpower iteration is also used by the trust inferencemechanisms PageRank, EigenTrust, and TrustRank [22,30, 33, 45], where it is executed until convergence tothe stationary distribution of the complete graph [34].At each step and with a constant probability, PageR-ank’s random walks jump to random users, and Eigen-Trust/TrustRank’s walks jump to trust seeds. Eigen-Trust/TrustRank is personalized PageRank with a cus-tomized reset distribution over a particular set of seeds.SybilRank’s random walks do not jump to random

users, because this would allow Sybils to receive trustin each iteration. Besides, SybilRank’s random walks donot jump to the trust seeds, because this would assignhigh trust to Sybils that are close to the trust seeds (§6.3).

5

OSNOSN Users

Fig. 7: Three steps performed by SybilRank in detecting Sybils. Black users are Sybils ( [Cao et al., 2012]).

non-Sybil regions are separate clusters of nodes. So, intuitively betweenness scores of the attack edgesshould be high as they connect the clusters. Their scheme exploits this social network property anduses a Sybil Resisting Network Clustering (SRNC) algorithm to detect Sybils. The algorithm computesthe betweenness of each edge and identifies the edges with high betweenness as attack edges.

Social graph based approaches still have some challenges to overcome. First, as graph-based Sybildetection schemes exploit trust relations, the success of the identification highly depends on the trustrelated assumptions. If an assumption is not right in a network, social graph-based Sybil detectiontechniques might work poorly in that network. For example, the assumption that Sybils’ have prob-lems in creating social connections with legitimate users (non-Sybils) is not well established. Althoughstudy [Motoyama et al., 2011] shows that most of a Sybil identity’s connections are also Sybil iden-tities and Sybils’ have less relationships with non-Sybil users, several other studies [Boshmaf et al.,2011; Irani et al., 2011; Bilge et al., 2009] show that users are not careful while accepting friendshiprequests and Sybil identities can easily befriend with them. Moreover, Sybil users are using advancedtechniques to create more realistic Sybil identities, either by copying profile data from existing ac-counts, or by assigning real users to customize them. Also, another assumption that a social networkis fast-mixing may not be right for all social networks. Study [Mohaisen et al., 2010] shows that manyof the social networks are not fast-mixing, especially where edges represent strong real-world trust(e.g., DBLP, Epinions, etc.).

Second, the performance of random walk-based Sybil detection techniques depends on the variousrelevant parameters of the random walks (e.g., the length of a random walk). These factors will workfor a fixed network size (as all the schemes have shown), but they have to be updated with the evolutionof the social networks.

7.2 Sybil Resistance

Sybil resistance schemes do not explicitly label users’ as Sybils and non-Sybils, rather they attempt tomitigate the impact that a Sybil user can have on others. Sybil resistance schemes have been effectivelyused in applications from diverse domains including content rating systems [Tran et al., 2009; Chilukaet al., 2012], spam protection [Mislove et al., 2008], online auctions [Post et al., 2011], reputationsystems [DeFigueiredo and Barr, July], and collaborative mobile applications [Quercia and Hailes,2010].ACM Computing Surveys Template.


Note two assumptions of Sybil detection schemes: 1) non-Sybil region is fast mixing, 2) Sybils cannot create an arbitrary number of social relationships with non-Sybils. Sybil resistance schemes alsoassume that non-Sybils’ have a limited number of social connections, but they do not rely on the fastmixing nature of the non-Sybil regions. However, Sybil resistance schemes take an additional applica-tion related information such as users’ interactions/transactions/votes etc. Using the underlying socialnetwork of the users and system information, Sybil resistance schemes determine whether an actionperformed by a user should be allowed or denied.

Most of the Sybil resistance schemes [Mislove et al., 2008; Post et al., 2011; Viswanath et al., 2012b]share a common approach in resisting Sybils—they use a credit network built on the top of the socialnetwork of users [Viswanath et al., 2012a]. Originally proposed in the electronic commerce community,Credit Networks [Dandekar et al., 2012; Ghosh et al., 2007] create mutual trust protocols in a situationwhere there is pairwise trust between two users, and a centralized trusted party is unavailable. Nodesin a credit network trust each other by providing credits up to a certain limit. Nodes use these creditsto pay for services (e.g., sending a message, purchase items, vote casting) that they receive from oneanother. These schemes assign credits to the network links, and allow an action between two nodes ifthere is a path between them that has enough credit to satisfy the operation. As such, these schemesfind a credit assignment strategy in the graph and apply the credit payment scheme to allow a limitednumber of illegitimate operations in the system. A Sybil user has limited number of edges with non-Sybils (hence, limited credits available), which restricts her to gain additional advantages by creatingmultiple Sybil identities. This scenario is shown in figure 8, which is a core defense philosophy of someresistance schemes. In the following, we provide a brief overview of the Sybil resistance schemes.

A X

X'

X''

X'''

52

39

106

22

Rest of the network

Fig. 8: Credit network based Sybil resistance [Viswanath et al., 2012a]. The network contains four Sybil identities as nodes X,X′, X′′, X′′′ of a Sybil user. A directed edge (X,Y) represents how much credit is available to X from Y. If X wants to pay creditsfrom other three nodes, the credits must be deducted from X’s single legitimate link to A. So, a Sybil’s other identities do notprovide any additional credits in the rest of the network.

Ostra [Mislove et al., 2008] leverages existing trust relationships among users to thwart unwantedcommunication (e.g., spam). It bounds the total number of unwanted communications a Sybil usercan produce by assigning credit values to the trust links. If a user sends a message to another user,Ostra finds a path with enough credit from the sender to the receiver. If a path is available, credit isassigned along all the links in the path, which is refunded if the receiver considers the messages as notunwanted. However, if no such path exists, Ostra blocks the communication, but the credit is paid. Inthis way, Ostra ensures that a user with multiple identities cannot send a large number of unwantedcommunications, unless she also has additional trust relationships.



Bazaar [Post et al., 2011] is targeted to strengthen the users’ reputation in online marketplaceslike eBay. The opportunity to create accounts freely leads Sybil users to create multiple accounts andcauses the waste of time and significant monetary losses for defrauded users. To mitigate Sybil users,Bazaar creates transaction network by linking users who have made a successful transaction. Theweight of a link is the amount that has been successfully transferred due to the transaction. Prior toa transaction, using a max flow based technique, Bazaar computes the reputation of the users doingthe transaction and compares with the new transaction value. If it finds available flow, it removes thevalue of the transaction between the users as credits, and eventually adds back if the transaction is afraud. However, a new transaction is denied if essential flow is not found.

Canal [Viswanath et al., 2012b] complements Ostra and Bazaar credit networks-based Sybil resis-tance schemes by applying landmark routing-based techniques in calculating credit payments over alarge network. One of the major problems of Ostra and Bazaar is that they require computing max-flow over a graph. However, the huge size of present day network (Facebook has over billion of nodesin social graph) leads to significant computation complexity to compute the max-flow between twonodes in the network. As such, this poses a bottleneck to those techniques to practically deploy in areal-world social network. Canal efficiently computes an approximate max-flow (compromising accu-racy with speed-up) path using existing landmark routing-based algorithm [Tsuchiya, 1988; Gubichevet al., 2010]. The main components of Canal are universe creator processes and path stitcher processes.Universe creator processes continuously select new landmarks and path stitcher processes continu-ously process incoming credit payment requests. Using real-world network datasets the authors showthat Canal can perform payment calculations efficiently (within a few milliseconds), even if the net-work contains hundreds of millions of links.

MobID [Quercia and Hailes, 2010] makes co-located mobile devices resilient to Sybil attackers.Portable devices in close proximity of each other could collaborate various services (e.g., run local-ization algorithms to get a precise street map), which is severely disrupted by Sybil users (Sybils couldinject false information). MobID uses mobile networks to Sybil resilience. More specifically, a devicemanages two small networks as it meets with other devices. A network of friends contains non-Sybildevices and a network of foes contains suspicious devices. Using two networks, MobID determineswhether an unknown device is attempting a Sybil attack. MobID ensures that a non-Sybil device ac-cepts, and accepted by most other non-Sybil devices with high probability. So, a non-Sybil device couldsuccessfully trade services with other non-Sybil devices.

8. MITIGATING ATTACKS FROM LARGE-SCALE CRAWLERS

OSNs enhance social browsing experience by allowing users to view public profiles of others. Thisway a user meets others, gets a chance to know strangers and eventually befriends some of them.Unfortunately, attackers are there in the vast landscape of OSNs, who exploit this functionality. Users’social data are always invaluable to marketers. Professional data aggregators build databases usingpublic views of profiles and social links and sale the databases to insurance companies, background-check agencies and credit-ratings agencies [Bonneau et al., 2009]. For example, crawling 100 millionpublic profiles from Facebook created news recently [Bradley, 2012]. Sometimes crawling is a violationof terms of service. Facebook states that someone should not collect “...users’ content or information,or otherwise access Facebook, using automated means (such as harvesting bots, robots, spiders, orscrapers) without our prior permission” [Facebook, 2015].

One solution of the problem could be the removal of the public profile view functionality. But removalof the public profile view functionality is against the business model of OSNs. Services like searchand targeted advertisements bring new users and ultimately revenues to OSNs, but openly accessiblecontents are necessary for their operation [Wilson et al., 2010]. Moreover, removal of the public viewACM Computing Surveys Template.


functionality will undermine user experience, as it makes a connection, communication and sharingeasy with unknown people in the network.

OSN operators such as Facebook and Twitter attempt to defend large-scale crawling by limitingthe number of user profiles a user can see from an IP address in a time window [Stein et al., 2011].However, tracking users with low level network identifiers (e.g., IP address, TCP port numbers or SSLsession IDs) is fundamentally flawed as a solution of this problem [Wilson et al., 2010]. Aggressive at-tackers may gather a large vector of those identifiers by creating a large number of fake user accounts,gaining access to compromised accounts, virtualizing in a cloud, employing botnets, and forwardingrequests to proxies. Until now, researchers have leveraged encryption based technique [Wilson et al.,2010] and crawler’s observational behavior [Mondal et al., 2012] to combat the problem.

ONS’s anti-crawling techniques suffer from the fact that web clients can access a particular pageusing a common URL accessible to all clients [Wilson et al., 2010]. This can be exploited by a distributedcrawler e.g., a crawling thread can download and parse a page for links using a session key and candeliver those links to another crawling thread to download and parse using different session keys. So,if some crawlers get banned from the OSNs for malicious activities, the links they have parsed arestill valid and a fresh start is possible from those links. SpikeStrip [Wilson et al., 2010] overcomes theproblem by creating unique, per-session “views” of the protected website that forcibly tie each clientto their session keys. SpikeStrip is a web server add-on that leverages link encryption technique. Itallows OSN administrators to moderate data access, and it defends against large-scale crawling bysecurely identifying and rate limiting individual sessions.

When a crawler visits a page, it receives a new session key and a copy of the page whose links are allencrypted. SpikeStrip appends each user’s session key to those links and then encrypts the result usinga server-side, secret symmetric key. It also appends a salt to the link after encryption to make each linkunique. As time passes, the crawler progressively covers more pages and collects links. However, at apoint, the crawler requires to change the session key due to the expiration of the session or due to a banfrom the OSN. As SpikeStrip couples all URLs to the browser’s session key, this switching of sessionsinvalidates all the links collected for future traversals. Thus, a fresh start to reconstruct the collectionshould be started from the beginning. The authors implemented mod spikestrip, a SpikeStrip imple-mentation for Apache 2.x and showed that it imposes only 7% performance penalty on Apache.

Genie [Mondal et al., 2012] exploits browsing patterns of honest/real users and crawlers and thwartslarge-scale crawls in OSNs using Credit Networks [Dandekar et al., 2012; Ghosh et al., 2007]. Thesystem design is based on three observations from real-world datasets: (i) there is a balance betweenthe number of profiles a honest user views and views requested by other users to her profile, butcrawlers view many more profiles than the number of times their profiles are viewed; (ii) a honest userviews profiles of socially close users.(iii) a honest user repeatedly views a small set of profiles in thenetwork, but unless re-crawling, the crawlers avoid repeating viewing of other users’ profiles. Genieleverages these observations and enforces a viewer to make a “credit payment” in the credit network ifa user wants to view a profile. It allows a user (also might be a crawler) to view a profile if a max-flowbetween them has at least a threshold value. The required credit payment to view a profile depends onthe shortest path length from viewer to viewee; a user has to pay more to view the profile of a distantuser in the social graph. As a legitimate user usually views one or two hop distant profiles, and alsoother users also view her profile, her liquidity of credits remains almost the same. On the other hand,a crawler views a lot of distant profiles and gets fewer views. Eventually it lacks credit liquidity toview the profiles of others. As such, the credit network poses a strict rate limit on profile views of thecrawlers.

Genie might see a large number of honest users’ activities (profile viewing) flagged due to the exis-tence of outliers in a social network. This might limit the usability of social networks, because without



viewing a profile an outlier will not be able to befriend others. Genie also might require a fast com-putation of shortest paths, as for each profile viewing request, it computes all the shortest paths fromviewer to viewee. Intuitively, this operation is too costly in a modern social network (more than onebillion users), even considering the state of the art shortest path algorithms.

Both SpikeStrip and Genie limit crawlers’ ability to quickly aggregate a significant portion of OSNsuser data. Unfortunately, equipped with a large number of user profiles (fake or compromised) andemploying dedicated crawlers for a long time, attackers could still collect a huge amount of users’social data.

9. MITIGATING SOCIAL SPAM

Spam is a news in web-based systems (e.g., [Xie et al., 2006; Ntoulas et al., 2006; Mehta et al., 2008]).However, OSNs have added a new flavor to it by acting as effective tools for spamming activities andpropagation. Social spam (e.g., [Zinman and Donath, 2007], [Lin et al., 2007]) is unwanted content thatis directed specifically at users of the OSN. The worst consequences of social spam include phishingattacks [Jagatic et al., 2007] and malware propagation [Boyd and Heer, 2006].

Spamming activity is pervasive in OSNs and spammers are successful. For example, about 0.13% ofspam tweets in Twitter generate a page visit [Grier et al., 2010], which is only 0.003%-0.006% for spamemail [Kanich et al., 2008]. This high click-through is due to the fact that OSNs expose intrinsic trustrelationship among online friends. As such, users read and click messages or links that are sharedfrom their friends. Study [Bilge et al., 2009] shows that 45% of users on OSNs click on links posted bytheir friends’ accounts.

Defending spam in OSNs can improve user experience. OSN service providers will also be bene-fited as this will lessen the system workload in terms of dealing with unwanted communications andcontents. Defense mechanisms against spam in OSNs can be classified into two categories: 1) spamcontent and profile detection, and 2) spam campaign detection. Spam content and profile-level detec-tion involve checking individual accounts or contents for an evidence of spam contents (Section 9.1).On the other hand, a spam “campaign” is a collection of malicious content having a common goal, forexample, selling backdoor products [Gao et al., 2010] (Section 9.2).

9.1 Spam Content and Profile Detection

Some early spam profile detections [Spitzner, 2002; Webb et al., 2008; Lee et al., 2010] use socialhoneypots. A honeypot is a trap deployed to capture examples of nefarious activities in networkedsystems [Spitzner, 2002]. For years, researchers have used honeypots to characterize malicious hackeractivities [Spitzner, 2003], to obtain footprints of email address crawlers [Prince et al., 2005], and tocreate intrusion detection signatures [Kreibich and Crowcroft, 2004]. Social honeypots are used tomonitor spammers’ behaviors and store their information from the OSNs [Lee et al., 2010].

Webb et al. [Webb et al., 2008] take the first step to characterize spam in OSNs using social honey-pots. They created 51 honeypot MySpace profiles in different geographic locations for harvesting de-ceptive spam profiles on MySpace. An automated program (commonly known as bots) works on behalfof a honeypot profile and collects all of the traffic it receives (via friend requests). After four monthsof the deployment and operation, the bots collected 1,570 friend requests (and corresponding spamprofiles). Through statistical analysis the authors show the followings: (i) spam profiles follow distincttemporal patterns in spamming activity; (ii) 57.2% of the “About me” contents of the spam profiles areduplicated; (iii) spam profiles redirect users to predefined web pages.

In [Lee et al., 2010], the authors also collected spam profiles using social honeypots. But this work isdifferent from the previous one in that it not only collects and characterizes spam profiles, it extractsfeatures from the gathered spam profiles and builds classifiers to detect potential spam profiles. TheACM Computing Surveys Template.


authors consider four categories of features such as demographics, content, activity and connectionsfrom the spam profiles collected from MySpace and Twitter. The paper shows the performance resultsof ten classifiers using the features. For MySpace dataset, each classifier’s accuracy is greater than98.4%, for Twitter dataset, each of the top 10 classifiers achieves an accuracy greater than 82.7%.

One of the limitations of these honeypot-based solutions [Webb et al., 2008; Lee et al., 2010] isthat they consider all profiles that sent friend requests to honeypots are spam profiles. But in socialnetworks, it is common to receive friend requests from unknown person, who might be legitimate usersin the network. The solutions would be more rigorous if legitimate users were not considered. Also, themethods are effective when spammers become friends with the honeypots. Otherwise the honeypotswill be able to target only a small subset of the spammers. Moreover, in social networks, friendship isnot always required for spamming. For example, in twitter, a spammer can use mention (e.g., @user)tag to send spam tweets to a user.

Stringhini et al.’s solution [Stringhini et al., 2010] overcome some limitations of the previous twohoneypot-based papers. The authors deployed honeypots accounts on Facebook, Twitter and MySpace;300 on each platform for about one year and logged the traffic (e.g., friend requests, messages, andinvitations). Combining all, the honeypots received 4,250 friend requests and 85,569 messages fromthe three platforms. They build classifiers from the following six features: (i) FF ratio: the ratio of thenumber of friend requests sent by a user and the number of friends she has; (ii) URL ratio: the ratioof the number of messages containing URLs and total messages; (iii) Message Similarity: similarityamong the messages sent by a user; (iv) Friend Choice: the ratio of the total number of names amongthe profiles’ friends, and the number of distinct first names; (v) Messages Sent: the number of messagessent by a profile as a feature; and (vi) Friend Number: the number of friends a profile has.

The authors manually inspected and labeled profiles as spam and used Random Forest algorithmfor classification. A 10- fold cross validation on the training data set of Facebook yielded an estimatedfalse positive ratio of 2% and a false negative ratio of 1%. Twitter dataset yielded an estimated falsepositive ratio of 2.5% and a false negative ratio of 3%. They detected 15,857 spam profiles on Twitterusing the classifier and the Twitter spam team eventually suspended those accounts.

Benevenuto et al. [Benevenuto et al., 2009b; Benevenuto et al., 2009a] detect video polluters such asspammers and promoters in YouTube online video social networks using machine learning techniques.The authors considered three attribute sets: user attributes, video attributes, and social network (SN)attributes in classification. Four volunteers manually analyzed the videos and built a test set of thedataset labeling users as spammers, promoters and legitimate users. They proposed a flat classificationapproach, which was able to detect correctly 96% of the promoters, 57% of spammers, and wronglyclassifying only 5% of the legitimate users. Interestingly, social network attributes performed the worstin classification—only one feature (UserRank) was within the top 30 features.

9.2 Spam Campaigns Detection

Chu et al. [Chu et al., 2012] detect social spam campaigns on Twitter using tweet URLs. They collecteda dataset of 50 million tweets from 22 million users. They considered tweets having the same URL asa campaign and clustered the dataset into a number of campaigns. The ground truth was producedthrough manual inspection using Twitter’s spam rules and automated URL checking in five services.They obtained a variety of features ranging from individual tweet/account levels to a collective cam-paign level and built a classification model to detect spam campaigns. Using several classificationalgorithms they were able to detect spam campaigns with more than 80% success rate. The focus ofthis solution is spam tweets with URLs. However, Twitter spammers can post tweets without anyURL. Even obfuscated URLs (e.g., somethingDOTcom) will make the detection inefficient.



Gao et al. [Gao et al., 2010] conduct a rigorous and extensive study on detecting spam campaignsin Facebook wall posts. They crawled 187 million Facebook wall posts from about 3.5 million users.Inspired by a study [Kreibich et al., 2009] which shows that spamming bot-nets create email spammessages using templates, they consider wall posts having similar texts as a spam campaign. In par-ticular, they model the wall posts as a graph: a post is a node and two nodes are connected by anedge if they have the same destination URL or their texts are very similar. As such, posts from thesame spam campaign will make connected subgraphs or clusters. To detect which clusters are fromspammers, they use “distribute” coverage and “bursty” natures of spam campaigns. The “distributed”property is characterized based on the number of user accounts posting in the cluster under the intu-ition that spammers will use a significant number of registered accounts for a campaign. The intuitionbehind the “bursty” property is that most spam campaigns are the results of coordinated actions ofmany accounts within short periods of time [Xie et al., 2008]. Using threshold filters on these twoproperties they found 297 clusters of wall posts and classified them as potentially malicious spamcampaigns.

10. MITIGATING DISTRIBUTED DENIAL-OF-SERVICE ATTACKS (DDOS) ATTACKS

A denial-of-service (DOS) attack is characterized by an explicit attempt to monopolize a computerresource, so that an intended user cannot use the resource [Mirkovic et al., 2004]. A Distributed Denial-of-Service attack (DDoS) deploys multiple attacking entities to simultaneously launch the attack (werefer readers [Mirkovic and Reiher, 2004] for a taxonomy of web-based DDoS attacks and defenses).DDoS attacks in social networks are also common. For example, on August 6, 2009, Twitter, Facebook,LiveJournal, Google’s Blogger, and YouTube were attacked by a DDoS attack [McCarthy, 2009]. Twitterexperienced interrupted service for several hours, users were complaining of not being able to sendtheir Tweets. Facebook users were experiencing longer periods of time (delays) in loading Facebookpages.

Several papers evaluated how a social network could be leveraged to launch a bot-net based DDoS onany target of the internet, including the social network itself. Athanasopoulos et al. [Athanasopouloset al., 2008] introduce a bot-net “FaceBot” that uses a social network to carry out a DDoS attack againstany host on the internet (including the social network itself). They created a real-world Facebookapplication, “Photo of the Day”, that presents a different photo from National Geographic to Facebookusers every day. Every time a user clicks on the application, an image from the National Geographicappears. However, they placed special codes in the application’s source code. Every time a user viewsthe photo, this code sends a HTTP request towards a victim host, which causes the victim to servea request of 600 KBytes. They used a web server as a victim and observed that the server recorded6 Mbit per second of traffic. They introduce defense mechanisms which include providing applicationdevelopers with a strict API that is capable of giving access to resources only related to the system.

Ur and Ganapathy [Ur and Ganapathy, 2009] showed how malicious social network users can lever-age their connections with hubs to launch DDoS attacks. They created MySpace profiles which be-friended hubs in the network. Those profiles posted “hotlinks” to large media files hosted by a victimweb server to Hubs’ pages. As hubs receive a large number of hits, a significant number of the visitorswould click those hotlinks. As a consequence, it staged a scenario where a flash crowd was sendingrequests to the victim web server—a denial of service was the result. They proposed several mitigatingtechniques. One approach is to restrict some privileges of a user when he becomes a hub (e.g., friendsof a hub might no longer be able to post comments containing HTML tags to the hub’s page). But thisapproach unfortunately restricts the user’s freedom on the OSN. So, they propose a focused automatedmonitoring on a hub or creating a hierarchy of a hub’s friend, so that only close friends will be able topost on a Hub’s profile (the intuition is that close friends will not exploit the hub). Furthermore, theyACM Computing Surveys Template.


recommend a reputation based system for social networks that scores user behavior. Only users witha higher reputation scores are allowed to post on the Hub’s profile.

However, bot-net based DDoS attacks are difficult to mitigate, because of the difficulty to distinguishlegitimate communications from those that are part of the attack. As social networks are flourishing,bot-net based DDoS attacks are becoming stronger, because more legitimate users are unwillinglybecoming part of an attack.

11. MITIGATING MALWARE ATTACKS

Malicious software (malware) is a program that is specifically designed to gain access, disrupt com-puter operation, gather sensitive information or damage a computer without the knowledge of theowner. Participatory internet technologies (e.g., AJAX) and applications (e.g., RSS) have expedited mal-ware attacks, because they enable the participation of the users. OSNs (all of them use participatorytechnologies and applications) are providing themselves as infrastructures for propagating malware.The “Koobface” is probably the best example of malware propagation using social networks [Facebook,2012]. It spread rapidly through Facebook social networks. The malware used Facebook credentials ona compromised computer and sent messages to the owner’s Facebook friends. The messages redirectedthe owner’s friends to a third-party website and they were asked to download an update of the AdobeFlash player. If they would download and install the file, Koobface would install and infect their systemusing the same process.

In a survey, Gao et al. [Gao et al., 2011] discuss a number of methods in which malware propagatesthrough social networks. For example, using cross-site request forgery (CSRF or XSRF) malware in-vites legitimate users to click on a link. If a user clicks, it opens an exploited page containing maliciousscripts. Eventually, the malware submits a message with a URL for a wall post on the user’s profileand clicks on the “Share” button so that all of her friends can see this message as well as the link.URL obfuscations are also widely used for malware attacks. An attacker uses commonly known URLshorteners to obfuscate the true location of a link and lures other users to click it.

Unfortunately, malware propagation on social networks exhibits unique propagation vectors. Assuch, existing Internet worm detection techniques (e.g., [Ellis et al., 2004]) cannot be applied to them.In the context of OSNs, Xu et al. [Xu et al., 2010b] proposed an OSN malware detection system byleveraging both the propagation characteristics of the malware and the topological properties of OSNs.They introduced a “maximum coverage algorithm” that picks a subset of legitimate OSN users to whomthe defense system attaches “decoy friends” to monitor the entire social graph. When the decoy friendsreceive suspicious malware propagation evidence, the detection system performs local and networkcorrelations to distinguish actual malware evidence from normal user communication. However, thechallenge for this honeypot-based approach is to determine how many social honeypots (in this contextdecoy friends) large-scale OSNs (e.g., billions of Facebook users) should deploy.

12. SUMMARY AND DISCUSSION

Millions of Internet users are using OSNs for communication and collaboration. Many companies relyon OSNs for promoting their products and influencing the market. It becomes harder and harder toimagine life without the use of OSN tools, whether for creating an image of oneself or organization,for selectively following news as filtered by the group of friends, or for keeping in touch. However, thegrowing reliance on OSNs is impaired by an increasingly more sophisticated range of attacks thatundermine the very usefulness of the OSNs.

This paper reviews online social networks’ privacy and security issues. We have categorized variousattacks on OSNs based on social network stakeholders and the forms of attack targeted at them.Specifically, we have categorized those attacks as attacks on users and attacks on the OSN. We have



discussed how the attacks are launched, what are the available defense techniques and what are thechallenges involved in such defenses.

In online social networks, privacy and security issues are not separable. In some contexts privacyand security goals may be the same, but there are other contexts where they may be orthogonal,and there are also contexts where they are in conflict. For example, in an OSN, a user wants privacywhen she is communicating with other users though the messaging service. She will expect that non-recipients of the message will not be able to read it. OSN services will ensure this by providing a securecommunication channel. In this context, the goals of security and privacy are the same. Consideranother context where there is a security goal of authenticating a user’s account. OSNs usually do thisby sending an activation link as a message to the user’s e-mail address. This is not a privacy issue—OSNs are just securely authenticating that malicious users are not using the legitimate user’s e-mailto register. In this context, security and privacy goals are orthogonal. However, anonymous views inOSNs (e.g., LinkedIn) present a context where security and privacy goals are in conflict. Users maywant to have privacy (e.g., anonymization) while viewing other users’ profiles. However, the vieweemight want to secure her profile from anonymous viewing.

The attacks discussed in this paper are often closely intertwined. User data collected though crawl-ing attacks or via social applications may help an attacker to create background knowledge for launch-ing de-anonymization attacks. An attacker might possess an unprecedented number of user accountsusing malware and Sybil attacks and could use those accounts for social spam propagation and dis-tributed denial-of-service attacks. Social spam can also be used to propagate malware. Some attacksmight be a pre-requisite for another attacks. For example, a de-anonymization attack can reveal theidentity of an individual. That identity could be used to launch an inference attack and to learn un-specified attributes of an individual.

There are also several functionality-oriented attacks that we did not discuss in this paper. Functionality-oriented attacks attempt to exploit specific functionalities of a social network. For example, Location-based Services (LSP) such as Foursquare, Loopt and Facebook Places utilize geo-location informationto publish users’ checked-in places. In some LSP, users can accumulate “points” for “checking in” atcertain venues or locations and can get real-world discounts or freebies in exchange for these points.There is a body of research that analyzes the technical feasibility of anonymous usage of location-basedservices so that users are not impacted by location sharing [Gruteser and Grunwald, 2003; Xiao andTao, 2006]. Moreover, real-world rewards and discounts give incentives for users in LSP to cheat ontheir locations, and hence research [Zhu and Cao, 2011; He et al., 2011] has focused on how to preventusers from location cheating.

OSNs and social applications are here to stay, and while they mature, new security and privacy at-tacks will take shape. Technical advances in this area can only be of limited effect if not supportedby legislative measures for protecting the user from other users and from the service providers [Nis-senbaum, 2011].

REFERENCES

Acohido, B. (2009). Cybergangs use cheap labor to break codes on social sites. http://usatoday30.usatoday.com/tech/news/computersecurity/2009-04-22-captcha-code-breakers N.htm/.

Acquisti, A. and Gross, R. (2006). Imagined communities: Awareness, information sharing, and privacy on the facebook. InPrivacy enhancing technologies, pages 36–58. Springer.

Adar, E., Weld, D. S., Bershad, B. N., and Gribble, S. S. (2007). Why we search: visualizing and predicting user behavior. InProceedings of the 16th international conference on World Wide Web, WWW ’07, pages 161–170, New York, NY, USA. ACM.

Adu-Oppong, F., Gardiner, C. K., Kapadia, A., and Tsang, P. P. (2008). Social circles: Tackling privacy in social networks. InSymposium on Usable Privacy and Security (SOUPS).



Aiello, L., Milanesio, M., Ruffo, G., and Schifanella, R. (2008). Tempering kademlia with a robust identity based system. InPeer-to-Peer Computing , 2008. P2P ’08. Eighth International Conference on, pages 30–39.

Aiello, L. M. and Ruffo, G. (2012). Lotusnet: tunable privacy for distributed online social network services. Computer Commu-nications, 35(1):75–88.

Alexa (2014). The top 500 sites on the web. http://www.alexa.com/topsites.Arrington, M. (2006). Aol proudly releases massive amounts of private data. http://techcrunch.com/2006/08/06/

aol-proudly-releases-massive-amounts-of-user-search-data/.Athanasopoulos, E., Makridakis, A., Antonatos, S., Antoniades, D., Ioannidis, S., Anagnostakis, K. G., and Markatos, E. P. (2008).

Antisocial networks: Turning a social network into a botnet. In 11th International Conference on Information Security, pages146–160. Springer.

Backstrom, L., Dwork, C., and Kleinberg, J. (2007). Wherefore art thou r3579x?: anonymized social networks, hidden patterns,and structural steganography. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages181–190, New York, NY, USA. ACM.

Baden, R., Bender, A., Spring, N., Bhattacharjee, B., and Starin, D. (2009). Persona: an online social network with user-definedprivacy. In Proceedings of the ACM SIGCOMM 2009 conference on Data communication, SIGCOMM ’09, pages 135–146, NewYork, NY, USA. ACM.

Banks, L. and Wu, S. (2009). All friends are not created equal: An interaction intensity based approach to privacy in online socialnetworks. In Computational Science and Engineering, 2009. CSE ’09. International Conference on, volume 4, pages 970–974.

BBC (2012). Facebook has more than 83 million illegitimate accounts. http://www.bbc.co.uk/news/technology-19093078.Benevenuto, F., Rodrigues, T., Almeida, J., Goncalves, M., and Almeida, V. (2009a). Detecting spammers and content promoters

in online video social networks. In INFOCOM Workshops 2009, IEEE, pages 1–2.Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., and Goncalves, M. (2009b). Detecting spammers and content promoters

in online video social networks. In Proceedings of the 32nd international ACM SIGIR conference on Research and developmentin information retrieval, SIGIR ’09, pages 620–627, New York, NY, USA. ACM.

Besmer, A., Lipford, H. R., Shehab, M., and Cheek, G. (2009). Social applications: exploring a more secure framework. InProceedings of the 5th Symposium on Usable Privacy and Security, SOUPS ’09, pages 2:1–2:10, New York, NY, USA. ACM.

Bilge, L., Strufe, T., Balzarotti, D., and Kirda, E. (2009). All your contacts are belong to us: automated identity theft attacks onsocial networks. In Proceedings of the 18th international conference on World wide web, WWW ’09, pages 551–560, New York,NY, USA. ACM.

Bonneau, J., Anderson, J., and Danezis, G. (2009). Prying data out of a social network. In Social Network Analysis and Mining,2009. ASONAM ’09. International Conference on Advances in, pages 249–254.

Bonneau, J. and Preibusch, S. (2010). The privacy jungle: On the market for data protection in social networks. In Economics ofinformation security and privacy, pages 121–167. Springer.

Boshmaf, Y., Muslukhov, I., Beznosov, K., and Ripeanu, M. (2011). The socialbot network: when bots socialize for fame andmoney. In Proceedings of the 27th Annual Computer Security Applications Conference, ACSAC ’11, pages 93–102, New York,NY, USA. ACM.

Boyd, D. (2004). Friendster and publiclly articulated social networking. In Extended Abstracts of the Conference on HumanFactors and Computing Systems (CHI 2004), pages 1279–1282.

Boyd, D. and Heer, J. (2006). Profiles as conversation: Networked identity performance on friendster. In System Sciences, 2006.HICSS ’06. Proceedings of the 39th Annual Hawaii International Conference on, volume 3, pages 59c–59c.

Boyd, S., Ghosh, A., Prabhakar, B., and Shah, D. (2005). Gossip algorithms: design, analysis and applications. In INFOCOM2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings IEEE, volume 3, pages1653 – 1664 vol. 3.

Bradley, T. (2012). 45,000 facebook accounts compromised: What to know. http://bit.ly/TUY3i8.Brandes, U. (2001). A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 40(2):163–177.Buchegger, S., Schioberg, D., Vu, L.-H., and Datta, A. (2009). Peerson: P2p social networking: early experiences and insights. In

Proceedings of the Second ACM EuroSys Workshop on Social Network Systems, SNS ’09, pages 46–52, New York, NY, USA.ACM.

Campan, A. and Truta, T. M. (2009). Data and structural k-anonymity in social networks. In Privacy, Security, and Trust inKDD, pages 33–54. Springer.

Cao, Q., Sirivianos, M., Yang, X., and Pregueiro, T. (2012). Aiding the detection of fake accounts in large scale social onlineservices. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, NSDI’12, pages15–15, Berkeley, CA, USA. USENIX Association.



Carminati, B., Ferrari, E., Heatherly, R., Kantarcioglu, M., and Thuraisingham, B. (2009). A semantic web based framework forsocial network access control. In Proceedings of the 14th ACM symposium on Access control models and technologies, SACMAT’09.

Carminati, B., Ferrari, E., and Perego, A. (2006). Rule-based access control for social networks. In Proceedings of the 2006international conference on On the Move to Meaningful Internet Systems, pages 1734–1744.

Cellan-Jones, R. (2012). Facebook ’likes’ and adverts’ value doubted. http://www.bbc.co.uk/news/technology-18813237.Chiluka, N., Andrade, N., Pouwelse, J., and Sips, H. (2012). Leveraging trust and distrust for sybil-tolerant voting in online

social media. In Proceedings of the 1st Workshop on Privacy and Security in Online Social Media, PSOSM ’12, pages 1:1–1:8,New York, NY, USA. ACM.

Choi, H. C., Kruk, S. R., Grzonkowski, S., Stankiewicz, K., Davis, B., and Breslin, J. (2006). Trust models for community awareidentity management. In Proceedings of the Identity, Reference and Web Workshop, in conjunction with WWW 2006, page140154.

Chu, Z., Widjaja, I., and Wang, H. (2012). Detecting social spam campaigns on twitter. In In Proceedings of Conference onApplied Cryptography and Network Security, Lecture Notes in Computer Science, pages 455–472. Springer Berlin Heidelberg.

Conti, M., Hasani, A., and Crispo, B. (2011). Virtual private social networks. In Proceedings of the first ACM conference on Dataand application security and privacy, CODASPY ’11, pages 39–50, New York, NY, USA. ACM.

Cummings, J. N., Butler, B., and Kraut, R. (2002). The quality of online social relationships. Commun. ACM, 45(7):103–108.Cutillo, L., Molva, R., and Strufe, T. (2009a). Privacy preserving social networking through decentralization. In Wireless On-

Demand Network Systems and Services, 2009. WONS 2009. Sixth International Conference on, pages 145–152.Cutillo, L., Molva, R., and Strufe, T. (2009b). Safebook: A privacy-preserving online social network leveraging on real-life trust.

Communications Magazine, IEEE, 47(12):94–101.Dam, W. B. (2009). School teacher suspended for facebook gun photo. http://www.foxnews.com/story/2009/02/05/

schoolteacher-suspended-for-facebook-gun-photo/.Dandekar, P., Goel, A., Wellman, M. P., and Wiedenbeck, B. (2012). Strategic formation of credit networks. In Proceedings of the

21st international conference on World Wide Web, WWW ’12, pages 559–568, New York, NY, USA. ACM.Danezis, G. (2009). Inferring privacy policies for social networking services. In Proceedings of the 2nd ACM workshop on Security

and artificial intelligence, AISec ’09, pages 5–10, New York, NY, USA. ACM.Danezis, G. and Mittal., P. (2009). Sybilinfer: Detecting sybil nodes using social networks. In Network and Distributed System

Security Symposium (NDSS).DeFigueiredo, D. and Barr, E. (July). Trustdavis: a non-exploitable online reputation system. In E-Commerce Technology, 2005.

CEC 2005. Seventh IEEE International Conference on, pages 274–283.Devriese, D. and Piessens, F. (2010). Noninterference through secure multi-execution. In Proceedings of the 2010 IEEE Sympo-

sium on Security and Privacy, SP ’10, pages 109–124, Washington, DC, USA. IEEE Computer Society.Dey, R., Tang, C., Ross, K., and Saxena, N. (2012). Estimating age privacy leakage in online social networks. In INFOCOM,

2012 Proceedings IEEE, pages 2836–2840.Douceur, J. (2002). The sybil attack. In Druschel, P., Kaashoek, F., and Rowstron, A., editors, Peer-to-Peer Systems, volume 2429

of Lecture Notes in Computer Science, pages 251–260. Springer Berlin Heidelberg.Dwyer, C. (2011). Privacy in the age of google and facebook. Technology and Society Magazine, IEEE, 30(3):58–63.Egele, M., Moser, A., Kruegel, C., and Kirda, E. (2012). Pox: Protecting users from malicious facebook applications. Computer

Communications, 35(12).Elahi, N., Chowdhury, M., and Noll, J. (2008). Semantic access control in web based communities. In Proceedings of the 2008

The Third International Multi-Conference on Computing in the GlobalInformation Technology, pages 131 –136.Ellis, D. R., Aiken, J. G., Attwood, K. S., and Tenaglia, S. D. (2004). A behavioral approach to worm detection. In Proceedings of

the 2004 ACM workshop on Rapid malcode, WORM ’04, pages 43–53, New York, NY, USA. ACM.Engelmore, R., editor (1988). Readings from the AI magazine. American Association for Artificial Intelligence, Menlo Park, CA,

USA.Facebook (2012). Facebook’s continued fight against koobface. http://on.fb.me/y5ibe1.Facebook (2015). Statement of rights and responsibilities. https://www.facebook.com/legal/terms.Fang, L. and LeFevre, K. (2010). Privacy wizards for social networking sites. In Proceedings of the 19th international conference

on World wide web, WWW ’10, pages 351–360, New York, NY, USA. ACM.Felt, A. and Evans, D. (2008). Privacy protection for social networking apis. In 2008 Web 2.0 Security and Privacy (W2SP08).Fiesler, C. and Bruckman, A. (2014). Copyright terms in online creative communities. In CHI’14 Extended Abstracts on Human

Factors in Computing Systems, pages 2551–2556. ACM.



Flaxman, A. (2008). Expansion and lack thereof in randomly perturbed graphs. Algorithms and Models for the Web-Graph,4936:24–35.

Fong, P. (2011a). Preventing sybil attacks by privilege attenuation: A design principle for social network systems. In Securityand Privacy (SP), 2011 IEEE Symposium on, pages 263–278.

Fong, P. W. (2011b). Relationship-based access control: protection model and policy language. In Proceedings of the first ACMconference on Data and application security and privacy, pages 191–202.

Gao, H., Hu, J., Huang, T., Wang, J., and Chen, Y. (2011). Security issues in online social networks. Internet Computing, IEEE,15(4):56–63.

Gao, H., Hu, J., Wilson, C., Li, Z., Chen, Y., and Zhao, B. Y. (2010). Detecting and characterizing social spam campaigns. InProceedings of the 10th ACM SIGCOMM conference on Internet measurement, IMC ’10, pages 35–47, New York, NY, USA.ACM.

Ghosh, A., Mahdian, M., Reeves, D., Pennock, D., and Fugger, R. (2007). Mechanism design on trust networks. In Deng, X.and Graham, F., editors, Internet and Network Economics, volume 4858 of Lecture Notes in Computer Science, pages 257–268.Springer Berlin Heidelberg.

Giunchiglia, F., Zhang, R., and Crispo, B. (2008). Relbac: Relation based access control. In Fourth International Conference onSemantics, Knowledge and Grid, pages 3 –11.

Graffi, K., Gross, C., Stingl, D., Hartung, D., Kovacevic, A., and Steinmetz, R. (2011). Lifesocial.kom: A secure and p2p-basedsolution for online social networks. In Consumer Communications and Networking Conference (CCNC), 2011 IEEE, pages554–558.

Grier, C., Thomas, K., Paxson, V., and Zhang, M. (2010). @spam: the underground on 140 characters or less. In Proceedings ofthe 17th ACM conference on Computer and communications security, CCS ’10, pages 27–37, New York, NY, USA. ACM.

Gross, R. and Acquisti, A. (2005a). Information revelation and privacy in online social networks. In Proceedings of the 2005ACM workshop on Privacy in the electronic society, WPES ’05, pages 71–80, New York, NY, USA. ACM.

Gross, R. and Acquisti, A. (2005b). Information revelation and privacy in online social networks. In Proceedings of the 2005ACM workshop on Privacy in the electronic society, pages 71–80.

Gruteser, M. and Grunwald, D. (2003). Anonymous usage of location-based services through spatial and temporal cloaking. InProceedings of the 1st international conference on Mobile systems, applications and services, MobiSys ’03, pages 31–42, NewYork, NY, USA. ACM.

Gubichev, A., Bedathur, S., Seufert, S., and Weikum, G. (2010). Fast and accurate estimation of shortest paths in large graphs. InProceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10, pages 499–508,New York, NY, USA. ACM.

Guha, S., Tang, K., and Francis, P. (2008). Noyb: privacy in online social networks. In Proceedings of the first workshop on Onlinesocial networks, WOSN ’08, pages 49–54, New York, NY, USA. ACM.

Hay, M., Miklau, G., Jensen, D., Towsley, D., and Weis, P. (2008). Resisting structural re-identification in anonymized socialnetworks. Proc. VLDB Endow., 1(1):102–114.

He, W., Liu, X., and Ren, M. (2011). Location cheating: A security challenge to location-based social network services. InDistributed Computing Systems (ICDCS), 2011 31st International Conference on, pages 740–749. IEEE.

Heatherly, R., Kantarcioglu, M., and Thuraisingham, B. (2013). Preventing private information inference attacks on socialnetworks. Knowledge and Data Engineering, IEEE Transactions on, 25(8):1849–1862.

Heymann, P., Koutrika, G., and Garcia-Molina, H. (2007). Fighting spam on social web sites: A survey of approaches and futurechallenges. IEEE Internet Computing, 11(6):36–45.

Holt, R. (2013). Twitter in numbers. http://www.telegraph.co.uk/technology/twitter/9945505/Twitter-in-numbers.html.Hwang, T., Pearce, I., and Nanis, M. (2012). Socialbots: Voices from the fronts. interactions, 19(2):38–45.Irani, D., Balduzzi, M., Balzarotti, D., Kirda, E., and Pu, C. (2011). Reverse social engineering attacks in online social networks.

In Detection of Intrusions and Malware, and Vulnerability Assessment, pages 55–74. Springer.Irani, D., Webb, S., and Pu, C. (2010). Study of static classification of social spam profiles in myspace. In Proceedings of the 4th

International Conference on Weblogs and Social Media.Jagatic, T. N., Johnson, N. A., Jakobsson, M., and Menczer, F. (2007). Social phishing. Commun. ACM, 50(10):94–100.Jansen, B. J. and Booth, D. (2010). Classifying web queries by topic and user intent. In CHI ’10 Extended Abstracts on Human

Factors in Computing Systems, CHI EA ’10, pages 4285–4290, New York, NY, USA. ACM.Jurek, M. (2011). Google explores +1 button to influence search results. http://www.tekgoblin.com/2011/08/29/

google-explores-1-button-to-influence-search-results/.



Kanich, C., Kreibich, C., Levchenko, K., Enright, B., Voelker, G. M., Paxson, V., and Savage, S. (2008). Spamalytics: an empiricalanalysis of spam marketing conversion. In Proceedings of the 15th ACM conference on Computer and communications security,CCS ’08, pages 3–14, New York, NY, USA. ACM.

Kayes, I. and Iamnitchi, A. (2013a). Aegis: A semantic implementation of privacy as contextual integrity in social ecosystems.In 11th International Conference on Privacy, Security and Trust (PST).

Kayes, I. and Iamnitchi, A. (2013b). Out of the wild: On generating default policies in social ecosystems. In IEEE ICC’13 -Workshop on Beyond Social Networks: Collective Awareness.

Kelly, S. (2008). Identity ‘at risk on facebook. http://news.bbc.co.uk/2/hi/programmes/click online/7375772.stm.Klimt, B. and Yang, Y. (2004). The enron corpus: A new dataset for email classification research. In 15th European Conference

on Machine Learning, pages 217–226. Springer.Kourtellis, N., Finnis, J., Anderson, P., Blackburn, J., Borcea, C., and Iamnitchi, A. (2010). Prometheus: User-controlled p2p

social data management for socially-aware applications. In 11th International Middleware Conference.Kreibich, C. and Crowcroft, J. (2004). Honeycomb: creating intrusion detection signatures using honeypots. SIGCOMM Comput.

Commun. Rev., 34(1):51–56.Kreibich, C., Kanich, C., Levchenko, K., Enright, B., Voelker, G. M., Paxson, V., and Savage, S. (2009). Spamcraft: An inside look

at spam campaign orchestration. Proc. of 2nd USENIX LEET.Krishnamurthy, B., Gill, P., and Arlitt, M. (2008). A few chirps about twitter. In Proceedings of the first workshop on Online

social networks, pages 19–24.Krishnamurthy, B. and Wills, C. E. (2008). Characterizing privacy in online social networks. In Proceedings of the first workshop

on Online social networks, pages 37–42.Kruk, S. (2004). Foaf-realm: control your friends access to the resource. In In Proceedings of the 1st Workshop on Friend of a

Friend.Kwak, H., Lee, C., Park, H., and Moon, S. (2010). What is twitter, a social network or a news media? In Proceedings of the 19th

international conference on World wide web, WWW ’10, pages 591–600, New York, NY, USA. ACM.Langville, A. N. and Meyer, C. D. (2004). Deeper inside pagerank. Internet Mathematics, 1:2004.Lee, K., Caverlee, J., and Webb, S. (2010). Uncovering social spammers: social honeypots + machine learning. In Proceedings

of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’10, pages435–442, New York, NY, USA. ACM.

Lenhart, A., Purcell, K., Smith, A., and Zickuhr, K. (2010). Social media and young adults. http://bit.ly/cQdgi3.Lewis, D. D. and Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the 17th annual

international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’94, pages 3–12, New York,NY, USA. Springer-Verlag New York, Inc.

Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., and Tseng, B. L. (2007). Splog detection using self-similarity analysis on blogtemporal dynamics. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb’07, pages 1–8, New York, NY, USA. ACM.

Lindamood, J., Heatherly, R., Kantarcioglu, M., and Thuraisingham, B. (2009). Inferring private information using social net-work data. In Proceedings of the 18th international conference on World wide web, WWW ’09, pages 1145–1146, New York,NY, USA. ACM.

Lipford, H. R., Besmer, A., and Watson, J. (2008). Understanding privacy settings in facebook with an audience view. UPSEC,8:1–8.

Liu, K. and Terzi, E. (2008). Towards identity anonymization on graphs. In Proceedings of the 2008 ACM SIGMOD internationalconference on Management of data, SIGMOD ’08, pages 93–106, New York, NY, USA. ACM.

Liu, Y., Gummadi, K. P., Krishnamurthy, B., and Mislove, A. (2011). Analyzing facebook privacy settings: user expectations vs.reality. In Proceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, pages 61–70.

Lotan, G., Graeff, E., Ananny, M., Gaffney, D., Pearce, I., et al. (2011). The arab spring— the revolutions were tweeted: Informa-tion flows during the 2011 tunisian and egyptian revolutions. International Journal of Communication, 5:31.

Lucas, M. M. and Borisov, N. (2008). Flybynight: mitigating the privacy risks of social networking. In Proceedings of the 7thACM workshop on Privacy in the electronic society, WPES ’08, pages 1–8, New York, NY, USA. ACM.

Luo, W., Xie, Q., and Hengartner, U. (2009). Facecloak: An architecture for user privacy on social networking sites. In Computa-tional Science and Engineering, 2009. CSE ’09. International Conference on, volume 3, pages 26–33.

Madejski, M., Johnson, M. L., and Bellovin, S. M. (2011). The failure of online social network privacy settings. Department ofComputer Science, Columbia University.

Mail, D. (2011). Bank worker fired for facebook post comparing her 7-an-hour wage to lloyds boss’s 4, 000-an-hour salary.http://dailym.ai/fjRTlC.



Marinando (2010). Tuenti, spain’s leading social network, switches on local for a location-based future. http://techcrunch.com/2010/03/25/tuenti-spains leading social network mixes local into social in a very big way/.

Masoumzadeh, A. and Joshi, J. (2011). Ontology-based access control for social network systems. IJIPSI, 1(1):59–78.McCarthy, C. (2009). Twitter crippled by denial-of-service attack. http://news.cnet.com/8301-13577 3-10304633-36.html.Mehta, B., Nangia, S., Gupta, M., and Nejdl, W. (2008). Detecting image spam using visual features and near duplicate detection.

In Proceedings of the 17th international conference on World Wide Web, WWW ’08, pages 497–506, New York, NY, USA. ACM.Mills, E. (2008). Facebook suspends app that permitted peephole. http://news.cnet.com/8301-10784\ 3-9977762-7.html.Mirkovic, J., Dietrich, S., Dittrich, D., and Reiher, P. (2004). Internet Denial of Service: Attack and Defense Mechanisms (Radia

Perlman Computer Networking and Security). Prentice Hall PTR, Upper Saddle River, NJ, USA.Mirkovic, J. and Reiher, P. (2004). A taxonomy of ddos attack and ddos defense mechanisms. ACM SIGCOMM Computer

Communication Review, 34(2):39–53.Mishra, N., Schreiber, R., Stanton, I., and Tarjan, R. E. (2007). Clustering social networks. In Proceedings of the 5th international

conference on Algorithms and models for the web-graph, WAW’07, pages 56–67, Berlin, Heidelberg. Springer-Verlag.Mislove, A., Post, A., Druschel, P., and Gummadi, K. P. (2008). Ostra: leveraging trust to thwart unwanted communication.

In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, NSDI’08, pages 15–30,Berkeley, CA, USA. USENIX Association.

Mohaisen, A., Yun, A., and Kim, Y. (2010). Measuring the mixing time of social graphs. In Proceedings of the 10th ACMSIGCOMM conference on Internet measurement, IMC ’10, pages 383–389, New York, NY, USA. ACM.

Mondal, M., Viswanath, B., Clement, A., Druschel, P., Gummadi, K. P., Mislove, A., and Post, A. (2012). Defending againstlarge-scale crawls in online social networks. In Proceedings of the 8th ACM International Conference on emerging NetworkingEXperiments and Technologies (CoNEXT’12), Nice, France.

Motoyama, M., McCoy, D., Levchenko, K., Savage, S., and Voelker, G. M. (2011). Dirty jobs: the role of freelance labor in webservice abuse. In Proceedings of the 20th USENIX conference on Security, SEC’11, pages 14–14, Berkeley, CA, USA. USENIXAssociation.

Murphy, S. (2010). Teens ditch e-mail for texting and facebook. http://www.nbcnews.com/id/38585236/.Narayanan, A., Shi, E., and Rubinstein, B. I. (2011). Link prediction by de-anonymization: How we won the kaggle social

network challenge. In Neural Networks (IJCNN), The 2011 International Joint Conference on, pages 1825–1834. IEEE.Narayanan, A. and Shmatikov, V. (2009). De-anonymizing social networks. In Security and Privacy, 2009 30th IEEE Symposium

on, pages 173–187.Nazir, A., Raza, S., Chuah, C.-N., and Schipper, B. (2010). Ghostbusting facebook: detecting and characterizing phantom profiles

in online social gaming applications. In Proceedings of the 3rd conference on Online social networks, WOSN’10, pages 1–1,Berkeley, CA, USA. USENIX Association.

Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.Nielsen (2012). Social networks & blogs now 4th most popular online activity, ahead of personal e-mail. http://www.nielsen.com/

us/en/press-room/2009/social networks .html.Nissenbaum, H. (2004). Privacy as contextual integrity. Washington Law Review, 79(1):119–158.Nissenbaum, H. (2011). A contextual approach to privacy online. Daedalus, 140(4):32–48.Ntoulas, A., Najork, M., Manasse, M., and Fetterly, D. (2006). Detecting spam web pages through content analysis. In Proceed-

ings of the 15th international conference on World Wide Web, WWW ’06, pages 83–92, New York, NY, USA. ACM.Nunes, S., Ribeiro, C., and David, G. (2008). Use of temporal expressions in web search. In Proceedings of the IR research, 30th

European conference on Advances in information retrieval, ECIR’08, pages 580–584, Berlin, Heidelberg. Springer-Verlag.Paul, T., Stopczynski, M., Puscher, D., Volkamer, M., and Strufe, T. (2012). C4ps - helping facebookers manage their privacy

settings. In Social Informatics, pages 188–201.Peterson, R. S. and Sirer, E. G. (2009). Antfarm: efficient content distribution with managed swarms. In Proceedings of the 6th

USENIX symposium on Networked systems design and implementation, pages 107–122.Piatek, M., Isdal, T., Krishnamurthy, A., and Anderson, T. (2008). One hop reputations for peer to peer file sharing workloads.

In NSDI’08.Post, A., Shah, V., and Mislove, A. (2011). Bazaar: strengthening user reputations in online marketplaces. In Proceedings of

the 8th USENIX conference on Networked systems design and implementation, NSDI’11, pages 14–14, Berkeley, CA, USA.USENIX Association.

Prince, M., Dahl, B., Holloway, L., Keller, A., and Langheinrich, E. (2005). Understanding how spammers steal your e-mailaddress: An analysis of the first six months of data from project honey pot. In Second Conference on Email and Anti-Spam.



Quercia, D. and Hailes, S. (2010). Sybil attacks against mobile users: Friends and foes to the rescue. In INFOCOM, 2010Proceedings IEEE, pages 1–5.

Ratkiewicz, J., Conover, M., Meiss, M., Goncalves, B., Flammini, A., and Menczer, F. (2011). Detecting and tracking politicalabuse in social media. In Proc. of ICWSM.

Reynaert, T., De Groef, W., Devriese, D., Desmet, L., and Piessens, F. (2012). Pesap: A privacy enhanced social applicationplatform. In Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conferneceon Social Computing (SocialCom), pages 827–833.

Riley, D. (2007). Stat gaming services come to youtube. http://www.bbc.co.uk/news/technology-18813237.Russell, S. J., Norvig, P., Canny, J. F., Malik, J. M., and Edwards, D. D. (1995). Artificial intelligence: a modern approach,

volume 74. Prentice hall Englewood Cliffs.Saltzer, J. and Schroeder, M. (1975). The protection of information in computer systems. Proceedings of the IEEE, 63(9):1278–

1308.Shakimov, A., Lim, H., Caceres, R., Cox, L., Li, K., Liu, D., and Varshavsky, A. (2011). Vis- a-vis: Privacy-preserving online social

networking via virtual individual servers. In Communication Systems and Networks (COMSNETS), 2011 Third InternationalConference on, pages 1–10.

Shehab, M., Cheek, G., Touati, H., Squicciarini, A., and Cheng, P.-C. (2010). User centric policy management in online socialnetworks. In IEEE International Symposium on Policies for Distributed Systems and Networks (POLICY), pages 9 –13.

Shetty, J. and Adibi, J. (2005). Discovering important nodes through graph entropy the case of enron email database. InProceedings of the 3rd international workshop on Link discovery, LinkKDD ’05, pages 74–81, New York, NY, USA. ACM.

Simpson, A. (2008). On the need for user-defined fine-grained access control policies for social networking applications. InProceedings of the workshop on Security in Opportunistic and SOCial networks, SOSOC ’08, pages 1:1–1:8, New York, NY,USA. ACM.

Singh, K., Bhola, S., and Lee, W. (2009). xbook: redesigning privacy control in social networking platforms. In Proceedings of the18th conference on USENIX security symposium, SSYM’09, pages 249–266, Berkeley, CA, USA. USENIX Association.

Spitzner, L. (2002). Honeypots tracking hackers. Addison-Wesley, 1 edition.Spitzner, L. (2003). The honeynet project: trapping the hackers. Security Privacy, IEEE, 1(2):15–23.Squicciarini, A., Paci, F., and Sundareswaran, S. (2010). Prima: an effective privacy protection mechanism for social networks.

In Proceedings of the 5th ACM Symposium on Information, Computer and Communications Security, pages 320–323.Staff, E. (2010). Verisign: 1.5m facebook accounts for sale in web forum. http://www.pcmag.com/article2/0,2817,2363004,00.asp.Steel, E. and Fowler, G. A. (2010). Facebook in privacy breach. http://online.wsj.com/article/

SB10001424052702304772804575558484075236968.html.Stein, T., Chen, E., and Mangla, K. (2011). Facebook immune system. In Proceedings of the 4th Workshop on Social Network

Systems, SNS ’11, pages 8:1–8:8, New York, NY, USA. ACM.Strater, K. and Lipford, H. R. (2008). Strategies and struggles with privacy in an online social networking community. In

Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction -Volume 1, BCS-HCI ’08, pages 111–119, Swinton, UK, UK. British Computer Society.

Stringhini, G., Kruegel, C., and Vigna, G. (2010). Detecting spammers on social networks. In Proceedings of the 26th AnnualComputer Security Applications Conference, ACSAC ’10, pages 1–9, New York, NY, USA. ACM.

Stringhini, G., Wang, G., Egele, M., Kruegel, C., Vigna, G., Zheng, H., and Zhao, B. Y. (2013). Follow the green: Growth anddynamics in twitter follower markets. In Proceedings of the 2013 Conference on Internet Measurement Conference, IMC ’13,pages 163–176, New York, NY, USA. ACM.

Strufe, T. (2010). Profile popularity in a business-oriented online social network. In Proceedings of the 3rd Workshop on SocialNetwork Systems, SNS ’10, pages 2:1–2:6, New York, NY, USA. ACM.

Sweeney, L. (2000). Uniqueness of simple demographics in the us population. Carnegie Mellon University Laboratory forInternational Data Privacy.

Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05):557–570.

Thomas, K., Grier, C., Song, D., and Paxson, V. (2011). Suspended accounts in retrospect: an analysis of twitter spam. InProceedings of the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ’11, pages 243–258, New York,NY, USA. ACM.

Tran, N., Min, B., Li, J., and Subramanian, L. (2009). Sybil-resilient online content voting. In In Proceedings of the 6th Sympo-sium on Networked System Design and Implementation (NSDI.

Tsuchiya, P. F. (1988). The landmark hierarchy: a new hierarchy for routing in very large networks. SIGCOMM Comput.Commun. Rev., 18(4):35–42.



Ur, B. E. and Ganapathy, V. (2009). Evaluating attack amplification in online social networks. In Proceedings of the 2009 Web2.0 Security and Privacy Workshop. Citeseer.

Viswanath, B., Mondal, M., Clement, A., Druschel, P., Gummadi, K., Mislove, A., and Post, A. (2012a). Exploring the design spaceof social network-based sybil defenses. In Communication Systems and Networks (COMSNETS), 2012 Fourth InternationalConference on, pages 1–8.

Viswanath, B., Mondal, M., Gummadi, K. P., Mislove, A., and Post, A. (2012b). Canal: scaling social network-based sybil toleranceschemes. In Proceedings of the 7th ACM european conference on Computer Systems, EuroSys ’12, pages 309–322, New York,NY, USA. ACM.

Viswanath, B., Post, A., Gummadi, K. P., and Mislove, A. (2010). An analysis of social network-based sybil defenses. In Proceed-ings of the ACM SIGCOMM 2010 conference, SIGCOMM ’10, pages 363–374, New York, NY, USA. ACM.

von Ahn, L., Blum, M., and Langford, J. (2004). Telling humans and computers apart automatically. Commun. ACM, 47(2):56–60.Wagner, C., Mitter, S., Korner, C., and Strohmaier, M. (2012). When social bots attack: Modeling susceptibility of users in online

social networks. In Proceedings of the WWW, volume 12.Walsh, K. and Sirer, E. G. (2006). Experience with an object reputation system for peer-to-peer filesharing. In NSDI’06.Wang, G., Mohanlal, M., Wilson, C., Xiao Wang, M. M., Zheng, H., and Zhao, B. Y. (2013). Social turing tests: Crowdsourcing

sybil detection. In In Proceedings of The 20th Annual Network & Distributed System Security Symposium (NDSS).Webb, S., Caverlee, J., and Pu, C. (2008). Social honeypots: Making friends with a spammer near you. In Proceedings of the Fifth

Conference on Email and Anti-Spam (CEAS 2008), Mountain View, CA.Wei, W., Xu, F., Tan, C., and Li, Q. (March). Sybildefender: Defend against sybil attacks in large social networks. In INFOCOM,

2012 Proceedings IEEE, pages 1951–1959.Wilson, C., Sala, A., Bonneau, J., Zablit, R., and Zhao, B. Y. (2010). Don’t tread on me: moderating access to osn data with

spikestrip. In Proceedings of the 3rd conference on Online social networks, WOSN’10, pages 5–5, Berkeley, CA, USA. USENIXAssociation.

Wu, X., Ying, X., Liu, K., and Chen, L. (2010). A survey of privacy-preservation of graphs and social networks, pages 421–453.Springer.

Xiao, X. and Tao, Y. (2006). Personalized privacy preservation. In Proceedings of the 2006 ACM SIGMOD international conferenceon Management of data, pages 229–240. ACM.

Xie, M., Yin, H., and Wang, H. (2006). An effective defense against email spam laundering. In Proceedings of the 13th ACMconference on Computer and communications security, CCS ’06, pages 179–190, New York, NY, USA. ACM.

Xie, Y., Yu, F., Achan, K., Panigrahy, R., Hulten, G., and Osipkov, I. (2008). Spamming botnets: signatures and characteristics.SIGCOMM Comput. Commun. Rev., 38(4):171–182.

Xu, L., Chainan, S., Takizawa, H., and Kobayashi, H. (2010a). Resisting sybil attack by social network and network clustering.In Proceedings of the 2010 10th IEEE/IPSJ International Symposium on Applications and the Internet, SAINT ’10, pages15–21, Washington, DC, USA. IEEE Computer Society.

Xu, W., Zhang, F., and Zhu, S. (2010b). Toward worm detection in online social networks. In Proceedings of the 26th AnnualComputer Security Applications Conference, ACSAC ’10, pages 11–20, New York, NY, USA. ACM.

Yang, Z., Wilson, C., Wang, X., Gao, T., Zhao, B. Y., and Dai, Y. (2011). Uncovering social network sybils in the wild. In Proceedingsof the 2011 ACM SIGCOMM conference on Internet measurement conference, IMC ’11, pages 259–268, New York, NY, USA.ACM.

Yu, H. (2011). Sybil defenses via social networks: A tutorial and survey. SIGACT News, 42(3):80–101.Yu, H., Gibbons, P. B., Kaminsky, M., and Xiao, F. (2010). Sybillimit: a near-optimal social network defense against sybil attacks.

IEEE/ACM Trans. Netw., 18(3):885–898.Yu, H., Kaminsky, M., Gibbons, P. B., and Flaxman, A. (2006). Sybilguard: defending against sybil attacks via social networks.

In Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications,SIGCOMM ’06, pages 267–278, New York, NY, USA. ACM.

Zhang, C., Sun, J., Zhu, X., and Fang, Y. (2010). Privacy and security for online social networks: challenges and opportunities.Network, IEEE, 24(4):13–18.

Zheleva, E. and Getoor, L. (2008). Preserving the privacy of sensitive relationships in graph data. In Privacy, security, and trustin KDD, pages 153–171. Springer.

Zheleva, E. and Getoor, L. (2009). To join or not to join: the illusion of privacy in social networks with mixed public and privateuser profiles. In Proceedings of the 18th international conference on World wide web, WWW ’09, pages 531–540, New York,NY, USA. ACM.

Zheleva, E. and Getoor, L. (2011). Privacy in social networks: A survey, pages 277–306. Springer.



Zhou, B. and Pei, J. (2011). The k-anonymity and l-diversity approaches for privacy preservation in social networks againstneighborhood attacks. Knowledge and Information Systems, 28(1):47–77.

Zhou, B., Pei, J., and Luk, W. (2008). A brief survey on anonymization techniques for privacy preserving publishing of socialnetwork data. SIGKDD Explor. Newsl., 10(2):12–22.

Zhu, Z. and Cao, G. (2011). Applaus: A privacy-preserving location proof updating system for location-based services. In INFO-COM, 2011 Proceedings IEEE, pages 1889–1897. IEEE.

Zinman, A. and Donath, J. (2007). Is britney spears spam. In Fourth Conference on Email and Anti-Spam, Mountain View, CA.


A Survey on Privacy and Security in Online Social - arXiv · 2. A TAXONOMY OF PRIVACY AND SECURITY...

Documents

Transcript of A Survey on Privacy and Security in Online Social - arXiv · 2. A TAXONOMY OF PRIVACY AND SECURITY...