464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured...

12
Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong Wang, Member, IEEE Abstract—Although anonymizing Peer-to-Peer (P2P) systems often incurs extra traffic costs, many systems try to mask the identities of their users for privacy considerations. Existing anonymity approaches are mainly path-based: peers have to pre-construct an anonymous path before transmission. The overhead of maintaining and updating such paths is significantly high. We propose Rumor Riding (RR), a lightweight and non-path-based mutual anonymity protocol for decentralized P2P systems. Employing a random walk mechanism, RR takes advantage of lower overhead by mainly using the symmetric cryptographic algorithm. We conduct comprehensive trace-driven simulations to evaluate the effectiveness and efficiency of this design, and compare it with previous approaches. We also introduce some early experiences on RR implementations. Index Terms—Mutual anonymity, non-path-based, random walk, peer-to-peer. Ç 1 INTRODUCTION P EER-TO-PEER (P2P) networks, such as Napster, Gnutella, and BitTorrent, have become essential media for in- formation dissemination and sharing over the Internet. Concerns about privacy, however, have grown with the rapid development of P2P systems. In distributed and decentralized P2P environments, the individual users cannot rely on a trusted and centralized authority, for example, a Certificate Authority (CA) center, for protecting their privacy. Without such trustworthy entities, the P2P users have to hide their identities and behaviors by themselves. Hence, the requirement for anonymity has become increas- ingly critical for both content requesters and providers. A number of methods [1], [2], [3], [4] have been proposed to provide anonymity. Most, if not all, of them achieve anonymous message delivery via nontraceable paths com- prised of multiple proxies or middle agent peers. Those approaches, also known as path-based approaches, require users to setup anonymous paths before transmission. In most cases, the path is a layer-encrypted data structure. Although path-based protocols provide strong anonymity, an anonymous path has to be preconstructed, which requires the initiator to collect a large number of IP addresses and public keys. Also, an initiator has to perform asymmetric key based cryptographic encryptions, for example RSA [5], when wrapping the layer-encrypted packets. Both the peer collection and content encryption introduce high costs. Practically, users often expect to establish a long anonymous path and update the path periodically to defend against the analysis from attackers [6]. In highly dynamic P2P systems, when a chosen peer leaves, the whole path fails. Unfortu- nately, such a failure is often difficult to be known by the initiator. Therefore, a “blindly-assigned” path is very unreliable, and users have to frequently probe the path and retransmit messages. To address the above issues, we propose a non-path-based anonymous P2P protocol called Rumor Riding (RR). In RR, we first let an initiator encrypt the query message with a symmetric key, and then send the key and the cipher text to different neighbors. The key and the cipher texts take random walks separately in the system, where each walk is called a rumor. Once a key rumor and a cipher rumor meet at some peer, the peer is able to recover the original query message and act as an agent to issue the query for the initiator. We call the agent peer as a sower in this paper. The similar idea is also employed during the query response, confirm, and file delivery processes. Thus, the rumors serve as the primitives of this protocol to achieve mutual anonymity and meet the design objectives. In RR, anonymous paths are automatically constructed via the rumors’ random walks. Neither the initiator nor the responder needs to be concerned with path construction and maintenance. Extending the scope of anonymous servents from a small clique of nodes to the entire P2P network, RR significantly increases the anonymity degree of a system. RR employs a symmetric cryptographic algorithm to achieve anonymity, which significantly reduces the crypto- graphic overhead for the initiator, the responder, and the middle nodes. In addition, as initiating peers have no requirement on extra information for constructing paths, the risk of information leakage, caused by links that are used for peers to request the IP addresses of anonymous proxies, is eliminated. The rest of this paper is structured as follows: In Section 2, we describe the related work on anonymous communica- tions. In Section 3, we present the design of the RR protocol. Section 4 discusses the key issues and the anonymity degree of RR. Section 5 presents simulation results and performance evaluations. We conclude the work in Section 6. 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011 . Y. Liu is with the Department of Computer Science and Engineering, Hong Kong University of Science and Technology, and also the Tsinghua National Lab for Information Science and Technology. E-mail: [email protected]. . J. Han is with the School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China. E-mail: [email protected]. . J. Wang is with the Network Research Center, Tsinghua University, Room 3-221, FIT Building, Beijing 100084, P.R. China. Manuscript received 8 Jan. 2009; revised 25 Aug. 2009; accepted 29 Sept. 2009; published online 30 Apr. 2010. Recommended for acceptance by A. Boukerche. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TPDS-2009-01-0007. Digital Object Identifier no. 10.1109/TPDS.2010.98. 1045-9219/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

Transcript of 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured...

Page 1: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

Rumor Riding: Anonymizing UnstructuredPeer-to-Peer Systems

Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong Wang, Member, IEEE

Abstract—Although anonymizing Peer-to-Peer (P2P) systems often incurs extra traffic costs, many systems try to mask the identities

of their users for privacy considerations. Existing anonymity approaches are mainly path-based: peers have to pre-construct an

anonymous path before transmission. The overhead of maintaining and updating such paths is significantly high. We propose Rumor

Riding (RR), a lightweight and non-path-based mutual anonymity protocol for decentralized P2P systems. Employing a random walk

mechanism, RR takes advantage of lower overhead by mainly using the symmetric cryptographic algorithm. We conduct

comprehensive trace-driven simulations to evaluate the effectiveness and efficiency of this design, and compare it with previous

approaches. We also introduce some early experiences on RR implementations.

Index Terms—Mutual anonymity, non-path-based, random walk, peer-to-peer.

Ç

1 INTRODUCTION

PEER-TO-PEER (P2P) networks, such as Napster, Gnutella,and BitTorrent, have become essential media for in-

formation dissemination and sharing over the Internet.Concerns about privacy, however, have grown with therapid development of P2P systems. In distributed anddecentralized P2P environments, the individual users cannotrely on a trusted and centralized authority, for example, aCertificate Authority (CA) center, for protecting theirprivacy. Without such trustworthy entities, the P2P usershave to hide their identities and behaviors by themselves.Hence, the requirement for anonymity has become increas-ingly critical for both content requesters and providers.

A number of methods [1], [2], [3], [4] have been proposedto provide anonymity. Most, if not all, of them achieveanonymous message delivery via nontraceable paths com-prised of multiple proxies or middle agent peers. Thoseapproaches, also known as path-based approaches, requireusers to setup anonymous paths before transmission. Inmost cases, the path is a layer-encrypted data structure.Although path-based protocols provide strong anonymity,an anonymous path has to be preconstructed, which requiresthe initiator to collect a large number of IP addresses andpublic keys. Also, an initiator has to perform asymmetric keybased cryptographic encryptions, for example RSA [5], whenwrapping the layer-encrypted packets. Both the peercollection and content encryption introduce high costs.Practically, users often expect to establish a long anonymous

path and update the path periodically to defend against theanalysis from attackers [6]. In highly dynamic P2P systems,when a chosen peer leaves, the whole path fails. Unfortu-nately, such a failure is often difficult to be known by theinitiator. Therefore, a “blindly-assigned” path is veryunreliable, and users have to frequently probe the pathand retransmit messages.

To address the above issues, we propose a non-path-basedanonymous P2P protocol called Rumor Riding (RR). In RR,we first let an initiator encrypt the query message with asymmetric key, and then send the key and the cipher text todifferent neighbors. The key and the cipher texts takerandom walks separately in the system, where each walk iscalled a rumor. Once a key rumor and a cipher rumor meetat some peer, the peer is able to recover the original querymessage and act as an agent to issue the query for theinitiator. We call the agent peer as a sower in this paper. Thesimilar idea is also employed during the query response,confirm, and file delivery processes. Thus, the rumors serveas the primitives of this protocol to achieve mutualanonymity and meet the design objectives.

In RR, anonymous paths are automatically constructedvia the rumors’ random walks. Neither the initiator nor theresponder needs to be concerned with path construction andmaintenance. Extending the scope of anonymous serventsfrom a small clique of nodes to the entire P2P network, RRsignificantly increases the anonymity degree of a system.

RR employs a symmetric cryptographic algorithm toachieve anonymity, which significantly reduces the crypto-graphic overhead for the initiator, the responder, and themiddle nodes. In addition, as initiating peers have norequirement on extra information for constructing paths,the risk of information leakage, caused by links that areused for peers to request the IP addresses of anonymousproxies, is eliminated.

The rest of this paper is structured as follows: In Section 2,we describe the related work on anonymous communica-tions. In Section 3, we present the design of the RR protocol.Section 4 discusses the key issues and the anonymity degreeof RR. Section 5 presents simulation results and performanceevaluations. We conclude the work in Section 6.

464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011

. Y. Liu is with the Department of Computer Science and Engineering, HongKong University of Science and Technology, and also the TsinghuaNational Lab for Information Science and Technology.E-mail: [email protected].

. J. Han is with the School of Electronic and Information Engineering, Xi’anJiaotong University, Xi’an, Shaanxi, China.E-mail: [email protected].

. J. Wang is with the Network Research Center, Tsinghua University, Room3-221, FIT Building, Beijing 100084, P.R. China.

Manuscript received 8 Jan. 2009; revised 25 Aug. 2009; accepted 29 Sept.2009; published online 30 Apr. 2010.Recommended for acceptance by A. Boukerche.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TPDS-2009-01-0007.Digital Object Identifier no. 10.1109/TPDS.2010.98.

1045-9219/11/$26.00 � 2011 IEEE Published by the IEEE Computer Society

Page 2: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

2 RELATED WORK

Since Chaum [4] pioneered the concept of anonymity, manyapproaches have been proposed to support anonymouscommunications. The approaches fall into two categories:path-based anonymous delivery and anonymous multicast.

Onion Routing [7], as well as its second generation, Tor[8], are the most popular path-based protocols providinginitiator anonymity based on a layered-encryption method.They mainly focus on the IP layer rather than theapplication level. APFS [9] adopts the initiator anonymousprotocols like Onion Routing to provide responder anon-ymity in P2P systems. Shortcut Protocol [2] provides P2Pmutual anonymity with a reduced response delay. Crowds[1] introduces a random forwarding mechanism to inter-mediate nodes. When receiving a packet, a peer has twooptions: forwarding the packet to a randomly chosen peeror directly sending it to the destination peer.

P5 [3] is based on the anonymous multicast. To make thebroadcasting scalable, P5 employs a virtual tree to constructanonymous broadcasting groups. P5 also utilizes a noisemechanism, which enables peers within a group to sendpackets at a fixed rate for concealing the initiator’s ID.Anonymous multicast-based approaches, however, are notsuitable for P2P systems as the initiator has to know thedestination node’s ID.

Since random walk is the building block of our protocoldesign, we provide a brief overview of random walkapproaches. Lv et al. [10] propose a multiple random walkbased query algorithm to replace the flooding method forreducing the network traffic in Gnutella-like systems. In[11], Adamic et al. propose an algorithm, which works wellin power-law graphs. The work attempts to make the searchscalable and reduce the network traffic. Gkantsidis et al.[12] study the properties of random walk in depth viastatistical methods and reveal some factors for improvingthe system performance. Bisnik et al. [13] provide amathematical model to analyze the performance of randomwalk, and develop an adaptive algorithm to reduce thesearch overhead. Bhattacharjee et al. present a randombased search protocol for unstructured P2P systems [14].Sybilguard [15] employs random routes in a social networktrust topology to defend against sybil attacks. All of theseprior studies strongly support the workability and effi-ciency of random walk in P2P systems.

3 RUMOR RIDING

Rumor Riding (RR) includes five major components: RumorGeneration and Recovery, Query Issuance, Query Response,Query Confirm, and File Delivery.

3.1 Rumor Generation and Recovery

RR employs the AES algorithm [16] to encrypt originalmessages. The key size is 128-bit. To determine whether apair of cipher and key rumors hit, we employ a CyclicRedundancy Check (CRC) function to attach a CRC value,CRC(M), to the message M. For received key rumors andcipher rumors, the sower S uses AES to recover a messageM’ and the checksum CRCðM 0Þ. It then performs the CRCfunction to the recovered M’ and compares the result with

CRCðM 0Þ. If they match, the sower S is aware that it hassuccessfully recovered a message M. The purpose of theCRC function is to avoid using a complex text under-standing technique to distinguish a meaningful M.

3.2 Query Issuance

When an initiator I wishes to issue an anonymous query,it first generates the query content q, and a public key KþI .Node I then uses an AES cryptographic algorithm toencrypt q into a cipher text C with a symmetric key K. Itorganizes the key K and the cipher text C into two queryrumors, qK and qC . In Gnutella, each packet is labeled witha Descriptor ID, a string that uniquely identifies thepacket. RR also uses the descriptors to identify rumors.Thus, two random number strings, IDqK and IDqC , areused to label the two rumors. After generation, I forwardsthe rumor messages to two randomly chosen neighbors, asillustrated by the dashed and dotted lines in Fig. 1. Thequery cipher rumor and the query key rumor then starttheir random walks.

The strategy of processing rumors is different from thatof processing normal queries. RR requires every node totemporarily keep a local cache to store the received rumors.When a node receives a query key rumor, it performs therumor recovery procedure to check all cached cipherrumors. If a decrypted rumor holds a plaintext matchingthe CRC value, q will be successfully recovered. Whateverthere is a match or not, this intermediate node reduces theTTL value of the received rumor by one, keeps a temporaryrecord containing the ID of the rumor in the local cache, andforwards it to a randomly chosen neighbor. The procedurecontinues until the TTL value of this rumor is reduced tozero. For the received query cipher rumor, the process issimilar. Therefore, if a pair of query rumors reach a certainnode, no matter what the sequence is, this node willeventually recover the original q. The key issue with thisprocedure is that the number of rumors and their initialTTL values need to be carefully selected so that at least onepair of rumors, including a key and a cipher, will meet. Onthe other hand, local adversaries may eavesdrop on thetraffic and launch traffic analysis attack to locate theinitiator’s identity. In the Gnutella protocol, the travelinglength of a query message is limited by two unsigned fields:TTL and Hops. When issuing plaintext query, an initiatorsets a positive number in TTL, and zero in Hops. The query

LIU ET AL.: RUMOR RIDING: ANONYMIZING UNSTRUCTURED PEER-TO-PEER SYSTEMS 465

Fig. 1. Query issuance.

Page 3: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

is passed from one servent to another, and each serventdecreases the TTL by one and increases Hops by one beforesending the message to the next servent. Thus, if anadversary receives a message with a Hops set as zero, itknows that the node sent the message is the initiator. Toavoid this, RR initializes a nonzero positive number wð1 <w < 127Þ in the Hops fields of rumors before sending themout. Normally, a small number between 7 and 10 would besufficient to confuse the adversaries. Suppose the expectedlength of rumors’ walking is L, the TTL value should beLþ w. We present the determination of L in Section 3.7 anddetailed discussion on the parameter settings in Section 4.

If one intermediate node that recovers q is willing to actas an agent peer, it conducts a search on behalf of theunknown I. We call this agent node a sower. When a peeridentifies itself as a sower, it proceeds as follows: First, itchecks the TTL values defined in the rumors. If they arenot zero, the sower forwards the rumors out, so that ifthere are attackers who can overhear some of the messagessent to the sower, it is still not trivial to determine whetheror not the peer is a sower. Second, the sower, Sa, asillustrated in Fig. 1, attaches the original query message qwith its IP address, and then issues the query marked witha label IDq in a plaintext (IDq is also used for Sa to locatethe correlated qK and qC). In this operation, we avoid ablind flooding. Instead, we employ a probability-based-flooding, in which the sower selects a subset of itsneighbors and issues the query. Note that the sower doesnot send the query to the nodes which sent or have beensent the two rumors of this query. Such a selective floodingis effective on defending against collaborating attacks, onwhich we have more discussions in Section 4.2. For theneighboring nodes that do not send or have not been sentthe two rumors, the sower sends the plaintext query toeach of those nodes with a probability p. The p is like athreshold. The sower can first compute a random valuebetween [0, 1], and then compare to the p. If the value isless than p, the sower sends the query, otherwise does notsend. Upon receiving the query, the neighboring nodesforward the query to each of its neighbors (expect thesower) with the probability p. Such a procedure continuesuntil the query packet exhausts its TTL. The selectiveflooding has a constrained flooding scope compared to thethe blind flooding, which can reduce the redundant trafficcaused by multiple sowers’ flooding. The probability p is asystematic parameter. We examine the proper setting of p

through our simulation. In later discussions, we use the lqKand lqC to denote rumor paths from I to Sa.

3.3 Query Response

When a receiving node the query has a copy of the desiredfile, it becomes a responder R. To respond to the query, Rencrypts the plain text of the response message r, using theinitiator’s public key KþI . It encrypts <ðrÞKþ

I; IDq; IPSa ;K

þR>

using AES, where KþR is the public key generated by R, andencloses the cipher text and the key into two responserumors, rK and rC . They are then assigned with IDrK andIDrC , respectively.

After being sent out from R, two rumors start theirrandom walks in the system. We illustrate the procedure inFig. 2. RR guarantees that at least one pair of rumors meet ata certain peer Sb. We use lrK and lrC to denote their pathsfrom R to Sb. Sb decrypts the cipher text in rC with the keyin rK , and recovers the IP address of sower Sa.

If Sb volunteers to forward the response for R, it contactsSa via a TCP connection, and forwards these two responserumors to Sa. Note that Sb also attaches its IP address, IDq,IDrK , and IDrC to the two rumors. When Sa receives theresponses rK and rC , it delivers them to the originating peersof qK and qC . Two response rumors are marked with IDqKand IDqC , to help them walk along the reversed paths of lqKand lqC . The successor nodes continue this procedure. Thus,two response rumors make use of lqK and lqC to reach I.

Having two response rumors, I recovers ðrÞKþI

from rKand rC , and then decrypts ðrÞKþ

Ito recover the original

response message r, using its private key K�I . A flooding

search procedure may raise multiple responses. To simplify

the demonstration, we assume that I only selects one

candidate as the provider. Without loss of generality, we

continue using R to denote the selected provider.

3.4 Query Confirm

In the query confirm phase, I uses the responder’s public

key to encrypt the confirm message c. It then encrypts

<ðcÞKþR; IDrK; IDrC; IPSb> and obtains two confirm rumors,

cK and cC , which take random walks in the system. Note

that two confirm rumors are marked with new descriptors:

IDcK and IDcC . We assume that cK and cC collide in a new

sower S0a. We denote their paths from I to S0a by lcK and lcC .

When S0a recovers the IP address of Sb from cK and cC , it

directly contacts Sb to forward cK and cC attached with

IDrK and IDrC via a TCP link, as shown in Fig. 3. The cK

466 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011

Fig. 2. Query response. Fig. 3. Query confirm.

Page 4: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

and cC are then delivered along the reversed paths of lrKand lrC until they reach R.

3.5 File Delivery

After recovering the confirm message from (c) KþR

, using itsprivate key K�R , R employs a digital envelope technique toencrypt the file into cipher CF . Instead of including CF intothe rumor generation, R encrypts <IDcK; IDcC; IPS0a> togenerate the data cipher rumor and the data key rumor, andattaches the digital envelop payload to the data cipherrumor. The large data cipher rumor and the small data keyrumor first take random walks to meet each other at a sowerS0b, then traverse the path from S0b to S0a via a TCPconnection, and eventually reach I along the reversedpaths of lcK and lcC . Upon receiving the digital envelop, Irecovers the desired file using its private key. For large-sizefiles, responders can split them into multiple segments.

3.6 Multiple Rumor Riding

Previous designs employ multiple walkers, say k-walkers,to shorten the query delay. After L hops, k-walkers shouldcover approximately the same amount of peers as a one-walker covers after k� L hops, while the response time canbe significantly reduced. To accelerate the query cycle, inRR, an initiator can issue multiple rumors in the querycycle. We denote this scheme as (i, j)-RR, which issues icipher rumors and j key rumors. Another advantage of theuse of multiple rumors is that RR can be more reliable asmore sowers can serve the query.

3.7 Rumor TTL

The selection of rumor TTL, together with the number ofcipher and key rumors, determines 1) how many sowers aquery will have, and 2) how the sowers are distributed. Thetradeoff is that for each query, RR requires a number ofsowers randomly distributed in the entire system, but toomany sowers will lead to unacceptable overhead. In order toguarantee the diversity of the sowers, a simple andlightweight method is needed for estimating OðlognÞ, wheren is the size of the P2P overlay. Also, to reduce theunnecessary overhead, peers need to observe the diversityof sampling sowers to adjust the TTL value of rumors. Theadaptive TTL determination of RR comprises two phases: (a)setting initial TTL value, and (b) adaptively adjusting TTL.

In phase (a), during the P2P Bootstrapping process, eachfresh peer retrieves a list of neighbors from the boot-strapping server. The fresh node can set the mean value ofthose in the responses as its initial TTL value of rumors.

In phase (b), RR employs random walk based schemessimilar to the ones used in [15]. A peer can periodicallyinsert several pairs of “sampling” rumors into the network.Those rumors are set with the peer’s current TTL settingand its IP address. The initiator sets a timer for each rumor.Any sower of such a pair of sampling rumors records theretained TTL of two rumors and then directly sends theTTL information as well as a timestamp to the initiatorthrough a TCP connection. During the process, we do notneed to consider the anonymity. The initiator observes theID (IP address) of the responding sower and the distribu-tion of the reported TTL. Also, the mechanism facilitates tobalance the tradeoff between the length of anonymous

paths and the latencies of query, response, or file deliveryby adaptively tuning the setting of TTL. For example, theinitiator can leverage a long TTL to increase the probabilityof collision, while increasing the number of rumors forreducing the latency. Utilizing this mechanism, an initiatorcan also adaptively tune the settings of TTL value of rumorsL or the number of rumors k to increase the diversity ofanonymous paths. Increasing the diversity of sowerdistribution can increase the probability that RR generatesmore disjointed anonymous paths between the initiator andsowers, and hence creates more different paths between theinitiator and the responder.

3.8 Rumor Cache

In RR, each peer needs to cache a number of receivedrumors before the rumors are matched, which raises aconcern of the storage overhead. RR does not incur muchstorage overhead to individual peers. The storage overheadis highly related to the speed of query generation. In P2Psystems, the speed of query generation is relatively low,e.g., each peer normally issues no more than 0.3 queries perminute on average [17]. Therefore, the storage overhead foreach peer is relatively small in general, which is acceptableto current PCs. We will further discuss the storage overheadin Section 5, based on our simulation and preliminaryimplementation. In addition, we propose a FIFO basedrumor removal mechanism for handling the cache overflow.More details of this mechanism can be found in [18].

4 DISCUSSION

We now examine several key issues in the RR design. Wefirst focus on how to ensure that each query has at the leastone sower and that the sowers are evenly distributed overthe system. We then discuss the attack models and analyzethe anonymity degree of RR.

4.1 Sower Distribution and Collision Rate

In RR, we select the random walk as the rumor spreadingmethod. P2P systems mainly utilize three communicationpatterns to deliver messages: flooding [19], random walk,and end-to-end delivery. Some existing works, for exampleP5, employ the flooding pattern, which is not suitable forP2P systems due to the huge traffic overhead. The end-to-end delivery, which is used by the path-based approaches,however, may compromise the anonymity of the initiator orresponder, as the destinations of the delivered messageshave to be known in advance. As we discussed in theSection 1, path-based model also suffers from the unrelia-bility of fixed paths and high computation overhead causedby layered encrypted encapsulation. Based on theseobservations, we select random walk as the fundamentalanonymizing method. The distinct features of random walkmechanism are as follows: First, random walk mechanismintroduces randomness to the message delivery such thatthe difficulty for attackers to trace back to the initiator orresponder is increased. Second, this mechanism potentiallyinvolves all peers in the anonymizing process, so that theanonymous proxy set is extended from a small group inpath-based approaches to the entire P2P network.

Specifically, RR rumors are sent in random directions,and each peer forwards a rumor to one of its neighbors

LIU ET AL.: RUMOR RIDING: ANONYMIZING UNSTRUCTURED PEER-TO-PEER SYSTEMS 467

Page 5: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

without any bias. According to the observations in [13],random walk achieves statistical properties similar toindependent sampling for reasonable networks, for exam-ple the small world [20] or power law networks [13], [21].Studies in [14], [22] show that if the walk length issufficiently large, the final receivers of a random walkquery are randomly distributed. We define collision distance,Hcd, as the number of hops along the shortest path thatrumors walk from the initiator to the sower in a query.When taking the Hcd to be Oðlogn), the sowers are evenlydistributed, guaranteed by the observations in [14]. Oursimulation results in Section 5 show that carefully selectingparameters will lead to desired collision distances.

We assume that each peer accessed by a rumor is anindependent sample from a space of uniform distribution. Apeer becomes a sower if it receives a pair of rumors. For a(i; j)-RR scheme, there are i cipher rumors and j key rumors.Without loss of generality, we assume that each rumor has afixed TTL value of L. After rumor spreading, the popularityof cipher rumors is i� L, where the popularity means thetotal number of peers receiving the cipher rumor. We furtherassume that those nodes are distinct with each other and thedistribution is uniformly random. On the key rumor path,the probability of a peer only being visited by this key rumorand not having the cipher rumor is ð1� i� L=nÞ. Theprobability of a key rumor terminating its walk withouthitting a cipher rumor is given by ð1� i� L=nÞL. Thus, theprobability of a successful collision in a (i, j)-RR is given by:

ph ¼ 1� ð1� i� L=nÞj�L: ð1Þ

We also introduce a parameter � to indicate the acceptablecollision rate. The expected collision rate is formulatedsubject to the constraint: ph � 1� �; 0 < � � 1. Combiningthis with (1) we have

j� L� logð1� i� L=nÞ � logð�Þ: ð2Þ

To keep the collision rate ph at a high level, say 99 percent,peers can choose the proper i, j, and L according to (1) and(2). We calculate three typical distributions in (1, 1), (1, k),and (k, k)-RR schemes to examine the optimal setting of iand j. Based on (2), we can investigate the optimaltheoretical distribution of ph. Indeed, the (k, k)-RR schemeachieves a higher collision rate than others in most cases.Thus, we assert that the (k, k)-RR scheme is a proper choicefor a high collision rate. We simulate the ph of (k, k)-RR with

a 106 node network and plot the results in Fig. 4. We see thata larger TTL value of rumors corresponds to a highercollision rate, while increasing the number of rumors alsoleads to a higher collision rate. On the other hand, a highercollision rate often yields more overhead due to the larger kand L. We also investigate the lower bounds of k� Lsettings, which guarantee the different collision rates. Withenlarging the size of P2P networks, the lower boundsincrease as well. Fig. 5 plots lower bounds of k� L as thefunction of network size under different ph requirements.The results show that in a million-node network, if phclosely approaches 1, adopting a setting of k� L near 3,000yields a 99.99 percent probability of a pair of rumorscolliding with each other. If we slightly reduce the collisionrate from 99.99 percent to 90 percent, the least needed k� Ldramatically decreases below 1,500. It is worth noticing thatadopting appropriate ph, say 90 percent, for rumors’collision while needs a small k� L, would benefit the RRperformance from saving a significant amount of traffic.

As we mentioned, the collision distance is anotherimportant factor balancing the tradeoff between the useranonymity and the query delay. Initiators hope that thesower peers reside as far away as possible, since the sowersrecover the query messages and might help adversaries tolocate the initiator if they are compromised. Thus, thenumber of rumors needs to be limited as well. We show oursimulation results for the rumor settings in a practicalnetwork in Section 5.

4.2 Anonymity Analysis

In this section, we first discuss the degree of anonymity thatRR achieves, and then analyze the protocol effectivenessunder various attack scenarios.

4.2.1 Anonymity Model

There are two main categories of anonymity models fordefining the anonymity degree. The models in the firstcategory define the anonymity of a certain node as thenumber of peers that have an equiprobable chance of beingthe given node, which is termed as anonymity set. Thesecond category employs measurements based on informa-tion theory, for example the mutual information [23], toreflect the similarity between two entities, such as theinput/output links or real/suspected participants. Theanonymity set used by the first category is widely adopteddue to its capability of capturing the common features of

468 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011

Fig. 4. Collision rate in ðk; kÞ-RR.Fig. 5. Collision rate under different k� L.

Page 6: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

anonymity. The second model focuses on the informationleakage in anonymous systems. The typical usage of themodel is to analyze the anonymity in so called covertchannels. We adopt the first anonymity model and definethe Anonymity Degree (AD) as the probability of making anincorrect guess to identify a participant. A higher degreeinfers that better anonymity has been achieved.

In RR, when an intermediate node receives a query orconfirm rumor, it forwards the rumor to a randomly chosenneighbor. From an observer’s perspective, any rumorsending node might be the actual initiator of the rumor.Similarly, when an intermediate node receives a response ordata rumor, any intermediate node delivering the rumorcould be a potential receiver. Therefore, an individualobserver (initiator, responder or an intermediate node)cannot distinguish the initiator and responder from theother peers. Thus, if the number of nodes in the P2P systemis n, the initiator’s or responder’s AD is ðn� 2Þ=ðn� 1Þfrom the viewpoint of a normal observer (the number ofpotential initiators/responders is n� 1).

4.2.2 Attacks

We assume that the number of adversary nodes is m, so theprobability of a peer being an adversary is m=n. In somecases, adversaries may merely observe the fact that a peer issending information, without any knowledge about thetransmitted data. We claim that the protocol achievesunlinkability to the initiator and responder, if they cannotbe identified when communicating with each other. In ourattack model, we assume that, based on the records, theadversary nodes are able to observe and store thecommunication traversing them and guess the identity ofnodes that initiated those transmissions. Adversaries alsohave the capability to perform active attacks which includedropping, hijacking, and forging packets, controlling flowsand connections of the network, etc. We categorize themajor attacks that threaten a P2P anonymity protocol, andbriefly discuss why RR is invulnerable, while leave a fulldiscussion in our technical report [18].

Collaborating attack. Neighboring adversaries maycollaborate to monitor the traffic passing through and sharethe information in order to identify the possible neighbor-ing initiators. When two adversaries neighboring theinitiator receive a pair of rumors of a message, one of themmay forward the key rumor to another. The latter willrecover the message, and guess that the node sending therumor is the initiator.

In RR, a sower selects a subset of its neighbors to sendthe plaintext query, and the two collaborating nodes willnot receive the query. Fig. 6 illustrates the selective floodingof RR. In this way, adversaries only bet that the monitorednode is an initiator or a responder. Thus, the AD of theinitiator or responder becomes 1� 1=ð1þ sÞ, where s is thenumber of sowers of this pair of rumors. Suppose aninitiator is neighboring c local collaborating nodes. If cexceeds 2, then the AD becomes 1� 1=ð1þ s� ð1� pÞðc�2ÞÞ,where p is the probability of a sower choosing a neighbor tosend the query. Hence, RR is not subject to the localcollaborating attack, if the adversaries cannot compromisemore than three neighbors of the monitored node. For the

nonlocal collaborating attack, we discuss the defensetogether with the traceback attack later.

Timing attack. In a timing attack [24], the adversarydeduces the correlation among the timings of packets, suchas the response time of a query, the time difference of a query,the time interval between two sequential packets, etc., tolocate a transmission. Timing attacks pose a serious threat topath-based approaches. RR is invulnerable in that 1) rumorsare delivered over the overlay network in a random walkmanner, and RTT measurements do not reveal the realdistance to the responder; 2) if adversaries want to trace therumor via the time difference to locate the responder, theyneed to trace one query rumor from the initiator to a sower,then trace the plaintext query message from the sower to theresponder, which is not trivial; and 3) a sower issues arequest only after it obtains a pair of query rumors, so theresponse time is mainly dependent on the random walks ofrumors, which are unpredictable. All of these factors make itdifficult to launch a timing attack.

Predecessor attack. In some anonymous systems, aninitiator repeatedly communicates to a specific responder inmany rounds. In [25], Wright et al. elaborate the prede-cessor attack, where the adversary can control a subset ofthe nodes in anonymous systems and passively log thepossible communications between the initiator and respon-der. In RR, rumors walk randomly and interact withrandom sowers. The sowers of a given initiator orresponder are unpredictable and randomly distributedover the system. Hence, adversaries are not able to performsuch an attack to identify the initiator or responder viasowers, and RR is not subject to this type of attack.

Traffic analysis attack. An adversary can extract trafficflow information such as packet count, message volume,and communication pattern, etc., and build correlationsbetween the initiator, responder, and their communication[26]. Similar to timing attacks, traffic analysis attacks cancompromise the initiator’s or responder’s anonymity, ifadversaries control a large fraction of the network. Forexample, based on traffic shaping [27], adversaries clogtraffic in the suspected nodes and observe the traffic changewhen they slightly mitigate the clogging traffic. Thus, thereal traffic can be deduced. Performing this attack con-sequentially along the reversed path of the traffic, adver-saries can easily determine the initiator. RR is much lessvulnerable to this attack since subsequent messages do notbelong to the same traffic, and there are not any continuouspaths in RR.

Traceback attack. Adversaries start from a known sowerto trace back to the initiator along the rumor paths. The

LIU ET AL.: RUMOR RIDING: ANONYMIZING UNSTRUCTURED PEER-TO-PEER SYSTEMS 469

Fig. 6. Collaborating attack versus RR. (a) Collaborating attack.(b) Selective flooding of sowers.

Page 7: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

adversary examines the stored routing state of the peers toidentify the paths between the initiator and responder. Weconsider the users’ anonymity in two attack scenarios:1) One-way back tracking: adversaries that are on the rumorpath back-track and collaborate with each other to detectthe source node of this rumor; 2) Multiple-ways backtracking: at least one adversary intercepts both the cipherrumor and the key rumor. When there are no globaladversaries or active tracebacks, RR achieves a high degreeof anonymity under traceback attacks.

5 EXPERIMENTS

Additional latency of data delivery, bandwidth consumedby anonymous traffic, and crypto processing, if they exist,are necessary in order to provide anonymity. In this section,we evaluate the RR design through trace-driven simulationsand some implementations.

5.1 Metrics

We use the following metrics for evaluating RR:Collision rate. To verify the theoretical results in Section 4,

we examine the distribution of collision rate with realtraces. Besides the verification, we also use the results toguide the selection of rumor parameters.

Collision distance. A longer collision distance often meansa higher anonymity level, but also increases the delay of aquery as well as the traffic overhead. On the other hand, thecollision distance must be sufficiently large to guaranteesower diversity, as we discussed in Section 4.

Sower diversity. The metric reflects the distribution ofsower locations in the P2P systems. Evenly randomdistribution of sower location leads to a higher anonymitydegree.

Number of sowers. Since each sower implements aselective flooding search for an initiator, too many sowerswill incur a large number of replicated query messages, andtoo few sowers will result in failure on providing enoughredundancy and reliability.

Traffic overhead. The amount of traffic overhead repre-sents the comprehensive latency in data delivery andbandwidth. We assume that a query cycle involves e edgesin the P2P overlay. For each edge in the P2P overlay, there isa unique path mapped into the physical internet layer withthe length l. For each message enrolled in one query cycle,we calculate the sum of the distances that this messagepasses through. Therefore, the traffic overhead of a querycycle is defined as C ¼M � L ¼

Pjmij � li; 1 � i � e,

where jmij is the size of the traversed messages. Specifi-cally, we are more interested in the extra traffic overheadcaused by anonymous components. We define the extratraffic overhead as the total traffic overhead of a query cyclein an anonymized P2P system minus that of a query cycle ina nonanonymized P2P system.

Response time. In P2P systems, it is defined as the timeelapsed from when a query is issued to when the firstresponse arrives. In our simulation, the response time isdefined as the time from the start of rumor spreading to thetime when the initiator receives the first response message.

Crypto latency. The overhead incurred by the maincryptographic algorithms. We examine the cryptographic

overhead incurred by RR compared with other anonymityprotocols. We use the processing overhead in one AESoperation as the basic unit to make conversions betweenRSA and AES. Thus we can investigate the cryptographicoverhead incurred by different algorithms.

5.2 Methodology

The P2P topologies come from two sources. One is based onthe DSS Clip2 trace, which collected log data from January2001 to June 2001. The other one is a more recent snapshotkit of Ion P2P [28], which logged data from September 2004to February 2005, including topologies with high degreenodes (i.e., maintaining more than 30 neighbors). Whenadopting Ion’s traces into the simulated topologies, we onlyuse the ultrapeers of its snapshots, which perform theflooding search in a hybrid Gnutella. To simulate thephysical internet layer below the P2P overlay, we usedBRITE [29] to generate 30,000 - 100,000 nodes in the internet-like topologies.

To perform the security algorithms used in RR protocol,we employ Crypto++ to perform the standard crypto-graphic functions. Our experiments for simulation andimplementation are both conducted on several desktop PCs,typically with Pentium M 3.2G CPU, 1GBytes memory, 40Ghard disk, and 10/100M Ethernet card. We also simulate thedynamic properties of the P2P overlay network by assign-ing a lifecycle to each peer. The lifetime is generatedaccording to the distribution observed in [30]. The mean ofthe distribution is chosen to be 600 seconds [31], [32]. Thevalue of each peer’s lifecycle is decreased by one with eachpassing second. When peers use up their lifetimes at theend of each second, they leave the system in the followingsecond, and other fresh peers selected from the physicalinternet layer join in as replacements. In our simulation, wemainly compare RR with a typical Onion Routing approach,Shortcut protocol [2].

We first consider the collision rate of a single rumorspreading. To verify the theoretical results discussed inSection 4.1, we simulate rumor spreading procedures in thetraces with a (k, k)-RR scheme. We experiment in thesample space of rumor numbers k 2 ½1::10� and path lengthL 2 ½1::256� (the default TTL value in Gnutella is 7). Theaverage results of the collision rates are presented in Fig. 8.It is observed that the collision rates are usually higher thanthey are in the theoretical results, which are shown in Fig. 7.Since the topology in Gnutella networks follows small-world properties, a random path in the P2P topology often

470 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011

Fig. 7. Theoretical collision rate.

Page 8: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

traverses high-degree nodes, causing the collision rates tobe higher than they are in homogeneous networks, in whichthe node degree follows a uniform distribution. Forexample, when TTL ¼ 9, k ¼ 2, the theoretical result ofcollision rate is only 2.9 percent, but the simulation result is15.1 percent. This phenomenon is particularly obvious inthe dense topologies of the Ion’s traces. Combining theresults of theoretical and real experiments, we suggest thatthe selection of the number of rumors k and the TTL valueof each rumor L should follow k� L � 100.

As discussed in Section 4, the collision distance isimportant in balancing the tradeoff between user anonym-ity and the query delay in the RR design. In the results ofthe (k, k)-RR scheme plotted in Fig. 9, it is shown that if L islarger than 25ð6 � k � 10Þ, the average collision distance isno less than 5. On the other hand, a small k, say less than 6,guarantees that the most collision distance will be largerthan 5. While k exceeds 6, the collision distance tends to beconstrained within 2 � 5 hops. Considering the fact thatanonymity is more important than latency, we suggest thatthe number of rumors k should be kept to a maximum of 6.

Ideally, sowers should be uniformly distributed over theentire system so that for a given peer, distinct sowers aregenerated in different RR executions. Otherwise, since aninitiator would have fixed sowers no matter how manyrounds of RR it performs, attackers can easily perform thepredecessor attack to locate it. We use the distinct sower ratio(D) to evaluate the diversity. If a peer performs g rounds ofRR, and generates d distinct sowers ðd � gÞ, the distinctsower ratio of this peer is given by D ¼ ðd=gÞ � 100 percent.By repeatedly running RR for each node in its lifecycle withthe results obtained in the collision distance simulation, we

see that when L is larger than 30 and the k 2 ½1::6�, the ratioD is larger than 92 percent. The results in Fig. 10 show thatD increases rapidly when increasing the TTL of rumors.Therefore, selecting the constrains L > 30 and 1 � k � 6 caneffectively provide a safe collision distance and the randomdistribution of sowers.

After selecting proper values of k and L for diversifyingthe sower locations, RR needs to provide the diversity ofpaths along which rumor messages probe. In this simula-tion, we examine the diversity of the distance that rumorswalk from the initiator to the sower. The TTL value ofrumors is constrained by k� L ¼ 150 in our experiments.Fig. 11 plots the frequency that the rumors appear indifferent distance bins in the setting of different (k, k) RR.The distance of rumors travel before colliding is approxi-mately ranging in a randomly uniform distribution in RR.Combining the sower diversity result, we can conclude theoverall rumors achieve an approximate uniform distribu-tion when the rumor TTL is following the above constraints.

In the meantime, RR needs to balance the tradeoffbetween the reliability and the overhead. Obviously, thereliability is first guaranteed by having at least one sower toserve each query. If too many sowers involve in the queryissuance, the redundancy of sowers, however, may intro-duce large amount of overhead such that the P2P system isnot scalable. We thus use the number of sowers to evaluatethe reliability as well as the redundancy in order toguarantee the reliability while not incurring too manyquery messages. To this end, we investigate the distributionof the number of sowers under different RR settings. Asshown in Fig. 12, when we select k� L � 200, each (k, k)-RRscheme has no more than 10 sowers (1 � k � 6Þ. Therefore,

LIU ET AL.: RUMOR RIDING: ANONYMIZING UNSTRUCTURED PEER-TO-PEER SYSTEMS 471

Fig. 8. Collision rate of simulation.

Fig. 9. Collision distance.

Fig. 10. Sower diversity.

Fig. 11. Rumor diversity.

Page 9: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

k� L should be in the range [100..200], which provides arange of the number of sowers [2..10], in order to meet boththe reliability and the scalability requirements.

We then consider the extra traffic. We compare RR with atypical Onion Routing based work, Shortcut protocol [2].We insert 10,000 queries into the system, and Fig. 13 plotsthe cumulative distribution of the extra traffic overhead of(k, k)-RR schemes with k� L ¼ 150. Note that a larger Lmeans more traffic overhead. Traffic overhead is highlycorrelated with the length of packets. Using Onion Routingbased approaches, the length of packets depends on thelength of delivery path. In common, the onion packets arepadded with some random bits for confusing the the trafficanalysis attackers. A random bit string with a size of peeledoff layer should be appended to the end of the onion packetat each hop. In this way, the onion packets are maintainedwith the same length along the entire path, which leads tolarge traffic overhead compared to RR. In contrast, thecipher rumor of RR only contains an encrypted cipher thathas no need of wrapping or padding. Those features reducethe traffic cost of RR, especially for those cipher rumors,compared to the Shortcut. The key rumors, however, willincur extra traffic cost in P2P systems. In addition, the usageof multiple rumors also incurs traffic overhead. Therefore,RR does not perform better than Shortcut in terms of trafficcost in some cases. In Fig. 13, we observe that the averagetraffic overhead incurred by the Shortcut protocol is slightlyless than that of the (6, 6)-RR scheme, which is themaximum value of our suggested settings. Except for thiscase, the traffic overhead of RR is much smaller than that ofthe Shortcut protocol.

Users of current P2P systems often have rigid require-ments on the response time for requesting resources. We

show the cumulative distribution curves of response timesin different (k, k)-RR schemes in Fig. 14, comparing themwith those of the Shortcut protocol. The major factor thathelps RR to achieve a low latency is its small cryptographicprocessing overhead.

Shortcut employs Onion Routing to deliver the message.Using Onion Routing, the initiator (or the responder) has toencrypt the packet using the public keys of nodes on thepath. The number of those asymmetric key based encryp-tions depends on the number of nodes on the delivery pathof the packet. Each node on the path will decrypt the packet,using its private key. In contrast, RR utilizes the symmetrickey based algorithm to perform encryptions (or decryp-tions), which significantly reduces the cryptographic pro-cessing overhead. This feature results in a low latency forRR. On the other hand, RR can adopt multiple pairs ofrumors for accelerating each query cycle in P2P systems. Inone query cycle, multiple rumors may revoke multiplesowers to flood the queries simultaneously. On thecontrary, Shortcut merely has one anonymous agent toflood the query in one query cycle. The concurrent floodingof RR will effectively improve the search latency. Clearly,the average response latency is decreased when we increasethe number of sowers. However, more rumors incur moretraffic overhead and message replications.

We now examine the storage overhead caused by thecache mechanism of RR. Previous works [17] show thateach node generates 0.3 queries per minute on average. Wethereby set six query workloads as each peer issues 0.3, 1, 3,10, 30, and 60 queries per minute. For each case, we sort allpeers according to the amount of cache usage, and plot thecache usage of top 10,000 peers in Fig. 15. We also show the

472 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011

Fig. 13. Cumulative distribution of traffic overhead.

Fig. 14. Cumulative distribution of response time.

Fig. 15. Storage overhead of cache in top 10,000 peers.

Fig. 12. Number of sowers.

Page 10: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

average cache usage in Fig. 16. We find that most of thepeers only consume no more than 170 Kbyte of their cachesunder the test case of each node issuing 0.3 queries perminute, while the average cache usage is 24.9 Kbyte in thiscase. When the workload is enlarged to an extreme case,where each node issues one query per second, the cacheusage is 1.5 Mbyte in average, while the maximum cacheusage is less than 10M. The results show that the storageoverhead of RR’s cache mechanism is acceptable for currentPC configurations.

We also examine the cryptographic overhead of the RRprotocol. Figs. 17 and 18 contrast the average cryptographicoverhead of RR and the Shortcut protocol in a query cycle.We can see that RR significantly reduces the cryptographicoverhead for the initiators, responders, and intermediatenodes. The total cryptographic overhead of the initiator orresponder is linearly proportional to the length of the Onionpath when using the Shortcut protocol. Fig. 17 reflects thistrend. Compared to RR, the Onion Routing technique leadsto an overtly high cryptographic overhead for Shortcutusers. Also, the computation overhead spent by the

intermediate nodes on the Onion paths is significantlyhigher than RR. We find the ratio of the cryptographicoverhead of RR to that of Onion Routing spent on the pathis below 0.01, as shown in Fig. 18.

We implement a RR prototype on the Window XPplatform. We use the Crypto++ Library to implement allbuilt-in cryptographic algorithms. We set the default andmaximum size of cache as 6MByte and 50MByte, similar tothe settings of a popular P2P file sharing software BitComet[33]. The time-duration for cipher rumors is 2 minutes, andthe time-duration is 10 minutes for key rumors. The size ofthe file fragment is 512 Kbytes. We examine the throughputand the latency of RR. Fig. 19 presents the averagethroughput of cryptographic operations, including AESkey generation, AES operation, and RSA operation. In RR,the throughput of an initiator query depends on the rumorgeneration speed, which is mainly determined by the AESkey generation and AES encryption. Fig. 20 summarizes theaverage packet throughput of each RR peer. The resultsshow that the capacity of initiating rumors and localsearching in caches are acceptable even in heavy loadedP2P systems.

6 CONCLUSION

We propose a lightweight and non-path-based mutualanonymity protocol for P2P systems, Rumor Riding (RR).Employing a random walk concept, RR issues key rumorsand cipher rumors separately, and expects that they meet insome random peers. The results of trace-driven simulationsand simple implementations show that RR provides a highdegree of anonymity and outperforms existing approachesin terms of reducing the traffic overhead and processing

LIU ET AL.: RUMOR RIDING: ANONYMIZING UNSTRUCTURED PEER-TO-PEER SYSTEMS 473

Fig. 17. Average cryptographic overhead of the initiator/responder.

Fig. 18. Average cryptographic overhead of intermediate nodes.

Fig. 19. Average throughput of cryptographic operations.

Fig. 20. Average packet throughput.

Fig. 16. Average cache usage.

Page 11: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

latency. We also discuss how RR can effectively defend

against various attacks.Future and ongoing work includes accelerating the query

speed, introducing mimic traffic to confuse attackers, and

optimizing the k and L combination to further reduce the

traffic overhead. We will also investigate other security

properties of RR, such as the unlinkability, information

leakage, and failure tolerance when facing different attacks.

It would also be interesting to explore the possibility of

implementing this lightweight protocol in other distributed

systems, such as grid systems and ad-hoc networks.

ACKNOWLEDGMENTS

This research was supported in part by the NSFC/RGC Joint

Research Scheme N_HKUST602/08 and N_HKUST614/07,

National Basic Research Program of China (973 Program)

under Grant 2010CB328000, the 985 Program of China (Next

generation Tsinghua Campus Network), the NSFC 60873262,

Hong Kong Innovation and Technology Fund ITP/037/

09LP, and the China Postdoctoral Science Foundation funded

project No.20090461298.

REFERENCES

[1] M.K. Reiter and A.D. Rubin, “Crowds: Anonymity for WebTransactions,” ACM Trans. Information and System Security, vol. 1,no. 1, pp. 66-92, Nov. 1998.

[2] L. Xiao, Z. Xu, and X. Zhang, “Low-Cost and Reliable MutualAnonymity Protocols in Peer-to-Peer Networks,” IEEE Trans.Parallel and Distributed Systems, vol. 14, no. 9, pp. 829-840, Sept.2003.

[3] R. Sherwood, B. Bhattacharjee, and A. Srinivasan, “P5: A Protocolfor Scalable Anonymous Communication,” Proc. IEEE Symp.Security and Privacy, pp. 58-70, 2002.

[4] D. Chaum, “Untraceable Electronic Mail Return Addresses, andDigital Pseudonyms,” Comm. ACM, vol. 24, no. 2, pp. 84-90, Feb.1981.

[5] R. Rivest, A. Shamir, and L. Adleman, “A Method for ObtainingDigital Signatures and Public-Key Cryptosystems,” Comm. ACM,vol. 21, no. 2, pp. 120-126, 1978.

[6] M.K. Wright, M. Adler, B.N. Levine, and C. Shields, “ThePredecessor Attack: An Analysis of a Threat to AnonymousCommunications Systems,” ACM Trans. Information and SystemSecurity, vol. 7, no. 4, pp. 489-522, Nov. 2004.

[7] D. Goldschlag, M. Reed, and P. Syverson, “Onion Routing,”Comm. ACM, vol. 42, no. 2, p. 39, 1999.

[8] R. Dingledine, N. Mathewson, and P. Syverson, “Tor: The Second-Generation Onion Router,” Proc. 13th USENIX Security Symp.,pp. 303-320, 2004.

[9] V. Scarlata, B.N. Levine, and C. Shields, “Responder Anonymityand Anonymous Peer-to-Peer File Sharing,” Proc. IEEE Int’l Conf.Network Protocols (ICNP), pp. 272-280, Nov. 2001.

[10] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker, “Search andReplication in Unstructured Peer-to-Peer Networks,” Proc. 16thACM Int’l Conf. Supercomputing, pp. 84-95, 2002.

[11] L.A. Adamic, R.M. Lukose, A.R. Puniyani, and B.A. Huberman,“Search in Power-Law Networks,” Physical Rev. E., vol. 64,p. 046135, 2001.

[12] C. Gkantsidis, M. Mihail, and A. Saberi, “Random Walks in Peer-to-Peer Networks,” Proc. IEEE INFOCOM, 2004.

[13] N. Bisnik and A. Abouzeid, “Modeling and Analysis of RandomWalk Search Algorithms in P2P Networks,” Proc. Second Int’lWorkshop Hot Topics in Peer-to-Peer Systems, 2005.

[14] R. Morselli, B. Bhattacharjee, A. Srinivasan, and M.A. Marsh,“Efficient Lookup on Unstructured Topologies,” Proc. ACM Symp.Principles of Distributed Computing, 2005.

[15] H. Yu, M. Kaminsky, P.B. Gibbons, and A. Flaxman, “SybilGuard:Defending against Sybil Attacks via Social Networks,” IEEE/ACMTrans. Networking, vol. 16, no. 3, pp. 576-589, June 2008.

[16] “FIPS PUB 197: The Official AES Standard,” NIST, http://csrc.nist.gov/archive/aes/index.html, 2001.

[17] K. Sripanidkulchai, “The Popularity of Gnutella Queries and ItsImplications on Scalability,” http://www-2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html, 2009.

[18] J. Han, Y. Liu, and J. Wang, “Rumor Riding: AnonymizingUnstructured Peer-to-Peer Systems,” technical report, http://www.cse.ust.hk/~jasonhan/RR-TR.pdf, 2009.

[19] S. Jiang, L. Guo, X. Zhang, and H. Wang, “LightFlood: MinimizingRedundant Messages and Maximizing Scope of Peer-to-PeerSearch,” IEEE Trans. Parallel and Distributed Systems, vol. 19,no. 5, pp. 601-614, May 2008.

[20] M. Li, W.-C. Lee, A. Sivasubramaniam, and J. Zhao, “SSW: ASmall-World-Based Overlay for Peer-to-Peer Search,” IEEE Trans.Parallel and Distributed Systems, vol. 19, no. 6, pp. 735-749, June2008.

[21] C. Gkantsidis, M. Mihail, and A. Saberi, “Conductance andCongestion in Power Law Graphs,” Proc. ACM SIGMETRICS,2003.

[22] I. Abraham and D. Malkhi, “Probabilistic Quorums for DynamicSystems,” Proc. Int’l Symp. Distributed Computing, 2003.

[23] Y. Zhu, X. Fu, R. Bettati, and W. Zhao, “Analysis of Flow-Correlation Attacks in Anonymity Networks,” Int’l J. Security andNetworks, vol. 2, nos. 1/2, pp. 137-153, Mar. 2007.

[24] B.N. Levine, M.K. Reiter, C. Wang, and M. Wright, “TimingAttacks in Low-Latency Mix Systems,” Proc. Eighth Int’l Conf.Financial Cryptography, 2004.

[25] M.K. Wright, M. Adler, B.N. Levine, and C. Shields, “ThePredecessor Attack: An Analysis of a Threat to AnonymousCommunications Systems,” ACM Trans. Information and SystemSecurity, vol. 7, no. 4, pp. 489-522, 2004.

[26] Y. Zhu, X. Fu, B. Graham, R. Bettati, and W. Zhao, “Correlation-Based Traffic Analysis Attacks on Anonymity Networks,” IEEETrans. Parallel and Distributed Systems, vol. 21, no. 7, pp. 954-967,July 2009.

[27] S.J. Murdoch and G. Danezis, “Low-Cost Traffic Analysis of Tor,”Proc. IEEE Symp. Security and Privacy, 2005.

[28] D. Stutzbach, R. Rejaie, and S. Sen, “Characterizing UnstructuredOverlay Topologies in Modern P2P File-Sharing Systems,” IEEE/ACM Trans. Networking, vol. 16, no. 2, pp. 267-280, Apr. 2008.

[29] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: AnApproach to Universal Topology Generation,” Proc. Int’l WorkshopModeling, Analysis and Simulation of Computer and Telecomm.Systems (MASCOTS), 2001.

[30] S. Saroiu, P. Gummadi, and S. Gribble, “A Measurement Study ofPeer-to-Peer File Sharing Systems,” Proc. Multimedia Computingand Networking (MMCN) Conf., 2002.

[31] S. Sen and J. Wang, “Analyzing Peer-to-Peer Traffic across LargeNetworks,” IEEE/ACM Trans. Networking, vol. 12, no. 2, pp. 219-232, Apr. 2004.

[32] Y. Liu, L. Xiao, X. Liu, L.M. Ni, and X. Zhang, “LocationAwareness in Unstructured Peer-to-Peer Systems,” IEEE Trans.Parallel and Distributed Systems, vol. 16, no. 2, pp. 163-174, Feb.2005.

[33] “BitComet Options,” http://wiki.bitcomet.com/BitComet_Options, 2009.

Yunhao Liu (SM ’06) received the BS degree inautomation from Tsinghua University, China, in1995, and the MS and PhD degrees in computerscience and engineering from Michigan StateUniversity in 2003 and 2004, respectively. Hisresearch interests include peer-to-peer comput-ing, pervasive computing, and sensor networks.He is a senior member of the IEEE and the IEEEComputer Society, and a member of the ACM.He is an ACM Distinguished Speaker. He serves

as an associate editor for IEEE Transactions on Parallel and DistributedSystems and IEEE Transactions on Mobile Computing.

474 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 22, NO. 3, MARCH 2011

Page 12: 464 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED …Rumor Riding: Anonymizing Unstructured Peer-to-Peer Systems Yunhao Liu, Senior Member, IEEE, Jinsong Han, Member, IEEE, and Jilong

Jinsong Han (M ’04) received the PhD degree incomputer science and engineering from the HongKong University of Science and Technology in2007. He is currently a postdoctoral fellow at theXi’an Jiaotong University. His research interestsinclude peer-to-peer computing, anonymity, per-vasive computing, network security, and high-speed networking. He is a member of the IEEE,the IEEE Computer Society, and the ACM.

Jilong Wang is an associate professor atTsinghua University. He is also the director ofNetwork Operation Center of several large-scalenetwork infrastructures, including TsinghuaCampus Network (TUNET), China Next Gen-eration Internet Backbone Network (CNGI-CER-NET2), and Trans-Eurasia Information Network(TEIN). His research interests include networkmanagement and network measurement. He isa member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

LIU ET AL.: RUMOR RIDING: ANONYMIZING UNSTRUCTURED PEER-TO-PEER SYSTEMS 475