A scalable and reliable matching service for

13
A Scalable and Reliable Matching Service for Content-Based Publish/Subscribe Systems Xingkong Ma, Student Member, IEEE, Yijie Wang, Member, IEEE, and Xiaoqiang Pei Abstract—Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed subscriptions, or interrupt dissemination when a large number of servers fail. The cloud computing provides great opportunities for the requirements of complex computing and reliable communication. In this paper, we propose SREM, a scalable and reliable event matching service for content-based pub/sub systems in cloud computing environment. To achieve low routing latency and reliable links among servers, we propose a distributed overlay SkipCloud to organize servers of SREM. Through a hybrid space partitioning technique HPartition, large-scale skewed subscriptions are mapped into multiple subspaces, which ensures high matching throughput and provides multiple candidate servers for each event. Moreover, a series of dynamics maintenance mechanisms are extensively studied. To evaluate the performance of SREM, 64 servers are deployed and millions of live content items are tested in a CloudStack testbed. Under various parameter settings, the experimental results demonstrate that the traffic overhead of routing events in SkipCloud is at least 60 percent smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times larger than the single-dimensional partitioning technique of BlueDove. Besides, SREM enables the event loss rate to drop back to 0 in tens of seconds even if a large number of servers fail simultaneously. Index Terms—Publish/subscribe, event matching, overlay construction, content space partitioning, cloud computing Ç 1 INTRODUCTION B ECAUSE of the importance in helping users to make real- time decisions, data dissemination has become dramati- cally significant in many large-scale emergency applications, such as earthquake monitoring, disaster weather warning, and status update in social networks. Recently, data dissemi- nation in these emergency applications presents a number of fresh trends. One is the rapid growth of live content. For instance, Facebook users publish over 600,000 pieces of con- tent and Twitter users send over 100,000 tweets on average per minute [1]. The other is the highly dynamic network environment. For instance, the measurement studies indi- cates that most users’ sessions in social networks only last several minutes [2]. In emergency scenarios, the sudden dis- asters like earthquake or bad weather may lead to the failure of a large number of users instantaneously. These characteristics require the data dissemination system to be scalable and reliable. Firstly, the system must be scalable to support the large amount of live con- tent. The key is to offer a scalable event matching service to filter out irrelevant users. Otherwise, the content may have to traverse a large number of uninterested users before they reach interested users. Secondly, with the dynamic network environment, it’s quite necessary to provide reliable schemes to keep continuous data dissem- ination capacity. Otherwise, the system interruption may cause the live content becomes obsolete content. Driven by these requirements, publish/subscribe (pub/ sub) pattern is widely used to disseminate data due to its flexibility, scalability, and efficient support of com- plex event processing. In pub/sub systems (pub/subs), a receiver (subscriber) registers its interest in the form of a subscription. Events are published by senders to the pub/ sub system. The system matches events against subscrip- tions and disseminates them to interested subscribers. In traditional data dissemination applications, the live content are generated by publishers at a low speed, which makes many pub/subs adopt the multi-hop routing techni- ques to disseminate events. A large body of broker-based pub/subs forward events and subscriptions through orga- nizing nodes into diverse distributed overlays, such as tree- based design [3], [4], [5], [6], cluster-based design [7], [8] and DHT-based design [9], [10], [11]. However, the multi- hop routing techniques in these broker-based systems lead to a low matching throughput, which is inadequate to apply to current high arrival rate of live content. Recently, cloud computing provides great opportunities for the applications of complex computing and high speed communication [12], [13], where the servers are connected by high speed networks, and have powerful computing and storage capacities. A number of pub/sub services based on the cloud computing environment have been proposed, such as Move [14], BlueDove [15] and SEMAS [16]. However, most of them can not completely meet the requirements of The authors are with the Science and Technology on Parallel and Distrib- uted Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, Hunan 410073, P.R. China. E-mail: {maxingkong, wangyijie, xiaoqiangpei}@nudt.edu.cn. Manuscript received 7 Mar. 2014; revised 22 May 2014; accepted 18 June 2014. Date of publication 28 July 2014; date of current version 11 Mar. 2015. Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, and M. Parashar. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TCC.2014.2338327 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015 1 2168-7161 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of A scalable and reliable matching service for

Page 1: A scalable and reliable matching service for

A Scalable and Reliable Matching Service forContent-Based Publish/Subscribe Systems

Xingkong Ma, Student Member, IEEE, Yijie Wang,Member, IEEE, and Xiaoqiang Pei

Abstract—Characterized by the increasing arrival rate of live content, the emergency applications pose a great challenge: how to

disseminate large-scale live content to interested users in a scalable and reliable manner. The publish/subscribe (pub/sub) model is

widely used for data dissemination because of its capacity of seamlessly expanding the system to massive size. However, most event

matching services of existing pub/sub systems either lead to low matching throughput when matching a large number of skewed

subscriptions, or interrupt dissemination when a large number of servers fail. The cloud computing provides great opportunities for the

requirements of complex computing and reliable communication. In this paper, we propose SREM, a scalable and reliable event

matching service for content-based pub/sub systems in cloud computing environment. To achieve low routing latency and reliable links

among servers, we propose a distributed overlay SkipCloud to organize servers of SREM. Through a hybrid space partitioning

technique HPartition, large-scale skewed subscriptions are mapped into multiple subspaces, which ensures high matching throughput

and provides multiple candidate servers for each event. Moreover, a series of dynamics maintenance mechanisms are extensively

studied. To evaluate the performance of SREM, 64 servers are deployed and millions of live content items are tested in a CloudStack

testbed. Under various parameter settings, the experimental results demonstrate that the traffic overhead of routing events in

SkipCloud is at least 60 percent smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times

larger than the single-dimensional partitioning technique of BlueDove. Besides, SREM enables the event loss rate to drop back to 0 in

tens of seconds even if a large number of servers fail simultaneously.

Index Terms—Publish/subscribe, event matching, overlay construction, content space partitioning, cloud computing

Ç

1 INTRODUCTION

BECAUSE of the importance in helping users to make real-time decisions, data dissemination has become dramati-

cally significant in many large-scale emergency applications,such as earthquake monitoring, disaster weather warning,and status update in social networks. Recently, data dissemi-nation in these emergency applications presents a number offresh trends. One is the rapid growth of live content. Forinstance, Facebook users publish over 600,000 pieces of con-tent and Twitter users send over 100,000 tweets on averageper minute [1]. The other is the highly dynamic networkenvironment. For instance, the measurement studies indi-cates that most users’ sessions in social networks only lastseveral minutes [2]. In emergency scenarios, the sudden dis-asters like earthquake or bad weather may lead to the failureof a large number of users instantaneously.

These characteristics require the data disseminationsystem to be scalable and reliable. Firstly, the systemmust be scalable to support the large amount of live con-tent. The key is to offer a scalable event matching serviceto filter out irrelevant users. Otherwise, the content mayhave to traverse a large number of uninterested users

before they reach interested users. Secondly, with thedynamic network environment, it’s quite necessary toprovide reliable schemes to keep continuous data dissem-ination capacity. Otherwise, the system interruption maycause the live content becomes obsolete content.

Driven by these requirements, publish/subscribe (pub/sub) pattern is widely used to disseminate data dueto its flexibility, scalability, and efficient support of com-plex event processing. In pub/sub systems (pub/subs), areceiver (subscriber) registers its interest in the form of asubscription. Events are published by senders to the pub/sub system. The system matches events against subscrip-tions and disseminates them to interested subscribers.

In traditional data dissemination applications, the livecontent are generated by publishers at a low speed, whichmakes many pub/subs adopt the multi-hop routing techni-ques to disseminate events. A large body of broker-basedpub/subs forward events and subscriptions through orga-nizing nodes into diverse distributed overlays, such as tree-based design [3], [4], [5], [6], cluster-based design [7], [8]and DHT-based design [9], [10], [11]. However, the multi-hop routing techniques in these broker-based systems leadto a low matching throughput, which is inadequate to applyto current high arrival rate of live content.

Recently, cloud computing provides great opportunitiesfor the applications of complex computing and high speedcommunication [12], [13], where the servers are connectedby high speed networks, and have powerful computing andstorage capacities. A number of pub/sub services based onthe cloud computing environment have been proposed, suchas Move [14], BlueDove [15] and SEMAS [16]. However,most of them can not completely meet the requirements of

� The authors are with the Science and Technology on Parallel and Distrib-uted Processing Laboratory, College of Computer, National University ofDefense Technology, Changsha, Hunan 410073, P.R. China.E-mail: {maxingkong, wangyijie, xiaoqiangpei}@nudt.edu.cn.

Manuscript received 7 Mar. 2014; revised 22 May 2014; accepted 18 June2014. Date of publication 28 July 2014; date of current version 11 Mar. 2015.Recommended for acceptance by I. Bojanova, R.C.H. Hua, O. Rana, andM. Parashar.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TCC.2014.2338327

IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015 1

2168-7161� 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: A scalable and reliable matching service for

both scalability and reliability when matching large-scalelive content under highly dynamic environments. Thismainly stems from the following facts: 1) Most of them areinappropriate to the matching of live content with high datadimensionality due to the limitation of their subscriptionspace partitioning techniques, which bring either lowmatch-ing throughput or high memory overhead. 2) These systemsadopt the one-hop lookup technique [17] among servers toreduce routing latency. In spite of its high efficiency, itrequires each dispatching server to have the same view ofmatching servers. Otherwise, the subscriptions or eventsmay be assigned to the wrong matching servers, whichbrings the availability problem in the face of current joiningor crash of matching servers. A number of schemes can beused to keep the consistent view, like periodically sendingheartbeat messages to dispatching servers or exchangingmessages among matching servers. However, these extraschemes may bring a large traffic overhead or the interrup-tion of event matching service.

Motivated by these factors, we propose a scalable andreliable matching service for content-based pub/sub servicein cloud computing environments, called SREM. Specifi-cally, we mainly focus on two problems: one is how to orga-nize servers in the cloud computing environment to achievescalable and reliable routing. The other is how to managesubscriptions and events to achieve parallel matchingamong these servers. Generally speaking, we provide thefollowing contributions:

� We propose a distributed overlay protocol, calledSkipCloud, to organize servers in the cloud comput-ing environment. SkipCloud enables subscriptionsand events to be forwarded among brokers in a scal-able and reliable manner. Also it is easy to imple-ment and maintain.

� To achieve scalable and reliable event matchingamong multiple servers, we propose a hybrid multi-dimensional space partitioning technique, calledHPartition. It allows similar subscriptions to bedivided into the same server and provides multiplecandidate matching servers for each event. More-over, it adaptively alleviates hot spots and keepsworkload balance among all servers.

� We implement extensive experiments based on aCloudStack testbed to verify the performance ofSREM under various parameter settings.

The rest of this paper is organized as follows. Section 2introduces content-based data model and an essentialframework of SREM. Section 3 presents the SkipCloud

overlay in detail. Section 4 describes HPartition in detail.Section 5 discusses the dynamics maintenance mechanismsof SREM. We evaluate the performance of SREM in Section6. In Section 7, we review the related work on the matchingof existing content-based pub/subs. Finally, we concludethe paper and outline our future work in Section 8.

2 DESIGN OF SREM

2.1 Content-Based Data Model

SREM uses a multi-dimensional content-based data model.Consider our data model consists of k dimensionsA1; A2; . . . ; Ak. Let Ri be the ordered set of all possible val-ues of Ai. So, V ¼ R1 �R2 � � � � �Rk is the entire contentspace. A subscription is a conjunction of predicates over oneor more dimensions. Each predicate Pi specifies a continu-ous range for a dimension Ai, and it can be described by thetuple ðAi; vi; OiÞ, where vi 2 Ri and Oi represents a rela-tional operator (< ;�; 6¼;�; > , etc). The general form of a

subscription is S ¼ ^ki¼1Pi. An event is a point within thecontent space V. It can be represented as k dimension-value

pairs, i.e., e ¼ ^kj¼1ðAj; vjÞ. For each pair ðAj; vjÞ, we say it

satisfies a predicate ðAi; vi; OiÞ if Aj ¼ Ai and vjOivi. By thisdefinition we say an event e matches S if each predicate of Ssatisfies some pairs of e.

2.2 Overview of SREM

To support large-scale users, we consider a cloud comput-ing environment with a set of geographically distributeddatacenters through the Internet. Each datacenter contains alarge number of servers (brokers), which are managed by adatacenter management service such as Amazon EC2 orOpenStack.

We illustrate a simple overview of SREM in Fig. 1. Allbrokers in SREM as the front-end are exposed to the Inter-net, and any subscriber and publisher can connect to themdirectly. To achieve reliable connectivity and low routinglatency, these brokers are connected through an distrib-uted overlay, called SkipCloud. The entire content space ispartitioned into disjoint subspaces, each of which is man-aged by a number of brokers. Subscriptions and events aredispatched to the subspaces that are overlapping withthem through SkipCloud. Thus, subscriptions and eventsfalling into the same subspace are matched on the samebroker. After the matching process completes, events arebroadcasted to the corresponding interested subscribers.As shown in Fig. 1, the subscriptions generated by sub-scribers S1 and S2 are dispatched to broker B2 and B5,respectively. Upon receiving events from publishers, B2

and B5 will send matched events to S1 and S2, respectively.One may argue that different datacenters are responsi-

ble for some subset of the subscriptions according to thegeographical location, such that we do not really needmuch collaboration among the servers [3], [4]. In thiscase, since the pub/sub system needs to find all thematched subscribers, it requires each event to bematched in all datacenters, which leads to large trafficoverhead with the increasing number of datacenters andthe increasing arrival rate of live content. Besides, it’shard to achieve workload balance among the servers of

Fig. 1. System framework.

2 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015

Page 3: A scalable and reliable matching service for

all datacenters due to the various skewed distributionsof users’ interests. Another question is that why we needa distributed overlay like SkipCloud to ensure reliablelogical connectivity in datacenter environment whereservers are more stable than the peers in P2P networks.This is because as the number of servers increases indatacenters, the node failure becomes normal, but notrare exception [18]. The node failure may lead to unreli-able and inefficient routing among servers. To this end,we try to organize servers into SkipCloud to reduce therouting latency in a scalable and reliable manner.

Such a framework offers a number of advantages forreal-time and reliable data dissemination. First, it allows thesystem to timely group similar subscriptions into the samebroker due to the high bandwidth among brokers in thecloud computing environment, such that the local searchingtime can be greatly reduced. This is critical to reach highmatching throughput. Second, since each subspace is man-aged by multiple brokers, this framework is fault-toleranteven if a large number of brokers crash instantaneously.Third, because the datacenter management service providesscalable and elastic servers, the system can be easilyexpanded to Internet-scale.

3 SKIPCLOUD

3.1 Topology Construction

Generally speaking, SkipCloud organizes all brokers intolevels of clusters. As shown in Fig. 2, the clusters at eachlevel of SkipCloud can be treated as a partition of the wholebroker set. Table 1 shows key notations used in this section.

At the top level, brokers are organized into multipleclusters whose topologies are complete graphs. Each clus-ter at this level is called top cluster. It contains a leader brokerwhich generates a unique b-ary identifier with length of musing a hash function (e.g. MD-5). This identifier is calledClusterID. Correspondingly, each broker’s identifier is aunique string with length ofmþ 1 and shares common pre-fix of length m with its ClusterID. At this level, brokers inthe same cluster are responsible for the same content sub-spaces, which provides multiple matching candidates foreach event. Since brokers in the same top cluster generatefrequent communication among themselves, such as updat-ing subscriptions and dispatching events, they are orga-nized into a complete graph to reach each other in one hop.

After the top clusters have been well organized, the clus-ters at the rest levels can be generated level by level. Specifi-cally, each broker decides to join which cluster at every level.

The brokers whose identifiers share the common prefix withlength i would join the same cluster at level i, and the com-mon prefix is referred to as the ClusterID at level i. That is,clusters at level iþ 1 can be regarded as a b-partition of theclusters at level i. Thus, the number of clusters reduces line-arly with the decreasing of levels. Let " be the empty identi-fier. All brokers at level 0 join one single cluster, called globalcluster. Therefore, there are bi clusters at level i. Fig. 2 showsan example of how SkipCloud organizes eight brokers intothree levels of clusters by binary identifers.

To organize clusters of the non-top levels, we employ alight-weighted peer sampling protocol based on Cyclon[19], which provides robust connectivity of each cluster.Suppose there are m levels in SkipCloud. Specifically,each cluster runs Cyclon to keep reliable connectivityamong brokers in the cluster. Since each broker falls intoa cluster at each level, it maintains m neighbor lists. For aneighbor list at the cluster of level i, it samples equalnumber of neighbors from their corresponding childrenclusters at level iþ 1. This ensures that the routing fromthe bottom level can always find a broker pointing to ahigher level. Because brokers maintain levels of neighborlists, they update the neighbor list of one level and thesubsequent ones in a ring to reduce the traffic cost. Thepseudo-code of view maintenance algorithm is show inAlgorithm 1. This topology of the multi-level neighborlists is similar to Tapestry [20]. Compared with Tapestry,SkipCloud uses multiple brokers in top clusters as targetsto ensure reliable routing.

Algorithm 1.Neighbor List Maintenance

Input: views: the neighbor lists.m: the total number of levels in SkipCloud.cycle: current cycle.

1: j ¼ cycle%ðmþ 1Þ;2: for each i in [0,m-1] do3: update views[i] by the peer sampling

service based on Cyclon.4: for each i in [0,m-1] do5: if views[i] contains empty slots then6: fill these empty slots with other levels’

items who share common prefix of length iwith the ClusterID of views[i].

3.2 Prefix Routing

Prefix routing in SkipCloud is mainly used to efficientlyroute subscriptions and events to the top clusters. Note thatthe cluster identifiers at level iþ 1 are generated by append-ing one b-ary to the corresponding clusters at level i. Therelation of identifiers between clusters is the foundation ofrouting to target clusters. Briefly, when receiving a routing

TABLE 1Notations in SkipCloud

Nb the number of brokers in SkipCloudm the number of levels in SkipCloudDc the average degree in each cluster of SkipCloudNc the number of top clusters in SkipCloud

Fig. 2. An example of SkipCloud with eight brokers and three levels.Each broker is identified by a binary string.

MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 3

Page 4: A scalable and reliable matching service for

request to a specific cluster, a broker examines its neighborlists of all levels and chooses the neighbor which shares thelongest common prefix with the target ClusterID as the nexthop. The routing operation repeats until a broker can notfind a neighbor whose identifier is more closer than itself.Algorithm 2 describes the prefix routing algorithm inpseudo-code.

Algorithm 2. Prefix Routing

1: l ¼ commonPrefixLength(self.ID, event.ClusterID);2: if (l ¼¼ m) then3: process(event);4: else5: destB the broker whose identifier matches

event.ClusterIDwith the longest common prefixfrom self:views.

6: lmax ¼ commonPrefixLength(destB.identifier,event.ClusterID);

7: if (lmax � l) then8: destB the broker whose identifier is

closest to event.ClusterID from view½l�.9: if (destB and myself is in the same cluster of

level l) then10: process(event);11: forwardTo(destB, event);

Since a neighbor list in the cluster at level i is a uniformsampling from the corresponding children clusters at leveliþ 1, each broker can find a neighbor whose identifiermatches at least one more longer common prefix with thetarget identifier before reaching the target cluster. Therefore,the prefix routing in Algorithm 2 guarantees that any topcluster will be reached in at most logbNc hops, where Nc isthe the number of top clusters. Assume the average degreeof brokers in each cluster isDc. Thus, each broker only needsto keepDc � logbNc neighbors.

4 HPARTITION

In order to take advantage of multiple distributed brokers,SREM divides the entire content space among the top clus-ters of SkipCloud, so that each top cluster only handles asubset of the entire space and searches a small number ofcandidate subscriptions. SREM employs a hybrid multi-dimensional space partitioning technique, called HPartition, toachieve scalable and reliable event matching. Generally

speaking, HPartition divides the entire content space intodisjoint subspaces (Section 4.1). Subscriptions and eventswith overlapping subspaces are dispatched and matchedon the same top cluster of SkipCloud (Sections 4.2 and 4.3).To keep workload balance among servers, HPartitiondivides the hot spots into multiple cold spots in an adaptivemanner (Section 4.4). Table 2 shows key notations used inthis section.

4.1 Logical Space Construction

Our idea of logical space construction is inspired by existingsingle-dimensional space partitioning (called SPartition) andall-dimensional space partitioning (called APartition). Let kbe the number of dimensions in the entire space V and Ri

be the ordered set of values of the dimension Ai. Each Ri is

split into Nseg continuous and disjoint segments fRji; j ¼

1; 2; . . . ; Nsegg, where j is the segment identifier (calledSegID) of the dimension Ai.

The basic idea of SPartition like BlueDove [15] is to treateach dimension as a separated space, as show in Fig. 3a.Specifically, the range of each dimension is divided intoNseg segments, each of which is regarded as a separated sub-space. Thus, the entire space V is divided into k �Nseg sub-spaces. Subscriptions and events falling into the samesubspace are matched against each other. Due to the coarse-grained partitioning of SPartition, each subscription fallsinto a small number of subspaces, which brings multiplecandidate brokers for each event and low memory over-head. On the other hand, SPartition may form a hot spoteasily if a large number of subscribers are interested in thesame range of a dimension.

On the other hand, the idea of APartition like SEMAS[16] is to treat the combination of all dimensions as a sepa-rate space, as shown in Fig. 3b. Formally, the whole space V

is partitioned into ðNsegÞk subspaces. Compared with SParti-tion, APartition leads to a smaller number of hot spots, sincea hot subspace would be formed only if its all segments aresubscribed by a large number of subscriptions. On the otherhand, each event in APartition only has one candidate bro-ker. Compared with SPartition, each subscription in AParti-tion falls into more subspaces, which leads to a highermemory cost.

Inspired by both SPartition and APartition, HPartitionprovides a flexible manner to construct logical space. Itsbasic idea is to divide all dimensions into a number ofgroups and treat each group as a separate space, as shownin Fig. 3c. Formally, k dimensions in HPartition are classi-fied into t groups, each of which contains ki dimensions,

TABLE 2Notations in HPartition

V the entire content space

k the number of dimensions of VAi the dimension i, i 2 ½1; k�Ri the range of Ai

Pi the predicate of Ai

Nseg the number of segments on Ri

N0seg

the number of segments of hot spots

a the minimum size of the hot clusterNsub the number of subscriptionsGx1���xk the subspace with SubspaceID x1 � � � xk

Fig. 3. Comparison among three different logical space constructiontechniques.

4 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015

Page 5: A scalable and reliable matching service for

wherePt

i¼1 ki ¼ k. That is, V is split into t separated spacesfVi; i 2 ½1; t�g. HPartition adopts APartition to divide each

Vi. Thus, V is divided intoPt

i¼1ðNsegÞki subspaces. For eachspace Vi, let Gx1���xk ; xi 2 ½0; Nseg� be its subspace, where

x1 � � �xk is the concatenation of SegIDs of the subspace,called SubspaceID. For the dimensions that a subspace doesnot contain, the corresponding positions of the SubspaceIDare set to 0.

Each subspace of HPartition is managed by a top clusterof SkipCloud. To dispatch each subspace Gx1���xk to its corre-sponding top cluster of SkipCloud, HPartition uses a hashfunction like MurMurHash [21] to map each SubspaceIDx1 � � �xk to a b-ary identifier with length of m, representedby Hðx1 � � �xkÞ. According to the prefix routing in Algo-rithm 2, each subspace will be forwarded to the top clusterwhose ClusterID is nearest to Hðx1 � � �xkÞ. Note that thebrokers in the same top cluster are in charge of the same setof subspaces, which ensures the reliability of each subspace.

4.2 Subscription Installation

A user can specify a subscription by defining the ranges overone or more dimensions. We treat S ¼ ^ki¼1Pi as a generalform of subscriptions, where Pi is a predicate of the dimen-sionAi. If S does not contain a dimensionAi, the correspond-ing Pi is the entire range ofAi. When receiving a subscriptionS, the broker first obtains all subspaces Gx1���xk which are

overlapping with S. Based on the logical space constructionof HPartition, all dimensions are classified into t individualspacesVj; j 2 ½1; t�. For eachGx1���xk ofVi, it satisfies

xi ¼ s; if Ai 6¼ Ri;Rsi \ Pi 6¼ ;;

0; if Ai ¼ Ri::

(1)

As shown in in Fig. 4, V consists of two groups, andeach range is divided into two segments. For a subscrip-tion S1 ¼ ð20 � A1 � 80Þ ^ ð80 � A2 � 100Þ ^ ð105 � A3 �170Þ ^ ð92 � A4 � 115Þ, its matched subspaces are G1200,G1100 and G0022.

For the subscription S1, its every matched subspaceGx1���xk is hashed to a b-ary identifier with length ofm, repre-

sented by Hðx1 � � �xkÞ. Thus, S1 is forwarded to the top clus-ter whose ClusterID is nearest to Hðx1 � � �xkÞ. When a

broker in the top cluster receives the subscription S1, itbroadcasts S1 to other brokers in the same cluster, such thatthe top cluster provides reliable event matching and bal-anced workloads among its brokers.

4.3 Event Assignment and Forwarding Strategy

Upon receiving an event, the broker forwards this event toits corresponding subspaces. Specifically, considering anevent e ¼ ^ki¼1ðAi; viÞ. Each subspace Gx1���xk that e falls intoshould satisfy

xi 2 f0; sg; vi 2 Rsi : (2)

For instance, let e be ðA1; 30Þ ^ ðA2; 100Þ ^ ðA3; 80Þ^ ðA4; 80Þ.Based on the logical partitioning in Fig. 4, the matched sub-spaces of e are G1200, G0011.

Recall that HPartition divides the whole space V into tseparated spaces fVig, which indicates there are t candidatesubspaces fGi; 1 � i � tg for each event. Because of the dif-ferent workloads, the strategy of how to select one candi-date subspace for each event may greatly affect thematching rate. For an event e, suppose each candidate sub-space Gi contains Ni subscriptions. The most intuitiveapproach is the least first strategy, which chooses the sub-space with least number of candidate subscriptions. Corre-spondingly, its average searching subscription amountis Nleast ¼ mini2½1;t�Ni. However, when a large number of

skewed events fall into the same subspace, this approachmay lead to unbalanced workloads among brokers. Anotheris the random strategy, which selects each candidate sub-space with equal probability. It ensures balanced workloadsamong brokers. However, the average searching subscrip-

tion amount increases to Navg ¼ 1t

Pti¼1 Ni. In HPartition, we

adopt a probability based forwarding strategy to dispatchevents. Formally, the probability of forwarding e to the

space fVig is pi ¼ ð1Ni=Pt

i¼1 NiÞ=ðt 1Þ. Thus, the aver-

age searching subscription amount is Nprob ¼ ðN Pt

i¼1N2i

N Þðt 1Þ. Note that when every subspace has the same

size, Nprob reaches its maximal value, i.e., Nprob � Navg.Therefore, this approach has better workload balance thanthe least first strategy and less searching latency than therandom strategy.

4.4 Hot Spots Alleviation

In real applications, a large number of subscriptions fall intoa few number of subspaces which are called hot spots. Thematching in hot spots leads to a high searching latency. Toalleviate the hot spots, an obvious approach is to addbrokers to the top clusters containing hot spots. However,this approach raises the total price of the service and wastesresources of the clusters only containing cold spots.

To utilize distributed multiple clusters, a better solutionis to balance the workloads among clusters through parti-tioning and migrating hot spots. The gain of the partitioningtechnique is greatly affected by the distribution of subscrip-tions of the hot spot. To this end, HPartition divides eachhot spot into a number of cold spots through two partition-ing techniques: hierarchical subspace partitioning and sub-scription set partitioning. The first aims to partition the hot

Fig. 4. An example of subscription assignment.

MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 5

Page 6: A scalable and reliable matching service for

spots where the subscriptions are diffused among the wholespace, and the second aims to partition the hot spots wherethe subscriptions fall into a narrow space.

4.4.1 Hierarchical Subspace Partitioning

To alleviate the hot spots whose subscriptions diffuse in itswhole space, we propose a hierarchical subspace partitioningtechnique, called HSPartition. Its basic idea is to divide thespace of a hot spot into a number of smaller subspaces andreassign its subscriptions into these new generated subspaces.

Step one, each hot spot is divided along with its rangesof dimensions. Assume Vi contains ki dimensions, andone of its hot spots is Gx1���xk . For each xi 6¼ 0, the corre-

sponding range Rxii is divided into N

0seg segments, each of

which is denoted by Rxiyið1 � xi � Ns; 1 � yi � N0segÞ.

Therefore, Gx1���xk is divided into ðN 0segÞki subspaces, each of

which is represented by Gx1���xky1���yk .Step two, subscriptions in Gx1���xk are reassigned to

these new subspaces according to the subscription instal-lation in Section 4.2. Because each subspace occupies asmall piece of the space of Gx1���xk , it means a subscription

S of Gx1���xk may fall into a part of these subspaces, which

will reduce the number of subscriptions that each eventmatches against on the hot spot. HSPartition enables thehot spots to be divided into much smaller subspaces itera-tively. As shown in the left top corner Fig. 5, the hot spotG102 is divided into four smaller subspaces by HSParti-tion, and the maximum number of subscriptions in thisspace decreases from 9 to 4.

To avoid concurrency partitioning one hot spot by multi-ple brokers in the same top cluster, the leader broker ofeach top cluster is responsible for periodically checking andpartitioning hot spots. When a broker receives a subscrip-tion, it hierarchically divides the space of the subscriptionuntil each subspace can not be divided further. Therefore,the leader broker that is in charge of dividing a hot spotGx1���xk disseminates a update message including a three-

tuple {“Subspace Partition”, x1 � � �xk, N0s} to all the other

brokers. Because the global cluster at level 0 of SkipCloudorganizes all brokers into a random graph, we can utilize agossip-based multicast method [22] to disseminate theupdate message to other brokers in OðlogNbÞ hops, whereNb is the total broker size.

4.4.2 Subscription Set Partitioning

To alleviate the hot spots whose subscriptions fall into a nar-row space, we propose a subscription set partitioning, called

SSPartition. Its basic idea is to divide the subscriptions set ofa hot spot into a number of subsets and scatter these subsetsto multiple top clusters. Assume Gx1���xk is a hot spot and N0

is the size of each subscription subset.Step one, divide the subscription set of Gx1���xk into n

subsets, where n ¼ jGx1���xk j=N0. Each subset is assigned a

new SubspaceID x1 � � �xk i, ð1 � i � jGx1���xk j=N0Þ, and

the subscription with identifier SubID would be dis-patched to the subset Gx1���xki if HðSubIDÞ%n ¼ i.

Step two, each subset Gx1���xki is forwarded to its corre-sponding top cluster with ClusterID Hðx1 � � �xk iÞ. Sincethese non-overlapping subsets are scattered to multiple topclusters, it allows events falling into the hot spots to bematched in parallel multiple brokers, which brings muchlower matching latency. Similar to the process of HSParti-tion, the leader broker who is responsible for dividing a hotspot Gx1���xk disseminates a three-tuple {“Subscription Parti-

tion”, x1 � � �xk, n} to all the other brokers to support furtherpartitioning operations. As shown in the right bottom cor-ner of Fig. 5, the hot spot G2100 into three smaller subsetsG21001; G21002 and G21003 by SSPartition. Correspond-ingly, the maximal number of subscriptions of the subspacedecreases from 11 to 4.

4.4.3 Adaptive Selection Algorithm

Because of diverse distributions of subscriptions, bothHSPartition and SSPartition can not replace with each other.On one hand, HSPartition is attractive to divide the hotspots whose subscriptions are uniform scattered regions.However, it’s inappropriate to divide the hot spots whosesubscriptions all appear at the same exact point. On theother hand, SSPartition allows to divide any kind of hotspots into multiple subsets even if all subscriptionsfalls into the same single point. However, compared withHSPartition, it has to dispatch an event to multiple subspa-ces, which brings a higher traffic overhead.

To achieve balanced workloads among brokers, we pro-pose an adaptive selection algorithm to select either HSPar-tition or SSPartition to alleviate hot spots. The selection isbased on the similarity of subscriptions in the same hotspot. Specifically, assume Gx1���xky1���yk is the subspace withmaximal size of subscriptions in HSPartition. a is a thresh-old value, which represents the similarity degree of sub-scriptions’ spaces in a hot spot. We choose HSPartition asthe partitioning algorithm if jGx1���xky1���yk j=jGx1���xk j < a. Oth-

erwise, we choose SSPartition. Through combining bothpartitioning techniques, this selection algorithm can allevi-ate hot spots in an adaptive manner.

4.5 Performance Analysis

4.5.1 The Average Searching Size

Since each event is matched in one of its candidate subspa-ces, the average number of subscriptions of all subspaces,called the average searching size, is critical to reduce thematching latency. We give the formal analysis as follows.

Theorem 1. Suppose the percentage of each predicate’s range ofeach subscription is �, then the average searching size Nprob in

HPartition is not more than Nsubt

Pti¼1 �

ki ifNseg !1, where t

is the number of groups in the entire space, ki is the number of

Fig. 5. An example of dividing hot spots. Two hot spots ofG1200 andG2100

are divided HSPartition and SSPartition, respectively.

6 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015

Page 7: A scalable and reliable matching service for

dimensions in each group, and Nsub is the number ofsubscriptions.

Proof. For each dimension Ai; i 2 ½1; k�, the range of Ai is Ri,the length of corresponding predicate Pi is �kRik. Sinceeach dimension is divided into Nseg segments, the lengthof each segment is kRik=Nseg. Thus, the expected numberof segments that Pi falls into is d�Nsege. For a space Vi, it

contains ðNsegÞki subspaces. Then the average searching

size in Vi is Ni ¼ d�NsegekiNsub

ðNsegÞki. When Nseg !1, we have

d�NsegeNseg

¼ � , and Ni ¼ �kiNsub. According to the probabil-

ity based forwarding strategy in Section 4.3, the average

searching size is Nprob � 1t

Pti¼1 Ni. That is, Nprob �

Nsubt

Pti¼1 �

ki . tuAccording to the result of Theorem 1, the average search-

ing size Nprob decreases with the reduction of �. That is,smaller � brings less matching time in HPartition. Fortu-nately, the subscriptions distribution in real world applica-tions is often skewed, and most predicates occupy smallranges, which guarantees small average searching size.

Besides, note that Nsubt

Pti¼1 �

ki reaches its minimal value

Nsub�kt if ki ¼ kj, where i 2 ½1; t� and j 2 ½1; t�. It indicates

that the upper bound of Nprob can further decrease to Nsub�kt

if each group has the same number of dimensions.

4.5.2 Event Matching Reliability

To achieve reliable event matching, each event should havemultiple candidates in the system. In this section, we give aformal analysis of the eventmatching availability as follows.

Theorem 2. SREM promises event matching availability in theface of concurrent crash failure of up to dNtopð1 e

tNtopÞ 1

brokers, where d is the number of brokers in each top cluster ofSkipCloud, Ntop is the number of top clusters, and t is thenumber of groups in HPartition.

Proof. Based on HPartition, each event has t candidate sub-spaces which are diffused into Ntop top clusters. Overn boxes, distributing m balls at random, the expectation

of the number of empty boxes is nemn . Similarly, over

Ntop top clusters, distributing t candidate subspaces atrandom, the expectation of the number of non-empty top

clusters is Ntopð1 e tNtopÞ. Since each top cluster contains

d brokers which manages the same set of subspaces, theexpectation of the number of non-empty brokers for each

event is dNtopð1 e tNtopÞ. Thus, SREM ensures available

event matching in the face of concurrent crash failure of

up to dNtopð1 e tNtopÞ 1 brokers. tu

According to Theorem 2, the event matching availabil-ity of SREM is affected by d and t. That is, both Skip-Cloud and HPartition provide flexible schemes to ensurereliable event matching.

5 PEER MANAGEMENT

In SREM, there are mainly three roles: clients, brokers,and clusters. Brokers are responsible for managing all of

them. Since the joining or leaving of these roles maylead to inefficient and unreliable data dissemination, wewill discuss the dynamics maintenance mechanisms usedby brokers in this section.

5.1 Subscriber Dynamics

To detect the status of subscribers, each subscriber estab-lishes affinity with a broker (called home broker), and period-ically sends its subscription as a heartbeat message to itshome broker. The home broker maintains a timer for itsevery buffered subscription. If the broker has not received aheartbeat message from a subscriber over Tout time, the sub-scriber is supposed to be offline. Next, the home brokerremoves this subscription from its buffer and notifies thebrokers containing the failed subscription to remove it.

5.2 Broker Dynamics

Broker dynamics may lead to new clusters joining or oldclusters leaving. In this section, we mainly consider thebrokers joining/leaving from existing clusters, rather thanthe changing of the cluster size.

When a new broker is generated by its datacenter man-agement service, it firstly sends a “Broker Join” message tothe leader broker in its top cluster. The leader broker returnsback its top cluster identifier, neighbor lists of all levels ofSkipCloud, and all subspaces including the correspondingsubscriptions. The new broker generates its own identifierby adding a b-ary number to its top cluster identifier andtakes the received items of each level as its initial neighbors.

There is no particular mechanism to handle brokerdeparture from a cluster. In the top cluster, its leader brokercan easily monitor the status of other brokers. For the clus-ters of the rest levels, the sampling service guarantees thatthe older items of each neighbor list are prior to be replacedby fresh ones during the view shuffling operation, whichmakes the failed brokers be removed from the systemquickly. From the perspective of event matching, all brokersin the same top cluster have the same subspaces of subscrip-tions, which indicates that broker failure would not inter-rupt the event matching operation if there is at least onebroker alive in each cluster.

5.3 Cluster Dynamics

Brokers dynamics may lead to new clusters joining or oldclusters leaving. Since each subspace is managed by the topcluster whose identifier is closest to that of the subspace, it’snecessary to adaptively migrate a number of old clusters tothe new joining clusters. Specifically, the leader broker ofthe new cluster delivers its top ClusterID carried on a“Cluster Join” message to other clusters. The leader brokersin all other clusters find out the subspaces whose identifiersare closer to the new ClusterID than their own cluster iden-tifiers, and migrate them to the new cluster.

Since each subspace is stored in one cluster, the clusterdeparture incurs subscription loss. The peer sampling ser-vice of SkipCloud can be used to detect failed clusters. Torecover lost subscriptions, a simple method is to redirect thelost subscriptions by their owners’ heartbeat messages. Dueto the unreliable links between subscribers and brokers, thisapproach may lead to long repair latency. To this end, we

MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 7

Page 8: A scalable and reliable matching service for

store all subscriptions into a number of well-known serversof the datacenters. When these servers obtain the failed clus-ters, they dispatch the subscriptions in these failed clustersto the corresponding live clusters.

Besides, the number of levels m in SREM is adjustedadaptively with the change of the broker size Nb to ensurem ¼ dlogbðNbÞe, where b is the number of children clustersof each non-top cluster. Each leader broker runs a gossip-based aggregation service [23] at the global cluster to esti-mate the total broker size. When the estimated broker sizeNe

d is enough to change m, each leader broker notifies otherbrokers to update their neighbor lists through a gossip-based multicast with a probability of 1=Ne

d . If a number ofleader brokers disseminate the “Update Neighbor Lists”messages simultaneously, the earlier messages will bestopped when it collides with later ones, which ensures allleader brokers have the same view ofm. Whenm decreases,each broker removes its neighbor list of the top leveldirectly. When m increases, a new top level is initialized byfilling appropriate neighbors from the neighbor lists at otherlevels. Due to the logarithmic relation ofm and Nb, only sig-nificant change of Nb can change m. So, the level dynamicsbrings quite low traffic cost.

6 EXPERIMENT

6.1 Implementation

To take advantage of the reliable links and high bandwidthamong servers of the cloud computing environment, wechoose the CloudStack [24] testbed to design and implementour prototype. To develop the prototype as modular andportable, we use ICE [25] as the fundamental communica-tion platform. ICE provides a communication solution thatis easy to program with, and allows the developers to onlyfocus on their application logic. Based on ICE, we add about11,000 lines of Java code.

To evaluate the performance of SkipCloud, we imple-ment both SkipCloud and Chord to forward subscriptionsand messages. To evaluate the performance of HPartition,the prototype supports different space partitioning policies.Moreover, the prototype provides three different messageforwarding strategies, i.e, least subscription amount for-warding, random forwarding, and probability based for-warding mentioned in Section 4.3.

6.2 Parameters and Metrics

We use a group of virtual machines (VMs) in theCloudStack testbed to evaluate the performance of SREM.Each VM is running in a exclusive physical machine, andwe use 64 VMs as brokers. For each VM, it is equipped withfour processor cores, 8 GB memory, and is connected toGigabit Ethernet switches.

In SkipCloud, the number of brokers in each top cluster dis set to 2. To ensure reliable connectivity of clusters, theaverage degree of brokers in each cluster Dc is 6. In HParti-tion, the entire subscription space consists of eight dimen-sions, each of which has a range from 0 to 500. The range ofeach dimension is cut into 10 segments by default. For eachhotspot, the range of its every dimension is cut into two seg-ments iteratively.

In most experiments, 40; 000 subscriptions are generatedand dispatched to their corresponding brokers. The rangesof predicates of subscriptions follow a normal distributionwith standard deviation of 50, represented by ssub. Besides,there are two million events generated by all brokers. Forthe events, the value of each pair follows a uniform distri-bution along the entire range of the corresponding dimen-sion. A subspace is labeled as a hotspot if its subscriptionamount is over 1,000. Besides, the size of basic subscriptionsubset N0 in Section 4.4.2 is set to 500, and the thresholdvalue a in Section 4.4.3 is set to 0.7. The detail of defaultparameters is shown in Table 3.

Recall that HPartition divides all dimensions into t indi-vidual spaces Vi, each of which contains ki dimensions. Weset ki to be 1, 2, and 4, respectively. These three partitioningpolicies are called HPartition-1, HPartition-2, and HParti-tion-4, respectively. Note that HPartition-1 represents thesingle-dimensional partitioning (mentioned in Section 4.1),which is adopted by BlueDove [15]. The details of imple-mented methods are shown in Table 4, where SREM-ki andChord-ki represent HPartition-ki under SkipCloud andChord, respectively. In the following experiments, we eval-uate the performance of SkipCloud through comparingSREM-ki with Chord-ki, and evaluate the performance ofHPartition through comparing HPartition-4 and HParti-tion-2 with the single-dimensional partitioning in Blue-Dove, i.e., HPartition-1. Besides, we do not use APartition(mentioned in Section 4.1) in the experiments, which ismainly because its fine-grained partitioning technique leadsto extremely high memory cost.

We evaluate the performance of SREM through a numberof metrics.

� Subscription searching size: The number of subscrip-tions that need to be searched on each broker formatching a message.

� Matching rate: The number of matched events persecond. Suppose the first event is matched at themoment of T1, and the last one is matched at the

moment of T2. Thus, the matching rate is NeT2T1.

� Event loss rate: The percentage of lost events in aspecified time period.

TABLE 4List of Implemented Methods

Name Overlay Partitioning Technique

SREM-4 SkipCloud HPartition-4SREM-2 SkipCloud HPartition-2SREM-1 SkipCloud HPartition-1 (SPartition, BlueDove)Chord-4 Chord HPartition-4Chord-2 Chord HPartition-2Chord-1 Chord HPartition-1 (SPartition, BlueDove)

TABLE 3Default Parameters in SREM

Parameter Nb d Dc k ki Nseg

Value 64 2 6 8 f1; 2; 4g 10Parameter N

0seg

Nsub ssub Nhot N0 b

Value 2 40K 50 1; 000 500 0.7

8 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015

Page 9: A scalable and reliable matching service for

6.3 Space Partitioning Policy

The space partitioning policy determines the number ofsearching subscriptions for matching a message. It fur-ther affects the matching rate and workload allocationamong brokers. In this section, we test three differentspace partitioning policies : HPartition-1, HPartition-2,and HPartition-4.

Fig. 6a shows the cumulative distribution function (CDF)of subscription searching sizes on all brokers. Most subscrip-tion searching sizes of HPartition-1 are smaller than thethreshold value of Nhot. In Fig. 6a, the percentage of “cold”subspaces in HPartition-4, HPartition-2, and HPartition-1 is99.7, 95.1 and 83.9 percent. As HPartition-4 splits the entirespace into more fine-grained subspaces, subscriptions fallinto the same subspace if they are interested in the samerange of four dimensions. In contrast, since HPartition-1 orHPartition-2 splits the entire space into less fine-grained sub-spaces, subscriptions are dispatched to the same subspacewith a higher probability. Besides, the CDF of each approachshows a sharp increase at the subscription searching size of500, which is caused by SSPartition in Section 4.4.2.

Fig. 6b shows the average subscription searching size ofeach broker. The maximal average subscription sizes ofHPartition-4, HPartition-2, and HPartition-1 are 437, 666and 980, respectively. Their corresponding normalized stan-dard deviations (standard deviation divided by the aver-age) of the average subscription sizes of each approach are0.026, 0.057 and 0.093, respectively. It indicates that brokersof HPartition-4 have less subscription searching size andbetter workload balance, which mainly lies in its more fine-grained space partitioning.

In conclusion, with the increasing of the partitioninggranularity, HPartition shows better workload balanceamong brokers at the cost of the growth of the memory cost.

6.4 Forwarding Policy and Workload Balance

6.4.1 Impact of Message Forwarding Policy

The forwarding policy determines each message to be for-warded to which broker, which significantly affects theworkload balance among brokers. Fig. 7 shows the matchingrates of three policies: probability based forwarding, leastsubscription amount forwarding and random forwarding.

As shown in Fig. 7 the probability based forwarding pol-icy has highest matching rate in various scenarios. Thismainly lies in its better trade-off between workload balanceand searching latency. For instance, when we use HParti-tion-4 under SkipCloud, the matching rate of the probabilitybased policy are 1:27� and 1:51� that of random policy andleast subscription amount forwarding policy, respectively.

When we use HPartition-4 under Chord, the correspondinggains of the probability based policy become 1:26� and1:57�, respectively.

6.4.2 Workload Balance of Brokers

We compare the workload balance of brokers under differ-ent partitioning strategies and overlays. In the experiment,we use the probability based forwarding policy to dispatchevents. One million events are forwarded and matchedagainst 40;000 subscriptions in the experiments. We use anindex of bðNbÞ [26] to evaluate the load balance among

brokers, where bðNbÞ ¼ ðPNb

i¼1 LoadiÞ2=ðNb

PNbi¼1 Load

2i Þ, and

Loadi is the workload of each broker. The value of bðNbÞ isbetween 0 and 1, and the higher value means better work-load balance among brokers.

Fig. 9a shows the distributions of the number of for-warding events among brokers. The forwarding event sizeof SREM-ki is less than that of Chord-ki, which is becauseSkipCloud spends less routing hops than Chord. The corre-sponding values of bðNbÞ of SREM-4, SREM-2, SREM-1,Chord-4, Chord-2 and Chord-1 are 0.99, 0.96, 0.98, 0.99,0.99 and 0.98 respectively. These values indicate quite bal-anced workloads of forwarding events among brokers.Besides, the number of forwarding events in SREM is atleast 60 percent smaller than in Chord. Fig. 9b shows thedistributions of matching rates among brokers. Their corre-sponding values of bðNbÞ are 0.98, 0.99, 0.99, 0.99, 0.97 and0.89, respectively. The balanced matching rates amongbrokers is mainly caused by the fine-grained partitioningof HPartition and the probability based forwarding policy.

6.5 Scalability

In this section, we evaluate the scalability of all approachesthrough measuring the changing of matching rate with dif-ferent values ofNb,Nsub, k andNseg. In each experiment, onlyone parameter of Table 3 is changed to validate its impact.

We first change the number of brokers Nb. As shown inFig. 10a, the matching rate of each approach increases line-arly with the growth of Nb. As Nb increases from 4 to 64, thegain of the matching rate of SREM-4, SREM-2, SREM-1,Chord-4, Chord-2 and Chord-1 is 12:8�, 11:2�, 9:6�, 10:0�,8:4� and 5:8�, respectively. Compared with Chord-ki, thehigher increasing rate of SREM-ki is mainly caused by thesmaller routing hops of SkipCloud. Besides, SREM-4presents the highest matching rate with various values ofNb due to the fine-grained partitioning technique of HParti-tion and the lower forwarding hops of SkipCloud.

We change the number of subscriptions Nsub in Fig. 10b.As Nsub increases from 40K to 80K, the matching rate of

Fig. 6. Distribution of subscription searching sizes.Fig. 7. Forwarding policy.

MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 9

Page 10: A scalable and reliable matching service for

SREM-4, SREM-2, SREM-1, Chord-4, Chord-2 and Chord-1decreases 61.2, 58.7, 39.3, 51.5, 51.0 and 38.6 percent, respec-tively. Compared with SREM-1, each subscription in SREM-4 may fall into more subspaces due to its fine-grained parti-tioning, which leads to a higher increasing rate of the aver-age subscription searching size in SREM-4. That’s why thedecreasing percentages of the matching rates in SREM-kiand Chord-ki increase with the growth of ki. In spite of thehigher decreasing percentage, the matching rates of SREM-4 and Chord-4 are 27:4� and 33:7� that of SREM-1 andChord-1 respectively, whenNsub equals to 80;000.

We increase the number of dimensions k from 8 to 20 inFig. 10c. The matching rate of SREM-4, SREM-2, SREM-1,Chord-4, Chord-2 and Chord-1 decreases 83.4, 31.3, 25.7,81.2, 44.0 and 15.0 percent, respectively. Similar to the phe-nomenon of Fig. 10b, the decreasing percentages of thematching rates in SREM-ki and Chord-ki increase with thegrowth of ki. This is because the fine-grained partitioning ofHPartition-4 leads to faster growing of the average subscrip-tion searching size. When k equals to 20, the matching ratesof SREM-4 and Chord-4 are 8:7� and 9:4� that of SREM-1and Chord-1, respectively.

We evaluate the impact of different number of segmentsNseg in Fig. 10d. As Nseg increases from 4 to 10, The corre-sponding matching rates of SREM-4, SREM-2, SREM-1,Chord-4, Chord-2 and Chord-1 increases 82.7, 179.1, 44.7,98.3, 286.1 and 48.4 percent, respectively. These numbersindicate that the bigger Nseg brings more fine-grained parti-tioning and smaller average subscription searching size.

In conclusion, the matching rate of SREM shows a linearincreasing capacity with the growing Nb, and SREM-4presents highest matching rate in various scenarios due toits more fine-grained partitioning technique and less rout-ing hops of SkipCloud.

6.6 Reliability

In this section, we evaluate the reliability of SREM-4 by test-ing its ability to recover from server failures. During the

experiment, 64 brokers totally generate 120 millionmessagesand dispatch them to their corresponding subspaces. After10 seconds, a part of brokers are shut down simultaneously.

Fig. 11a shows the changing of event loss rates of SREMwhen four, eight 16 and 32 brokers fail. From the momentwhen brokers fail, the corresponding event loss ratesincrease to 3, 8, 18 and 37 percent respectively in 10 seconds,and drop back to 0 in 20 seconds. This recovery abilitymainly lies in the quick failure detection of the peer sam-pling service mentioned in Section 3.1 and the quick reas-signing lost subscriptions by the well-known serversmentioned in Section 5.3. Note that the maximal event lossrate in each case is less than the percentage of lost brokers.This is because each top cluster of SkipCloud initially hastwo brokers, and an event will not be dropped if it is dis-patched to an alive broker of its top cluster.

Fig. 11b shows the changing of the matching rates when anumber of brokers fail. When a number of brokers fail atthe moment of 10 second, there is an apparent droppingof the matching rate in the following tens of seconds. Notethat the dropping interval increases with the number offailed brokers, which is because the failure of more brokersleads to a higher latency of detecting these brokers andrecovering the lost subscriptions. After the handling offailed brokers, the matching rate of each situation increasesto a higher level. It indicates SREM can function normallyeven if a large number of broker fail simultaneously.

In conclusion, the peer sampling service of SkipCloudensures continuous matching service even if a large numberof brokers fails simultaneously. Through buffering sub-scriptions in the well-known servers, failed subscriptionscan be dispatched to the corresponding alive brokers in tensof seconds.

Fig. 9. Workload balance. Fig. 10. Scalability.

Fig. 8. The impact of workload characteristics.

10 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015

Page 11: A scalable and reliable matching service for

6.7 Workload Characteristics

In this section, we evaluate how workload characteristicsaffect the matching rate of each approach.

First, we evaluate the influence of different subscriptiondistribution through changing the standard deviation ssub

from 50 to 200. Fig. 8a shows the matching rate of eachapproach decreases with the increasing of ssub. As ssub

increases, the predicates of each subscription occupy largerranges of the corresponding dimensions, which causes thatthe subscription is dispatched to more subspaces.

We then evaluate the performance of each approachunder different event distributions. In this experiment,the standard deviation of the event distribution sechanges from 50 to 200. Note that skewed events lead totwo scenarios. One is the events are skewed in the sameway as subscriptions. Thus, the hot events coincide withthe hot spots, which severely hurts the matching rate. Theother is the events are skewed in the opposite way as sub-scriptions, which benefits the matching rate. Fig. 8bshows how the adverse skewed events affect the perfor-mance of each approach. As se reduces from 200 to 50,the matching rate of SREM-4, SREM-2, SREM-1, Chord-4,Chord-2 and Chord-1 decreases 79.4, 63.2, 47.0, 77.2, 59.1and 52.6 percent, respectively. Although the matchingrate of SREM-4 decreases greatly as se reduces, it is stillhigher than other approaches under different se.

We evaluate the performance of each approach underdifferent combinations of dimensions (called subscriptionpatterns) in Fig. 8c. Until now, all above mentioned experi-ments generate subscriptions by one subscription pattern,where each predicate of the dimension follows the normaldistribution. We uniformly select a group of patterns togenerate subscriptions. For the predicates that a subscrip-tion does not contain, their ranges are the whole ranges ofthe corresponding dimensions. Fig. 8c shows the matchingrate decreases with the number of subscription patterns. Asthe number of subscription patterns grows from 1 to 16, thematching rate of SREM-4, SREM-2, SREM-1, Chord-4,Chord-2 and Chord-1 decreases 86.7, 63.8, 2.3, 90.3, 65.9and 10.3 percent, respectively. The sharp decline of thematching rate of SREM-4 and Chord-4 is mainly caused bythe quick increasing of the average searching size. How-ever, the matching rates of SREM-4 and Chord-4 are stillmuch higher than that of the other approaches.

In conclusion, the matching throughput of each approachdecreases greatly as the skewness of subscriptions or eventsincreases. Compared with other approaches, the fine-grained partitioning of SREM-4 ensures higher matchingthroughput under various parameter settings.

6.8 Memory Overhead

Storing subscriptions is main memory overhead of each bro-ker. Since each subscription may fall into the subspaces thatare managed by the same broker, each broker only needs tostore a real subscription and a group of its identifications toreduce the memory overhead. We use the average numberof subscription identifications that each broker stores, notedby Nsid, as a criterion. Recall that a predicate specifies a con-tinuous range of a dimension. We restrict a continuousrange by two numerals, each of which occupies 8 bytes.Suppose that each subscription identification uses 8 bytes.Thus, the maximal average memory overhead of managingsubscriptions on each broker is 16kNsub þ 8Nsid, where k isthe number of dimensions, and Nsub is the total number ofsubscriptions. As mentioned in Section 4.1, HPartition-kibrings large memory cost and partitioning latency as kiincreases. When we test the performance of HPartition-8,the Java heap space is out of memory.

We first evaluate the average memory cost of brokerswith the changing of Nsub in Fig. 12a. The value of Nsid ofeach partitioning technique increases linearly with thegrowth of Nsub. As Nsub increases from 40K to 100K, thevalue of Nsid in HPartition-4, HPartition-2 and HPartition-1grows 143, 151 and 150 percent, respectively. HPartition-4presents the highest Nsid than that of the other approaches,which is because the number of subspaces that a subscrip-tion falls into increases with the decreasing room of eachsubspace. In spite of the high Nsid in HPartition-4, its maxi-mal average overhead on each broker only cost 20.2 MBwhenNsub reaches 100K.

We then evaluate how the memory overhead of brokerschanges with different values of ssub. As ssub increases, eachpredicate of subscriptions spans a wider range, which leadsto much more memory overhead. As ssub increases from 50to 200, the value of Nsid in HPartition-4, HPartition-2 andHPartition-1 grows 86, 60 and 27 percent, respectively.Unsurprisingly, Nsid in HPartition-4 still has largest value.When ssub equals to 200, the maximal average memoryoverhead of HPartition-4 is 6.2 MB, which brings smallmemory overhead to each broker.

In conclusion, the memory overhead of HPartition-kiincreases slowly with the growth of Nsub and ssub, if ki is nomore than 4.

7 RELATED WORK

A large body of efforts on broker based pub/subs have beenproposed in recent years. One method is to organize brokersinto a tree overlay, such that events can be delivered to allrelevant brokers without duplicate transmissions. Besides,data replication schemes [27] are employed to ensure

Fig. 12. Memory overhead.

Fig. 11. Reliability.

MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 11

Page 12: A scalable and reliable matching service for

reliable event matching. For instance, Siena [3] advertisessubscriptions to the whole network. When receiving anevent, each broker determines to forward the event to thecorresponding broker according to its routing table. InAtmosphere [4], it dynamically identifies entourages ofpublishers and subscribers to transmit events with lowlatency. It is appropriate to the scenarios with small-scale ofsubscribers. As the number of subscribers increases, theover-overlays constructed in Atmosphere probably havethe similar latency like in Siena. To ensure reliable routing,Kazemzadeh and Jacobsen [5] propose a d-fault-tolerancealgorithm to handle concurrent crash failure of up to d

brokers. Brokers are required to maintain a partial view ofthis tree that includes all brokers within distance dþ 1. Themulti-hop routing techniques in these tree-based pub/subslead to a high routing latency. Besides, skewed subscrip-tions and events lead to unbalanced workloads amongbrokers, which may severely reduces the matching through-put. In contrast, SREM uses SkipCloud to reduce the routinglatency and HPartition to balance the workloads of brokers.

Anothermethod is to divide brokers intomultiple clustersthrough unstructured overlays. Brokers in each cluster areconnected through reliable topologies. For instance, brokersin Kyra [7] is grouped into cliques based on their networkproximity. Each clique divides the whole content space intonon-overlapping zones based on the number of its brokers.After that, the brokers in different cliques which are respon-sible for similar zones are connected by a multicast tree.Thus, events are forwarded through its correspondingmulti-ple tree. Sub-2-Sub [8] implements epidemic-based cluster-ing to partition all subscriptions into disjoint subspaces. Thenodes in each subspace are organized into a bidirectionalring. Due to the long delay of routing events in unstructuredoverlays, most of these approaches are inadequate to achievescalable event matching. In contrast, SREM uses SkipCloudto organize brokers, which uses the prefix routing techniqueto achieve low routing latency.

To reduce the routing hops, a number of methods orga-nize brokers through structured overlays which commonlyneed QðlogNÞ hops to locate a broker. Subscriptions andevents falling into the same subspace are sent and matchedon a rendezvous broker. For instance, PastryString [9] con-structs a distributed index tree for each dimension to sup-port both numerical and string dimensions. The resourcediscovery service proposed by Ranjan et al. [10] mapsevents and subscriptions into d-dimensional indexes, andhashes these indexes onto a DHT network. To ensure reli-able pub/sub service, each broker in Meghdoot [11] has aback up which is used when the primary broker fails. Com-pared with these DHT-based approaches, SREM ensuressmaller forwarding latency through the prefix routing ofSkipCloud, and higher event matching reliability by multi-ple brokers in each top cluster of SkipCloud and multiplecandidate groups of HPartition.

Recently, a number of cloud providers have offered aseries of pub/sub services. For instance, Move [14] pro-vides high available key-value storage and matchingrespectively based on one-hop lookup [17]. BlueDove [15]adopts a single-dimensional partitioning technique todivide the entire spare and a performance-aware forward-ing scheme to select candidate matcher for each event. Its

scalability is limited by the coarse-grained clustering tech-nique. SEMAS [16] proposes a fine-grained partitioningtechnique to achieve high matching rate. However, thispartitioning technique only provides one candidate foreach event and may lead to large memory cost as the num-ber of data dimensions increases. In contrast, HPartitionmakes a better trade-off between the matching throughputand reliability through a flexible manner of constructinglogical space.

8 CONCLUSIONS AND FUTURE WORK

This paper introduces SREM, a scalable and reliable eventmatching service for content-based pub/sub systems incloud computing environment. SREM connects the brokersthrough a distributed overlay SkipCloud, which ensuresreliable connectivity among brokers through its multi-levelclusters and brings a low routing latency through a prefixrouting algorithm. Through a hybrid multi-dimensionalspace partitioning technique, SREM reaches scalable andbalanced clustering of high dimensional skewed subscrip-tions, and each event is allowed to be matched on any of itscandidate servers. Extensive experiments with real deploy-ment based on a CloudStack testbed are conducted, produc-ing results which demonstrate that SREM is effective andpractical, and also presents good workload balance, scal-ability and reliability under various parameter settings.

Although our proposed event matching service can effi-ciently filter out irrelevant users from big data volume,there are still a number of problems we need to solve.Firstly, we do not provide elastic resource provisioningstrategies in this paper to obtain a good performance priceratio. We plan to design and implement the elastic strategiesof adjusting the scale of servers based on the churn work-loads. Secondly, it does not guarantee that the brokers dis-seminate large live content with various data sizes to thecorresponding subscribers in a real-time manner. For thedissemination of bulk content, the upload capacity becomesthe main bottleneck. Based on our proposed event matchingservice, we will consider utilizing a cloud-assisted tech-nique to realize a general and scalable data disseminationservice over live content with various data sizes.

ACKNOWLEDGMENTS

This work was supported by the National Grand Funda-mental Research 973 Program of China (Grant No.2011CB302601), the National Natural Science Foundation ofChina (Grant No.61379052), the National High TechnologyResearch and Development 863 Program of China (GrantNo.2013AA01A213), the Natural Science Foundation forDistinguished Young Scholars of Hunan Province (GrantNo.14JJ1026), Specialized Research Fund for the DoctoralProgram of Higher Education (Grant No.20124307110015).Yijie Wang is the corresponding author.

REFERENCES

[1] Dataperminite. (2014). [Online]. Available: http://www.domo.com/blog/2012/06/how-much-data-is-created-every-minute/

[2] F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida,“Characterizing user behavior in online social networks,” in Proc.9th ACM SIGCOMMConf. Internet Meas. Conf., 2009, pp. 49–62.

12 IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 3, NO. 1, JANUARY-MARCH 2015

Page 13: A scalable and reliable matching service for

[3] A. Carzaniga, “Architectures for an event notification servicescalable to wide-area networks,” Ph.D. dissertation, IngegneriaInformatica eAutomatica, Politecnico diMilano,Milan, Italy, 1998.

[4] P. Eugster and J. Stephen, “Universal cross-cloud communication,”IEEE Trans. Cloud Comput., vol. 2, no. 2, pp. 103–116, 2014.

[5] R. S. Kazemzadeh and H.-A Jacobsen, “Reliable and highly avail-able distributed publish/subscribe service,” in Proc. 28th IEEE Int.Symp. Reliable Distrib. Syst., 2009, pp. 41–50.

[6] Y. Zhao and J. Wu, “Building a reliable and high-performancecontent-based publish/subscribe system,” J. Parallel Distrib.Comput., vol. 73, no. 4, pp. 371–382, 2013.

[7] F. Cao and J. P. Singh, “Efficient event routing in content-basedpublish/subscribe service network,” in Proc. IEEE INFOCOM,2004, pp. 929–940.

[8] S. Voulgaris, E. Riviere, A. Kermarrec, and M. Van Steen, “Sub-2-sub: Self-organizing content-based publish and subscribe fordynamic and large scale collaborative networks,” Res. Rep.RR5772, INRIA, Rennes, France, 2005.

[9] I. Aekaterinidis and P. Triantafillou, “Pastrystrings: A comprehen-sive content-based publish/subscribe DHT network,” in Proc.IEEE Int. Conf. Distrib. Comput. Syst., 2006, pp. 23–32.

[10] R. Ranjan, L. Chan, A. Harwood, S. Karunasekera, and R. Buyya,“Decentralised resource discovery service for large scale federatedgrids,” in Proc. IEEE Int. Conf. e-Sci. Grid Comput., 2007, pp. 379–387.

[11] A. Gupta, O. D. Sahin, D. Agrawal, and A. El Abbadi, “Meghdoot:Content-based publish/subscribe over p2p networks,” in Proc. 5thACM/IFIP/USENIX Int. Conf. Middleware, 2004, pp. 254–273.

[12] X. Lu, H. Wang, J. Wang, J. Xu, and D. Li, “Internet-based virtualcomputing environment: Beyond the data center as a computer,”Future Gener. Comput. Syst., vol. 29, pp. 309–322, 2011.

[13] Y. Wang, X. Li, X. Li, and Y. Wang, “A survey of queries overuncertain data,” Knowl. Inf. Syst., vol. 37, no. 3, pp. 485–530, 2013.

[14] W. Rao, L. Chen, P. Hui, and S. Tarkoma, “Move: A large scalekeyword-based content filtering and dissemination system,” inProc. IEEE 32nd Int. Conf. Distrib. Comput. Syst., 2012, pp. 445–454.

[15] M. Li, F. Ye, M. Kim, H. Chen, and H. Lei, “A scalable and elasticpublish/subscribe service,” in Proc. IEEE Int. Parallel Distrib. Pro-cess. Symp., 2011, pp. 1254–1265.

[16] X. Ma, Y. Wang, Q. Qiu, W. Sun, and X. Pei, “Scalable and elasticevent matching for attribute-based publish/subscribe systems,”Future Gener. Comput. Syst., vol. 36, pp. 102–119, 2013.

[17] A. Lakshman and P. Malik, “Cassandra: A decentralized struc-tured storage system,” Oper. Syst. Rev., vol. 44, no. 2, pp. 35–40,2010.

[18] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis,R. Vadali, S. Chen, and D. Borthakur, “Xoring elephants: Novelerasure codes for big data,” in Proc. 39th Int. Conf. Very Large DataBases, 2013, pp. 325–336.

[19] S. Voulgaris, D. Gavidia, and M. van Steen, “Cyclon: Inexpensivemembership management for unstructured p2p overlays,”J. Netw. Syst. Manage., vol. 13, no. 2, pp. 197–217, 2005.

[20] B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, andJ. D. Kubiatowicz, “Tapestry: A resilient global-scale overlay forservice deployment,” IEEE J. Sel. Areas Commun., vol. 22, no. 1,pp. 41–53, Jan. 2004.

[21] Murmurhash. (2014). [Online]. Available: http://burtleburtle.net/bob/hash/doobs.html

[22] A.-M Kermarrec, L. Massouli�e, and A. J. Ganesh, “Probabilisticreliable dissemination in large-scale systems,” IEEE Trans. ParallelDistrib. Syst., vol. 14, no. 3, pp. 248–258, Mar. 2003.

[23] M. Jelasity, A. Montresor, and €O. Babaoglu, “Gossip-based aggre-gation in large dynamic networks,” ACM Trans. Comput. Syst.,vol. 23, no. 3, pp. 219–252, 2005.

[24] Cloudstack. (2014). [Online]. Available: (http://cloudstack.apache.org/)

[25] Zeroc. (2014). [Online]. Available: (http://www.zeroc.com/)[26] B. He, D. Sun, and D. P. Agrawal, “Diffusion based distributed

internet gateway load balancing in a wireless mesh network,” inProc. IEEE Global Telecommun. Conf., 2009, pp. 1–6.

[27] Y. Wang and S. Li, “Research and performance evaluation of datareplication technology in distributed storage systems,” Comput.Math. Appl., vol. 51, no. 11, pp. 1625–1632, 2006.

XingkongMa received the BS degree in computerscience and technology from the School of Com-puter of Shandong University, China, in 2007, andthe MS degree in computer science and technol-ogy from National University of Defense Technol-ogy, China, in 2009. He is currently working towardthe PhD degree at the School of Computer ofNational University of Defense Technology. Hiscurrent research interests lie in the areas of datadissemination, publish/subscribe. He is a studentmember of the IEEE.

Yijie Wang received the PhD degree in computerscience and technology from the National Univer-sity of Defense Technology, China, in 1998. Sheis currently a professor at the Science and Tech-nology on Parallel and Distributed ProcessingLaboratory, National University of Defense Tech-nology. Her research interests include networkcomputing, massive data processing, and paralleland distributed processing. She is a member ofthe IEEE.

Xiaoqiang Pei received the BS and MS degreesin computer science and technology fromNational University of Defense Technology,China, in 2009 and 2011, respectively. He is cur-rently working toward the PhD degree at theSchool of Computer of National University ofDefense Technology. His current research inter-ests lie in the areas of fault-tolerant storage andcloud computing.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

MA ET AL.: A SCALABLE AND RELIABLE MATCHING SERVICE FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS 13