Independent Counter Estimation Buckets

9
Independent Counter Estimation Buckets Gil Einziger Computer Science [email protected] Benny Fellman Electrical Engineering [email protected] Technion - Israel Institute of Technology Yaron Kassner Computer Science [email protected] Abstract—Measurement capabilities are essential for a variety of network applications, such as load balancing, routing, fairness and intrusion detection. These capabilities require large counter arrays in order to monitor the traffic of all network flows. While commodity SRAM memories are capable of operating at line speed, they are too small to accommodate large counter arrays. Previous works suggested estimators, which trade precision for reduced space. However, in order to accurately estimate the largest counter, these methods compromise the accuracy of the rest of the counters. In this work we present a closed form representation of the optimal estimation function. We then introduce Independent Counter Estimation Buckets (ICE-Buckets), a novel algorithm that improves estimation accuracy for all counters. This is achieved by separating the flows to buckets and configuring the optimal estimation function according to each bucket’s counter scale. We prove an improved upper bound on the relative error and demonstrate an accuracy improvement of up to 57 times on real Internet packet traces. I. I NTRODUCTION A. Background Counter arrays are essential in network measurements and accounting. Typically, measurement applications track several million flows [1], [2], and their counters are updated with the arrival of every packet. These capabilities are an important enabling factor for networking algorithms in many fields such as load balancing, routing, fairness, network caching and intru- sion detection [3]–[7]. Counter arrays are also used in popular approximate counting sketches such as multi stage filters [8] and count min sketch [9], as well as in network monitoring architectures [10]–[12]. Such architectures are used to collect and analyze statistics from many networking devices [13]. Implementation of counter arrays is particularly challenging due to the requirement to operate at line speed. Although commodity SRAM memories are fast enough for this task, they do not meet the space requirements of modern counter arrays. Implementing a counter array entirely in SRAM is therefore very expensive [14]. Counter estimation algorithms use shorter counters, e.g. 12- bits instead of 32-bits, at the cost of a small error. Upon packet arrival, a counter is only incremented with a certain probabil- ity that depends on its current value. In order to keep the relative error uniform, small values are incremented with high probability and large ones with low probability. An estimation function is used in order to determine these probabilities and estimate the true value of a counter. Estimation functions can (a) Twelve flows are estimated with a classic counter estimation array. Error for all counters is affected by the largest flow in the array (10000). (b) The same twelve flows are estimated with ICE-Buckets. In this example the flows are separated into four buckets. Error for each counter is affected only by the largest flow in its bucket. Fig. 1: High level overview of ICE-Buckets vs. previous counter estimation approaches. be scaled to achieve higher counting capacity at the cost of a larger estimation error. B. Contributions In this work we present Independent Counter Estimation Buckets (ICE-Buckets), a novel counter estimation technique that reduces the overall error by efficiently utilizing multiple counter scales. The basic idea of ICE-Buckets is illustrated in Figure 1. In this example, the largest counter (10,000) can only be estimated with a large scale and a relative error of 10%. In the traditional approach, this error applies to all counters, as illustrated in Figure 1(a). Figure 1(b) shows what happens when the array is partitioned to independent buckets. The largest counter (10,000) is still estimated with an error of 10%, but in this case the error applies only to counters within the same bucket. The other buckets are able to use smaller scales and enjoy lower relative error. As a result, the overall error is reduced. ICE-Buckets makes use of the optimal estimation function that was previously known only in recursive form. We present an explicit representation and provide an extended analysis

Transcript of Independent Counter Estimation Buckets

Page 1: Independent Counter Estimation Buckets

Independent Counter Estimation BucketsGil Einziger

Computer [email protected]

Benny FellmanElectrical Engineering

[email protected]

Technion - Israel Institute of Technology

Yaron KassnerComputer Science

[email protected]

Abstract—Measurement capabilities are essential for a varietyof network applications, such as load balancing, routing, fairnessand intrusion detection. These capabilities require large counterarrays in order to monitor the traffic of all network flows. Whilecommodity SRAM memories are capable of operating at linespeed, they are too small to accommodate large counter arrays.Previous works suggested estimators, which trade precision forreduced space. However, in order to accurately estimate thelargest counter, these methods compromise the accuracy of therest of the counters.

In this work we present a closed form representation ofthe optimal estimation function. We then introduce IndependentCounter Estimation Buckets (ICE-Buckets), a novel algorithm thatimproves estimation accuracy for all counters. This is achievedby separating the flows to buckets and configuring the optimalestimation function according to each bucket’s counter scale.We prove an improved upper bound on the relative error anddemonstrate an accuracy improvement of up to 57 times on realInternet packet traces.

I. INTRODUCTION

A. Background

Counter arrays are essential in network measurements andaccounting. Typically, measurement applications track severalmillion flows [1], [2], and their counters are updated with thearrival of every packet. These capabilities are an importantenabling factor for networking algorithms in many fields suchas load balancing, routing, fairness, network caching and intru-sion detection [3]–[7]. Counter arrays are also used in popularapproximate counting sketches such as multi stage filters [8]and count min sketch [9], as well as in network monitoringarchitectures [10]–[12]. Such architectures are used to collectand analyze statistics from many networking devices [13].

Implementation of counter arrays is particularly challengingdue to the requirement to operate at line speed. Althoughcommodity SRAM memories are fast enough for this task, theydo not meet the space requirements of modern counter arrays.Implementing a counter array entirely in SRAM is thereforevery expensive [14].

Counter estimation algorithms use shorter counters, e.g. 12-bits instead of 32-bits, at the cost of a small error. Upon packetarrival, a counter is only incremented with a certain probabil-ity that depends on its current value. In order to keep therelative error uniform, small values are incremented with highprobability and large ones with low probability. An estimationfunction is used in order to determine these probabilities andestimate the true value of a counter. Estimation functions can

(a) Twelve flows are estimated with a classic counter estimation array. Errorfor all counters is affected by the largest flow in the array (10000).

(b) The same twelve flows are estimated with ICE-Buckets. In this examplethe flows are separated into four buckets. Error for each counter is affectedonly by the largest flow in its bucket.

Fig. 1: High level overview of ICE-Buckets vs. previouscounter estimation approaches.

be scaled to achieve higher counting capacity at the cost of alarger estimation error.

B. Contributions

In this work we present Independent Counter EstimationBuckets (ICE-Buckets), a novel counter estimation techniquethat reduces the overall error by efficiently utilizing multiplecounter scales.

The basic idea of ICE-Buckets is illustrated in Figure 1.In this example, the largest counter (10,000) can only beestimated with a large scale and a relative error of 10%. Inthe traditional approach, this error applies to all counters, asillustrated in Figure 1(a). Figure 1(b) shows what happenswhen the array is partitioned to independent buckets. Thelargest counter (10,000) is still estimated with an error of 10%,but in this case the error applies only to counters within thesame bucket. The other buckets are able to use smaller scalesand enjoy lower relative error. As a result, the overall error isreduced.

ICE-Buckets makes use of the optimal estimation functionthat was previously known only in recursive form. We presentan explicit representation and provide an extended analysis

Page 2: Independent Counter Estimation Buckets

for this function. We also present a rigorous mathematicalanalysis of ICE-Buckets that includes a very attractive upperbound for the overall relative error. We show that for trafficcharacteristics of real workloads this upper bound is up to 14times smaller than that of previous works. Moreover, we showthat the maximum relative error of ICE-Buckets is optimal.

Additionally, we extensively evaluate ICE-Buckets withfour real Internet packet traces and demonstrate an accuracyimprovement of up to 57 times.

Finally, we show that ICE-Buckets can avoid global opera-tions and still maintain similar accuracy. This configuration ismore attractive for practical implementations.

C. Related WorkWhile all counter arrays are required to monitor traffic at

line speed, their implementations differ in the availability ofmonitored data. Offline counter arrays can take as much asseveral hours to read from, while online counter arrays can beread at line speed. Naturally, offline counter arrays are usedfor offline tasks such as analyzing the bottlenecks of networkperformance. On the other hand, online counter arrays can alsobe used to answer low level questions such as what priorityto give a certain flow, how much bandwidth it requires andwhere to route its packets. In this work we focus on onlinecounter arrays.

Hybrid DRAM/SRAM counter arrays [1], [2] store only theleast significant bits of each counter in SRAM and the rest ofthe counter in (slower) DRAM. In CounterBraids [15], coun-ters are compressed in order to fit SRAM, but the decodingprocess is slow. Both of these methods are considered offlinebecause they cannot read a counter at line speed.

Brick [16] is an online counter array that encodes variablelength counters. Brick can hold more counters, since theaverage counter is shorter than the largest one. Unfortunately,it has limited counting capacity and becomes less efficient asthe average counter value increases.

Alternatively, sampling techniques can be used to onlymonitor some large flows [12], [17]. However this kind ofsolution is insufficient for some applications.

Estimators are able to represent large values with smallsymbols at the price of a small error. They can therefore beused to implement online counter arrays. This idea was firstintroduced by Approximate Counting [18] and was recentlyadapted to networking as Small Active Counters (SAC) [19].It was later improved by DISCO [20] in order to provide betteraccuracy and support variable sized increments. Lately, [21]introduced a way to gradually increase the relative error as thecounters grow. CEDAR [22] proved that their estimation func-tion is optimal. Although these methods require more spacethan sampling techniques, they provide accurate estimation forboth small and large flows, and compared to Brick they enjoysignificantly higher counting capacity at the price of a smallrelative error.

D. Paper OrganizationThe optimal estimation function is presented and analyzed

in Section II, followed by the presentation and analysis of

ICE-Buckets in Section III. Section IV describes simulationresults with real Internet packet traces. We conclude our workin Section V.

II. OPTIMAL ESTIMATION FUNCTION

A. Technical Background

Consider the problem of counting up to M packets, witha counter of only log2L < log2 (M + 1) bits. We rely on anestimation function A : {0, ..., L−1} → [0,M ], which acceptsa symbol l as input and returns an estimation value for thatsymbol. M is the required counting capacity of the estimationfunction.

First, the symbol l is initialized to zero. Upon arrivalof a packet, we increment l with probability 1

D(l) , whereD (l) = A (l + 1)−A (l). It is easy to verify that the expectedestimation value of l grows by one with each packet. Thus,the counter estimation is unbiased.

For example, for the estimation function A(l) = 2l, if at acertain point in time a symbol l = 3 is used, its estimationvalue is A(3) = 8. If another packet arrives at the flow, weincrement the symbol with probability 1

D(3) = 1A(4)−A(3) = 1

8 .The estimation value is expected to change from 8 to 16 aftereight packet arrivals.

B. Our Estimation Function

We propose the following estimation function

Aε(l) =(1 + 2ε2)l − 1

2ε2(1 + ε2), (1)

where ε is a parameter of the algorithm. The benefits of thisestimation function are discussed later on.

C. Upscale

Upscale is a way to dynamically adjust the counter scaleto the actual workload [21]. It is useful in case the maximumcounter is not known in advance. We begin with a small εthat gives a counting capacity of Aε(L− 1) and dynamicallyincrease it when necessary. That is, when a symbol approachesL − 1, we increase ε to ε′ > ε. Then, we update all symbolsto maintain unbiased estimation under the new scale.

Define l′ to be the largest integer such that Aε′(l′) ≤ Aε(l).For our estimation function, this value is

l′ =

⌊log1+2ε′2

(1 +

2ε′2Aε(l)

1 + ε′2

)⌋(2)

The correct estimation for symbol l lies between Aε′(l′) andAε′(l

′ + 1). We update to l′ + 1 with probability proportionalto the difference between Aε(l) and Aε′(l′):

Aε(l)−Aε′(l′)Aε′(l′ + 1)−Aε′(l′)

and to l′ otherwise. Algorithm 1 describes the symbol upscaleprocedure.

Page 3: Independent Counter Estimation Buckets

Algorithm 1 Symbol Upscale

1: procedure SYMBOLUPSCALE(l,ε,ε′)2: l′ ←

⌊log1+2ε′2

(1 + 2ε′2Aε(l)

1+ε′2

)⌋3: r ← rand(0, 1)

4: if r < Aε(l)−Aε′ (l′)

Aε′ (l′+1)−Aε′ (l′)

then5: l← l′ + 16: else7: l← l′

8: end if9: end procedure

D. Analysis

1) Performance Metrics: There are several metrics for theaccuracy of an estimation function. Throughout this paperwe discuss the quality of estimation mainly in terms ofthe root mean squared relative error (RMSRE), or relativeerror in short. Denote n̂ the random variable representing theestimation value of a flow after n packets have arrived. Themean square relative error (MSRE) of a flow of size n is:

MSRE [n] = E

[(n̂− nn

)2]

and the root mean square relative error is

RMSRE [n] =

√√√√E

[(n̂− nn

)2]

We want the maximum relative error,

εmax = maxn≤M

RMSRE [n] ,

to be as small as possible.When counting multiple flows, we can also measure the

overall relative error. Let ni be the true value of counter i.The overall relative error is the root mean square relative errorover all N counters,

εoverall =

√1

N

∑i

MSRE [ni]

Another metric for the accuracy of an estimation function isthe hitting time. The hitting time is defined to be the randomvariable T (l) that represents the amount of traffic required fora certain counter to be estimated as A (l). The expected hittingtime for symbol l is simply A (l) [22] and the Coefficient ofVariation (CV) of the hitting time is

CV [T (l)] =σ[T (l)]

E[T (l)].

In Theorem 1 we will prove that our estimation function isoptimal in terms of the maximum CV of the hitting time,

maxl<L

CV [T (l)] .

2) Optimality: An estimation function is considered opti-mal if it minimizes the maximum CV of the hitting time, givenM , the desired counting capacity. To show that our estimationfunction is optimal, we use a theorem, proved in [22], thatstates that a function that satisfies the following recursiveformula is optimal with a maximum CV of δ.

∀l, A(l + 1)−A(l) =1 + 2δ2A(l)

1− δ2(3)

A (0) =0 (4)

Furthermore, this function is unique.

Theorem 1. Aε(l) = (1+2ε2)l−12ε2 (1 + ε2) is an optimal

estimation function.

Proof. It is clear to see that Aε(0) = 0, thus condition (4)holds.

Aε(l + 1) =

((1 + 2ε2)l+1 − 1

)(1 + ε2)

2ε2

=

((1 + 2ε2)(1 + 2ε2)l − (1 + 2ε2) + 2ε2

)(1 + ε2)

2ε2

= (1 + 2ε2)Aε (l) + (1 + ε2)(5)

Therefore,

Aε(l + 1)−Aε(l) = 2ε2Aε (l) + (1 + ε2) (6)

Choose

δ =

√ε2

1 + ε2

to get

ε2 =δ2

1− δ2.

Putting this into (6) we get

Aε(l+1)−Aε(l) = 2δ2Aε (l)

1− δ2+1+

δ2

1− δ2=

1 + 2δ2Aε(l)

1− δ2.

Hence condition (3) holds. Our estimation function satisfiesconditions (3) and (4) and is therefore optimal.

Thus, our estimation function is identical to the one given inCEDAR, which was previously known only in recursive form.

This function was previously analyzed only with respectto its hitting time. With the newly discovered closed formrepresentation, we are able to extend the analysis in order toquantify its relative error.

3) Relative Error:

Theorem 2. The optimal estimation function (Aε) gives arelative error of

RMSRE [n] = ε, ∀n

Proof. To prove this theorem, we use a technique similar tothe one used in [23].Let Ql(n) be the probability to have a symbol l given that

Page 4: Independent Counter Estimation Buckets

exactly n packets have arrived at the flow. As mentionedbefore, the estimator is unbiased, thus∑

l

Ql(n)Aε(l) = n. (7)

In order to calculate ε we should first find the variance of theestimation value. We already know its mean, so let us find

E[n̂2]

=∑l

Ql(n)A2ε(l).

We first compute E[

ˆ(n+ 1)2− ˆ(n)

2]. Recall that if the

symbol is l, when a packet arrives the symbol is incrementedwith probability 1

Dε(l)= 1

Aε(l+1)−Aε(l) or remains unchangedwith probability 1− 1

Dε(l). Therefore,

E[

ˆ(n+ 1)2− ˆ(n)

2]=∑l

(Aε(l + 1)2 −Aε(l)2

)· 1

Dε(l)Ql(n)

=∑l

(Aε(l) +Aε(l + 1))Ql(n)

We can use (5) to substitute Aε(l + 1) with

(1 + 2ε2)Aε (l) + (1 + ε2)

and get

E[

ˆ(n+ 1)2− ˆ(n)

2]=∑l

(1 + ε2)(2Aε(l) + 1)Ql(n).

This can be separated to a constant multiplied by the unbiasedmean, and another constant times the sum of a probabilityvector. We get

E[

ˆ(n+ 1)2− ˆ(n)

2]= (1 + ε2)

(2∑l

Aε(l)Ql(n) +∑l

Ql(n)

)= (1 + ε2)(2n+ 1)

Now we can calculate E[n̂2]

E[n̂2]

=

n−1∑i=0

(E[ ˆ(i+ 1)

2]− E[(̂i)

2])

+ E[0̂2]

=

n−1∑i=0

(1 + ε2)(2i+ 1) = (1 + ε2)n (2n− 1 + 1)

2

= (1 + ε2)n2

Hence, the variance is

V [n̂] = E[n̂2]− E [n̂]

2= ε2n2

and the relative error is

RMSRE [n] =

√V [n̂]

n2= ε

Note that the relative error is independent of n, and thereforeεmax = ε.

For comparison, DISCO’s [20] estimation function,

DISCO(l) =(1 + 2ε2)l − 1

2ε2,

also achieves a relative error bounded by ε. Thus, DISCO guar-antees the same accuracy as the optimal estimation functionbut its counting capacity is 1+ε2 times smaller. In most casesthis factor is negligible.

4) Upscale Error: We now show how to use linear pro-gramming to prove for some ε and ε′, that no upscale operationfrom ε to ε′ increases the relative error to more than ε′.

Consider the change in variance when an upscale from εto ε′ occurs. Let Ql denote the probability to have a symbol lbefore upscale. Recall that the probability to use l′ as definedin Algorithm 1 is

Aε′(l′ + 1)−Aε(l)Dε′(l′)

and the probability to use l′ + 1 is

Aε(l)−Aε′(l′)Dε′(l′)

.

The expected estimation value remains unchanged after up-scale [22]. Therefore, the change in variance is:

∆V =∑l

QlAε′(l

′ + 1)−Aε(l)Dε′(l′)

·(A2ε′(l′)−A2

ε(l))

+QlAε(l)−Aε′(l′)

Dε′(l′)·(A2ε′(l′ + 1)−A2

ε(l))

=∑l

Ql(Aε′(l

′ + 1)−Aε(l)) (Aε(l)−Aε′(l′))Dε′(l′)

·

· (−Aε′(l′)−Aε(l) +Aε′(l′ + 1) +Aε(l))

=∑l

Ql (Aε′(l′ + 1)−Aε(l)) (Aε(l)−Aε′(l′))

If we find α ≥ 0 and β ≥ 0 such that ∀0 ≤ l < L,

(Aε′(l′ + 1)−Aε(l)) (Aε(l)−Aε′(l′)) ≤ αA2

ε(l) + βAε(l)(8)

we may conclude that the MSRE is

MSRE[n] ≤ n2ε2 + ∆Vn2

≤n2ε2 +

∑lQl

(αA2

ε(l) + βAε(l))

n2

Recall that∑l

QlA2ε(l) = E

[A2ε(l)]

= V [n̂] + (E [n̂])2 ≤ n2(1 + ε2)

Page 5: Independent Counter Estimation Buckets

Notation DescriptionS number of symbols per bucket.N number of flows.M maximum possible number of packets.

B number of buckets(NS

). For simplicity assume that N is a multitude of S.

L number of different possible symbols - a power of two.Fij counter array of size B × S. i runs over the buckets and j runs over the flows in each bucket. Each flow is log2 L bits wide.εstep the difference between consecutive estimation errors.

{wi}B−1i=0 wi is a scale parameter for bucket i. Each parameter is log2E bits wide.

εw estimation error for bucket with scale w (εstep · w).E number of different possible estimation scales - a power of two.

TABLE I: Notations

Therefore, if 8 holds for every l,

MSRE[n] ≤n2ε2 + αn2

(1 + ε2

)+ βn

n2

= ε2 + α(1 + ε2

)+β

n≤ ε2 + α

(1 + ε2

)+ β

Define the following two variable linear problem:

Minimize ε2 + α(1 + ε2

)+ β

s.t. the constraints (8) hold for every 0 ≤ l < L.We solved this simple LP for a wide range of parameters

and found that the objective is minimized to ε′2 in all of thesecases. We conjecture that the relative error is always boundedby ε′ after an upscale operation from ε to ε′.

III. ICE-BUCKETS

A. Overview

We now describe ICE-Buckets, an algorithm that uses theoptimal estimation function with a different scale for eachbucket. ICE-Buckets uses a base ε that is called εstep. Symbolsare separated into buckets and each bucket maintains a scaleparameter wi. To estimate counters in a bucket with scale w,we use the estimation function Aεstep·w. We pay specialattention to the additional memory that is allocated for eachbucket and make sure that this overhead is small. Figure 2demonstrates this basic architecture.

B. Algorithm

Before we start, some notations are given in Table I.Initially, all symbols and bucket scales are set to zero.Since each bucket’s scale is different, we first associate each

flow-id with a bucket. To estimate the value of flow f , we useAεwi (Fij) where i =

⌊fS

⌋and j = (f mod S).

The estimation values for bucket i are calculated with anoptimal estimation function of scale εwi . An optimal functionwith ε = 0 is defined as the identity function, i.e.

∀l, A0 (l) = l.

Fig. 2: An ICE-Buckets data structure with four flows perbucket. Bucket i has a scale parameter wi, which is used by theestimation function Aεstep·wi to decode the symbols (Fi,j). Inthis example, to estimate counter 2 in bucket 1, Aεstep·w1

(F1,2)is computed.

The increment process is straightforward. When a packetarrives, first we find the associated bucket i and the index inthe bucket, j. Then, we increment Fij with probability

1

Dεwi(Fij)

.

C. Analysis

In this section, we analyze ICE-Buckets and bound both itsmaximum relative error and overall relative error.

1) Maximum Relative Error: Define the following function,

M(ε) = Aε(L− 1) =(1 + 2ε2)L−1 − 1

2ε2(1 + ε2),

which denotes the maximum representable value with error ε.Define ε(M) to be the smallest ε we need for an optimal

estimation function with capacity at least M . For M > L −1, ε(M) is the inverse of M(ε). For M ≤ L − 1, ε(M) iszero. M(ε) is increasing with ε and therefore ε(M) is alsoincreasing.

We now show that ICE-Buckets’ relative error is not largerthan ε(M).

Theorem 3. The maximum relative error of ICE-Buckets canbe bounded by ε(M) for any distribution of the counters.

Page 6: Independent Counter Estimation Buckets

Proof. Construct an ICE-Buckets structure with

εstep =ε(M)

E − 1.

Let Mi be the value of the biggest counter in bucket i. If wechoose the scale of bucket i to be at least wi =

⌈ε(Mi)εstep

⌉we

obtain an error parameter of no less than⌈ε (Mi)

εstep

⌉εstep ≥ ε (Mi) ,

which gives us a capacity of at least Mi. In other words, weneed to round up each bucket’s error to the nearest productof εstep.ε(Mi) ≤ ε(M), because the maximum counter in each

bucket is smaller or equal to the total number of packets andε is increasing with M . Thus the maximum relative error overthe entire counter scale is bounded by ε(M).

We conclude that ICE-Buckets has the same maximumrelative error as the optimal estimation function.

2) Overall Relative Error: ICE-Buckets also improves theoverall relative error. In ICE-Buckets this error is

εoverall =

√1

N

∑f

MSRE [nf ] (9)

In Theorem 4, we show an upper bound for εoverall whenICE-Buckets is configured optimally. In order to prove it, weneed to show that ε2(M) is concave for M ≥ L − 1 as canbe seen in Figure 3. M(ε) is convex and increasing with ε2.The inverse of this function (which is defined on M ≥ L− 1)is therefore concave and increasing. Similarly, ε(M) is alsoincreasing and concave on M ≥ L− 1.

Fig. 3: ε2 as a function of M for L = 4096. The dashed linecomes to show that the function is concave for M ≥ L− 1.

Theorem 4. For any counter distribution, ICE-Buckets can beconfigured to have an overall relative error no greater than

ε

(M

B+ L− 1

)+ε (M)

E − 1

Proof. Use the same construction as in the proof of The-orem 3, i.e. εstep = ε(M)

E−1 and wi =⌈ε(Mi)εstep

⌉. With this

construction, the overall relative error is no greater than

εoverall ≤√

1

N

∑f

MSRE [nf ] ≤

√√√√∑B−1i=0

⌈ε(Mi)εstep

⌉2ε2step

B

Instead of rounding ε (Mi) to be a product of εstep We cansimply add εstep and keep a bound that is no smaller.

εoverall ≤

√√√√√B−1∑i=0

⌈ε(Mi)εstep

⌉2ε2step

B≤

√√√√√B−1∑i=0

(ε (Mi) + εstep)2

B

According to the concaveness of ε and ε2. (ε(x) + εstep)2

is concave on x > L− 1. Proof :

α (ε(x) + εstep)2

+ (1− α) (ε(y) + εstep)2

=αε2(x) + (1− α) ε2(y)

+ 2εstep (αε(x) + (1− α) ε(y)) + ε2step

≤ε2 (αx+ (1− α) y) + 2εstepε (αx+ (1− α) y) + ε2step

≤ (ε (αx+ (1− α) y) + εstep)2

To use concaveness we must make sure that all of the valuesare greater or equal to L − 1. We do so by adding L − 1 toeach symbol. Since ε is an increasing function, the result isno smaller.

εoverall ≤

√√√√√B−1∑i=0

(ε (Mi + L− 1) + εstep)2

B

We can now apply Jensen’s inequality.

εoverall ≤

√√√√√B−1∑i=0

(ε (Mi + L− 1) + εstep)2

B

√√√√√√√√εB−1∑i=0

(Mi + L− 1)

B

+ εstep

2

= ε

(M

B+ L− 1

)+ εstep

Setting εstep = ε(M)E−1 , we get an overall relative error of no

more than

εoverall ≤ ε(M

B+ L− 1

)+ε (M)

E − 1.

With the correct choice of parameters this guaranteed over-all relative error is far better than the one we can achieve withCEDAR - ε(M).

This bound demonstrates the effect E and S have on theerror. We want the bucket size S = N

B to be as small aspossible to restrict the ’collateral damage’ to nearby counters,as a counter’s size only affects the errors of counters sharingthe same bucket. As for E, we want it to be as big as possibleto increase the error granularity. However, decreasing S orincreasing E increases the memory overhead.

Note that while CEDAR guarantees optimal estimation interms of the maximum CV of the hitting time, there is nosuch claim in terms of the overall relative error. This enablesICE-Buckets to significantly improve εoverall.

Page 7: Independent Counter Estimation Buckets

D. Dynamic ConfigurationIn Theorem 4 we have shown that there is an ICE-Buckets

configuration for which the error is low. We now show how todynamically configure ICE-Buckets to fit any workload. Thisprocess is composed of local and global upscale operations.

1) Local Upscale: The configuration of the data structure,{wi}B−1i=0 , is dynamically adjusted to the biggest estimationvalue in each bucket. Whenever a symbol Fij approaches L,we increment wi and upscale bucket i to use the parameterεwi+1. This is done by upscaling all of the flows in bucket iusing the symbol-upscale procedure described in Algorithm 1.A pseudo code of the local upscale procedure can be found inAlgorithm 2. We note that since the number of counters perbucket S is typically small, local upscale can be efficientlyimplemented in hardware.

Algorithm 2 Local Upscale1: procedure UPSCALEBUCKET(i)2: for j = 0 to S − 1 do3: SYMBOLUPSCALE(Fij , εwi , εwi+1)4: end for5: wi ← wi + 1

6: end procedure

2) Global Upscale: When a counter in a bucket with themaximum scale index (E− 1) approaches its maximum value(L − 1), we initiate a global upscale procedure in order toprevent overflow. In this procedure, we double the size of εstep.Buckets with odd pointer values perform a local upscale. Then,every bucket i updates its scale index to wi/2. Pseudo codeis given in Algorithm 3.

Algorithm 3 Global Upscale1: procedure GLOBALUPSCALE2: for u = 0 to B − 1 do3: if wu mod 2 = 1 then4: UPSCALEBUCKET(u)5: end if6: wu ← wu

27: end for8: εstep ← 2εstep

9: end procedure

Table II demonstrates this process. In this case we use eightdifferent possible scales (E = 8), and εstep = 0.1% beforethe global upscale. Buckets with odd scale parameters performlocal upscale and their scale is incremented accordingly, whilebuckets with even parameters only increment their scale.Notice, that at the moment of upscale, εmax is increasedonly by εstep. In this case, no bucket after upscale has scalelarger than 4. In the general case, the E

2 − 1 highest errorsare not utilized immediately after global upscale. Assumingconstant counter growth rate, the time between consecutiveglobal upscale operations increases exponentially.

Global upscale may be difficult to implement in hardware.[22] describes a method to upscale the entire counter arraywhile continuously counting new packet arrivals and thismethod also applies to ICE-Buckets’ global upscale. In ourcase, global upscale can be completely avoided by setting themaximal error to ε(M).

Previous Previous New New RequiresEstimator ε Estimator ε Upscale?

0 0.0% 0 0.0% No1 0.1% 1 0.2% Yes2 0.2% 1 0.2% No3 0.3% 2 0.4% Yes4 0.4% 2 0.4% No5 0.5% 3 0.6% Yes6 0.6% 3 0.6% No7 0.7% 4 0.8% Yes

— — 5 1.0% —— — 6 1.2% —— — 7 1.4% —

TABLE II: Global upscale example (E = 8); εstep is upscaledfrom 0.1% to 0.2%.

IV. SIMULATION RESULTS

To evaluate the performance of ICE-Buckets we counted 5-tuples from real Internet packet traces. The first trace (NZ09)consists of twenty four hours of Internet traffic collected froman unnamed New Zealand ISP on Jan. 6th 2009 [24]. It is arelatively large trace, containing almost a billion packets andover 32 Million flows. We also used data from two Equinixdata-centers in the USA, which are connected to backbonelinks of Tier1 ISPs. Both use per flow data for load balancing.One is connected to a link between Chicago, IL and Seattle,WA. We used two traces from this dataset, CHI08 from2008 [25], and CHI14 from 2014 [26]. CHI08 was previouslyused to evaluate CEDAR in [22]. Trace (SJ13) was taken in2013 from an Internet data collection monitor that is connectedto a link between San Jose and Los Angeles, CA [27].

We compare ICE-Buckets to two state of the art counterestimation algorithms - DISCO and CEDAR. In order tomeasure different memory constraints, we tested each algo-rithm with both 8.5 and 12.5 bits on average per counter.We use 8 and 12 bits (correspondingly) for the symbols. Theremainder is used for the scale indices in ICE-Buckets and forstoring the estimation values in CEDAR. Previous works wereevaluated with similar symbol lengths. DISCO does not havean upscaling scheme, therefore we configured it according tothe maximal expected number of packets (M ) that is differentfor each trace, as specified in Table III. For ICE-Buckets,bucket size and scale indices lengths are chosen to minimizethe upper bound proved in Theorem 4, also according to M .Per-trace statistics and configurations are given in Table III.

Trace Flows (N) Packets M S E8b / 12b 8b / 12b

CHI08 1,420,318 26,750,712 26,750,712 10 / 14 32 / 128CHI14 1,213,614 34,721,808 34,721,808 10 / 12 32 / 64SJ13 3,071,187 20,803,060 20,803,060 12 / 14 64 / 128NZ09 32,737,760 891,023,765 232 − 1 12 / 12 64 / 64

TABLE III: Documentation of trace characteristics and ICE-Buckets configurations used in experiments for 8/12 bit sym-bols.

Table IV presents the overall relative error of ICE-Buckets

Page 8: Independent Counter Estimation Buckets

Trace CHI08 CHI14 SJ13 NZ09Bits-Per-Symbol 8 12 8 12 8 12 8 12

CEDAR (upper bound) 16.93% 3.70% 17.10% 3.75% 16.76% 3.65% 19.61% 4.51%ICE-Buckets (upper bound) 5.02% 0.42% 5.71% 0.50% 3.49% 0.26% 8.22% 0.94%

DISCO (actual) 10.34% 2.24% 10.16% 2.24% 7.82% 1.71% 14.24% 3.21%CEDAR (actual) 12.19% 2.17% 12.40% 2.38% 13.05% 2.60% 15.01% 3.09%

ICE-Buckets (no global upscale) 1.70% 0.10% 2.14% 0.14% 1.00% 0.03% 1.78% 0.14%ICE-Buckets (actual) 1.50% 0.06% 1.96% 0.16% 1.01% 0.03% 1.81% 0.13%

TABLE IV: overall relative error bounds and actual overall relative error of different algorithms on various traces

(a) CHI08 with 12-bit symbols (b) NZ09 with 12-bit symbols (c) NZ09 with 8-bit symbols

Fig. 4: comparison of the relative error per value on different traces using 8 and 12-bit counters

and the alternatives for the tested traces. We also present theupper bounds on the error of ICE-Buckets and CEDAR. Ascan be observed, for real datasets, ICE-Buckets’ error is muchlower than this bound, since the majority of flows are small.For CHI08, ICE-Buckets achieves an overall relative error thatis over 57 times smaller than that of CEDAR. Notice thatfor all traces ICE-Buckets’ overall relative error with 8-bitsymbols is lower than that of the alternatives, even with 12-bit symbols. Note that in our case, DISCO is 100% accuratewhen the value is 1, and is slightly less accurate than CEDARfor all other values. All in all, this results in an overall relativeerror similar to that of CEDAR.

We also experimented with a version of ICE-Buckets thatdoes not use global upscale, which should be simpler toimplement in hardware. To do so, we pre-configured εstep toensure local upscales are sufficient to count 232-1 packets inany bucket. Note that the error in this case is very similar tothat of ICE-Buckets with upscale. We therefore recommend toimplement ICE-Buckets without global upscale when the totalnumber of packets can be bounded in advance.

To explain the cause of ICE-Buckets’ substantial errorreduction, we show in Figure 4 the relative error as a functionof the real counter value. The relative error was computedfrom 256 runs of each algorithm on CHI08 and one runon NZ09. Note that for most counter values the relativeerror of ICE-Buckets is lowest, followed by CEDAR, andthen DISCO. ICE-Buckets achieves an error close to zerofor counters smaller than L because an accurate counter oflog2L bits suffices to represent those values. Unfortunately,this error cannot always be zero, as some of these countersshare buckets with larger counters. As the counter scale grows,the estimation error increases, until eventually, the largestcounter is estimated with εmax. In contrast, CEDAR estimates

Fig. 5: Comparison between the overall relative error ofdifferent algorithms as a function of the memory restrictionsimulated on CHI08. Memory overheads are taken into ac-count.

all of the counters with relative error εmax.Figure 5 illustrates the accuracy of different algorithms

for CHI08 under varying memory constraints. Overheads ofall methods are taken into account and the maximal symbolsize is used for each method. ICE-Buckets uses differentconfigurations with overheads that range between 1

4 and 34

bits per counter. The figure shows that DISCO and CEDARprovide similar space-accuracy trade-offs as mentioned in ouranalysis. Under all of the simulated memory constraints, ICE-Buckets is more accurate than both CEDAR and DISCO, andthe difference between ICE-Buckets and the other algorithmsgrows with the memory. We believe that the reason for this isthat as the number of bits per symbol (L) grows, more counterscan be estimated with zero error.

Figure 6 describes the overall relative error of CEDAR and

Page 9: Independent Counter Estimation Buckets

Fig. 6: average relative error through trace progress with 8-bitsymbols as simulated on NZ09

ICE-Buckets throughout the NZ09 trace’s progress. To adapt tothe growing counter scale, both ICE-Buckets and CEDAR usean upscale mechanism that gradually increases the error. Notethat the overall relative error of ICE-Buckets is almost constantthroughout an entire day of real Internet traffic. In addition,CEDAR’s multiple global upscales are clearly visible in thefigure. In contrast, ICE-Buckets’ upscales are mostly local andcause a smoother increase in the relative error.

V. CONCLUSION

In this work we introduced ICE-Buckets, a novel counterestimation algorithm that minimizes the relative error. ICE-Buckets uses the optimal estimation function with a scale thatis optimized independently for each bucket.

We first described an explicit representation of this function,which was previously known only in recursive form. Weextended its analysis and showed a method to measure theeffect of upscale operations on the relative error. This functionis used in ICE-Buckets to minimize the error in each bucket.

ICE-Buckets is dynamically configured to adapt to thegrowing counter scale. For practical deployments it can beimplemented without global operations with similar accuracy.

We proved an upper bound to ICE-Buckets’ overall relativeerror, which is significantly smaller than that of previous esti-mation algorithms. In particular, we demonstrated a reductionof up to 14 times in this upper bound when applied to trafficcharacteristics of real workloads. ICE-Buckets also achievesthe same maximum relative error as the optimal function.

Additionally, we extensively evaluated ICE-Buckets withfour Internet packet traces and demonstrated a reduction ofup to 57 times in overall error. ICE-Buckets achieves animprovement in accuracy even when it is given considerablyless space than the alternatives. Finally, we have shown thatICE-Buckets is significantly more accurate than the leadingalternatives for a wide range of memory constraints.

Acknowledgements: We would like to thank Roy Fried-man, Isaac Keslassy, Olivier Marin and Erez Tsidon for theirinsights and helpful advice. This work was partially funded bythe Israeli Ministry of Science and Technology grant 3-10886and the Technion HPI center.

REFERENCES

[1] D. Shah, S. Iyer, B. Prabhakar, and N. McKeown, “Maintainingstatistics counters in router line cards.” IEEE Micro 2002, vol. 22,no. 1, pp. 76–81. [Online]. Available:

[2] S. Ramabhadran and G. Varghese, “Efficient implementationof a statistics counter architecture,” ACM SIGMETRICS2003, vol. 31, no. 1, pp. 261–271. [Online]. Available:http://dx.doi.org/10.1145/885651.781060

[3] A. Kabbani, M. Alizadeh, M. Yasuda, R. Pan, and B. Prabhakar,“Af-qcn: Approximate fairness with quantized congestion notificationfor multi-tenanted data centers,” in Proc. of the 18th IEEE Symposiumon High Performance Interconnects, ser. HOTI ’10, pp. 58–65. [Online].Available: http://dx.doi.org/10.1109/HOTI.2010.26

[4] B. Mukherjee, L. Heberlein, and K. Levitt, “Network intrusion detec-tion,” Network, IEEE, vol. 8, no. 3, pp. 26–41, 1994.

[5] P. Garcia-Teodoro, J. E. Daz-Verdejo, G. Maci-Fernndez, and E. Vzquez,“Anomaly-based network intrusion detection: Techniques, systems andchallenges,” Computers and Security, pp. 18–28, 2009.

[6] G. Dittmann and A. Herkersdorf, “Network processor load balancing forhigh-speed links,” in Proc. of the 2002 Int. Symposium on PerformanceEvaluation of Computer and Telecommunication Systems.

[7] G. Einziger and R. Friedman, “Tinylfu: A highly efficient cache admis-sion policy,” in Euromicro PDP 2014, pp. 146–153.

[8] C. Estan and G. Varghese, “New directions in traffic measurement andaccounting,” ACM SIGCOMM 2002.

[9] G. Cormode and S. Muthukrishnan, “An improved data stream summary:The count-min sketch and its applications,” J. Algorithms, vol. 55, pp.29–38, 2004.

[10] A. Ciuffoletti, “Architecture of a network monitoring element,” Tech.Rep., 2006.

[11] V. Paxson, J. Mahdavi, A. Adams, and M. Mathis, “An architecture forlarge scale internet measurement,” Communications Magazine, IEEE,vol. 36, no. 8, pp. 48–54, 1998.

[12] C. Estan, K. Keys, D. Moore, and G. Varghese, “Building a betternetflow,” in ACM SIGCOMM 2004, pp. 245–256. [Online]. Available:http://doi.acm.org/10.1145/1015467.1015495

[13] Y. Lee and Y. Lee, “Toward scalable internet traffic measurement andanalysis with hadoop,” ACM SIGCOMM 2012, vol. 43, no. 1, pp. 5–13,Jan. [Online]. Available: http://doi.acm.org/10.1145/2427036.2427038

[14] H. Wang, H. Zhao, B. Lin, and J. Xu, “Dram-based statistics counterarray architecture with performance guarantee,” IEEE/ACM Transactionson Networking, vol. 20, no. 4, pp. 1040–1053, Aug 2012.

[15] Y. Lu, A. Montanari, B. Prabhakar, S. Dharmapurikar, and A. Kabbani,“Counter braids: a novel counter architecture for per-flow measurement,”in ACM SIGMETRICS 2008, pp. 121–132.

[16] N. Hua, B. Lin, J. J. Xu, and H. C. Zhao, “Brick: A novel exact activestatistics counter architecture,” ser. ACM/IEEE ANCS ’08, pp. 89–98.

[17] B.-Y. Choi, J. Park, and Z.-L. Zhang, “Adaptive random sampling forload change detection,” ACM SIGMETRICS 2002, pp. 272–273.

[18] R. Morris, “Counting large numbers of events in small registers.”Commun. ACM, vol. 21, no. 10, pp. 840–842, 1978. [Online]. Available:

[19] R. Stanojevic, “Small active counters,” in IEEE INFOCOM 2007, pp.2153–2161.

[20] C. Hu, B. Liu, H. Zhao, K. Chen, Y. Chen, C. Wu, and Y. Cheng, “Disco:Memory efficient and accurate flow statistics for network measurement,”in IEEE ICDCS 2010, pp. 665–674.

[21] C. Hu and B. Liu, “Self-tuning the parameter of adaptive non-linearsampling method for flow statistics,” in International Conf. on Compu-tational Science and Engineering, CSE 2009.

[22] E. Tsidon, I. Hanniel, and I. Keslassy, “Estimators also need sharedvalues to grow together,” in IEEE INFOCOM 2012, pp. 1889–1897.

[23] C. Hu, S. Wang, B. L. Tian, Jia, Y. Cheng, and Y. Chen, “Accurate andefficient traffic monitoring using adaptive non-linear sampling method.”in IEEE INFOCOM 2008, pp. 26–30.

[24] W. N. R. Group, “Waikato Internet Traffic Storage 2009-01-06 00:00-23:30 UTC, ISPDSL-I,” http://wand.net.nz/wits/ispdsl/1/.

[25] P. Hick, “CAIDA Anonymized 2008 Internet Trace, equinix-chicago2008-03-19 19:00-20:00 UTC, Direction A.”

[26] ——, “CAIDA Anonymized 2014 Internet Trace, equinix-chicago 2014-03-20 13:55 UTC, Direction B.”

[27] ——, “CAIDA Anonymized 2013 Internet Trace, equinix-sanjose 2013-1-17 13:55 UTC, Direction B.”