ON THE ROBUST ESTIMATION OF NETWORK- WIDE FAILURE … · 2016-11-08 · Third-generation (3G)...

25
ON THE ROBUST ESTIMATION OF NETWORK- WIDE FAILURE PROBABILITY FROM PASSIVE MEASUREMENTS Angelo Coluccia 1 , Peter Romirer-Maierhofer 2 , Fabio Ricciato 1,2 1 Univ. of Salento, Lecce, Italy 2 FTW, Vienna, Austria COST TMA Meeting, Zagreb, 26 th January 2011

Transcript of ON THE ROBUST ESTIMATION OF NETWORK- WIDE FAILURE … · 2016-11-08 · Third-generation (3G)...

ON THE ROBUST ESTIMATION OF NETWORK-WIDE FAILURE PROBABILITY FROM PASSIVE MEASUREMENTS

Angelo Coluccia1, Peter Romirer-Maierhofer2, Fabio Ricciato1,2

1Univ. of Salento, Lecce, Italy 2FTW, Vienna, Austria

COST TMA Meeting, Zagreb, 26th January 2011

  Third-generation (3G) cellular networks -  functionally complex, fast changing -  exposed to errors, failures and attacks

  Early detection of network problems -  is fundamental to network operation -  often based on passive traffic monitoring: packet capture, log data

  Key Performance Indicators (KPI) -  common tool to concisely quantify network performances -  for comparison/benchmarking between different network areas -  for detecting drifts and/or sudden changes

Motivations

- 2 -

  Global Percentages of “what goes wrong” -  most common class of KPI -  refer to events with binary output: success or failure -  Examples

-  % of lost packets (over transmitted ones) -  % of late packets with delay > Th (over received ones) -  % of unanswered TCP connection attempts (over total SYN openings) -  % of failed attach_requests (over total attach_requests) -  …

Motivations

- 3 -

Global Percentages often give noisy signals

Our goal: get a cleaner signal

System Modeling

REQUEST

packet, SYN, attach_request,…

USER SUCCESS

FAILURE lost, late, failed, unanswered …

  Notation -  in a generic timebin t user i (i=1...I) -  generates ni “requests” (e.g. packets) -  out of which mi “fail” (e.g. lost)

  Assumptions -  failures are independent and occur with

(unknown) probability ai -  ai’s are iid random variables with mean -  independency between traffic volume ni

and failure prob. ai

  Goal: estimate given a set of measurements {ni,mi}

System Modeling

dedicated resources

shared resources

  Goal: estimate given {ni,mi}   The Ideal Estimator

-  is unbiased -  has minimum variance -  is simple: fast to compute, easy to implement, easy to understand by

practitioners! -  is general: not bound to a specific (class of) traffic distribution (ni’s)

Estimators

variance

com

plex

ity

Global Percentage: (current practice]

EPWR empirical estimator [BWA’09]

Bayesian estimator: [PERFORM’10]

  Empirical Global Ratio -  aka Global Percentage

  The simplest to implement -  no need to associate events to users!

EGR

- 7 -

EGR=# of total failures# of total requests

=mii

∑nii∑

like this one

  Problem: -  few “big” users with many failures -  sporadically inflate the EGR, increase its variance -  problem when traffic volume (the ni’s) are heavy tailed !

  Empirical Mean Ratio -  Arithmetic mean of individual

ratios ri

  Sometimes better than EGR -  but requires per-user data

  Problem: -  “small” users with only one or few packets -  bring very inaccurate estimates of their ai’s -  weight the same of larger users with more reliable estimations -  problem when traffic volume (the ni’s) are heavy tailed !

EMR

- 8 -

EMR=1I

rii∑ =1I

mi

nii∑

like these ones

with ri =mi

ni

NB: ri = mi ni is the minimum variance unbiased estimator (MVUE) for ai

  Both EGR and EMR can be “corrected” by filtering away very big (for EGR) or very small (for EMR) users

-  discarding lots of data, especially for long-tailed ni’s   Can we do something more clever than discarding data ?

-  YES: differentially weighting data !

  Empirical Weighted Ratio

  Generalization of EGR, EMR -  wi constant wi=1/I EWR=EMR -  wi proportional to ni wi=ni/N EWR=EGR

EWR - definition

EWR= wimi

nii∑ = wirii∑ = wT r

with w i > 0, w ii∑ =1

  Problem: Find the optimal weights wi’s -  that minimize the variance of the estimator

  Methodology: -  compute variance of estimator as function of weights VAR(w) -  constrained optimization: minimize VAR(w) under constraint |w|=1 -  solve by Lagrangian multipliers

EWR - optimization

with w 1 =1

[BWA’09] A. Coluccia, F. Ricciato, Peter Romirer, On Robust Estimation of Network-wide Packet Loss in 3G Cellular Networks, IEEE BWA 2009, Honolulu, 30 November 2009

EWR= wT r

EWR - approximation

exact optimal solution

piece-wise linear approximation

θ

  optimal knee-point θ depends on mean and variance of the ai’s -  no dependency on the distribution of n generality -  depends on first two moments of p(a): remains unknown

  How to set the knee-point θ ??

  Algorithm: -  preliminary coarse estimation of mean and variance of ai’s -  (better after filtering very small ni’s

  heuristic setting (“educated guess”) -  final estimator performance are weakly sensitive to exact location of knee-

point, as far as “extreme” setting (very low, very high) is avoided -  simplest solution

EPWR - setting

- 12 -

  Empirical Piece-wise Linearly Weighted Ratio (EPWR) -  single parameter θ (to be set heuristically) -  very simple conceptually and computationally -  requires individual per-user counters

  Why then going for Bayesian estimators ? -  provides optimal reference for EPWR -  preferred in applications where accuracy is preferred over simplicity

Bayesian Estimators

variance

com

plex

ity

EGR

EPWR empirical estimator: [BWA’09 paper]

Bayesian estimator: [PERFORM’10 paper]

EMR

  Bayesian hierarchical model -  p(a) {ai} {mi}

  Approach: empirical parametric Bayes + conjugate prior -  chose a parametric distribution family for p(a) -  such that prior and posterior p(a), p(m|a) belong to the same

family (conjugate prior) -  and estimate its (hyper-)parameters

Bayesian approach

prior posterior joint

data

  Conjugate prior for Binomial Beta Distribution -  very versatile -  two (hyper-)parameters α and β

Bayesian approach

prior posterior joint

data

Beta function:

Binomial D.

Beta D. Beta D. Beta-Bin D.

  ML estimates of hyper-parameters:

  with p(m|α,β) = product of BetaBin(ni,α,β)

  numerically solution straightforward -  p(m|α,β) is unimodal and log-convex -  Well-known numerical procedures lead to accurate and fast

resolution (e.g. simplex) -  Example: resolution time with MATLAB for I=105 users: <10 sec

(for EPWS a few msec)

Bayesian approach

- 16 -

  ok we got but how to get ?

  MAP -  not good when small ni’s are present, as in real traffic

  Arithmetic mean of MMSE -  MMSE of ai coincides with conditional mean:

  Directly from the hyper-parameters

Bayesian Estimators

- 17 -

  Compare different estimators for different ni’s distr. -  changing family, varying parameter -  Comparing the normalized Variance of estimator

  Finding: -  EGR or EMR can perform bad in some cases -  EMR with data pre-filtering improves, but still bad in some cases -  EPWR always very close to Bayesian

Simulations

- 18 -

  Datasets extracted with METAWIN, on a real UMTS/HSPA network   DATA:INV

-  REQUEST := every SYNACK in DL | SUCCESS := unambiguous ACK in UL

  DATA:RTT -  REQUEST := unambiguous SYNACK/ACK pair | SUCCESS := RTT < 500 ms

Results from a real dataset

- 19 -

2. SYNACK

3. ACK

time

Client-side Semi-RTT

Results DATA:INV

- 20 -

Results DATA:RTT

- 21 -

  for these datasets…   Bayesian Estimators QMMSE and QHYP almost equivalent   EGR is a poor choice: high noise, bias   EPWR very close to QMMSE (but much simpler)   EMR not bad, only slightly worse than EPWR (but same complexity)

Summary of Results

- 22 -

  Independence between failure level and traffic volume {ai’s and ni’s}   is a major model simplification   a posteriori check via semi-simulation shows it can be tolerated

Independency check

real datapoint (ni,mi)

semi-simulated datapoint (ni,m’i) real ni extract ai from p(a) independently from ni extract mi from Bino(ni,ai)

model p(a) as Beta, estimate parameters from real data

  Mean failure estimation: a very general problem -  apply to any binary outcome event -  generated by a user population

  Default practice: Global Percentages -  ok if traffic distribution across users has low-variance -  problems with heavy-tailed distributions

  Bayesian estimator -  provides the optimal reference -  solvable in reasonable time also for large dataset (seconds)

  EPWR estimator -  extremely simple (computationally and conceptually) -  near-optimal performance -  one parameter to be set heuristically -  could become a standard KPI !

Conclusions

  Thanks for your attention !

  for follow-up write to: [email protected]

- 25 -