Robust subspace clustering - Search 2.5 million pages of ...
Distributed Clustering for Robust Aggregation in Large Networks
description
Transcript of Distributed Clustering for Robust Aggregation in Large Networks
![Page 1: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/1.jpg)
Distributed Clustering for Robust Aggregation in Large Networks
Ittay Eyal, Idit Keidar, Raphi Rom
Technion, Israel
![Page 2: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/2.jpg)
Aggregation in Sensor Networks – Applications
Temperature sensors thrown in the woods
Seismic sensors
Grid computing load
2
![Page 3: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/3.jpg)
Aggregation in Sensor Networks – Applications
• Large networks, light nodes, low bandwidth• Fault-prone sensors, network• Multi-dimensional (location X temperature)• Target is a function of all sensed data
Average temperature, max location, majority…
3
![Page 4: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/4.jpg)
What has been done?
![Page 5: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/5.jpg)
Tree Aggregation
Hierarchical solution
Fast - O(height of tree)
1 3 9 11
2 10
5
6
![Page 6: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/6.jpg)
Tree Aggregation
Hierarchical solution
Fast - O(height of tree)
Limited to static topology No failure robustness
1 3 9 11
2 1010
6
62
![Page 7: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/7.jpg)
Gossip
• D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003.
• S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004.
Gossip:
Each node maintains a synopsis
7
11
9
3
1
![Page 8: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/8.jpg)
Gossip
• D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003.
• S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004.
Gossip:
Each node maintains a synopsis
Occasionally, each node contacts a neighbor and they improve their synopses
8
11
9
3
1 7
5
5
7
![Page 9: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/9.jpg)
Gossip
• D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, 2003.
• S. Nath, P. B. Gibbons, S. Seshan, and Z. R. Anderson. Synopsis diffusion for robust aggregation in sensor networks. In SenSys, 2004.
Gossip:
Each node maintains a synopsis
Occasionally, each node contacts a neighbor and they improve their synopses
Indifferent to topology changes Crash robust
No data error robustness
Proven convergence
9
7
5
5
7
6
6
6
6
![Page 10: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/10.jpg)
A closer look at the problem
![Page 11: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/11.jpg)
The Implications of Irregular Data
4A single erroneous sample can radically offset the data
31 106
27o
The average (47o) doesn’t tell the whole story
25o 26o 25o 28o 98o 120o 27o
11
![Page 12: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/12.jpg)
Sources of Irregular Data
Sensor Malfunction
Short circuit in a seismic sensor
Sensing Error
An animal sitting on a temperature sensor
Interesting Info: DDoS: Irregular load on some machines in a grid
Software bugs: In grid computing, a machine reports negative CPU usage
Interesting Info: Fire outbreak: Extremely high temperature in a certain area of the woods
Interesting Info: intrusion: A truck driving by a seismic detector
12
![Page 13: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/13.jpg)
It Would Help to Know The Data Distribution
27o
The average is 47o
Bimodal distribution with peaks at 26.3o and 109o
25o 26o 25o 28o 98o 120o 27o
13
![Page 14: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/14.jpg)
Estimate a range of distributions [1,2] or data clustering according to values [3,4]
Fast aggregation [1,2] Tolerate crash failures, dynamic networks [1,2] High bandwidth [3,4], multi-epoch [2,3,4] or One dimensional data only [1,2] No data error robustness [1,2]
Existing Distribution Estimation Solutions
1. M. Haridasan and R. van Renesse. Gossip-based distribution estimation in peer-to-peer networks. In InternationalWorkshop on Peer-to-Peer Systems (IPTPS 08), February 2008.
2. J. Sacha, J. Napper, C. Stratan, and G. Pierre. Reliable distribution estimation in decentralised environments. Submitted for Publication, 2009.
3. W. Kowalczyk and N. A. Vlassis. Newscast em. In Neural Information Processing Systems, 2004.4. N. A. Vlassis, Y. Sfakianakis, and W. Kowalczyk. Gossip-based greedy gaussian mixture learning. In
Panhellenic Conference on Informatics, 2005.
14
![Page 15: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/15.jpg)
Our Solution
Samples deviating from the distribution of the bulk of the data
Outliers:
15
Estimate a range of distributions by data clustering according to values
Fast aggregation Tolerate crash failures, dynamic networks Low bandwidth, single epoch Multi-dimensional data Data error robustness by outlier detection
![Page 16: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/16.jpg)
Outlier Detection Challenge
27o 25o 26o 25o 28o 98o 120o 27o
16
![Page 17: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/17.jpg)
Outlier Detection Challenge
A double bind:
27o 25o 26o 25o 28o 98o 120o 27o
Regular data distribution
~26o
Outliers{98o, 120o}
No one in the system has enough information
17
![Page 18: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/18.jpg)
Aggregating Data Into Clusters
• Each cluster has its own mean and mass • A bounded number (k) of clusters is maintained
Herek = 2
Original samples
1 3 5 10
1a b c d
1
Clustering a and b
1 3
a b
Clustering all
1 3 5 10
1
abc
d
3Clustering a, b and c
5 10
1
2
abc
2
1
18
![Page 19: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/19.jpg)
But What Does The Mean Mean?
New Sample
Mean A Mean B
The variance must be taken into account
Gaussian A
Gaussian B
19
![Page 20: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/20.jpg)
Gossip Aggregation of Gaussian Clusters
Distribution is described as k clusters Each cluster is described by: • Mass• Mean • Covariance matrix (variance for 1-d data)
20
![Page 21: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/21.jpg)
Gossip Aggregation of Gaussian Clusters
a
b
Merge
21
Keep half, Send half
![Page 22: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/22.jpg)
Distributed Clustering for Robust Aggregation 22
• Aggregate a mixture of Gaussian clusters• Merge when necessary (exceeding k)
Our solution:
Recognize outliers
By the time we need to merge, we can estimate the distribution
![Page 23: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/23.jpg)
Simulation Results 23
1. Data error robustness2. Crash robustness3. Elaborate multidimensional data
Simulation Results:
![Page 24: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/24.jpg)
It Works Where It Matters
Not Interesting
Easy
24
![Page 25: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/25.jpg)
It Works Where It Matters
Error
0 5 10 15 20 250
0.5
1E
rror
No outlier d
etection
With outlier detection
25
![Page 26: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/26.jpg)
Simulation Results 26
1. Data error robustness2. Crash robustness3. Elaborate multidimensional data
Simulation Results:
![Page 27: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/27.jpg)
Err
or
Round
Protocol is Crash Robust
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Round
Ave
rage
Err
or
No outlier detection, 5% crash probability
No outlier detection, no crashes
Outlier detection
27
![Page 28: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/28.jpg)
Simulation Results 28
1. Data error robustness2. Crash robustness3. Elaborate multidimensional data
Simulation Results:
![Page 29: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/29.jpg)
Describe Elaborate Data
FireNo Fire
Distance
Tem
pera
ture
29
![Page 30: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/30.jpg)
The algorithm converges• Eventually all nodes have the same clusters forever• Note: this holds even without atomic actions
• The invariant is preserved by both send and receive
Theoretical Results (In Progress) 30
… to the “right” output• If outliers are “far enough” from other samples, then they
are never mixed into non-outlier clusters• They are discovered• They do not bias the good samples’ aggregate
(where it matters)
![Page 31: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/31.jpg)
Summary
Robust Aggregation requires outlier detection
27o 27o 27o 27o 27o98o 120o
31
We present outlier detection by Gaussian clustering:
Merge
![Page 32: Distributed Clustering for Robust Aggregation in Large Networks](https://reader035.fdocuments.net/reader035/viewer/2022062518/56814958550346895db6ab95/html5/thumbnails/32.jpg)
Summary – Our Protocol 32
Elaborate DataCrash Robustness
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Round
Ave
rage
Err
or
Outlier Detection (where it matters)