High Performance Discrete Fourier Transforms on Graphics Processors
Tree-Based Density Clustering using Graphics Processors
description
Transcript of Tree-Based Density Clustering using Graphics Processors
Paradyn Project
Paradyn / Dyninst WeekCollege Park, Maryland
March 26-28, 2012
Tree-Based Density Clustering using Graphics Processors
Evan Samanas and Ben WeltonA First Marriage of MRNet and GPUs
The Tweet Stream
2Tree-Based Density Clustering using Graphics Processors
`
`
vv
Source: Twitter, Map: About.com
3Tree-Based Density Clustering using Graphics Processors
Tree-Based Overlay Networks (TBONs)
o Scalable multicast
o Scalable gather
o Scalable data aggregation
FE
…… …BE
appappappapp
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
CP CP CP CP
4Tree-Based Density Clustering using Graphics Processors
MRNet – Multicast / Reduction Networko General-purpose
TBON APIo Network: user-defined
topologyo Stream: logical data channel
o to a set of back-endso multicast, gather, and custom
reductiono Packet: collection of datao Filter: stream data operator
o synchronizationo transformation
o Widely adopted by HPC toolso CEPBA toolkit o Cray ATP & CCDBo Open|SpeedShop & CBTFo STATo TAU
FE
…… …BE
appappappapp
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
CP CP CP CP
F(x1,…,xn)
TBON Computation
5Tree-Based Density Clustering using Graphics Processors
FE
BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
BE
appappappapp
Ideal Characteristics:o Filter output size constant or decreasing
o Computation rate similar across levels
o Adjustable for load balance
Data Size: 10MB per BE
Packet Size: ≤10 MB
Packet Size:≤10 MB
~10 sec
~40 sec
…4x
~10 sec
~10 sec
~10 sec
…Data to
process: e.g. 40 MBTotal Time: ~30
secTotal Time: ~60
sec
Why GPUs?
6Tree-Based Density Clustering using Graphics Processors
FE
…… …BE
appappappapp
BE
appappappapp
BE
appappappapp
CP CP
CP CP CP CP
BE
appappappapp
o Natural fito Increase compute powero Trade computation for bandwidtho Derived summarieso Compute and send ∆ data
o Compression algorithms (e.g. LZO, zlib, etc.)
F(x1,…,xn)
Goal: Find regions that meet minimum density and spatial
distance characteristics
The two parameters that determine if a point is in a cluster
is Epsilon (Eps), and MinPtsIf the number of points in Eps is > MinPts, the point is a core point.For every discovered point, this
same calculation is performed until the cluster is fully expanded
Clustering Example (DBSCAN[1])
7Tree-Based Density Clustering using Graphics Processors
EpsMin Points
Min Pts: 3
[1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996)
Scaling DBSCANo PDBSCAN[2]
oQuality equivalent to single DBSCANo Linear speedup up to 8 nodes
o DBDC[3]
o Sacrifices qualityo~30x speedup on 15 nodes
o CUDA-Dclust[4]
oQuality equivalent to DBSCANo~15x faster on 1 node
8Tree-Based Density Clustering using Graphics Processors
[2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999)[3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004)[4] C. Bohm et al., Density-based clustering using graphics processors (2009)
Tree-Based Clustering: Mr. Scan
9Tree-Based Density Clustering using Graphics Processors
FE
BE BE BE
CP CP
BE
DBSCAN
Algorithm Steps
SpatialDecomp: CPU(@ FE)
DBSCAN: CPU or GPU(@ BE)
DrawBoundBox: CPU or GPU
MergeCluster: CPU (x #levels)
MergeCluster
Spatial Decomposition
10Tree-Based Density Clustering using Graphics Processors
1. Start with an input of Spatially Referenced points
2. Partition the region into equal sized density regions across one dimension
3. Add the shadow region area of one Epsilon to all density regions
Partition #1 Partition #2 Partition #3
Eps
DBSCAN - CPUo Run on local slice
for each BEo R* treeo Start at random
pointo Cluster w/ respect
to Eps, MinPtso Complexity: O(n
log n)11Tree-Based Density Clustering using Graphics
Processors
A B
C D E
R* Tree Example
GPU DBSCAN Filter
12Tree-Based Density Clustering using Graphics Processors
Candidate #1
Candidate #2
.
.Candidate #N
Candidate Clusters are potential clusters being explored concurrently in
the GPU
Number of cluster candidates is limited by GPU Characteristics
State array stores the current state of points in the search space
States
GPU DBSCAN operates similarly to the CPU version with two exceptions
Coarse grain search space
Chain Collisions causing cluster merges
CUDA-DCLUST [09 – Böhm]
DrawBoundBox – CPU | | GPU
13Tree-Based Density Clustering using Graphics Processors
MergeBoundBox - CPUo Checks for merge if box within
shadowo At least one point MUST be in
commono Iterate through ALL points in right
cluster
14Tree-Based Density Clustering using Graphics Processors
Match!
The Tweet Stream
15Tree-Based Density Clustering using Graphics Processors
`
`
Source: Twitter, Map: About.com
Evaluationo Dataset: 1-3 “Tweet Days”o Measuring:
o Time to completionoQuality compared to single-threaded
DBSCANo Algorithms:
o Single-Threaded DBSCANoDBDCoMRNet w/DBSCAN filteroMRNet w/DBSCAN GPU filter
16Tree-Based Density Clustering using Graphics Processors
Results
One Day Three Day0
5
10
15
20
25
30
ELKISingle ThreadCPU - MR. ScanGPU - MR. Scan
Tim
e (H
ours
)
17Tree-Based Density Clustering using Graphics Processors
Results
1x4 1x16 1x2x160
5000
10000
15000
20000
25000
DecompGPU RunCPU Run
Topology
Runn
ing
Tim
e (s
ec)
18Tree-Based Density Clustering using Graphics Processors
Results
19Tree-Based Density Clustering using Graphics Processors
Quality
20Tree-Based Density Clustering using Graphics Processors
0 10 20 30 40 50 6070%
75%
80%
85%
90%
95%
100%Quality of 1 tweet day
Mr. Scan w/CPU; 0 in-ternal nodesMr. Scan w/CPU; 2 in-ternal nodesMr. Scan w/GPU; 0 in-ternal nodesMr. Scan w/GPU; 2 in-ternal nodesDBDC
# Backend Nodes
Qua
lity
Future Worko Scaling Issues
o Spatial DecompositionoMerging Algorithm
21Tree-Based Density Clustering using Graphics Processors
2D Spatial Decomposition
22Tree-Based Density Clustering using Graphics Processors
- 1D Spatial Decomposition has some severe limitations
7pts 11pts
- Splits can have wildly differing point counts- Number of splits limited by Epsilon
Eps
- 2D Spatial Decomposition would allow for more Splits with more equal point counts
Merging Algorithmo Bounding Box
oNo quality degradation
o Limits iteration, but still iterates
23Tree-Based Density Clustering using Graphics Processors
oPossible Alternative: Concave HulloDBSCAN on border
pointsoAvoids iterationoConstant output
sizeoO(n2) on BEs
Use Caseso Twitter Data
o “Flu Tweets”oMood/Topic clusteringoRiot prediction
o Any Spatial dataoCurrently limited to
2D
24Tree-Based Density Clustering using Graphics Processors
Conclusiono DBSCAN performance scaleso Quality able to be maintainedo Goal: Scale O(100,000 nodes)
25Tree-Based Density Clustering using Graphics Processors