K-Means with BSP
Transcript of K-Means with BSP
![Page 1: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/1.jpg)
K-Means Clustering with BSP Thomas Jungblut, Testberichte.de, 2012
Study assignment 4th semester, HWR Berlin
![Page 2: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/2.jpg)
What is K-Means Clustering?
What is BSP?
K-Means with BSP
Content
2/33
![Page 3: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/3.jpg)
What is K-Means Clustering?
3/33
![Page 4: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/4.jpg)
Was ist K-Means Clustering?
![Page 5: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/5.jpg)
![Page 6: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/6.jpg)
![Page 7: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/7.jpg)
7
![Page 8: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/8.jpg)
![Page 9: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/9.jpg)
Unsupervised Learning
Huge number of input vectors
k initial centers
Two step iterative algorithm
Assignment
Update
What is K-Means Clustering?
9/33
![Page 10: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/10.jpg)
How do we parallelize K-Means?
10/33
![Page 11: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/11.jpg)
BSP = Bulk Synchronous Parallel
Paradigm to design parallel algorithms
Two basic operations
Send message
Barrier synchronization
What is BSP?
11/33
![Page 12: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/12.jpg)
What is BSP?
12/33
Sync
Sync
P1 P2 P3
Computation
Communication
Superstep
![Page 13: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/13.jpg)
Computation phase is queuing messages
Within two barrier synchronizations messages are exchanged in bulk
Messages from previous superstep are available in next superstep
13
What is BSP?
![Page 14: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/14.jpg)
K-Means with BSP
14/33
Partition the dataset into equal sized blocks
![Page 15: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/15.jpg)
K-Means with BSP
Centers
Sum assigned vectors to a new temporary center object
15/33
Put centers into RAM on each process
Iterate sequentially over vectors on disk
![Page 16: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/16.jpg)
K-Means with BSP
Centers
Centers
Centers
Centers
Centers
Centers
![Page 17: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/17.jpg)
K-Means with BSP
Centers
Sums
• Center 1 • Sum=25 • 5 times summed
• Center 2 • Sum=50 • 10 times summed
• Center 3 • Sum=10 • 5 times summed
17/33
![Page 18: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/18.jpg)
K-Means with BSP
Centers
Sum
Centers
Sum
Centers
Sum
Centers
Sum
Send the sum
![Page 19: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/19.jpg)
K-Means with BSP
Centers
Sum
Centers
Sum
Centers
Sum
Centers
Sum
Send the sum
![Page 20: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/20.jpg)
K-Means mit BSP
Centers Sum
Sum
Sum
Sum
Total Sum
Means
New Centers
20/33
• The same calculation on every process
• Floating point error can be corrected by synchronizing when it exceeds a given threshold
Divide by total increments
![Page 21: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/21.jpg)
K-Means with BSP
Assignment
Sync
Update
21/33
![Page 22: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/22.jpg)
Partition vectors into equal sized blocks # Blocks = # Tasks
Put centers in RAM Assignmentphase
Iterative vectors on disk sequentially Sum up temporary centers with assigned vectors Message all tasks with sum and how often something was
summed
Updatephase Calculate the total sum over all received messages and average Replace old centers with new centers and calc convergence
K-Means with BSP
22/33
![Page 23: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/23.jpg)
16 Server, 256 Cores, 10G network
Benchmark
80 seconds!
Possible starvation: add more servers
![Page 24: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/24.jpg)
Logarithmic scaling
Much better than linear scaling of MapReduce
24
Benchmark
![Page 25: K-Means with BSP](https://reader034.fdocuments.net/reader034/viewer/2022052321/55626329d8b42ae87d8b4e52/html5/thumbnails/25.jpg)
Implementation on Github
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/clustering/KMeansBSP.java
Will be comitted to Hama‘s ML-package soon
https://issues.apache.org/jira/browse/HAMA-547
25
Misc