In-place Super Scalar Samplesort - KITalgo2.iti.kit.edu/sanders/courses/algen17/IPSSSSo.pdf ·...
Transcript of In-place Super Scalar Samplesort - KITalgo2.iti.kit.edu/sanders/courses/algen17/IPSSSSo.pdf ·...
In-place (Parallel) Super Scalar Samplesort
Algorithm Engineering Lecture · June 13, 2017Michael Axtmann, Sascha Witt, Daniel Ferizovic, Peter Sanders
KIT – University of the State of Baden-Wuerttemberg andNational Laboratory of the Helmholtz Association
Institute of Theoretical Informatics · Algorithmics Group
www.kit.edu
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
Overview
Quicksort BlockQuicksort SSSS ISSSS
Decoupled control anddata flow
no yes yes yes
Conditional branches yes no no noData transfers / element ≈ log2 n ≈ log2 n ≈ logk n ≈ logk nAdditional space O(1) O(b) O(n) O(kb)Parallelization yes no no yes
1
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
Pivot b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
Pivot b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
Pivot b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
Pivot b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
Pivot b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place: O (b) additional space
Pivot
DrawbacksO(
nb log2
nn0
)block transfers
b Elements
2
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesk-way distributionCache/IO-efficient
O(
ntb logk
nn0
)block transfers
In-place: O (kb) additional spaceEasy to parallelize
3
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Local classification
Block permutation
Input
4
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Local classification
Block permutation
Cleanup
Input
4
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Flush ?
Local classificationk-way decision tree without branchesk buffer blocks of size B
Flush buffer block if charged
k buffer blocks
5
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w2 w3 w4w1 r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w1 w2 w3 w4r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w1 w2 w3 w4r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w1 w2 w3 w4r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w1 w2 w3 w4r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w1 w2 w3 w4r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
Swap buffers
b1 b2 b3 b4 b5w1 w2 w3 w4r1 r2 r3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b3 b5w1 w2 w3r1 r2 r3
Swap buffers
b4 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b3 b4 b5w2 w3 w4w1 r1 r2 r3 r4
Swap buffers
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1 r1 r2
Swap buffers
b3w2 r3 b4w3 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1 r1 r2
Swap buffers
b3w2 r3 b4w3 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1 r1 r2
Swap buffers
b3w2 r3 b4w3 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1 r1 r2
Swap buffers
b3w2 r3 b4w3 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1r1 r2
Swap buffers
b3w2 r3 b4w3 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1r1 r2
Swap buffers
b3w2 r3 b4w3 w4r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1
w4r1 r2
Swap buffers
b3w2 r3 b4w3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
Block permutationInvariant
Permuted blocks [bi , wi )Unpermuted blocks [wi , ri ]Empty Blocks (ri , bi+1)
Two buffers to swap blocks
b1 b2 b5w1
w4r1 r2
Swap buffers
b3w2 r3 b4w3 r4
6
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
DiscussionWhy blocks
Just n + n/b classifications per levelReduced TLB misses
Local classification: buffers on same pageWrite and read whole blocks
Better prefetching and less cache misses
7
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Super Scalar Samplesort
28 211 214 217 220 223 226 2290
2
4
6
Item count n
Runn
ingtim
e/
nlo
g 2n[µ
s]
IS4o BlockQ std-sorts3-sort DualPivot
Sequential on two Xeon E5-2683 v4 16-core processors – Uniform Input
8
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
Shared-memory parallelism with t threadsLocal classification: divide input into stripes – one for each threadBlock permutation: fetch blocks atomically
atomic read pointers and end pointersAccess with fetch-and-add operations
Blocks of size Ω(t) to avoid starvationBuffers for each threadCall sequential subroutines in parallel if n ≤ ninit/t
9
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
215 218 221 224 227 2300
0.2
0.4
0.6
One of two Intel Xeon E5-2683 v4 16-core processors – Uniform InputItem count n
Runn
ingtim
e/
nlo
g 2n[µ
s]IPS4o MCSTLubq MCSTLbqPBBS MCSTLmwm TBB
10
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
215 218 221 224 227 2300
0.2
0.4
0.6
Two Intel Xeon E5-2683 v4 16-core processors – Uniform InputItem count n
Runn
ingtim
e/
nlo
g 2n[µ
s]IPS4o MCSTLubq MCSTLbqPBBS MCSTLmwm TBB
11
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
215 218 221 224 227 2300
0.2
0.4
0.6
Two Intel Xeon E5-2683 v4 16-core processors – Exponential InputItem count n
Runn
ingtim
e/
nlo
g 2n[µ
s]IPS4o MCSTLubq MCSTLbqPBBS MCSTLmwm TBB
12
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
215 218 221 224 227 2300
0.2
0.4
0.6
Two Intel Xeon E5-2683 v4 16-core processors – RootDup InputItem count n
Runn
ingtim
e/
nlo
g 2n[µ
s]IPS4o MCSTLubq MCSTLbqPBBS MCSTLmwm TBB
13
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
215 218 221 224 227 2300
0.2
0.4
0.6
Two Intel Xeon E5-2683 v4 16-core processors – TwoDup InputItem count n
Runn
ingtim
e/
nlo
g 2n[µ
s]IPS4o MCSTLubq MCSTLbqPBBS MCSTLmwm TBB
14
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
In-place Parallel Super Scalar Samplesort
215 218 221 224 227 2300
0.2
0.4
0.6
Two Intel Xeon E5-2683 v4 16-core processors – AlmostSorted InputItem count n
Runn
ingtim
e/
nlo
g 2n[µ
s]IPS4o MCSTLubq MCSTLbqPBBS MCSTLmwm TBB
15
M. Axtmann, S. Witt, D. Ferizovic, P. Sanders – In-place Super Scalar Samplesort Institute of Theoretical Informat-icsAlgorithmics Group
BlockQuicksort
GoalsPartially decoupling control flow from data flowAvoid conditional branchesIn-place
16