Liron Schiff * (TAU) Joint work with Yehuda Afek, Anat Bremler-Barr (TAU) (IDC) Recursive Design of...
-
Upload
jamir-grose -
Category
Documents
-
view
221 -
download
0
Transcript of Liron Schiff * (TAU) Joint work with Yehuda Afek, Anat Bremler-Barr (TAU) (IDC) Recursive Design of...
Liron Schiff * (TAU)Joint work with
Yehuda Afek, Anat Bremler-Barr(TAU) (IDC)
Recursive Design of Hardware Priority Queues
∗Supported by European Research Council (ERC) Starting Grant no. 259085
• Interface:– PQ.Insert()
• The higher the priority of , the smaller is– PQ.GetMin(): remove and return
– PQ.Delete(): just remove– PQ.Peek(): just return minimum
Priority Queue (PQ)
Priority
QueueInser
tGetMi
n
• Networking: Scheduling Packets– Many flows (1M)– High rate (100Mpps)
More Application: Scientific Simulators, Databases
Priority Queue Applications
Priority
Queue ( s c h e d u l
e r )
14
33
9
13
24
1927
42
55
16
38
7 25
Two Existing Approaches
Dedicated HardwareSolutions
Common SoftwareSolutions
: Fast : Slow
Non-Scalable Scalable
Merge-Sort concept:
Our Approach: The Powering Technique
Base Priority Queue (BPQ)
size HW PQ3 x + size
RAM =
Sort
Merge
√𝑵
√𝑵
Size PQ
The Powering Technique
• Insert(x) uses Input
Input
BPQ
Exit BPQ
3
The Powering Technique
• Insert(x) uses Input
Input
BPQ
Exit BPQ
0
3
The Powering Technique
• Insert(x) uses Input
Input
BPQ
Exit BPQ
0
35
The Powering Technique
• When Input gets full move to Exit.
Input
BPQ
Exit BPQ
0
3
5
√𝑵
The Powering Technique
• When Input gets full move to Exit.
Input
BPQ
Exit BPQ
0
3
5
4
7
8
The Powering Technique
• When Input gets full move to Exit.
Input
BPQ
Exit BPQ
0
3
5
4
7
8
1
2
6
√𝑵
The Powering Technique
• Get_min() extracts the min of Exit or Input
Input
BPQ
Exit BPQ
0
3
5
4
7
8
1
2
6
9
min
The Powering Technique
• Get_min() extracts the min of Exit or Input
Input
BPQ
Exit BPQ
0
3
5
4
7
8
1
2
6
9
and we update the Exit (if needed).
min
• Difficulties with the Simple idea
• Applying the construction recursively
• Exemplifying on TCAM base units
• Evaluation
Outline
1. More than lists in exit module (As lists are emptied, and capacity N is maintained)
2. Move a list in O(1) op’s from Input to Exit
Two difficulties with the simple idea
Input
Exit
√𝑵
√𝑵
¿𝑵
Difficulty 1
• Maintaining capacity N, while lists are shrinking
Input
BPQ
Exit BPQ
3
5
4
7
8
1
2
6
9
Difficulty 1
• Maintaining capacity N, while lists are shrinking
Input
BPQ
Exit BPQ
3
5
4
7
8
1
2
6
9
• We continually merge inactive lists during Insert
Difficulty 1
• Maintaining capacity N, while lists are shrinking
Input
BPQ
Exit BPQ
3
54
7
8
1
2
6
9
• We continually merge inactive lists during Insert
10
Difficulty 1
• Maintaining capacity N, while lists are shrinking
Input
BPQ
Exit BPQ
3
5
4
7
8
1
2
6
9
• We continually merge inactive lists during Insert
10
11
Difficulty 1
• Maintaining capacity N, while lists are shrinking
Input
BPQ
Exit BPQ
3
5
4
7
8
1
2
6
• We continually merge inactive lists during Insert
9
10
11
Difficulty 2
• Moving all items from input to RAM in O(1) time
Exit BPQ
Input
BPQ
Difficulty 2
• Moving all items from input to RAM in O(1) time– Use two Input BPQs and switch between them
Exit BPQ
Input
BPQ
Input
BPQs
Buffers
Difficulty 2
• Moving all items from input to RAM in O(1) time– Use two Input BPQs and switch between them
Exit BPQ
Input
BPQ
Input
BPQ
Buffers
Difficulty 2
• Moving all items from input to RAM in O(1) time– Use two Input BPQs and switch between them
Exit BPQ
Input
BPQ
Input
BPQ
Buffers
Difficulty 2
• Moving all items from input to RAM in O(1) time– Use two Input BPQs and switch between them
Exit BPQ
Input
BPQ
Input
BPQ
Buffers
Block Size – Time Tradeoff
• Apply the construction recursively– We used Exit and Input
Exit BPQ
Input
BPQ
Input
BPQ
√𝑵
√𝑵
Block Size – Time Tradeoff
• Apply the construction recursively– We used Exit and Input– We can use Exit and Input
Exit BPQ
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁 2
Block Size – Time Tradeoff
• Apply the construction recursively– We used Exit and Input– We can use Exit and Input– We can build each Input recursively
Exit BPQ
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁 2
Exit BPQ
3√𝑁
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁
Block Size – Time Tradeoff
Exit BPQ
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁 2
Exit BPQ
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁
Exit BPQ
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁
Block Size – Time Tradeoff
Exit BPQ
Input
BPQ
Input
BPQ
3√𝑁
3√𝑁 2
Exit BPQ
Input
BPQ
Input
BPQ
Exit BPQ
Input
BPQ
Input
BPQInser
t
Insert
Block Size – Time Tradeoff• A Systolic Array like design:
Exit BPQ
𝑥
RAM
Buf
Buf
Exit BPQ
RAM
𝑁𝑥2
𝑁𝑥2
𝑥
Exit BPQ
RAM
Exit BPQ
𝑵𝒙𝟐
𝑥
Exit BPQ
𝑵𝒙𝟐
…Input
BPQ
Input
BPQ𝑥𝑥
𝑁𝑥3
𝑁𝑥3
Exit BPQ
𝒙𝟐
𝑥
Exit BPQ
𝒙𝟐
𝑥
in
Resulting Tradeoffs
Parallel op. Time (Latency)
#BPQ Ops. (per op.)
#Queues * Size
Recursion Levels
.
.
.
.
.
.
.
.
.
.
.
.
TCAM example
• Associative Memory chips:
• Properties:
– Ternary values (‘0’,’1’ and ‘*’)
– Already used in routers (IP lookup, classification)
– High throughput (300M ops per sec for 1Mb TCAM)
– Latency and costs increase dramatically with size
Ternary CAMs (TCAMs)
0*10**1*001001
1111***011
01010110
in
012
m
0001001
11out
entry data entry index
• Implied by Panigrahy & Sharma (2003)
• Three versions:
A. O(1) time but O(w) entries per item
(where w is the width of a priority value in bits)
B. O(log w) time
C. “Empirical O(1)” time but O(w) on w.c.
TCAM based Priority Queue
BPQ
Space (TCAM bits)
Time (TCAM ops.)
Latency(TCAM ops.)
original
• Implied by Panigrahy & Sharma (2003)
• Our results:
TCAM based Priority Queue
PoweringPowering
• Using small TCAM-based PQs– Faster TCAM access– Feasible even when N is large
• Suits well backbone routers– TCAMs are already used for IP-lookup
Powering the TCAM BPQ
Results for TCAM-based PQ
Size limit
5040
032
0010
0013
0016
0019
00
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
TCAM Space
N (thousands of items)
TC
AM
Sp
ace
(K
b)
50
100
200
400
800
1600
3200
0
50
100
150
200
Throughput
N (thousands of items)
Mp
ps
k=2
k=1
A
BC
Applying to Shift-Registers
1,0
00
2,0
00
4,0
00
8,0
00
16,0
00
32,0
00
64,0
00
128,0
00
256,0
00
512,0
00
1,0
24,0
000
50
100
150
200
Throughput
SR-BPQSR_PPQ(2)SR-PPQ(3)
N (thousands of items)
Mp
ps
Size limit
• Considering a HW PQ implementation of R. Chandra and O. Sinnen.
OriginalK=1K=2
Summary
• The Powering Technique– Combine Small HW queues and RAM– Allows space – time tradeoffs
• Powering TCAMs– Smaller TCAMs shorter operation time– Matches lower bound for sorting with TCAM– Also works for Shift Registers