Rosemary Gibbons, 2012. Derecha- right Izquierda- left Rosemary Gibbons, 2012.
How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β·...
Transcript of How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β·...
![Page 1: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/1.jpg)
How Emerging Memory TechnologiesWill Have You
Rethinking Algorithm Design
Phillip B. Gibbons
Carnegie Mellon University
PODCβ16 Keynote Talk, July 28, 2016
![Page 2: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/2.jpg)
3Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
50 Years of Algorithms Research
β¦has focused on settings in whichreads & writes to memory have equal cost
But what if they have very DIFFERENT costs?How would that impact Algorithm Design?
![Page 3: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/3.jpg)
4Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Key Take-Aways
β’ Main memory will be persistent and asymmetric
β Reads much cheaper than Writes
β’ Very little work to date on Asymmetric Memory
β Not quite: space complexity, CRQW, RMRs, Flashβ¦
β’ Highlights of our results to date:
β Models: (M,Ο)-ARAM; with parallel & block variants
β Asymmetric memory is not like symmetric memory
β New techniques for old problems
β Lower bounds for block variant are depressing
![Page 4: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/4.jpg)
5Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Emerging Memory Technologies
![Page 5: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/5.jpg)
6Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Emerging Memory Technologies
Motivation:
β DRAM (todayβs main memory) is volatile
β DRAM energy cost is significant (~35% of DC energy)
β DRAM density (bits/area) is limited
Promising candidates:β Phase-Change Memory (PCM)
β Spin-Torque Transfer Magnetic RAM (STT-RAM)
β Memristor-based Resistive RAM (ReRAM)
β Conductive-bridging RAM (CBRAM)
Key properties:
β Persistent, significantly lower energy, can be higher density
β Read latencies approaching DRAM, byte-addressable
3D XPoint
![Page 6: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/6.jpg)
7Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Another Key Property:Writes More Costly than Reads
In these emerging memory technologies, bits arestored as βstatesβ of the given material
β’ No energy to retain state
β’ Small energy to read state
- Low current for short duration
β’ Large energy to change state
- High current for long duration
Writes incur higher energy costs, higher latency, lower per-DIMM bandwidth (power envelope constraints),
endurance problems
PCM
![Page 7: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/7.jpg)
8Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Cost Examples from Literature(Speculative)
β’ PCM: writes 15x slower, 15x less BW, 10x more energy
β’ PCM L3 Cache: writes up to 40x slower,17x more energy
β’ STT-RAM cell: writes 71x slower, 1000x more energy @
material level
β’ ReRAM DIMM: writes 117x slower, 125x more energy
β’ CBRAM: writes 50x more energy
Sources: [KPMWEVNBPA14] [DJX09] [XDJX11] [GZDCH13]
Costs are a well-kept secret by Vendors
![Page 8: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/8.jpg)
9Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Are We There Yet?
β’ 3D XPoint will first come out in SSD form factor
β No date announced: expectation is 2017
β’ Later will come out in DIMM form factor
β Main memory: Loads/Stores on memory bus
β No date announced: perhaps 2018
β’ Energy/density/persistence advantagesProjected to become dominant main memory
In near future: Main memory will be persistent & asymmetric
![Page 9: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/9.jpg)
10Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Write-Efficient Algorithm Design
Goal: Design write-efficient algorithms(write-limited, write-avoiding)
β’ Fewer writes Lower energy, Faster
How we model the asymmetry: In asymmetric memory, writes are Ο times more costly than reads
![Page 10: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/10.jpg)
11Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
How does one sort π elements using π(π log π)instructions (reads) but only π(π) writes?
β’ Swap-based sorting (i.e. quicksort, heap sort) does
π(π log π) writes
β’ Mergesort requires π writes for log π rounds
Warm up: Write-efficient Sorting
Solution:β’ Insert each key in random order
into a binary search treeβ’ An in-order tree traversal yields
the sorted array
![Page 11: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/11.jpg)
12Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Asymmetric Read-Write Costs:
Prior Work (1)
β’ Space complexity classes such as L
β Can repeatedly read input
β Only limited amount of working space
Whatβs missing: Doesnβt charge for number of writes
β OK to write every step
β’ Similarly, streaming algorithms have limited space
β But OK to write every step
![Page 12: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/12.jpg)
13Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Asymmetric Read-Write Costs:
Prior Work (2)
β’ Reducing writes to contended shared memory vars
β Multiprocessor cache coherence serializes writes, but reads
can occur in parallel
β Concurrent-read-queue-write (CRQW) model [GMR98]
β Contention in asynchronous shared memory algs [DHW97]
β Etc, etc
Whatβs missing: Cost of writes to even un-contended vars
β OK to write every step to disjoint vars (disjoint cache lines)
β’ Similarly, reducing writes to minimize locking/synch
β But OK for sequential code to write like a maniac!
![Page 13: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/13.jpg)
14Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Asymmetric Read-Write Costs:
Prior Work (3)
β’ Remote Memory References (RMR) [YA95]
β Only charge for remote memory references,
i.e., references that require an interconnect traversal
β In cache-coherent multiprocessors, only charge for:
β’ A read(x) that gets its value from a write(x) by another process
β’ A write(x) that invalidates a copy of x at another process
β Thus, writes make it costly
Whatβs missing: Doesnβt charge for number of writes
![Page 14: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/14.jpg)
15Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Asymmetric Read-Write Costs:
Prior Work (4)
β’ NAND Flash. This work focused on:
β Asymmetric granularity of writes (must erase large blocks)
β Asymmetric endurance of writes [GT05, EGMP14]
No granularity issue for emerging NVM
β Byte-addressable for both reads and writes
Individual cell endurance not big issue for emerging NVM
β Can be handled by system software
![Page 15: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/15.jpg)
16Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Key Take-Aways
β’ Main memory will be persistent and asymmetric
β Reads much cheaper than Writes
β’ Very little work to date on Asymmetric Memory
β Not quite: space complexity, CRQW, RMRs, Flash,β¦
β’ Highlights of our results to date:
β Models: (M,Ο)-ARAM; with parallel & block variants
β Asymmetric memory is not like symmetric memory
β New techniques for old problems
β Lower bounds for block variant are depressing
You Are Here
![Page 16: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/16.jpg)
17Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
(π΄,π)-Asymmetric RAM (ARAM)
β’ Comprised of:
β processor executing RAM instructions on Ξ(log π)-bit words
β a symmetric memory of π words
β an asymmetric memory of unbounded size, with write cost π
β’ ARAM cost Q(n):
β’ Time T(n) = Q(n) + # of instructions
CPUwrite cost
π
AsymmetricMemorySymmetric
Memory
πwords
1
π
1
[BFGGS16]
![Page 17: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/17.jpg)
18Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Write-efficient Algorithms
Problem Read
(unchanged)
Previous
write
Current
write
Reduction
ratio
Comparison sort Ξ(π logπ) π(π log π) Ξ(π) π(log π)
Search tree,
priority queueΞ(log π) π(log π) Ξ(1) π(log π)
2D convex hull,
triangulationΞ(π log π) π(π log π) Ξ(π) π(log π)
BFS, DFS,
topological sort,
bi-CC, SCC
Ξ(π +π) π(π + π) Ξ(π) π(π/π)
β’ Trivial
β’ Significant reduction. M can be O(1)
zzzz
![Page 18: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/18.jpg)
19Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Reduction in Writes depends on M, π, input size
ProblemARAM cost π(π,π)
Classic algorithms New algorithms
Single-source
shortest-pathπ π π + π log π
π(
)
min(
)
π π +π/π ,π π + π log π ,π(π + log π)
Minimum
spanning treeπ ππ
π(min(ππ,π min(log π , π/π) + ππ))
β’ SSSP: Phased Dijkstra that uses phases & keeps a
truncated priority queue in symmetric memory
β’ Write-efficient bookkeeping is often challenging
![Page 19: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/19.jpg)
20Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
β’ New FFT lower bound technique (generalizes [HK81])
β’ Gap between comparison sorting & sorting networks
β No gap for classic RAM setting, PRAM, etc
No (significant) improvement with cheaper reads
Problem
ARAM cost Q(n)
Classic
Algorithm
New Lower
Bound
Sorting networks and
Fast Fourier TransformΞ ππ
log π
logπΞ ππ
log π
logππ
Diamond DAG (ala
LCS, edit distance)Ξ
π2π
πΞ
π2π
π
![Page 20: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/20.jpg)
21Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
An Example of a Diamond DAG: Longest Common Subsequence (LCS)
A C G T A T
A
T
C
G
A
T
![Page 21: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/21.jpg)
22Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
An Example of a Diamond DAG: Longest Common Subsequence (LCS)
1 1 1 1 1 1
1 1 1 2 2 2
1 2 2 2 2 2
1 2 3 3 3 3
1 2 3 3 4 4
1 2 3 4 4 5
A C G T A T
A
T
C
G
A
T
![Page 22: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/22.jpg)
23Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
β’ π Γ π diamond requires π storage to compute [CS76]
β’ Computing any 2π Γ 2π diamond requires π writesto the asymmetric memory
β 2π storage space, π from symmetric memory
β’ Tiling with 2π Γ 2π sub-DAGs yields π2/ 2π 2 tiles
Proof sketch of π―πππ
π΄diamond DAG lower bound
2π rows
![Page 23: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/23.jpg)
24Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
β’ DAG rule: Compute a node after all its inputs ready
β’ By breaking this rule: LCS cost reduced by π(π1/3)β New βpath sketchβ technique
β’ Classic RAM: No gap between Diamond DAG & LCS/Edit Distanceβ’ Classic RAM: No gap between Sorting Networks & Comparison Sort
Asymmetric Memory is not like Symmetric Memory
Problem
ARAM cost Q(n)
Classic
Algorithm
New Lower
Bound
Sorting networks and
Fast Fourier TransformΞ ππ
log π
logπΞ ππ
log π
logππ
Diamond DAG (ala
LCS, edit distance)Ξ
π2π
πΞ
π2π
π
![Page 24: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/24.jpg)
25Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Asymmetric Shared Memory
β’ (π΄,π)-Asymmetric PRAM (machine model)
β P processors, each with local memory of size π΄
β Unbounded asymmetric shared memory, write cost π
β’ Asymmetric Nested-Parallel Model
β Processor oblivious
β Provably good with work-stealing schedulers
[BBFGGMS16]
[BFGGS15]
![Page 25: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/25.jpg)
Reduce on Asymmetric PRAM Model
Reduce(list L, function F, identity I)
if(L.length == 0)
return I;
if(L.length == 1)
return L[0];
L1, L2 = split(L);
R1 = Reduce(L1, F, I);
R2 = Reduce(L2, F, I);
return F(R1, R2);
Assume π 1 work for F
Each write costs Ο
Split takes π π work
Work
π π = 2ππ
2+ π π
π π = π ππ
Span
π· π = π·π
2+ π π
π· π = π π log π
26
Too conservative: All intermediate results written to shared memory
Must explicitly schedule computation on processors
![Page 26: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/26.jpg)
Asymmetric Nested-Parallel (NP) Model: Fork-Join
Parent
Parent
Child 1 Child 2
Join
Fork
Root
Suspended
27
![Page 27: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/27.jpg)
Ο
1Stack
Memory
β’ Stack Allocated
β’ Symmetricβ’ Limited Size
Asymmetric NP Model: Memory Model
1
1
CPU
β’ Unit cost per instruction
Heap Memory
β’ Heap Allocated
β’ Asymmetricβ’ Infinite Size
28
Key feature: Algorithmic cost modelProcessor-oblivious
![Page 28: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/28.jpg)
Constant Stack Memory
Constant Stack Memory
Ml Stack Memory
Asymmetric NP Model: Stack Memory
Root
Root
A
B
A
C D
29
![Page 29: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/29.jpg)
Asymmetric NP Model: Work Stealing Issues
30
Thieves need access to stack memory of stolen task
Good news: Non-leaf stacks are π 1 size
Approach 1: Write out all stack memory every fork
Have to pay π π for each fork!
Approach 2: Write stack memory only on steal
Challenge: Need to limit number of stacks written per steal
Heap
Stack 1
Proc 1
Stack 2
Proc 2
Stack 3
Proc 3
![Page 30: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/30.jpg)
Write Stack Memory Only on Steal:Problem Scenario
Root
A B
C D
E F
G HThread 3
Steal
Thread 2 Steal
31
Thread 1 started at root, now executing G
![Page 31: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/31.jpg)
Asymmetric NP Model: Work Stealing Theorem
A computation with binary branching factor on the Asymmetric NP model can be simulated on the
(M, Ο)-Asymmetric PRAM machine model in:
ππ
π+ ππ· Expected Time
where:
P = processors Ξ΄ = nesting depth
D = span ππ = leaf stack memory
W = work M = π Ξ΄ +ππ
32
![Page 32: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/32.jpg)
Reduce: Asymmetric NP Model
Reduce(list L, function F, identity I)
if(L.length == 0)
return I;
if(L.length == 1)
return L[0];
L1, L2 = split(L);
R1 = Reduce(L1, F, I);
R2 = Reduce(L2, F, I);
return F(R1, R2);
Assume π 1 work for F
Minimize writes to large memory
Children are forked tasks
Tasks store list start & end
Only write final answer
Work
π π = π π + π
Span
π· π = π log π + π
33
Intermediate results NOT written to memoryScheduler handles inter-processor communication & its costs
![Page 33: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/33.jpg)
Write-Efficient Shared Memory Algorithms
34
Problem Work (W) Span (D) Reduction of Writes
Reduce π π + π π logπ + π π π
Ordered Filter π π + ππ β π π logπ β π logπ
List Contraction π π + π π π logπ β π Ο
Tree Contraction π π + π π π logπ β π π
Minimum Spanning Tree
π ࡬
ΰ΅°
πΌ π π
+ ππ log minπ
π, π
π πpolylog π βπ
π
π β log minππ,π
2D Convex Hull π π logπ + ππ log log π ^π π logπ 2 β Output Sensitive
BFS Tree π ππ +π ^ π πΞ΄ logπ βπ
π
π
π = output sizeπΏ = graph diameter
πΌ = inverse Ackerman function
β= with high probability
^= expected
[BBFGGMS16]
![Page 34: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/34.jpg)
Tree Contraction: Current Methods
Rake leaves
Compress Chains
Each rake or compress operation costs a write
Total number of rakes and compresses is π π
Work is π ππ
Span is π π log π
++
x
+
3 2 x
1 4
35
![Page 35: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/35.jpg)
Tree Contraction: High-level Approach
Assume that ML is Ξ© π
Partition the tree into
ππ
πcomponents of size
π π
Sequentially contract each component
Use a classic parallel algorithm to contract the resulting tree of size
ππ
π
36
![Page 36: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/36.jpg)
Tree Contraction: Classic Partitioning
Follow the Euler Tour
Generate subtree size
Find the m-critical points
Partition the tree
Requires a write for each node
37
15
7 7
3 3 3 3
1 1 1 1 1 1 1 1
![Page 37: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/37.jpg)
Tree Contraction: Write-Efficient Partitioning
Mark each node with
probability 1
π
Traverse the Euler Tour from each marked node and mark every Οth node
Mark the highest node on each path between marked nodes
Each marked node starts a new component
38
![Page 38: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/38.jpg)
Tree Contraction: Contract-ability of Partitions
39
= Unmarked node
Marked in step 1 =
= Marked in step 3
![Page 39: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/39.jpg)
Tree Contraction: A New Approach
Assume that ML is Ξ© π
Partition the tree into ππ
πcomponents of size π π
Sequentially contract each component
Use a classic algorithm to contract the resulting tree of
size ππ
π
Work: π π +π
πβ π + π π +
π
πβ π + π π = π π + π
Span: π π log π + π π + π π logπ
π= π π log π
40
![Page 40: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/40.jpg)
41Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
β¨―1
β’ AEM has two memory transfer instructions:
β Read transfer: load a block from large-memory
β Write transfer: write a block to large-memory
β’ The complexity of an algorithm on the AEM model (I/O complexity) is measured by:
# ππππ π‘ππππ πππ + π β # π€πππ‘π π‘ππππ πππ
The Asymmetric External Memory model
CPU
Asymmetric
Large-memory
π/π΅
π΅
0
1
β π
Small-memory
[BFGGS15]
![Page 41: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/41.jpg)
42Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
β’ Sorting π records in AEM model has I/O complexity of
πΆππ
π©π₯π¨π ππ΄
π©
π
π©
can be achieved by:β Multi-way mergesort
β Sample sort
β Heap sort based on buffer trees
β’ Matching lower bound [Sitchinava16]
β No asymptotic advantage whenever π is π ππ for a constant c
β Depressingβ¦because so many problems canβt beat an EM sorting lower bound
Sorting algorithms on the
Asymmetric EM model
![Page 42: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/42.jpg)
43Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Key Take-Aways
β’ Main memory will be persistent and asymmetric
β Reads much cheaper than Writes
β’ Very little work to date on Asymmetric Memory
β Not quite: space complexity, CRQW, RMRs, Flash,β¦
β’ Highlights of our results to date:
β Models: (M,Ο)-ARAM; with parallel & block variants
β Asymmetric memory is not like symmetric memory
β New techniques for old problems
β Lower bounds for block variant are depressing
You Are Here
![Page 43: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/43.jpg)
44Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
Thanks to Collaborators
β’ National Science Foundation
β’ Natural Sciences and Engineering Research Council of Canada
β’ Miller Institute for Basic Research in Sciences at UC Berkeley
β’ Intel (via ISTC for Cloud Computing & new ISTC for Visual Cloud)
Naama
Ben-David
Guy
Blelloch
Jeremy
Fineman
Yan
Gu
Charles
McGuffeyJulian
Shun
& Sponsors
(Credit to Yan and Charlie for some of these slides)
![Page 44: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/44.jpg)
45Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
References(in order of first appearance)
[KPMWEVNBPA14] Ioannis Koltsidas, Roman Pletka, Peter Mueller, Thomas Weigold, Evangelos Eleftheriou, Maria Varsamou,
Athina Ntalla, Elina Bougioukou, Aspasia Palli, and Theodore Antonakopoulos. PSS: A Prototype Storage Subsystem based on
PCM. NVMW, 2014.
[DJX09] Xiangyu Dong, Norman P. Jouupi, and Yuan Xie. PCRAMsim: System-level performance, energy, and area modeling for
phase-change RAM. ACM ICCAD, 2009.
[XDJX11] Cong Xu, Xiangyu Dong, Norman P. Jouppi, and Yuan Xie. Design implications of memristor-based RRAM cross-point
structures. IEEE DATE, 2011.
[GZDCH13] Nad Gilbert, Yanging Zhang, John Dinh, Benton Calhoun, and Shane Hollmer, "A 0.6v 8 pj/write non-volatile CBRAM
macro embedded in a body sensor node for ultra low energy applications", IEEE VLSIC, 2013.
[GMR98] Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. The Queue-Read Queue-Write PRAM Model: Accounting
for Contention in Parallel Algorithms, SIAM J. on Computing 28(2), 1998.
[DHW97] Cynthia Dwork, Maurice Herlihy, and Orli Waarts. Contention in Shared Memory Algorithms. ACM STOC, 1993.
[YA95] Jae-Heon Yang and James H. Anderson. A Fast, Scalable Mutual Exclusion Algorithm. Distributed Computing 9(1), 1995.
[GT05] Eran Gal and Sivan Toledo. Algorithms and data structures for flash memories. ACM Computing Surveys, 37(2), 2005.
[EGMP14] David Eppstein, Michael T. Goodrich, Michael Mitzenmacher, and Pawel Pszona. Wear minimization for cuckoo
hashing: How not to throw a lot of eggs into one basket. ACM SEA, 2014.
[BFGGS16] Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. Efficient Algorithms with Asymmetric
Read and Write Costs. ESA, 2016.
[HK81] Jia-Wei Hong and H. T. Kung. I/O complexity: The red-blue pebble game. ACM STOC, 1981.
[CS76] Stephen Cook and Ravi Sethi. Storage requirements for deterministic polynomial time recognizable languages. JCSS,
13(1), 1976.
![Page 45: How Emerging Memory Technologies with Have You Rethinking ...gibbons/Phillip B. Gibbons_files...Β Β· Good news: Non-leaf stacks are π1size Approach 1: Write out all stack memory](https://reader033.fdocuments.net/reader033/viewer/2022042413/5f2dc1126092f6790e10ab6f/html5/thumbnails/45.jpg)
46Β© Phillip B. GibbonsHow Emerging Memory Technologies Will Have You Rethinking Algorithm Design
References (cont.)
[BFGGS15] Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, and Julian Shun. Sorting with Asymmetric Read and
Write Costs. ACM SPAA, 2015.
[BBFGGMS16] Naama Ben-David, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Yan Gu, Charles McGuffey, and Julian
Shun. Parallel Algorithms for Asymmetric Read-Write Costs. ACM SPAA, 2016.
[Sitchinava16] Nodari Sitchinava, personal communication, June 2016.
Some additional related work:
[CDGKKSS16] Erin Carson, James Demmel, Laura Grigori, Nicholas Knight, Penporn Koanantakool, Oded Schwartz, and Harsha
Vardhan Simhadri. Write-Avoiding Algorithms. IEEE IPDPS, 2016.
[CGN11] Shimin Chen, Phillip B. Gibbons, and Suman Nath. Rethinking Database Algorithms for Phase Change Memory. CIDR,
2011.
[Viglas14] Stratis D. Viglas. Write-limited sorts and joins for persistent memory. VLDB 7(5), 2014.