Processing in Memory Advanced Seminar Computer Engineering … · 2017. 4. 28. · 11 Advanced...
Transcript of Processing in Memory Advanced Seminar Computer Engineering … · 2017. 4. 28. · 11 Advanced...
Processing in Memory
Advanced Seminar Computer Engineering
ZITI CAG – University of Heidelberg
Felix Kaiser
6.2.2017
2/6/17Advanced Seminar – Processing in Memory2
Processing in Memory
Processing in Memory can be everything
2/6/17Advanced Seminar – Processing in Memory3
Structure of Contents
Computation Walls Near-Data-Processing Processing-In-Memory Systems Results and Challenges Conclusion
2/6/17Advanced Seminar – Processing in Memory4
Walls
Peak FLOPS/Socket increasing at 50%-60% per Year
Memory Bandwidth increasing at 23% per Year
Memory Latency increasing at 4% per Year Interconnect Bandwidth increasing at 20%
per Year Interconnect Latency decreasing at 20% per
Year
2/6/17Advanced Seminar – Processing in Memory5
Walls
2/6/17Advanced Seminar – Processing in Memory6
Taxonomy of NDP
Computing in Caches
Near-Memory Processing/Processing-in-Memory
Processing in Flash
Computing on Disk
Intelligent Network
2/6/17Advanced Seminar – Processing in Memory7
Subsets of PIM
In front of the Sense Amplifier(with reservations)
Between Sense Amplifier and Column Decoder Memory embedded into Pipeline of a Processor In front of a Bus/Crossbar Switch In 3D-Stack
• In each Vault
• Processing Dies in Memory Stack
2/6/17Advanced Seminar – Processing in Memory8
Subsets of PIM
In front of the Sense Amplifier(with reservations)
Between Sense Amplifier and Column Decoder
Memory embedded into Pipeline of a Processor
In front of a Bus/Crossbar Switch
In 3D-Stack• In each Vault
• Processing Dies between Memory DiesSense
AmplifierEnable
A
B
C
2/6/17Advanced Seminar – Processing in Memory9
Subsets of PIM
SenseAmplifier
Enable
A
B
C
Final StateAB + BC +
AC
C(A + B) + ~C(AB)
2/6/17Advanced Seminar – Processing in Memory10
Subsets of PIM
In front of the Sense Amplifier(with reservations)
Between Sense Amplifier and Column Decoder
Memory embedded into Pipeline of a Processor
In front of a Bus/Crossbar Switch
In 3D-Stack• In each Vault
• Processing Dies between Memory Dies
2/6/17Advanced Seminar – Processing in Memory11
Computing near Memory
In front of the Sense Amplifier(with reservations)
Between Sense Amplifier and Column Decoder
Memory embedded into Pipeline of a Processor
In front of a Bus/Crossbar Switch
In 3D-Stack• In each Vault
• Processing Dies between Memory Dies
2/6/17Advanced Seminar – Processing in Memory12
Computing near Memory
2/6/17Advanced Seminar – Processing in Memory13
DEEP-ER (NAM)
HMC(3D-Stacked Memory) and the FPGA are connected with wide Data paths
The Intelligence can execute Computation while using the full Bandwidth
Data can be written back or sent to Host
2/6/17Advanced Seminar – Processing in Memory14
Stacking
Idea:
Stacking different Parts which are traditionally placed on PCBs
Especially Memory stacking can be sensible:• higher Density
• more Capacity
• less Power Consumption
2/6/17Advanced Seminar – Processing in Memory15
Subsets of PIM
In front of the Sense Amplifier(with reservations)
Between Sense Amplifier and Column Decoder
Memory embedded into Pipeline of a Processor (GPP)
In front of a Bus/Crossbar Switch
In 3D-Stack• In each Vault
• Processing Dies between Memory Dies
2/6/17Advanced Seminar – Processing in Memory16
Subsets of PIM
In front of the Sense Amplifier(with reservations)
Between Sense Amplifier and Column Decoder
Memory embedded into Pipeline of a Processor (GPP)
In front of a Bus/Crossbar Switch
In 3D-Stack• In each Vault
• Processing Dies between Memory Dies
2/6/17Advanced Seminar – Processing in Memory17
Simulation System
Host:• 2 Cortex-A15 at 2GHz
Memory:• Hybrid Memory Cube
• 512 MB
• 16 Vaults
• 4 Dies stacked
PIM:• Similar to Host with Voltage and Frequency Scaling
2/6/17Advanced Seminar – Processing in Memory18
Results
2/6/17Advanced Seminar – Processing in Memory19
Heat Problems with 3D-Stacking
No actual Implementations of logic in 3D-Memory stacks could be found
One of the biggest Problems is heat
2/6/17Advanced Seminar – Processing in Memory20
Conclusion
NDP sensible to overcome Walls Traditional PIM Approaches have Problems
• Reason: DRAM and Logic Processes differ too much
Bus/Crossbar Approach looks realistic 3D-Stacking Approaches seem promising
But:• Not verified(only Simulations)• Potential Heat Problems
2/6/17Advanced Seminar – Processing in Memory21
Sources
[1] siliconangle.com/blog/2011/10/21/sap-announces-hana-powered-analytics-cloud-at-teched-2011/sap-hana/
[2] amd.com[3] SC16 Invited Talks – John McCalpin[4] Design and Evaluation of Processing-in-Memory Architecture for the Smart Memory
Cube, Erfan Azarkhish et al.[5] Processing-in-Memory: Exploring the Design Space, Marco Scrbak et al.[6] Data-Centric Computing Frontiers: A survey On Processing-in-Memory,Patick Siegl, Rainer Buchty, Mladen Berekovic[7] Integrated Thermal Analysis for Processing in Die-Stacking Memory[8] Implementation of a 32-bit RISC Processor for the Data-Intensive ArchitectureProcessing-in-Memory Chip, Jeffrey Draper et al.[9] http://www.ece.cmu.edu/~ece447/s15/doku.php?id=start, Prof. Onur Mutlu