Evaluation of Memory Sub-sytem Performance with...
Transcript of Evaluation of Memory Sub-sytem Performance with...
+ Presenter
Evaluation of MemorySub-sytem Performance
with FPGAs
Shih-Lien Lu1, Eriko Nurvitadhi2, Jumnit Hong3, +Steen Larsen1
1Intel Corp, 2Dept. Of ECE CMU, 3Dept. Of ECE OSU
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 22
Recent and Current Projects
PHA$E – Programmable Hardware Assisted Cache EmulatorACE – Active Cache EmulatorPMI – Programmable Memory Interface
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 33
Programmable Hardware Assisted Cache Simulator (PHA$E)
Real-time FPGA based cache simulator.Collected cache organization studies for several server workloads on PIII.Results published(FPL’03, WWC’03, CEACW’04)
2P
4P
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 44
System Block Diagram
DRAMDRAM
DRAMDRAM
DRAMDRAM
CPU3CPU2
FPGAFPGAFPGA
System Memory
CC-FPGA
CPU1
system under test
DRAMDRAM
Host system
MC
CPU0 PHA$E
PCI interface
TEK LAI
Bus InterfaceFPGA
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 55
0.01
0.1
11
10 100 1000 10000
sh direct sh 2-way sh 4-way sh 8-wayCPU0_pv CPU1_pv CPU2_pv CPU3_pv
Sample PHA$E Results (TPCC)Cache Size (MB)Cache Size (MB)
Mis
s R
atio
1
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 66
Active Cache Emulator (ACE)Active emulation on processor busSystem with variable time dilationProvide performance analysis and miss ratios
ACE Broad
CPU
SystemDRAM
Xilinx’Programming
Cable
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 77
ACE Block DiagramCPU
5 GHz
MemoryController
DRAM
100ns access time
L3 added
10ns access time
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 88
ACE Time Dilation Example
CPU500MHz
MemoryController
DRAM
1000ns access time(FPGA injected
snoop stalls)
L3 added
100ns access time(FPGA allowsnormal access)
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 99
Sample ACE ResultsQuake 3 Performance 800x600
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100Cache Size (MB)
Nor
mal
ized
Per
form
ance
32B64B128B256B512B
© Intel Corp.© Intel Corp. WARFP2005WARFP2005 1010
Memory Flexible InterfaceOrion (450GX) Memory controller.
Synthesis complete, some simulationBoard ready to tapeout
Chipset pre-fetch and memory studies
Altera – Cyclone C12 (12K LE = ~200K gates)
Xilinx – Spartan3 S1500(26.6K LUT = ~1.5M gates)
Estimated logic size 9K LE 8K LUTMax Frequency 74MHz/13ns 69MHz/14ns
Estimated gate count ~150K ~480K