Evaluation of Memory Sub-sytem Performance with...

12
+ Presenter Evaluation of Memory Sub-sytem Performance with FPGAs Shih-Lien Lu 1 , Eriko Nurvitadhi 2 , Jumnit Hong 3 , + Steen Larsen 1 1 Intel Corp, 2 Dept. Of ECE CMU, 3 Dept. Of ECE OSU

Transcript of Evaluation of Memory Sub-sytem Performance with...

+ Presenter

Evaluation of MemorySub-sytem Performance

with FPGAs

Shih-Lien Lu1, Eriko Nurvitadhi2, Jumnit Hong3, +Steen Larsen1

1Intel Corp, 2Dept. Of ECE CMU, 3Dept. Of ECE OSU

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 22

Recent and Current Projects

PHA$E – Programmable Hardware Assisted Cache EmulatorACE – Active Cache EmulatorPMI – Programmable Memory Interface

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 33

Programmable Hardware Assisted Cache Simulator (PHA$E)

Real-time FPGA based cache simulator.Collected cache organization studies for several server workloads on PIII.Results published(FPL’03, WWC’03, CEACW’04)

2P

4P

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 44

System Block Diagram

DRAMDRAM

DRAMDRAM

DRAMDRAM

CPU3CPU2

FPGAFPGAFPGA

System Memory

CC-FPGA

CPU1

system under test

DRAMDRAM

Host system

MC

CPU0 PHA$E

PCI interface

TEK LAI

Bus InterfaceFPGA

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 55

0.01

0.1

11

10 100 1000 10000

sh direct sh 2-way sh 4-way sh 8-wayCPU0_pv CPU1_pv CPU2_pv CPU3_pv

Sample PHA$E Results (TPCC)Cache Size (MB)Cache Size (MB)

Mis

s R

atio

1

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 66

Active Cache Emulator (ACE)Active emulation on processor busSystem with variable time dilationProvide performance analysis and miss ratios

ACE Broad

CPU

SystemDRAM

Xilinx’Programming

Cable

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 77

ACE Block DiagramCPU

5 GHz

MemoryController

DRAM

100ns access time

L3 added

10ns access time

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 88

ACE Time Dilation Example

CPU500MHz

MemoryController

DRAM

1000ns access time(FPGA injected

snoop stalls)

L3 added

100ns access time(FPGA allowsnormal access)

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 99

Sample ACE ResultsQuake 3 Performance 800x600

0.4

0.5

0.6

0.7

0.8

0.9

1

1 10 100Cache Size (MB)

Nor

mal

ized

Per

form

ance

32B64B128B256B512B

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 1010

Memory Flexible InterfaceOrion (450GX) Memory controller.

Synthesis complete, some simulationBoard ready to tapeout

Chipset pre-fetch and memory studies

Altera – Cyclone C12 (12K LE = ~200K gates)

Xilinx – Spartan3 S1500(26.6K LUT = ~1.5M gates)

Estimated logic size 9K LE 8K LUTMax Frequency 74MHz/13ns 69MHz/14ns

Estimated gate count ~150K ~480K

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 1111

Programmable Memory Interface

© Intel Corp.© Intel Corp. WARFP2005WARFP2005 1212

ConclusionTarget FPGA emulation on a more defined and specific area – memory subsystem

Some limitationsEasier to setup real workloads