An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez,...

22
An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents , Raúl Martínez, and Carlos Molina [email protected] Computer Architecture Department UPC – BarcelonaTech

description

Outline Introduction Current ruby prefetching solution Our solution Implementation details Case of study Conclusions 3

Transcript of An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez,...

Page 1: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

An Accurate and Detailed Prefetching

Simulation Framework for gem5

Martí Torrents, Raúl Martínez, and Carlos Molina

[email protected] Architecture DepartmentUPC – BarcelonaTech

Page 2: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

2

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 3: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

3

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 4: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

4

Prefetching

• Reduce memory latency• Bring to a nearest cache data required by CPU• Increase the hit ratio• Implemented in many commercial processors• Erroneous prefetching may produce slowdown• Simulation tools should include this capability

Page 5: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

5

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 6: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

6

L1 Cache controller

Current ruby prefetching solution

Current Prefetcher

Prefetch queue

MESI protocolM E S I

L1 Private Cache Memory

Page 7: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

7

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 8: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

8

Our solution

L1 Cache controller Current Prefetcher

Prefetch queue

MESI protocolM E S I

L1 Private Cache Memory

MOESI protocolM E S

I

O

Prefetch wrapper

Prefetch queue

Abstract Prefetcher

Specific PrefetchEngineTagged

Prefetcher

Global History Buffer

ReferencePrediction

Table

CurrentPrefetchEngine

Prefetch profiler

Page 9: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

9

L2 Shared Cache Bank

Prefetch wrapper

L2 Cache controller

MOESI protocolM E S

IPrefetch queue

O

Prefetch profiler

Abstract Prefetcher

Specific PrefetchEngine

Our solution

L1 Private Cache Memory

Prefetch wrapper

L1 Cache controller

MOESI protocolM E S

IPrefetch queue

O

Prefetch profiler

Abstract Prefetcher

Specific PrefetchEngine

Page 10: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

10

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 11: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

11

Implementation details

• Receives command line options• Creates and inits the prefetch engine• Manages communication Controller –> Prefetcher• And Prefetcher –> Prefetch Queue –> Controller• Collects statistics

Prefetch wrapper

Page 12: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

12

Implementation details

• Accumulates the statistics

Prefetch profiler

• observed_misses - numMissesObserved

• Cancelled - numDroppedPrefetches

• completed - numPrefetchAccepted

• hit• in_cache • late –

numPartialHits/numHits• overflowed

• page_faults – numPagesCrossed

• total - numPrefetchRequested

• unuseful• useful• queue_merged_requests• generated_prefetches_p

er_train - streams• numMissedPrefetchBlock

s

MISS

Page 13: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

13

Implementation details

• Collects the requests generated by the engine• Checks the page fault error• Merges repeated requests

Prefetch queue

Page 14: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

14

Implementation details

• Works as an abstract class• Must be inherited by the specific pref• Virtual functions must be redeclared

– Init: Initialization function– Prefetcher size, distance, aggressiveness, etc.

– Observe request: Called on each cache access – Hit/miss, prefetch/no_prefetch, accessed address, etc.

– Allocate: Called when data allocated in cache– Same as observe request

– Deallocate: Called when evicting from cache– Only evicted address

Abstract Prefetcher

Page 15: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

15

Implementation details

• Notifies the wrapper:– Cache accesses– Cache allocation– Cache evictions

• Reads from Prefetch Queue• Prefetch issued when no Loads/Stores• Protocol modification similar to a Load operation• Very similar to the current solution

L1 Cache controller

MOESI protocolM E S

I

O

Page 16: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

16

Implementation details

• Same as in the L1 but…

• L2 local hits– Some data that is invalid in L2 but locally allocated

• L1_GETS does not store in L2– Protocol modified to store pref requests in L2

• Pref queue generates a request for another tile– Request is forwarded to the corresponding tile

L2 Cache controller

MOESI protocolM E S

I

O

Page 17: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

17

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 18: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

18

Case of Study: NoC aware prefetch performance evaluation• We tested 3 classical prefetch engines:

– Tagged Prefetcher– Reference Prediction Table (RPT) – Global History Buffer (GHB)

• With the gem5 Simulator using – 16 tiled x86 CPUs – L1 prefetchers– Ruby memory system– MOESI coherency protocol– Garnet network simulator

• Parsecs 2.1

Page 19: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

19

Case of Study: Results

M. Torrents, R. Martínez, C. Molina. “Network Aware Performance Evaluation of Prefetching Techniques in CMPs”. Simulation Modeling Practice and Theory (SIMPAT), 2014.

Page 20: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

20

Outline

IntroductionCurrent ruby prefetching solutionOur solutionImplementation detailsCase of studyConclusions

Page 21: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

21

Conclusions

• Prefetcher is important and it must be simulated• Current solution is ok• Our solution goes one step farther

– Easy to change/add new prefetch engines– Detailed statistics about prefetching– Garnet can identify prefetching traffic– Useful for statistics or traffic manipulation

• Current tool can be easily included in new solution• Current solution is ok for non prefetch researchers • Our tool is better for research related with prefetch

Page 22: An Accurate and Detailed Prefetching Simulation Framework for gem5 Martí Torrents, Raúl Martínez, and Carlos Molina Computer Architecture.

An Accurate and Detailed Prefetching

Simulation Framework for gem5

Martí Torrents, Raúl Martínez, and Carlos Molina

[email protected] Architecture DepartmentUPC – BarcelonaTech