André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor...

André Seznec Caps Team

IRISA/INRIA

Looking for limits in branch predictionwith the GTL predictor

André Seznec

IRISA/INRIA/HIPEAC

André SeznecCaps Team

Motivations

Geometric history length predictors introduced in 2004-2006 OGEHL, CBP-1, dec. 2004 TAGE, JILP ’06, feb. 2006

• Storage effective• Exploits very long global histories• Were defined with possible implementation in mind

What are the limits of accuracy that can be captured with these schemes ?

How do they compare with unconstrained prediction schemes ?

L(0) ?

L(2)L(1)

Geometric history length predictors:

global history +multiple lengths

GEometric History Length predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

What is important: L(i)-L(i-1) is drastically increasing

most of the storage for short history !!

{0, 2, 4, 8, 16, 32, 64, 128}

Capture correlation on very long histories

Combining multiple predictions

Neural inspired predictors Use a (multiply)-add tree

Partial matching Use tagged tables and the longest matching history

O-GEHL, CBP-1

TAGE, JILP’ 06

L(0) ∑

L(2)L(1)

CBP-1 (2004): O-GEHL

Final computation through a sum

Prediction=Sign

256Kbits: 12 components 3.670 misp/KI

=? =? =?

11 1 1 1 1 1

JILP ‘06: TAGElongest matching history

256Kbits: 3.358 misp/KI

What is global history

conditional branch history: path confusion on short histories

path history: Direct hashing leads to path confusion

1. Represent all branches in branch history

2. Use path AND direction history

Using a kernel history and a user history

Traces mix user and kernel activities: Kernel activity after exception

• Global history pollution

Solution: use two separate global histories

User history is updated only in user mode Kernel history is updated in both modes

Accuracy limits for TAGE

Varying the predictor size, the number of components, the tag width, the history length.

Allowing multiple allocations

The best accuracy on distributed traces:

3.054 misp/KI• History length around 1,000• 15-20 components• No need for tags wider than 16 bits

Accuracy limits for GEHL

Varying the predictor size, the number of components, the history length, counter width

(slightly) improving the update policy and fitting in the two hours simulation rule

on the distributed traces:

2.842 misp/KI• 97 components• 8 bits counter• 2,000 bits global history

GEHL vs TAGE

Realistic implementation parameters (storage budget, number of components)TAGE is more accurate than (O-)GEHL

Unlimited budget, huge number of componentsGEHL is more accurate than TAGE

Will it be sufficient to win

The Championship ?

GEHL history length: 2,00097 components

2.842 misp/KI

A step further: hybrid GEHL-TAGE

On a few benchmarks, TAGE is more accurate than GEHL,

Let us try an hybrid GEHL-TAGE predictor

Hybrid GEHL-TAGE

istory + P

egskew

Inherit from:Agree/bimode, YAGS, 2bcgskew,

GEHL+TAGE

GEHL provides the main prediction: also used as the base predictor for TAGE

(YAGS inspired)

TAGE records when GEHL fails:

{prediction, address, history}

(agree/bimode, YAGS inspired)

Meta selects between GEHL and TAGE

(2bcgskew inspired)

Let us have fun !!

GEHL history length: 400

TAGE history length: 100,000

2.774 misp/KI

Might still be unsufficient

GEHL history length: 400

TAGE history length: 100,000

2.774 misp/KI

Adding a loop predictor

The loop predictor captures the number of iterations of a loopWhen successively encounters 8 times the

same number of iterations, the loop predictor provides the prediction.

Advantage:Very reliable

GTL predictor

istory + P

egskew

Looppredictor

+ static prediction on first occurrence

confid

Hope this will be sufficient to win

the Championship !!

GEHL, 97 comp., 400 hist. + TAGE, 19 comp., 100,000 hist

+ loop predictor

2.717 misp/KI

Geometric History Length predictorsand limits on branch prediction

Unlimited budget, huge number of components GEHL is more accurate than TAGE

Very old correlation can be captured: On two benchmarks, using 10,000 history is really

helping

Does not seem to be a lot of potential extra benefit from local history We did not find any interesting extra scheme apart loop

prediction Loop prediction, very marginal apart gzip

The End

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor...

Documents

Transcript of André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor...

Programming Parallel and Distributed Systems for Large Scale Numerical Simulation Application Christian Perez INRIA researcher IRISA Rennes, France.

1 Thread level parallelism: It’s time now ! André Seznec IRISA/INRIA CAPS team.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Enchaînement de tâches robotiques Tasks sequencing for sensor-based control Nicolas Mansard Supervised by Francois Chaumette Équipe Lagadic IRISA / INRIA.

Ivan Laptev IRISA/INRIA, Rennes, France December 08, 2006 Boosted Histograms for Improved Object Detection.

Learning Stochastic Systems with Global PAC BoundsLearning Stochastic Systems with Global PAC Bounds Hugo Bazille INRIA/IRISA, Blaise Genest CNRS/IRISA, Cyrille Jégourel SUTD, Singapore

Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection.

Mediaeval2012 - How INRIA/IRISA identifies Geographic Location of Videos

1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.

Adaptive Mesh Subdivision for Precomputed Radiance Transfer Jaroslav Křivánek Univ. of Central Florida CTU Prague IRISA – INRIA Rennes Sumanta Pattanaik.

Parallel&Computaon&&&Genomic& - ASPROM · 2018-07-14 · Parallel&Computaon&&&Genomic& Dominique&LAVENIER Irisa/&Inria–Rennes& GenScale&team&leader&

A few issues on the design of future multicores André Seznec IRISA/INRIA.

A Similar Fragments Merging Approach to Learn Automata on Proteins Goulven KERBELLEC & François COSTE IRISA / INRIA Rennes.

Queueing Netw orks - | Institut de Recherche ... TEAC… · Queueing Netw orks G. Rubino INRIA / IRISA, Rennes, France February 2006! " # $) * +

HEPTANE’ - Inria · HEPTANE’ (HadesEmbeddedProcessorTiming ANalyzEr) A Modular’Toolfor’ Stac WCET Analysis’ ALF research group, IRISA/INRIA Rennes TuTor’16, Vienna, April

Periodic Motion Detection via Approximate Sequence Alignment Ivan Laptev*, Serge Belongie**, Patrick Perez* *IRISA/INRIA, Rennes, France **Univ. of California,

Modular Processings based on Unfoldings Eric Fabre & Agnes Madalinski DistribCom Team Irisa/Inria UFO workshop - June 26, 2007.

1 TAGE-SC-L Branch Predictors André Seznec INRIA/IRISA.

Vision-based assistance for wheelchair navigation along ... · Inria Rennes, France, marie.babel at irisa.fr 2Alexandre Krupa is with Inria Rennes and IRISA, France, alexandre.krupa

Architecture des systèmes pair-à-pair de gestion de données Gabriel Antoniu Projet PARIS IRISA/INRIA.