Power and Frequency Analysis for Data and Control Independence in Embedded Processors
description
Transcript of Power and Frequency Analysis for Data and Control Independence in Embedded Processors
![Page 1: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/1.jpg)
Power and Frequency Analysis for Data and Control Independence in
Embedded Processors
Farzad Samie Amirali Baniasadi
Sharif University of Technology University of Victoria
![Page 2: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/2.jpg)
This Work
Goal• Power and frequency analysis for control independent and data
independent instructions in embedded processors
Motivation• Embedded processors are becoming complex
• Modern embedded processors use speculation
• Mis-speculation causes performance and power penalty
• Power is a major concern in embedded processors
• Save power and gain performance
2
![Page 3: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/3.jpg)
This Work (cont.)
Our Approach• Reducing wasted energy and time in mispredictions.
How?• Identify and bypass Control Independent (CI) and Data Independent
(DI) instructions.
• CIs: Instruction executing independent of branch outcome.
• CI-DI: CI Instructions executing with the same operands.
Key Result:• 12% processor energy reduction.
3
![Page 4: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/4.jpg)
Background
Branch Prediction
4
Branch Predictor
Branch History
Program Counter
Predicted direction
Predicted target address
![Page 5: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/5.jpg)
Wrong Path (squashed) ??
Background (cont.)
5
I1
I2
I3
I4
I7
I8I9
I5I6
Branch Inst.Not taken
Misprediction Detection
Taken
Right Path
I9
I8
I7
I12
I11
I10
Control Independent Instructions (CIs)
![Page 6: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/6.jpg)
Background (cont.)
6
R1←R1+R2
Not taken Taken
R4←R1
If (R4=0)
R2←R4-R1
R5←R2-R3
R3←0
R5←R4+1
R1←R1-1
R3←0
R4←R6+R4
R1←R4+R1
R5←R5-2R3←R3-R4
Data Independent (CI-DI)Data Dependent (CI-DD)Data Dependent (CI-DD)Data Independent (CI-DI)
R1←R1-1R5←R2-R3
R5←R4+1
![Page 7: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/7.jpg)
CI-DI vs. CI-DD
• Bypassing CI-DIs saves more energy• No need to read operands/execute again
• Bypassing CI-DIs provides higher performance• Not need to waste time for reading operand/executing
7
Fetch Issue Dispatch ExecuteWriteBack
CI-DD
CI-DI
![Page 8: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/8.jpg)
Methodology
• Modified SimpleScalar
• Wattch for power measurement
• MiBench: Embedded Benchmark Suite
8
![Page 9: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/9.jpg)
Distribution
Wrong Path: 12%, CI: 5%, CI-DI: 2%9
![Page 10: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/10.jpg)
CI Power Reduction in Different Units
Max: branch predictor unit, Min: instruction cache
10
![Page 11: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/11.jpg)
CI Power Reduction in Stages
11
Rijndael: low misprediction low wrong path low CIs
![Page 12: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/12.jpg)
Power Sensitivity to RUU size
12
CI CI-DI
Higher power dissipation for bigger RUU sizes
![Page 13: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/13.jpg)
Power Sensitivity to Execution Bandwidth
13
CI CI-DI
Higher power dissipation for wider execution bandwidth
![Page 14: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/14.jpg)
Power Sensitivity to Branch Predictor Size
14Little sensitivity to branch predictor size
![Page 15: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/15.jpg)
Related Work
• Rotenberg et. al: studied control independence in superscalar processors, HPCA99.
• Collins et. al: suggested mechanism to predict re-convergent point, Micro04.
• Lam and Wilson: studied impact of CIs on instruction level parallelism, ISCA92.
• Gandhi et. al: recover selected branch mis-prediction, HPCA04.
15
![Page 16: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/16.jpg)
Conclusion
• Categorize CI to CI-DI and CI-DD
• Potential power saving for bypassing CI and CI-DI instructions up-to 12%
• High sensitivity to RUU size
• High sensitivity to execution bandwidth
• Little sensitivity to branch predictor size
16
![Page 17: Power and Frequency Analysis for Data and Control Independence in Embedded Processors](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815861550346895dc5bd4f/html5/thumbnails/17.jpg)
Question
Thank you
17