INTEGRATED WCET ESTIMATION OF MULTICORE...
-
Upload
nguyennguyet -
Category
Documents
-
view
223 -
download
0
Transcript of INTEGRATED WCET ESTIMATION OF MULTICORE...
![Page 1: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/1.jpg)
INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS
Dumitru Potop-Butucaru, Isabelle Puaut
1
![Page 2: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/2.jpg)
Motivation: Scalable timing analysis
Real-time systems: complexity steadily increases Hardware: Multi-core, networks-on-chips Software: Parallel/concurrent software
Safety margins used in practice after schedulability analysis are already enormous (40%-60%)
Further static abstraction is not a solution
How to preserve both tractability and precision? Probabilistic approaches (another form of abstraction), or Use « WCET-friendly » hardware and software
Limit/control timing interferences due to concurrency Static (off-line) scheduling, non-preemptive, etc. No shared caches, LRU caches, time-triggered execution, etc.
2
![Page 3: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/3.jpg)
Static timing analysis
3 basic sources of imprecision: Application-related:
Input arrival dates, data-dependent behavior
Mapping-related: Concurrency (pipelining, buses, scheduling)
Analysis-related: Abstraction (e.g. IPET, real-time calculus, etc.)
Our thesis: Few sources of imprecision in the application and mapping allow for scalable, precise analysis
3
![Page 4: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/4.jpg)
Reducing imprecision
Everybody is doing it (to a point) Industry: Space & time partitioning (among others)
Time-triggered standards: TTA, ARINC 653 Recent many-core chips: TilePro64, Kalray MPPA256, etc.
Research: Precision timed architectures (PRET) – Lee, etc. CompSoC, Aethereal, etc. Off-line scheduling – Fohler, Eles, Sorel, etc.
But we do it all the way: Remove all application- and mapping-related imprecision
sources that are not handled by classical WCET analysis Possibly add some back later on (future work) This paper: see that it’s possible and determine the gain
4
![Page 5: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/5.jpg)
Tiled MPSoC architecture
Multi-bank RAM Harvard-like architecture Full crossbar intra-tile interconnect Hardware locks for synchronization (not interrupts) Static routing (X-first)
Cachen (PLRU, write-through)
CPUn (MIPS32)
Local interconnect (crossbar) NIC
Cachen (LRU, write-through)
CPUn (MIPS32)
Buff
ered
D
MA
I/O (option)
Command router
Response router
Multi-bank RAM Pr
og.
RAM
/RO
M
Lock
uni
t
Djemal et al., DASIP 2012
Based on SoCLib (UPMC/LIP6)
5
![Page 6: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/6.jpg)
Tiled MPSoC architecture
Provide timing guarantees for inter-tile communications Use of locks, programmed arbitration (others do TDMA or
other types of resource reservation) Tool limitation: 1CPU/tile
Local interconnect (crossbar) NIC
Cachen (LRU, write-back)
CPUn (MIPS32)
Buff
ered
D
MA
I/O (option)
Command router
Response router
Multi-bank RAM Pr
og.
RAM
/RO
M
Lock
uni
t
South
West East
Local
6
![Page 7: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/7.jpg)
Tiled MPSoC applications
On each processor, sequential code Non-preemptive, off-line scheduling Synchronization by blocking send/recv operations
Lossless FIFOs A.k.a. Kahn process networks (G. Kahn, 1974)
No concurrent access to RAM banks, DMA units, NoC router outputs
Data allocation on memory banks, use of locks to enforce a predefined schedule
Tool limitations Sampled I/O only Send/recv primitives are explicitly matched Send/recv only at top level (global loop), non-conditioned
7
![Page 8: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/8.jpg)
Tiled MPSoC applications (example)
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
8
![Page 9: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/9.jpg)
Traditional timing analysis
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
Task1_1
Task1_2
Task1_3
Task2_1
9
![Page 10: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/10.jpg)
Traditional timing analysis
Task1_1
Task1_2
Task1_3
Task2_1
10
![Page 11: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/11.jpg)
Traditional timing analysis
Task1_1
Task1_2
Task1_3
Task2_1
WCET1_1
WCET1_2
WCET1_3
WCET2_1
11
![Page 12: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/12.jpg)
Traditional timing analysis
Task1_1
Task1_2
Task1_3
Task2_1
WCET1_1
WCET1_2
WCET1_3
WCET2_1
Application latency
12
![Page 13: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/13.jpg)
Traditional timing analysis
Task1_1
Task1_2
Task1_3
Task2_1
WCET1_1
WCET1_2
WCET1_3
WCET2_1
Application latency
Safety considerations when analyzing subtasks WCET_i_j are overestimated
Glue code between tasks is not considered Margins must be added to WCET_i_j
13
![Page 14: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/14.jpg)
Unified timing analysis
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
14
![Page 15: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/15.jpg)
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
Unified timing analysis
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
Task1_1
Task1_2
Task1_3
Task2_1
1. CFG extraction (unmodified)
15
![Page 16: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/16.jpg)
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
Unified timing analysis
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
Task1_1
Task1_2
Task1_3
Task2_1
1. CFG extraction (unmodified)
16
2. Per core low-level analysis
![Page 17: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/17.jpg)
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
Unified timing analysis
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
Task1_1
Task1_2
Task1_3
Task2_1
1. CFG extraction (unmodified)
17
2. Per core low-level analysis
Allows to capture reuse All code is considered (no margins needed)
![Page 18: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/18.jpg)
void core1() { int tqmf[24]; long xa, xb, el; int xin1, xin2, decis_levl; for(;;) { //Infinite loop xa = 0; xb = 0; for (i=0;i<12;i++) { // 12 iterations xa += (long) tqmf[2*i]*h[2*i]; xb += (long) tqmf[2*i+1]*h[2*i+1]; } send(channel1,(int)((xa+xb)>>15)); xin1=read_input(); xin2=read_input(); for(i=23;i>=2;i-‐-‐) { // 22 iterations tqmf[i]=tqmf[i-‐2]; } tqmf[1] = xin1; tqmf[0] = xin2; decis_levl = receive(channel2) ; write_output(decis_levl) ; } }
Unified timing analysis
const int decis_levl [30]; int core2() { int q,el; for(;;) {//Infinite loop el = receive(channel1); el = (el>=0)?el:(-‐el); for (q = 0; q < 30; q++) { // 30 iterations if (el <= decis_levl[q]) break; } send(channel2,decis_levl) ; } }
Task1_1
Task1_2
Task1_3
Task2_1
1. CFG extraction (unmodified)
18
2. Per core low-level analysis
3. Modeling of communications
Allows to capture reuse All code is considered (no margins needed)
![Page 19: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/19.jpg)
4. WCET estimation (standard IPET)
Unified timing analysis
Task1_1
Task1_2
Task1_3
Task2_1
1. CFG extraction (unmodified)
19
2. Per core low-level analysis
3. Modeling of communications c
b
a
d
f
e
Flow constraints: xb = xab + xcb = xbc + xbd xd = xdb = xdf + xde … Objective function: max(xa*ta + xb*tb + xdf*250)
Allows to capture reuse All code is considered (no margins needed)
![Page 20: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/20.jpg)
Unified timing analysis (detail, 1)
Task1_1
Task1_2
Task1_3
Task2_1
20
c
b
a
d
f
e
Flow constraints: xb = xab + xcb = xbc + xbd xd = xdb = xdf + xde … Objective function: max(xa*ta + xb*tb + xdf*250)
![Page 21: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/21.jpg)
Unified timing analysis (detail, 2)
Task1_1
Task1_2
Task1_3
Task2_1
21
c
b
a
d
f
e
Flow constraints: xb = xab + xcb = xbc + xbd xd = xdb = xdf + xde … Objective function: max(xa*ta + xb*tb + xdf*250)
Convert a parallel model in a sequential one for the analysis (critical path search)
Scalability
![Page 22: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/22.jpg)
Experimental results
Experimental setup 2x2 MPSoC
1 CPU/tile In order execution No variable time instructions
Cycle-accurate simulator
Heptane WCET analysis tool
Very accurate (precise) hardware model Same number of cycles for simple single-path programs
between Heptane and the SystemC simulator
Tile (0,0)
Tile (0,1)
Tile (1,1)
Tile (1,0)
22
![Page 23: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/23.jpg)
Experimental results
Two examples, 3 configurations: Adpcm
2 cores No (SW)
pipelining
4 cores One operation/CPU Pipelined
Load balancing (2 cores) Simple filter, need 2 CPUs to meet throughput Pipelined
QMF Multiplexer
High-band encoder
Low-band encoder
CPU 1
CPU 0
23
![Page 24: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/24.jpg)
Experimental results
Comparison with the isolated (traditional) timing analysis
Integrated (cycles)
Isolated (cycles)
Improvement (%)
Adpcm – 2 cores 73563 101431 36.5%
Adpcm – 4 cores 44568 55919 25.5%
Filter – 2 cores 110825 112543 1.55%
24
![Page 25: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/25.jpg)
Experimental results
Comparison with the isolated (traditional) timing analysis
Integrated (cycles)
Isolated (cycles)
Improvement (%)
Adpcm – 2 cores 73563 101431 36.5%
Adpcm – 4 cores 44568 55919 25.5%
Filter – 2 cores 110825 112543 1.55%
Always an improvement Improvement depend on the amount of reuse
25
![Page 26: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/26.jpg)
Experimental results
Comparison with the measured execution time (typical input, single run)
Integrated (cycles)
Measured (cycles, typical input)
Pessimism (%)
Adpcm – 2 cores 73563 64944 13.3%
Adpcm – 4 cores 44568 41468 7.5%
Filter – 2 cores 110825 108296 2.3%
26
![Page 27: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/27.jpg)
Experimental results
Comparison with the measured execution time (typical input, single run)
Integrated (cycles)
Measured (cycles, typical input)
Pessimism (%)
Adpcm – 2 cores 73563 64944 13.3%
Adpcm – 4 cores 44568 41468 7.5%
Filter – 2 cores 110825 108296 2.3%
Actual pessimism expected to be lower Still, pessimism is reasonable
27
![Page 28: INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONSwcet2013.imag.fr/Slides/WCET2013-PotopPuaut.pdf · INTEGRATED WCET ESTIMATION OF MULTICORE APPLICATIONS Dumitru Potop-Butucaru,](https://reader030.fdocuments.net/reader030/viewer/2022020416/5c7952e309d3f294278c6c7b/html5/thumbnails/28.jpg)
Conclusion
Predictable architecture + integrated approach static tight WCETs Scalable
Same complexity as IPET on a sequential program of the same size Better than traditional timing analysis
Captures cache reuse within one core No need for safety margins to account for glue code
Future work More experiments More general task/architecture model Closer interaction WCET – scheduling/mapping
Put WCET in the loop during scheduling/mapping
28