1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization...
-
Upload
gloria-joseph -
Category
Documents
-
view
241 -
download
0
description
Transcript of 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization...
![Page 1: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/1.jpg)
1
Code Optimization
![Page 2: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/2.jpg)
2
Outline
• Machine-Independent Optimization– Code motion– Memory optimization
• Suggested reading– 5.2 ~ 5.6
![Page 3: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/3.jpg)
3
Motivation
• Constant factors matter too!– easily see 10:1 performance range depending on
how code is written– must optimize at multiple levels
• algorithm, data representations, procedures, and loops
![Page 4: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/4.jpg)
4
Motivation
• Must understand system to optimize performance– how programs are compiled and executed– how to measure program performance and
identify bottlenecks– how to improve performance without destroying
code modularity and generality
![Page 5: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/5.jpg)
5
5.3 Program Example
![Page 6: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/6.jpg)
6
Vector ADT P384
typedef struct { int len ;
data_t *data ; } vec_rec, *vec_ptr ; typedef int data_t ;
length
data
0 1 2 length–1
Figure 5.3 P385
![Page 7: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/7.jpg)
7
Procedures P386
• vec_ptr new_vec(int len)– Create vector of specified length
• data_t *get_vec_start(vec_ptr v)– Return pointer to start of vector data
P392
P386
![Page 8: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/8.jpg)
8
Procedures
• int get_vec_element(vec_ptr v, int index, int *dest)– Retrieve vector element, store at *dest– Return 0 if out of bounds, 1 if successful
• Similar to array implementations in Pascal, Java– E.g., always do bounds checking
![Page 9: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/9.jpg)
9
Vector ADT
vec_ptr new_vec(int len) {
/* allocate header structure */vec_ptr result = (vec_ptr) malloc(sizeof(vec_rec)) ;if ( !result )
return NULL ;
![Page 10: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/10.jpg)
10
Vector ADT
/* allocate array */if ( len > 0 ) {
data_t *data = (data_t *)calloc(len, sizeof(data_t)) ;
if ( !data ) { free( (void *)result ) ; return NULL ; /* couldn’t allocte stroage
*/}result->data = data
} else result->data = NULL
return result ;}
![Page 11: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/11.jpg)
11
Vector ADT
/** Retrieve vector element and store at dest.* Return 0 (out of bounds) or 1 (successful)*/ int get_vec_element(vec_ptr v, int index, data_t *dest)
{ if ( index < 0 || index >= v->len)
return 0 ;*dest = v->data[index] ;return 1;
}
![Page 12: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/12.jpg)
12
Vector ADT
/* Return length of vector */ int vec_length(vec_ptr) {return v->len ;
}
/* Return pointer to start of vector data */data_t *get_vec_start(vec_ptr v){return v->data ;
}
P392
![Page 13: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/13.jpg)
13
Optimization Example P385
#ifdef ADD #define IDENT 0#define OPER +
#else#define IDENT 1#define OPER *
#endif
![Page 14: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/14.jpg)
14
Optimization Example P387
void combine1(vec_ptr v, data_t *dest){ int i; *dest = IDENT; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest = *dest OPER val; }}
![Page 15: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/15.jpg)
15
Optimization Example
• Procedure– Compute sum (product) of all elements of
vector– Store result at destination location
![Page 16: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/16.jpg)
16
5.2 Expressing Program Performance
![Page 17: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/17.jpg)
17
Time Scales P382
• Absolute Time– Typically use nanoseconds
• 10–9 seconds
– Time scale of computer instructions
![Page 18: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/18.jpg)
18
Time Scales
• Clock Cycles– Most computers controlled by high frequency
clock signal– Typical Range
• 100 MHz – 108 cycles per second– Clock period = 10ns
• 2 GHz – 2 X 109 cycles per second– Clock period = 0.5ns
![Page 19: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/19.jpg)
19
CPE P383
1 void vsum1(int n)2 {3 int i;45 for (i = 0; i < n; i++)6 c[i] = a[i] + b[i];7 }8
![Page 20: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/20.jpg)
20
CPE P383
9 /* Sum vector of n elements (n must be even) */10 void vsum2(int n)11 {12 int i;1314 for (i = 0; i < n; i+=2) {15 /* Compute two elements per iteration */16 c[i] = a[i] + b[i];17 c[i+1] = a[i+1] + b[i+1];18 }19 }
![Page 21: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/21.jpg)
21
Cycles Per Element
• Convenient way to express performance of program that operators on vectors or lists
• Length = n• T = CPE*n + Overhead
![Page 22: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/22.jpg)
22
Cycles Per Element Figure 5.2 P383
0
100
200
300
400
500
600
700
800
900
1000
0 50 100 150 200
Elements
Cyc
les
vsum1Slope = 4.0
vsum2Slope = 3.5
![Page 23: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/23.jpg)
23
5.3 Program Example5.4 Eliminating Loop Inefficiencies5.5 Reducing Procedure Calls5.6 Eliminating Unneeded Memory References
![Page 24: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/24.jpg)
24
Time Scales P387
void combine1(vec_ptr v, int *dest){ int i; *dest = 0; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest += val; }}
5.3 Program Example
![Page 25: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/25.jpg)
25
Time Scales P385
• Procedure– Compute sum of all elements of integer vector– Store result at destination location– Vector data structure and operations defined
via abstract data type• Pentium II/III Performance: CPE
– 42.06 (Compiled -g) 31.25 (Compiled -O2)
![Page 26: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/26.jpg)
26
Understanding Loop
void combine1-goto(vec_ptr v, int *dest){ int i = 0; int val; *dest = 0; if (i >= vec_length(v)) goto done; loop: get_vec_element(v, i, &val); *dest += val; i++; if (i < vec_length(v)) goto loop done:}
1 iteration
![Page 27: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/27.jpg)
27
Inefficiency
• Procedure vec_length called every iteration• Even though result always the same
5.4 Eliminating Loop Inefficiencies
![Page 28: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/28.jpg)
28
Code Motion P388
void combine2(vec_ptr v, int *dest){ int i; int length = vec_length(v); *dest = 0; for (i = 0; i < length; i++) { int val; get_vec_element(v, i, &val); *dest += val; }}
![Page 29: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/29.jpg)
29
Code Motion P388
• Optimization– Move call to vec_length out of inner loop
• Value does not change from one iteration to next• Code motion
– CPE: 22.61 (Compiled -O2)• vec_length requires only constant time, but
significant overhead
![Page 30: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/30.jpg)
30
Reduction in Strength P392
void combine3(vec_ptr v, int *dest){ int i; int length = vec_length(v); int *data = get_vec_start(v); *dest = 0; for (i = 0; i < length; i++) { *dest += data[i];}
5.5 Reducing Procedure Calls
![Page 31: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/31.jpg)
31
Reduction in Strength
• Optimization– Avoid procedure call to retrieve each vector
element• Get pointer to start of array before loop• Within loop just do pointer reference• Not as clean in terms of data abstraction
– CPE: 6.00 (Compiled -O2)• Procedure calls are expensive!• Bounds checking is expensive
![Page 32: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/32.jpg)
32
Eliminate Unneeded Memory References P394void combine4(vec_ptr v, int *dest){ int i; int length = vec_length(v); int *data = get_vec_start(v); int sum = 0; for (i = 0; i < length; i++) sum += data[i]; *dest = sum;}
5.6 Eliminating Unneeded Memory References
![Page 33: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/33.jpg)
33
Eliminate Unneeded Memory References
• Optimization– Don’t need to store in destination until end– Local variable sum held in register– Avoids 1 memory read, 1 memory write per
cycle– CPE: 2.00 (Compiled -O2)
• Memory references are expensive!
![Page 34: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/34.jpg)
34
Detecting Unneeded Memory References
.L18:movl (%ecx,%edx,4),%eaxaddl %eax,(%edi)
incl %edxcmpl %esi,%edxjl .L18
.L24:addl (%eax,%edx,4),%ecx
incl %edxcmpl %esi,%edxjl .L24
Combine3 Combine4
![Page 35: 1 Code Optimization. 2 Outline Machine-Independent Optimization Code motion Memory optimization Suggested reading 5.2 ~ 5.6.](https://reader035.fdocuments.net/reader035/viewer/2022062317/5a4d1b857f8b9ab0599bc6da/html5/thumbnails/35.jpg)
35
Detecting Unneeded Memory References
• Performance– Combine3
• 5 instructions in 6 clock cycles• addl must read and write memory
– Combine4• 4 instructions in 2 clock cyles