CIS 501 Computer Architecture What is Computer Architecture ...
Computer Architecture Exercises with Solutions
-
Upload
anisa-melishte -
Category
Technology
-
view
786 -
download
13
description
Transcript of Computer Architecture Exercises with Solutions
![Page 1: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/1.jpg)
Stalls and performance
• Stalls impede progress of a pipeline and result in
deviation from 1 instruction executing/clock cycle
• CPI pipelined =
– Ideal CPI + Pipeline stall cycles per instruction
– 1 + Pipeline stall cycles per instruction
• Ignoring overhead and assuming stages are balanced:
• Ideally, speedup equal to # of pipeline stages
Stalls occur because of hazards!
1Computer Architecture
![Page 2: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/2.jpg)
Computer Performance
“X is N% faster than Y.”
Execution Time of Y
Execution Time of X=
1001
N
Amdahl’s law for overall speedup
Overall Speedup =
S
FF )1(
1
F = The fraction enhanced
S = The speedup of the enhanced fraction
2Computer Architecture
![Page 3: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/3.jpg)
Using Amdahl’s law
Overall speedup if we make 90% of a program run 10 times faster.
Overall Speedup
10
9.0)9.01(
1
26.509.01.0
1
F = 0.9 S = 10
= =
Overall speedup if we make 80% of a program run 20% faster.
Overall Speedup
2.1
8.0)8.01(
1
153.166.02.0
1
F = 0.8 S = 1.2
= =
3Computer Architecture
![Page 4: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/4.jpg)
[1] .You have a system that contains a special processor for doing floating-point
operations. You have determined that 50% of your computations can use the
floating-point processor. The speedup of the floating pointing-point processor is 15.
a) Overall speedup achieved by using the floating-point processor.
F = 0.5 S = 15
Overall speedup = 876.1033.05.0
1
15
5.0)5.01(
1
b) Overall speedup achieved if you modify the compiler so that 75% of the
computations can use the floating-point processor.
F = 0.75 S = 15
Overall speedup = 33.305.025.0
1
15
75.0)75.01(
1
4Computer Architecture
![Page 5: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/5.jpg)
c) What fraction of the computations should be able to use the floating–point
processor in order to achieve an overall speedup of 2.25?
F = ? S = 15
15)1(
125.2
FF
FFF 1415
15
1515
15
15)1415(25.2 F
155.3175.33 F
75.185.31 F
595.05.31
75.18F or 60%
5Computer Architecture
![Page 6: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/6.jpg)
[2] . You have a system that contains a special processor for doing floating-point
operations. You have determined that 60% of your computations can use the
floating-point processor. When a program uses the floating-point processor, the
speedup of the floating-point processor is 40% faster than when it doesn’t use it.
a) Overall speedup by using the floating-point processor.
F = 0.6 S = 1.4
Overall speedup = 206.1429.04.0
1
4.1
6.0)6.01(
1
b) In order to improve the speedup you are considering two options:
• Option 1: Modifying the compiler so that 70% of the computations can use
the floating-point processor. Cost of this option is $50K.
• Option 2: Modifying the floating-point processor . The speedup of the
floating-point processor is 100% faster than when it doesn’t use it. Assume
in this case that 50% of the computations can use the floating–point
processor. Cost of this option is $60K.
Which option would you recommend? Justify your answer quantitatively.6Computer Architecture
![Page 7: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/7.jpg)
F = 0.7 S = 1.4
Overall speedup = 25.15.03.0
1
4.1
7.0)7.01(
1
F = 0.5 S = 2
Overall speedup = 33.125.05.0
1
2
5.0)5.01(
1
KK
SpeedupCost 40$
25.1
50$ Option 1
KK
SpeedupCost 1.45$
33.1
60$ Option 2
Therefore, Option 1 is better because it has a smaller Cost/Speedup ratio.
7Computer Architecture
![Page 8: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/8.jpg)
[3]. Suppose you have a load/store computer with the following instruction mix:
Operation Frequency No. of Clock cyclesALU ops 35% 1Loads 25% 2Stores 15% 2Branches 25% 3
a) Compute the CPI.
b) We observe that 35% of the ALU ops are paired with a load, and we propose to replace these ALU ops and their loads with a new instruction. The new instruction takes 1 clock cycle. With the new instruction added, branches take 5 clock cycles, Compute the CPI for the new version.
9.1)3*25.0()2*15.0()2*25.0()1*35.0( old
CPI
1225.035.0*35.0
8Computer Architecture
![Page 9: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/9.jpg)
)1225.01(
1*1225.05*25.02*15.02*)1225.025.0(1*)1225.035.0(
newCPI
455.28775.0
155.2
c) If the clock of the old version is 20% faster than the new version, which version has faster CPU Execution time and by how much percent?
2.1old
new
CCT
CCToldnew CCTCCT *2.1
36.1,
*
*2.1*46.2*
**9.1
1.9
2.59 faster is version old So
CCTIC*2.59
CCTIC*0.8775 Time Exec. CPU
CCTIC Time Exec. CPU
oldold
oldoldnew
oldoldold
By 36% 9Computer Architecture
![Page 10: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/10.jpg)
[4].For the purpose of solving a given application problem, you benchmark a program on two computer systems. On system A, the object code executed 80 million Arithmetic Logic Unit operations (ALU ops), 40 million load instructions, and 25 million branch instructions. On system B, the object code executed 50 million ALU ops, 50 million loads, and 40 million branch instructions. In both systems, each ALU op takes 1 clock cycles, each load takes 3 clock cycles, and each branch takes 5 clock cycles.
a) Compute the relative frequency of occurrence of each type of instruction executed in both systems.
0.28140
40 0.17
145
25
0.36140
50 0.28
145
40
0.36140
50 0.55
145
80
ALU ops
Loads
Branches
A B
10Computer Architecture
![Page 11: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/11.jpg)
b) Find the CPI for each system.
84.2)5*28.0()3*36.0()1*36.0(
24.2)5*17.0()3*28.0()1*55.0(
B
A
CPI
CPI
c) Assuming that the clock on system B is 10% faster than the clock on system A, which system is faster for the given application problem and by how much percent?
1.1CCT
CCT
B
A BA CCT*1.1CCT
1.11357.28
397.6 faster is A SystemSo,
CCT*10*397.6
CCT*2.84*10*140 Time Exec. CPU
CCT*10*357.28
CCT*1.1*2.24*10*145 Time Exec. CPU
B6
B6
B
B6
B6
A
By 11% 11Computer Architecture
![Page 12: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/12.jpg)
A common memory hierarchy
CPU Registers 100s Bytes<10s ns
Cache K Bytes10-100 ns1-0.1 cents/bit
Main Memory M Bytes 200ns- 500ns$.0001-.00001 cents /bit
DiskG Bytes, 10 ms (10,000,000 ns)10-5 - 10-6 cents/bit
Tape infinitesec-min 10 -8
Registers
Cache
Memory
Disk
Tape
Upper Level
faster
Larger
Lower Level
Computer Architecture
12
![Page 13: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/13.jpg)
13
Average Memory Access Time
AMAT = (Hit Time) + (1 - h) x (Miss Penalty)
•Hit time:
– basic time of every access.
•Hit rate (h):
– fraction of access that hit
•Miss penalty:
– extra time to fetch a block from lower level, including time
to replace in CPU
•Introduces caches to improve hit time.
Computer Architecture
![Page 14: Computer Architecture Exercises with Solutions](https://reader030.fdocuments.net/reader030/viewer/2022020712/5491a334b479599d2d8b534c/html5/thumbnails/14.jpg)
Second-level caches
• Introduces new definition of AMAT:
– Hit timeL1 + Miss RateL1 * Miss PenaltyL1
– Where, Miss PenaltyL1 =• Hit TimeL2 + Miss RateL2 * Miss PenaltyL2
• So 2nd level miss rate measure from 1st level cache misses…
Computer Architecture
14