ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM –...
-
Upload
arnold-ross -
Category
Documents
-
view
218 -
download
0
Transcript of ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM –...
ECE 4436ECE 5367
Introduction to Computer Architecture and Design
Ji Chen
Section : T TH 1:00PM – 2:30PM
Prerequisites: ECE 4436
ECE 4436ECE 5367
Instructor: Ji Chen Email: [email protected]: (713)-743-4423Office: W328Office Hour: T TH 2:30-3:30 orby appointment
TA: None
ECE 4436ECE 5367
ECE 4436ECE 5367
1. Introduction, basic computer organization 2. Instruction formats, instruction sets and their design 3. ALU design: Adders, subtracters, logic operations 4. Multiplication, division, floating point arithmetic5. Datapath design 6. Control design: Hardwired control, microprogrammed control 7. Pipelining 8. Memory systems 9. I/O
Course Contents
ECE 4436ECE 5367
HW/Quiz/Lab 10 %
Project 15 %
Exam 1 25 %
Exam 2 25 %
Exam 3 25 %
Grading
Web: http://www.egr.uh.edu/courses/ece/ECE5367/
Academic Honesty Statement
ECE 4436ECE 5367
Computer Organization and Design: The Hardware/Software Interfaceby David A. Patterson, John L. Hennessy, 3rd edition
Required NOT REQUIRED
ECE 4436ECE 5367
Home works/quiz: There will be several graded homework/lab assignments. Home works turned in late will be
accepted only under extraordinary circumstances.
Labs: Laboratory assignments may be worked in teams of two (2); however, there should be no collaboration between teams . . Lab assignments turned in late will be penalized 25 points for each calendar
day. Both students in a team will receive the same grade for the project.
Projects: Teams of four (4): describe computer architecture of a modern technology
Exams: two mid-term exams, and one final exam. A missed exam will result in a grade of zero Let me know immediately if you have any situation
Final Exam - TBD
Grading: Your final grade will be computed as follows: HW/Quiz/Lab 10 %
Project 15 %
Exam 1 25 %
Exam 2 25 %
Exam 3 25 %
ECE 4436ECE 5367
• Since 1946 all computers have had 5 components
Control
Datapath
Memory
Processor
Input
Output
ECE 4436ECE 5367
Message Bus (Mbus)
• TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
Floating-point Unit
Integer Unit
InstCache
RefMMU
DataCache
StoreBuffer
Bus Interface
SuperSPARC
L2$
CC
MBus Module
MBus
L64852 MBus controlM-S Adapter
SBus
DRAM Controller
SBusDMA
SCSIEthernet
STDIO
serialkbdmouseaudioRTC
FloppySBusCards
ECE 4436ECE 5367
Computer Architecture
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces• Design, Measurement, and Evaluation
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction Set Architecture
Firmware
Datapath & Control
Layout
ECE 4436ECE 5367
Forces on Computer Architecture
ComputerArchitecture
Technology ProgrammingLanguages
OperatingSystems
History
Applications
Cleverness
ECE 4436ECE 5367
Mixed-Signal
ECE 4436ECE 5367Where are We Going??
ECE 5367Spring 08
µProc60%/yr.(2X/1.5yr)
DRAM9%/yr.(2X/10 yrs)1
10
100
1000
19
80
19
81
19
83
19
84
19
85
19
86
19
87
19
88
19
89
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
DRAM
CPU
19
82
Processor-MemoryPerformance Gap:(grows 50% / year)
Per
form
ance
Time
“Moore’s Law”
34-b it A LU
LO register(16x2 bits)
Load
HI
Cle
arH
I
Load
LO
M ultiplicandR egister
S h iftA ll
LoadM p
Extra
2 bits
3 232
LO [1 :0 ]
R esult[H I] R esult[LO ]
32 32
Prev
LO[1]
Booth
Encoder E N C [0 ]
E N C [2 ]
ControlLog ic
InputM ultiplier
32
S ub /A dd
2
34
34
32
InputM ultiplicand
32=>34sig nEx
34
34x2 M U X
32=>34sig nEx
<<13 4
E N C [1 ]
M ulti x2 /x1
2
2H I register(16x2 bits)
2
01
3 4 ArithmeticSingle/multicycleDatapaths
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
IFetchDcd Exec Mem WB
Pipelining
Memory Systems
I/O
ECE 4436ECE 5367
• Purchasing perspective – Given a collection of machines, which has the
• Best performance ?• Least cost ?• Best performance / cost ?
• Design perspective– Faced with design options, which has the
• Best performance improvement ?• Least cost ?• Best performance / cost ?
• Both require– basis for comparison– metric for evaluation
• Our goal: understand cost & performance implications of architectural
choices
ECE 4436ECE 5367Two Notions of “Performance”
Which has higher performance?• Time to do the task (Execution Time)
– execution time, response time, latency• Tasks per day, hour, week, sec, ns. .. (Performance)
– throughput, bandwidthResponse time and throughput often are in opposition
Plane
Boeing 747
Concorde
Speed
610 mph
1350 mph
DC to Paris
6.5 hours
3 hours
Passengers
470
132
Throughput (pmph)
286,700
178,200
ECE 4436ECE 5367
Definitions
• Performance is in units of things-per-second– bigger is better
• If we are primarily concerned with response time– performance(x) = 1
execution_time(x)
" X is n times faster than Y" means
Performance(X)
n = ----------------------
Performance(Y)
ECE 4436ECE 5367Example
• Time of Concorde vs. Boeing 747?• Concord is 1350 mph / 610 mph = 2.2 times faster
= 6.5 hours / 3 hours
• Throughput of Concorde vs. Boeing 747 ?• Concord is 178,200 pmph / 286,700 pmph = 0.62 “times
faster”• Boeing is 286,700 pmph / 178,200 pmph = 1.60 “times
faster”
• Boeing is 1.6 times (“60%”) faster in terms of throughput• Concord is 2.2 times (“120%”) faster in terms of flying
time
We will focus primarily on execution time for a single jobLots of instructions in a program => Instruction throughput
important!
ECE 4436ECE 5367
CPU = Seconds = Instructions x Cycles x Seconds
Performance Program Program Instruction Cycle
CPU = Seconds = Instructions x Cycles x Seconds
Performance Program Program Instruction Cycle
ECE 4436ECE 5367
Speedup due to enhancement E: ExTime w/o E Performance w/ ESpeedup(E) = -------------------- = --------------------- ExTime w/ E Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) x ExTime(without E)
Speedup(with E) = 1 (1-F) + F/S
Amdahl's Law
ECE 4436ECE 5367
Typical Mix
Base Machine Op Freq Cycles CPI(i) % Time
ALU 50% 1 .5 23%
Load 20% 5 1.0 45%
Store 10% 3 .3 14%
Branch 20% 2 .4 18%
2.2
How much faster would the machine be if a better data cachereduced the average load time to 2 cycles?
How does this compare with using branch prediction to save a cycle off the branch time?
What if two ALU instructions could be executed at once?