Computer Architecture II 1 Computer architecture II Lecture 9.
Computer architecture
description
Transcript of Computer architecture
![Page 1: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/1.jpg)
Computer architecture
Lecture 4: Processor instruction list
Piotr Bilski
![Page 2: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/2.jpg)
Execution of program
• Processor executes machine instructions (after understanding them - decoding)
• Programmer creates a program in the symbolic low or high level language
• During compilation symbolic language is translated into the machine language instructions
![Page 3: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/3.jpg)
Elements of the machine instructions
• Operation code• Argument references (operation input data)• Result reference (if needed)• Reference to the next instruction
0 3 4 15
Operation code Argument references
![Page 4: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/4.jpg)
Arguments and results are stored in:
• Memory (main, cache, virtual)
• Processor registers (accumulator, general purpose registers)
• Input/output devices (hard drive, printer)
![Page 5: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/5.jpg)
Instructions types
• Data processing (logical and arithmetic operations)
• Data storage (instructions related to the memory access)
• Data transmission (input/output operations)
• Control (result testing, non-sequential code execution – jumps, branches)
![Page 6: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/6.jpg)
Relation between the symbolic and machine instructions
x = x + c;
LOAD 1001
ADD 1002
STORE 1001
1001
1002
x
cALU
![Page 7: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/7.jpg)
Number of the addresses in the instruction
Instruction Action
SUB Y,A,B YA-B
MPY T,D,E TD*E
ADD T,T,C TT+C
DIV Y,Y,T YY/T
3 addresses
Instruction Action
MOVE Y,A YA
SUB Y,B YY-B
MOVE T,D TD
MPY T,E TT*E
ADD T,C TT+C
DIV Y,T YY/T
2 addressesInstruction Action
LOAD D ACD
MPY E ACAC*E
ADD C ACAC+C
DIV Y ACAC/Y
1 addressY=(A-B)/(C+D*E)
![Page 8: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/8.jpg)
Number of the addresses in the instruction (cont.)
• Three addresses:ADD a,b,c
• Two addresses: MOVE a,b ADD a,c
• One address: LOAD b ADD c STOR a
a = b + c
![Page 9: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/9.jpg)
Instruction list design problems
• How many (and which) operations for processor to execute?
• What data types (arguments, results)?
• What instruction format (length, addresses’ number)?
• How many (and which) registers?
• Which addressing modes?
![Page 10: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/10.jpg)
Operands
• Addresses (unsigned integers)
• Numbers (numerical data) – fixed and floating point precision, decimal
• Characters (ASCII / IRA, EBCDIC codes etc.)
• Logical data (single bits)
![Page 11: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/11.jpg)
Computer as the data storage
• Writing multiple-byte data in memory can be little endian, big endian, and bi-endian
• The difference between the models of the data storage is in the sequence of the bytes stored in memory, for example hexadecimal number 76859432 can be written in two ways:
263
264
265
266
263
264
265
266
76
85
94
32
32
94
85
76
Big endian
Little endian
![Page 12: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/12.jpg)
Little and big endian
Big endian• Easy to sort character
sequences (strings)• Allows printing ASCII
characters withot any conversions
• Integers and characters are in the same order
• Used in: Sun SPARC, RISC processors, Motorola 680x0
Little endian• Easy to convert longer
number to the shorter one• Arithmetic operations are
easier to execute• Used in: Intel 80x86,
Pentium, Alpha
Bi-endian• Understands both
standards• Used in: PowerPC
![Page 13: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/13.jpg)
Examples of little and big endian in the file types
Big endian:• Adobe Photoshop• IMG (GEM Raster) • JPEG • MacPaint • SGI (Silicon
Graphics)• Sun Raster
Little endian:• BMP (Windows,
OS/2 Bitmaps) • GIF • PCX (PC
Paintbrush) • TGA (Targa) • Microsoft RTF
(Rich Text Format)
Bi-endian:• Microsoft
RIFF (.WAV & .AVI)
• TIFF • XWD (X
Window Dump)
![Page 14: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/14.jpg)
Pentium data types
• Data are organized in the multiplicity of the byte (byte – B, word – 2 B, double word – 4 B etc.)
• Formats are compliant with IEEE 754 norm• No need to store data under the evenly alligned
addresses• Unsigned integers (8, 16, 32, 64 bits) -
addresses• Signed integers (8,16, 32, 64 bits), two’s
complement representation• Floating point numbers (single, double, and
extended double precision)
![Page 15: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/15.jpg)
Pentium data types (cont.)
• Generic (any content 16,32 or 64 bits long)
• Unpacked decimal number binary representation (one digit in a byte)
• Packed decimal number binary representation (two digits in a byte)
• Pointer (32-bit address)
• Bit field
• Byte chain
![Page 16: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/16.jpg)
PowerPC data types
• Data 8, 16, 32, 64 bits long
• Data address alignment to the even byte is not required (though sometimes used)
• PowerPC is bi-endian type
• Stored: usigned and signed numbers (byte (8b), half-word (16b), word (32b), double word (64b)), floating point numbers (IEEE 754), byte chain (up to 128 B)
![Page 17: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/17.jpg)
Operation classification
• Data transfer ( STORE, LOAD, SET PUSH, POP)• Arithmetic (ADD, SUB, NEG, INC, MULT)• Logical (AND, OR, NOT, TEST, SHIFT, ROTATE)• Control passing (JUMP, HALT, EXEC)• Input/output (READ, WRITE)• Conversion (TRANS, CONV)
![Page 18: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/18.jpg)
Data transfer
• Aim: to move data from one location to another• Requires: determining memory location (virtual
address?), checking for cache memory, producing instruction of read/write operation
• Exemplary instructions: LOAD, STORE (in short, long, half-word versions etc.)
![Page 19: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/19.jpg)
Logical operations
• Operands are treated as the bit chain• The most popular operations: AND, OR, XOR,
NOT• Bit chains treated as masks:
A1 = 10100101
AND
A2 = 11110000
10100000
A1 = 10100101
XOR
A2 = 11111111
01011010
![Page 20: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/20.jpg)
Logical operations (cont.)• Logical shifting
• Arithmetic shifting
0
0
![Page 21: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/21.jpg)
Changing execution order
• Related to the instructions’ execution order
• Contain jumps, calling procedures and execution of one operation in a loop
• Control passing can be conditional or unconditional
![Page 22: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/22.jpg)
Conditional branches
• Multiple-bit code contains storing results of the operations being a condition to the jump execution, for example determined by the sign of the result, overflow and zeroing the result
• The second method is the jump condition embedded in the jump instruction
• Jump can be used in both directions
![Page 23: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/23.jpg)
Branch example351
352
353 SUB X, Y
354 BRZ 373
........
372 BR 353
373
........
395 Rest of the code
396
BRZ – make a jump, if the result is zero
BR – make a jump unconditionally
Conditional code of the SUB operation determines jump in BRZ operation
![Page 24: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/24.jpg)
Procedures
• They are isolated modules in the source code
• Their usage allows to increase flexibility of the code
• Require two instructions: call and return
• The same procedure can be called many times from different locations
• Procedures can be nested
![Page 25: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/25.jpg)
Procedure and return location
• Procedure can be called from multiple locations in the program
• Nesting of calls is possible
• Calling the procedure requires storing the return address:– In the register– At the beginning of the called procedure– On the stack (the best option, allows the
operation of the nested (recurrent) procedures)
![Page 26: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/26.jpg)
Procedure call
![Page 27: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/27.jpg)
Stack
• It is an isolated memory space to store data, organized as the LIFO structure
• In many processors there is the register working as the stack pointer (for example, Motorola 68000)
• Main stack operations: PUSH, POP
![Page 28: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/28.jpg)
Example of the stack implementation
Stack pointer
End of stack
F
T
PUSH
F
POP
F
![Page 29: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/29.jpg)
Working with stack
• Operation a+b-(c/d)• Operation in the reverse polish notation: ab+cd/-
a
b
a+b a+b
c
d
a+b
c/d
a+b-c/d
![Page 30: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/30.jpg)
Stack frame
• Set of the procedure parameters including return address
• Allows to call the nested procedures storing input and output parameters on the stack
![Page 31: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/31.jpg)
Stack frame illustration
x2
x1
Return point
Previous frame pointer
y2
y1
Previous frame pointer
Return point
x2
x1
Previous frame pointer
Return point
Stack cont.
SP
FP
Procedure AProcedure A calls B
FP
SP
![Page 32: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/32.jpg)
Stack frame in Pentium processor• Used by the ENTER, CALL commands• ENTER command supports compilers in the
nested procedures implementation• LEAVE command restores previous stack status• Frame pointer is stored in the EBP registry,
stack pointer in ESP registry• Example of the CALL execution:
PUSH EBP
MOV EBP, ESP
SUB ESP, space_in_memory
![Page 33: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/33.jpg)
MMX instructions
• Introduced in 1996 r. to the Pentium processors• In the first version they were 57 SIMD
instructions• Used to execute operations on the integer
numbers• Purpose – multimedia applications (computer
games, graphics and sound processing)• MMX uses four new data types: packed byte,
packed word, packed double word, packed quadruple word
![Page 34: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/34.jpg)
MMX instructions examples
• Arithmetic: PADD, PMUL, PMADD• Logical: PAND, PNDN, POR, PXOR• Comparison: PCMPEQ, PCMPGT• Conversion: PUNPCKH, PUNPCKL
• All instructions have suffixes determining, which type of data is used in the operation: B, W, D, Q
![Page 35: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/35.jpg)
Additional MMX registers
• Eight 64-bit registers from MM0 to MM7• Due to the backward compatibility, the MMX registers
are accessible by the older software as the floating point registers
63 56 7 0
eight byte Seventh byte First byte
Fourth word
.....
![Page 36: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/36.jpg)
Exemplary MMX operation
![Page 37: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/37.jpg)
MMX arithmetics
• Saturation instead of the overflow
1111 0000 0000 0000
+0011 0000 0000 0000
10010 0000 0000 0000 overflow
1111 0000 0000 0000
+0011 0000 0000 0000
10010 0000 0000 0000
1111 1111 1111 1111 saturation
![Page 38: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/38.jpg)
Why should we use MMX?
* - compared to the C code using traditional architecture
Operation Acceleration*
Echo effect 5,9
Matrix transposition 2
Arithmetic and logical operations on vectors
6
Fractals drawing (2D) 1,5
Billinear texture mapping (3D)
7
Median filter 3,8
Haar transform 2x2 2,2
Calculating L1 norm 3,3
3D transformation 3,1
![Page 39: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/39.jpg)
SSE instructions
• Introduced in 1999 (Pentium 3)
• New 70 instructions for the floating point operations
• Additional 8 128-bit registers, addressed directly: XMM0 – XMM7 (plus control register MXCSR).
• Every register stores 4 32-bit floating point numbers
![Page 40: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/40.jpg)
SSE (cont.)• New data type: 4-element vector of
floating point single precision numbers• Operations can be packed (PS – for all
elements of the vector), or scalar (SS – inly on the first elements)
• Example:
xmm0 = [X1 X2 X3 X4] xmm1 = [Y1 Y2 Y3 Y4]
ADDPS(xmm0,xmm1) =
[X1+Y1 X2+Y2 X3+Y3 X4+Y4]
![Page 41: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/41.jpg)
3DNow! Instructions• Introduced in 1997 r. by the AMD
corporation• Provide set of 21 new instructions for the
floating point number calculations of the SIMD type
• Used in the multimedia applications (high resolution graphics, computer games, CAD/CAM)
• Extensions exist: Enchanced 3DNow!, 3DNow Professional
![Page 42: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/42.jpg)
SSE2 instructions
• Introduced in 2001 (Intel Pentium IV, Athlon 64, Sempron 754, Transmeta Efficeon)
• Set of the additional 144 instructions, supported by 16 128-bit registers (XMM0 – XMM15)
• Performed operations on 64-bit floating point (coprocessors x87 work with 80-bit numbers) and integer 128-bit numbers
![Page 43: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/43.jpg)
Next Sets of Instructions
• SSE3 (Prescott New Instructions) – 13 new instructions, including the complex numbers arithmetics (since 2004, Pentium IV Prescott, Athlon 64 E)
• SSSE3 (Supplemental Streaming SIMD Extension 3) – 16 new instructions operating on integers (since 2005 Xeon, Intel Core 2, AMD Phenom)
• SSE4 – 54 new instructions in two groups (47 and 7), including integer number instructions modifying EFLAGS register (new!), implemented in Intel Core 2, Celeron Conroe, Penryn
![Page 44: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/44.jpg)
Next Sets of Instructions (c.d.)
• SSE5 – planned to be implemented by AMD in 2009. Finally replaced by three groups: XOP, FMA4, CVT16 (AVX compatible). Implemented in Buldozzer procesors in 2011. Instructions have even 4 arguments! Competitor to Intel’s SSE4
• AVX (Advanced Vector Extensions) – implemented by Intel in 2011: 16 new 256-bit registers (YMM0-YMM15) + 19 instructions working exclusively on these registers
![Page 45: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/45.jpg)
Assembler
• Low level programming language
• Uses both instructions and symbolic pointers to data
• Every processor has its own assembler
![Page 46: Computer architecture](https://reader036.fdocuments.net/reader036/viewer/2022062315/568157f2550346895dc56c47/html5/thumbnails/46.jpg)
Example of the assembly program
101 0010 0010 0000 0001
102 0001 0010 0000 0010
103 0001 0010 0000 0011
104 0011 0010 0000 0100
201 0000 0000 0000 0010
202 0000 0000 0000 0011
203 0000 0000 0000 0100
204 0000 0000 0000 0000
101 LDA 201
102 ADD 202
103 ADD 203
104 STA 204
201 DAT 2
202 DAT 3
203 DAT 4
204 DAT 0
FORMUL LDA I
ADD J
ADD K
STA L
I DATA 2
J DATA 3
K DATA 4
L DATA 0
MACHINE LANGUAGE SYMBOLIC ASSEMBLER
PROGRAM
L = I + J + K