MIPS Registers

download MIPS Registers

of 24

description

registers of the MIPS architecture

Transcript of MIPS Registers

  • 2008-2009

    Informatics 3 - Computer Architecture 40

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 40

    Register Usage in MIPS ABI

    Register Soft ABI function for thisNumber Name register

    $0 always contains zero$1 at reserved for assem b ler

    $2-$3 v0,v1 integ er funct ion result ( out) or stat ic link ( in)

    $4-$7 a0-a3 f irst 4 integ er-type funct ion arg um ents

    $8-$15 t0-t7 tem porary reg isters for expression evaluat ion

    $16-$23 s0-s7 reg isters preserved across funct ion call

    $24-$25 t8,t9 tem porary reg isters for expression evaluat ion

    $28 gp g lobal po inter

    $29 sp stack po inter

    $30 fp f ram e po inter

    $31 ra return address

    The ABI gives well-understood functions to each of the registers in the general purpose registerset. There are obvious uses, such as the stack pointer. There are also three other special registers;the return address (ra), the frame pointer (fp) and the global pointer (gp). The ra register isassigned the return address when a function call is made. Software will put this value on thestack if the called function itself calls further functions. The fp register points to the base of thestack frame for the current function. Well see that in the next slide. The gp register, when used,points to a pool of global data that can be commonly referenced by all functions. This mayinclude variables with file or global scope.A function can use registers t0-t9 freely, but if it calls another function they may be overwritten.A function may not overwrite the contents of s0-s7, and must preserve their original contents if itwants to use them. Hence, s0-s7 are callee-saved, whereas t0-t9 are caller-saved registers.

  • 2008-2009

    Informatics 3 - Computer Architecture 41

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 41

    Functions and Stack Frames

    foo (int i){ return bar (i);}

    int bar (int n){ int a = n+1, b = n-1; return (a*b);}

    Each function has a dynamicallyallocated stack frame

    Frame contents normally accessed byaddresses that are relative to eitherthe stack pointer $sp or the framepointer $fp

    Stack framefor foo

    Stack framefor bar

    free stack space

    high addresses

    low addresses

    stackusuallygrowsdownwards

    $sp

    $fp

    Stacks usually grow downwards in memory. Can you think why this might be?

  • 2008-2009

    Informatics 3 - Computer Architecture 42

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 42

    Anatomy of a Stack Frame

    int foo (int i){ return bar (i);}

    int bar (int n){ int a = n+1, b = n-1; return (a*b);}

    Positive offsets from $fp = args Negative offsets from $fp = locals Not all portions of frame are needed by

    all functions Callee save space holds previous $fp,

    $ra, and any $s0-$7 that are modified byfunction bar

    Stack framefor foo

    Stack framefor bar

    free stack space

    high addresses

    low addresses

    $sp

    $fpincoming args

    callee-save space

    local variables

    outgoing args

    The incoming arguments are values passed from foo to bar. Some of the args may be passedin registers and may not need space on the stack. The callee save space is a region that barcan use to save any of $s0-$s7 that may be modified in bar. Local variables in bar mayrequire some storage space on the stack. The outgoing args space is where args for functionsthat bar calls will be stored. This space will become the incoming args space of functionsthat bar calls (if any). If bar calls several functions, then the outgoing args space wouldtypically be the maximum space needed by any such function, allowing it to be allocatedonce.

  • 2008-2009

    Informatics 3 - Computer Architecture 43

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 43

    Call Return Sequencing

    Call sequence Save caller-saved registers Copy arguments to stack or regs Call the function

    Return sequence Restore caller-saved registers

    Function Prologue Allocate callees stack frame Reposition frame pointer Save callee-saved registers

    < execute body of function >

    Function Epilogue Restore callee-saved registers Restore frame pointer De-allocated callees stack frame Return to caller

    Exercise: take the foo() and bar() code shown earlier. Compile it using gcc on yourworkstation to produce an assembler file, and identify the four sequences listed in this slide.To do this type:

    gcc O S o assembler.lis program.c

    Where assembler.lis is the output where your assembler code will be produced, andprogram.c is the name of your C source file containing foo() and bar().

  • 2008-2009

    Informatics 3 - Computer Architecture 44

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 44

    Categorising Data by Location and Access

    C programs contain several categories of data, according to where theylive and how they are created

    The way addresses are computed depends on the category of access

    StaticRead-only

    StaticRead or Write

    Dynamicmalloc(), free()

    DynamicFunction scope

    DynamicFunction scope

    How created

    $pc + signed offsetOften in a constant pool inthe .text section

    Embeddedconstants

    Addressing modeWhere data is locatedClassification

    $gp + signed offset.bss sectionGlobal and staticvariables

    GPR + offsetOn the heapDynamicallyallocatedvariables

    $fp + negative offsetOn stack, below framepointer

    Automaticvariables

    $fp + positive offsetOn stack, above framepointer

    Functionarguments

    Each category of data, whether a function argument or an automatic variable, is allocated ina different way, and is therefore accessed in a different way. There are well-defined regions,such as the stack, the heap and the global data area. Each may have its own pointer (e.g. $sp,$gp) or may be accessed relative to $pc or a general-purpose register.

  • 2008-2009

    Informatics 3 - Computer Architecture 45

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 45

    Addressing Mode Frequency

    Bottom-line: few addressing modes account for most of theinstructions in programs

    H&PFig. 2.7

    1

    0

    24

    43

    32

    6

    16

    3

    17

    55

    1

    6

    11

    39

    40

    0 10 20 30 40 50 60

    Indirect

    Scaled

    Register

    Immediate

    Displacement

    Ad

    dre

    ss

    ing

    mo

    de

    Frequency of the addressing mode (%)

    gcc

    spice

    TeX

    In practice, compilers usually convert complex address calculations into unsigned integercomputations and then use very simple addressing modes based on computed addresses.Many memory references are to variables located on the stack. These always use [sp + offset]addressing modes, making the Displacement mode one of the most common.Try compiling a simple piece of C code into assembler and look at the addressing modes obtainedfor each variable accessed by the code.

    Hint: gcc -S foo.c

  • 2008-2009

    Informatics 3 - Computer Architecture 46

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 46

    Displacement Addressing and Data Classification

    Stack pointer and Frame pointer relative

    Compiler can often eliminate frame pointer

    Function must not call alloca()

    5 to 10 bits of offset is sufficient in most cases

    Register + offset Generic form for accessing via pointers

    Multi-dimensional arrays require address calculations

    PC relative addresses Useful for locating commonly-used constants in a pool of

    constants located in the .text section

    Exercise: add a call to alloca() in both foo() and bar() to see the effect on how the code getscompiled. Try man alloca if unsure how to use it.

  • 2008-2009

    Informatics 3 - Computer Architecture 47

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 47

    Floating point arithmetic

    Usually based on the IEEE 754 floating point standard Useful when greater range of number is required

    Integer: -2m-1 .. +2m-1-1 Floating point:

    Binary DecimalSingle precision (2-2-23)127 ~ 1038.53

    Double precision (2-2-52)1023 ~ 10308.25

    See Hennessy & Patterson appendix for details of formats and operations Set aside an hour to read their appendix and become familiar with the overall

    structure of the FP standard (dont memorise details you can always referback to the standard if you ever need to use it)

    Key points for instruction sets: Integer and Floating Point never mixed in same operation Separate register sets for integer and FP operations are therefore common Floating point operations often optional or omitted from embedded processors Other ways to represent fractional values, e.g. fixed-point types

    Follow the suggested reading on Hennessy and Patterson from the second bullet point. Makesummary notes here.

  • 2008-2009

    Informatics 3 - Computer Architecture 48

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 48

    Encoding the Instruction Set

    How many bits per instruction? Fixed-length 32-bit RISC encoding Variable-length encoding (e.g. Intel x86) Compact 16-bit RISC encodings

    ARM Thumb MIPS16 ARCompact

    Formats define instruction groups with a common set ofoperands

    An instruction format defines a set of operands that are used in common by a group ofinstructions. An instruction set is simply a collection of formats and the operations definedfor each format.

  • 2008-2009

    Informatics 3 - Computer Architecture 49

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 49

    Design consideration for ISA encoding

    How compact is the encoding? Is the encoding orthogonal? How easy is it to extract operands unambiguously?

    Register specifiers should be aligned in all formats (ideally) Implicitly defined registers will complicate decode How are the literals aligned and/or extended?

    Are control transfers easily identifiable? If not, slow decoding of branches may increase CPI

    Op-code assignment: Minimise Hamming distance between codes that perform

    similar operations. Leads to simpler and faster decode logic

    If you dont know what Hamming distance is, see page 193 of Andrew Tanenbaum,Computer Networks, 4th edition (a standard text in communications). A google search willalso find the definition. Think about why this is useful in instruction set design, and thenmake notes here as a reminder.

  • 2008-2009

    Informatics 3 - Computer Architecture 50

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 50

    MIPS 32-bit Instruction Formats

    R-type (register to register) three register operands

    most arithmetic, logical and shift instructions

    I-type (register with immediate) instructions which use two registers and a constant

    arithmetic/logical with immediate operand

    load and store

    branch instructions with relative branch distance

    J-type (jump) jump instructions with a 26 bit address

    At this point you will find it helpful to read Appendix B from Hennessy and Patterson (4/e)Putting it all together: The MIPS Architecture, p.B-32Appendix B is all about ISA design issues, using the MIPS architecture as a teachingvehicle.

  • 2008-2009

    Informatics 3 - Computer Architecture 51

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 51

    MIPS R-type instruction format

    6 b its 6 b its5 b its 5 b its5 b its5 b its

    opcode reg rs reg rt reg rd sham t funct

    add $1, $2, $3

    sll $4, $5, 16

    special $2 $3 $1 add

    special $5 $4 16 sll

    Make your own list of instructions that follow this format.

  • 2008-2009

    Informatics 3 - Computer Architecture 52

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 52

    MIPS I-type instruction format

    6 b its 1 6 b its5 b its 5 b its

    opcode reg rs reg rt immediate value/addr

    lw $2 $1 address offset

    beq $4 $5 (PC - .L001) >> 2

    lw $1, offset($2)

    beq $4, $5, .L001

    addi $1, $2, -10 addi $2 $1 0xfff6

    Find more examples of instructions that follow this format and write them here.

  • 2008-2009

    Informatics 3 - Computer Architecture 53

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 53

    MIPS J-type instruction format

    6 b its 2 6 b its

    opcode address

    call func call absolute func address >> 2

    Again, find other examples of MIPS instructions that use this format.

  • 2008-2009

    Informatics 3 - Computer Architecture 54

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 54

    Code density optimisations

    Prologue and Epilogue

    Constant pools and PC relative loads

    2-register formats

    Restricted register sets

    Non-orthogonality and implicit register operands

    Read section B.10, Fallacies and Pitfalls, on page B-39 of Hennessy & Patterson. Makebrief notes here to remind you of the main points.

  • 2008-2009

    Informatics 3 - Computer Architecture 55

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 55

    Examples:

    Special FeaturesGP registersInstructionSize

    Instruction SetArchitecture

    Freely-mixed compactand 32-bitinstructionsLong-immediate data

    8 direct32 available

    Mixed 16 and32 bit

    ARCompact

    push and pop forstack frame support

    816 bitARM thumb

    Some special ABIregisters stillaccessible

    816 bitMIPS16

    Most 32-bit architectures used in embedded systems have acquired a subset that is encodedin 16 bits. These instructions still operate on 32-bit data, but are encoded more efficiently.Generally speaking they all use two register operands rather than three, and also restrict thenumber of general purpose registers to 8. The ARCompact instruction set allows a freemixing of the original 32-bit instructions and the compact 16-bit instructions. This is notpermitted in ARM thumb or MIPS16, where each function must be compiled into the 32-bitor the 16-bit instruction set. Recently, ARM introduced the Thumb2 instruction set whichremoves that restriction.

  • 2008-2009

    Informatics 3 - Computer Architecture 56

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 56

    ARM Thumb Push and Pop instructions

    Particularly effective for encoding function entry and exit code ina compact form.

    Operand is a bit vector, with each bit specifying whether one ofthe callee saved registers should be pushed or popped.

    Push may also save the link register (equiv. to MIPS $ra) Pop may then pop that value directly into PC, causing the

    function to return to the caller. E.g.

    push { r4, r5, r6, r7, lr }pop { r4, r5, r6, r7, pc }

    These are multi-cycle operations, performing up to 5 memoryreads or writes.

    Complex to implement, but highly effective in terms of codedensity Prologue and epilogue can account for 10-15% of the code space

    Try to find other Instruction Set Architectures that support multi-register move operations.List them here:

  • 2008-2009

    Informatics 3 - Computer Architecture 57

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 57

    Instruction Frequency

    Bottom-line: few instruction types account for most of theinstructions executed

    96Total

    1return

    1call

    4move register-register

    5sub

    6and

    8add

    12store

    16compare

    20conditional branch

    22load

    Fraction (%)80x86 instruction

    H&PFig. 2.16

    Bear in mind that each architecture is different, but that in general the frequencies shown aboveare representative of typical desktop applications.Embedded applications often see increasing frequencies of signal processing operations,especially 16-bit multiplications.

  • 2008-2009

    Informatics 3 - Computer Architecture 58

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 58

    IS and Performance

    ISA Implementation: cycle time, pipelining, CPI, instruction length ISA Compiler: instruction scheduling, code motion, branch

    optimizations, code generation, code size, register allocation Implementation instruction delays, register allocation, functional

    units

    ISA CompilerImplementation

    Performance

    This slide summarises the relationship between ISA and Compiler, and ISA and Implementation.

  • 2008-2009

    Informatics 3 - Computer Architecture 59

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 59

    IS Guidelines

    Regularity: operations, data types, addressing modes, andregisters should be independent (orthogonal)

    Primitives, not solutions: do not attempt to match HLLconstructs with special IS instructions

    Simplify tradeoffs: make it easy for compiler to make choicesbased on estimated performance

    Trust compiler: provide compiler with instructions andprimitives that exploit knowledge at compile-time

    Instruction Sets can vary enormously from one architecture to another. However, within the set ofall RISC architectures there are actually few substantial differences.It is also worth noting that the number of distinct desktop architectures has been decreasing yearon year. In 2007 most new desktop systems shipped will have x86 processors. In the server spaceone can still find Sun SPARC and IBM PowerPC architectures.The embedded computing domain has a much greater diversity of architectures. Can you thinkwhy this might be?

  • 2008-2009

    Informatics 3 - Computer Architecture 60

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 60

    Improving CPU Performance (H&P 2.11; A.1; A3)

    CPU performance can be computed by the CPUperformance equation: CPU time = IC x CPI x Clock time

    To reduce CPU time: IC; clock period; CPI

    ISA influences implementation, compiler optimizations, andtherefore performance

    ISA must be an easy compiler target

    No need to provide too many and too complexinstructions

    Compiler has a significant role in improving performance

    Essentially, to improve CPI we must reduce one of the three primary contributors, or else issuemore than one instruction per cycle (or both!)

  • 2008-2009

    Informatics 3 - Computer Architecture 61

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 61

    Program Structure: Basic-Blocks (BB)

    Definition: straight-line code with single entry and single exit Boundaries:

    Branches and jumps Calls and returns Targets of branches, jumps, calls, and returns

    lw r2,0(r1) lw r3,4(r1) addi r3,r3,n bne r2,r3,Label2Label1: lw r4,8(r1) sub r2,r2,m beq r2,r0,label1Label2: add r1,r1,r3

    BB1

    BB2

    BB3

    BB1

    BB2 BB3

    Note: not all basic blocks are preceded by a branch. Contrive an example instruction sequence toillustrate this point here:

  • 2008-2009

    Informatics 3 - Computer Architecture 62

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 62

    Structure of Modern Compilers

    Dependences

    Front-end

    Function

    Language dependent;machine independent

    Generate intermediaterepresentation

    HLL code

    High-leveloptimizations

    IR

    Somewhat language independentlargely machine independent

    Procedure inlining;loop transformations

    Globaloptimizer

    Optimized IR

    Mostly language independentmostly machine independent

    Global + local optimizations;register allocation

    Codegenerator

    SSA

    Language independentmachine dependent

    Instruction selection;scheduling

    Machine code

    If you are taking a compiler course this year, these optimisations will be familiar. If not, you needto be at least aware of: 1. The difference between global and local optimisations 2. Machine dependent and machine independent optimisationsIf you need help with understanding the role of compilers, read section B.8, Crosscutting Issues:The Role of Compilers, in H&P (4/e) on page B-24

  • 2008-2009

    Informatics 3 - Computer Architecture 63

    Additional notes:

    Inf3 Computer Architecture - 2007-2008 63

    Compiler Optimizations

    High-level: at HLL source Procedure inlining

    Local: within basic-block (BB) Common sub-expression elimination Constant propagation Stack height reduction

    Global: across BBs Global common sub-expression elimination Copy propagation Code motion Induction variable elimination

    Machine-dependent Strength reduction Pipeline scheduling Branch offset optimization

    This slide summarises the essential concepts. A little reading around the subject andsupplementary note-taking will help with revision.