Hitachi SuperH SH-4

28
Hitachi SuperH SH-4 By: Herman Sheremetyev 5/10/2002

description

Hitachi SuperH SH-4. By: Herman Sheremetyev 5/10/2002. Inspiration. - PowerPoint PPT Presentation

Transcript of Hitachi SuperH SH-4

Page 1: Hitachi SuperH SH-4

Hitachi SuperH SH-4

By: Herman Sheremetyev

5/10/2002

Page 2: Hitachi SuperH SH-4

I was inspired to do this presentation on the Hitachi SH-4 processor because this is the processor used in the Sega Dreamcast video game system. I own a Dreamcast and after being assigned this project I became very interested in its internal workings. As a result of my research I found that there was quite a bit of software ported to this platform, starting with a NetBSD port and followed by a Linux port which can actually transform the Dreamcast into a usable X terminal. These ports were largely possible due to the fact that Hitachi released the complete specifications as well as a Programmer’s Manual for the processor. What follows are excerpts from the Hitachi Hardware Manual that briefly describe SH-4’s most interesting aspects which I loosely tailored to the Dreamcast implementation.

Inspiration

Page 3: Hitachi SuperH SH-4

Sources

Most of the information in this presentation is taken from the Hitachi Hardware Manual on the SH4 family of processors

The manual can be found at http://www.julesdcdev.com/ and probably on the Hitachi website

Page 4: Hitachi SuperH SH-4

Features Summary

The SH-4 (SH7750 Series (SH7750, SH7750S)) has been developed as the top-end model in the SuperH™ RISC engine family, featuring a 128-bit graphic engine for multimedia applications and 360 MIPS performance.

Page 5: Hitachi SuperH SH-4

Features

In addition to single- and double-precision floating-point operation capability, the on-chip FPU has a 128-bit graphic engine that enables 32-bit floating-point data to be processed 128 bits at a time.

It also supports 4 4 array operations and inner product operations, enabling a performance of 1.4 GFLOPS to be achieved.

Page 6: Hitachi SuperH SH-4

Features

Operating frequency is 200Mhz

A superscalar architecture is employed that enables simultaneous execution of two instructions (including FPU instructions)

An 8-kbyte instruction cache and 16-kbyte data cache are also provided, and the on-chip memory management unit (MMU) handles translation from the 4-Gbyte virtual address space to the physical address space.

Page 7: Hitachi SuperH SH-4

Registers

Sixteen 32-bit general registers (and eight 32-bit shadow registers)Seven 32-bit control registersFour 32-bit system registers

Register operands are always longwords (32 bits). When a memory operand is only a byte (8 bits)or a word (16 bits), it is sign-extended into a longword when loaded into a register.

Page 8: Hitachi SuperH SH-4

Data Formats in Memory

Memory data formats are classified into bytes, words, and longwords. Memory can be accessed in 8-bit byte, 16-bit word, or 32-bit longword form. A memory operand less than 32 bits in length is sign-extended before being loaded into a register.A word operand must be accessed starting from a word boundary (even address of a 2-byte unit: address 2n), and a longword operand starting from a longword boundary (even address of a 4-byte unit: address 4n). An address error will result if this rule is not observed. A byte operand can be accessed from any address.

Page 9: Hitachi SuperH SH-4

“Endianess”

Big endian or little endian byte order can be selected for the data format. Big endian is the preferred method of operation.

The endian cannot be changed dynamically.

Bit positions are numbered left to right from most-significant to least-significant. Thus, in a 32-bit longword, the leftmost bit, bit 31, is the most significant bit and the rightmost bit, bit 0, is the least significant bit.

Page 10: Hitachi SuperH SH-4

Operand and Instruction Caches

The operand cache consists of 512 cache lines, each composed of a 19-bit tag, validity bit(V), dirty bit(U), and 32-byte data.

The instruction cache consists of 256 cache lines, each composed of a 19-bit tag, validation bit (V), and 32-byte data (16 instructions).

(Tag - stores the upper 19 bits of the 29-bit external memory address of the data line to be cached.)

Page 11: Hitachi SuperH SH-4

Cache-Memory coherence

Coherency between cache and external memory should be assured by software.

Several cache operations instructions are provided, including a prefetch instruction

Page 12: Hitachi SuperH SH-4

Cache operations (operand cache only)

Invalidate instruction: OCBI @Rn Cache invalidation (no write-back)Purge instruction: OCBP @Rn Cache invalidation (with write-back)Write-back instruction: OCBWB @Rn Cache write-backAllocate instruction: MOVCA.L R0,@Rn Cache allocation

Page 13: Hitachi SuperH SH-4

Floating Point Unit (FPU)

Conforms to IEEE754 standard32 single-precision floating-point registers (can also be referenced as 16 double-precision registers)Two rounding modes: Round to Nearest and Round to ZeroTwo denormalization modes: Flush to Zero and Treat Denormalized NumberSix exception sources: FPU Error, Invalid Operation, Divide By Zero, Overflow, Underflow, and InexactComprehensive instructions: Single-precision, double-precision, graphics support, system control

Page 14: Hitachi SuperH SH-4

FPU Data Formats

A floating-point number consists of the following three fields:

Sign (s)

Exponent (e)

Fraction (f)

32 bit Single-Precision (s=1,e=8,f=23)

64 bit Double-Precision (s=1,e=11,f=52)

Page 15: Hitachi SuperH SH-4

FPU Rounding

Round to Nearest: The value is rounded to the nearest expressible value. If the unrounded value is 2^Emax (2 – 2^(–P)) or more, the result will be infinity with the same sign as the unrounded value.Round to Zero: The digits below the round bit of the unrounded value are discarded. If the unrounded value is larger than the maximum expressible absolute value, the value will be the maximum expressible absolute value.

Page 16: Hitachi SuperH SH-4

FPU Graphics Support

The SH7750 Series supports two kinds of graphics functions:

instructions for geometric operations

pair single-precision transfer instructions that enable high-speed data transfer.

Page 17: Hitachi SuperH SH-4

FPU Geometric functions

Geometric operation instructions perform approximate-value computations. To enable high-speed computation with a minimum of hardware, the SH7750 Series ignores comparatively small values in the partial computation results of four multiplications.

Page 18: Hitachi SuperH SH-4

FPU Pair Single-Precision Data Transfer

In addition to the geometric operation instructions, the SH7750 Series also supports high-speed data transfer instructions.

These instructions enable two single-precision (2 32-bit) data items to be transferred; that is, the transfer performance of these instructions is doubled.

Page 19: Hitachi SuperH SH-4

Instruction Format

the instruction set is implemented with 16-bit fixed length instructions.

operations are basically executed using registers.

Except for bit-manipulation operations such as logical AND that are executed directly in memory, operands in an operation that requires memory access are loaded into registers and the operation is executed between the registers.

Page 20: Hitachi SuperH SH-4

Instruction Format (cont’d)

Delayed Branches: Except for the two branch instructions BF and BT, branch instructions and RTE are delayed branches. (In a delayed branch, the instruction following the branch is executed before the branch destination instruction.)

Constant Values: An 8-bit constant value can be specified by the instruction code and an immediate value. 16-bit and 32-bit constant values can be defined as literal constant values in memory

Page 21: Hitachi SuperH SH-4

Addressing Modes

Register directRegister indirect (supports post and pre decrement and increment as well as displacement)Indexed register indirect, i.e. the effective address is sum of register Rn and R0 contents.Immediate

Page 22: Hitachi SuperH SH-4

Instruction Set

Over 100 different instructions including FP, mostly variations on MOV, ADD, etc. to accommodate different addressing modes. Instruction mnemonic:

OP, Sz, SRC, DEST

OP: Operation codeSz: SizeSRC: SourceDEST: Source and/or destination operand

Page 23: Hitachi SuperH SH-4

Instruction Level Parallelism

The SH7750 Series is a 2-ILP (instruction-level-parallelism) superscalar pipelining microprocessor.Instruction execution is pipelined, and two instructions can be executed in parallel.Parallel execution depends on the instructions – not all instructions can be executed in parallel with all others

Page 24: Hitachi SuperH SH-4

Pipelining

The instruction pipeline has 5 stages:

Instruction fetch (I)

decode and register read (D)

execution (EX/SX/F0/F1/F2/F3)

data access (NA/MA)

write-back (S/FS)

Page 25: Hitachi SuperH SH-4

http://www.hitachisemiconductor.com/sic/jsp/japan/eng/products/ mpumcu/32bit/image/2_way.gif

ILP Illustration

Page 26: Hitachi SuperH SH-4

Direct Memory Access

The SH7750 Series includes an on-chip four-channel direct memory access controller (DMAC).The DMAC can be used in place of the CPU to perform high-speed data transfers among external devices equipped with DACK (DMA transfer end notification), external memories, memory mapped external devices, and on-chip peripheral modules (except the DMAC, BSC, and UBC).Using the DMAC reduces the burden on the CPU and increases the operating efficiency of the chip.

Page 27: Hitachi SuperH SH-4

Serial Communication Interface (SCI)

The SH7750 is equipped with a single-channel serial communication interface (SCI) and a single channel serial communication interface with built-in FIFO registers (SCI with FIFO: SCIF).

The SCI can handle both asynchronous and synchronous serial communication. A function is also provided for serial communication between processors (multiprocessor communication function).

Page 28: Hitachi SuperH SH-4

Smart Card Interface

An IC card (smart card) interface conforming to ISO/IEC 7816-3 (Identification Card) is supported as a serial communication interface (SCI) extension function.

Switching between the normal serial communication interface and the smart card interface is carried out by means of a register setting.