TI Enhanced ARM925T Core

54
TI Enhanced ARM925T Core

description

TI Enhanced ARM925T Core. OMAP1510 Architecture. TI Enhanced ARM925T Core Up to 168 MHz (maximum frequency) Voltage: 1.5v nominal 16KB I-cache; 8KB D-cache 192-KB of shared internal SRAM - frame buffer Support for 32-bit and 16-bit (Thumb mode) instruction sets Data and program MMUs - PowerPoint PPT Presentation

Transcript of TI Enhanced ARM925T Core

Page 1: TI Enhanced ARM925T Core

TI Enhanced ARM925T Core

Page 2: TI Enhanced ARM925T Core

OMAP1510 Architecture

• TI Enhanced ARM925T Core

– Up to 168 MHz (maximum frequency)Voltage: 1.5v nominal

– 16KB I-cache; 8KB D-cache – 192-KB of shared internal SRAM - frame buffer – Support for 32-bit and 16-bit (Thumb mode) instruction

sets – Data and program MMUs – Two 64-entry translation look-aside buffers (TLBs) for

MMUs – 17-word write buffer

Page 3: TI Enhanced ARM925T Core

TI925T – MPU SUBSYSTEM

• ARM 9TDMI is enhanced by Texas Instruments and it is called as TI925T

• Based on the Harvard Architecture -- Separate bus for Address & Data

-- Allows concurrent Instruction & Data access(reduces CPI of processor)

• 32- bit ARM mode and 16- bit Thumb mode

Page 4: TI Enhanced ARM925T Core

ARM9 RISC Processor

– Load Store Architecture– Fixed length and fixed time pipelined organization

Register Organization -- 16 GPRs under User mode

-- 5 Shadow registers under FIQ mode -- 5 SP registers for exception mode stack handling -- 5 LR registers for exception handling -- 5 SPSRs to handle status flag contents -- 1 CPSR to indicate status of ALU registers

Page 5: TI Enhanced ARM925T Core

CPU Details

• Register Bank with 37 registers• 32 bit Address & Data Bus.• ALU• Barrel Shifter.• Multiplier.

Page 6: TI Enhanced ARM925T Core

Registers

• The ARM core has a total of 37 registers.– 31 general-purpose registers, including a

program counter. These registers are 32 bits wide.

– 6 status registers. These are also 32 bits wide, but only 32 bits are allocated or need to be implemented.

Page 7: TI Enhanced ARM925T Core
Page 8: TI Enhanced ARM925T Core

Saved Program Status Registers (SPSRs)

• The SPSRs are used to store the CPSR when an exception is taken.One SPSR is accessible in each of the exception-handling modes.

• User mode and System mode do not have an SPSR because they are not exception handling modes.

Page 9: TI Enhanced ARM925T Core

Current Program Status Register(CPSR)

• The CPSR holds:

• copies of the Arithmetic Logic Unit (ALU) status flags

• the current processor mode

• interrupt disable flags.

• The ALU status flags in the CPSR are used to determine whether conditional instructions are executed or not.

• On Thumb-capable processors, the CPSR also holds the current processor state (ARM or Thumb).

Page 10: TI Enhanced ARM925T Core

Program counter(pc)• The program counter is accessed as r15 (or pc). It is

incremented by one word (four bytes) for each instruction in ARM state, or by two bytes in Thumb state.

• Branch instructions and data opr. Instrns. load the destination address into the program counter. For example, to return from a subroutine, copy the link register into the program counter using:

• MOV pc,lr• During execution, r15 does not contain the address of the

currently executing instruction. The address of the currently executing instruction is typically pc– 8 for ARM, or pc– 4 for Thumb.

Page 11: TI Enhanced ARM925T Core

Memory Interface

• ARM data bus (32-bit)

-To ease connection to sub-word sized memory systems, input data & instruction can be latched on byte by byte basis.

• External data bus

- 32-bit bi-directional bus

- 32-bit unidirectional both data in & out buses.

Page 12: TI Enhanced ARM925T Core

Version 5

• Improve the efficiency of ARM/Thumb interworking in T variants

• Adds some extra instruction in both ARM and Thumb mode

• Adds more instruction options for coprocessor designers

• Some instructions are unconditionally executed.

Page 13: TI Enhanced ARM925T Core

Additional Instructions

• BKPT

• BLX

• CLZ

• CDP2, LDC2, STC2, MCR2, MRC2

• Minor changes with LDR, LDM

Page 14: TI Enhanced ARM925T Core

BKPT Instruction (ARM)

• Causes software breakpoint to occur

• Handled by an exception handler installed on the prefetch abort vector.

• Uses a 16 bit immediate value, but the value is ignored by ARM hardware, but may be used by the debugger to store additional information about breakpoint.

• Unconditional instruction.

• BKPT <immediate>

Page 15: TI Enhanced ARM925T Core

BKPT Instruction (Thumb)

• Causes software breakpoint and uses prefetch abort vector.

• Hardware can optionally override this behaviour.

• Uses 8 bit immediate value.

• BKPT <immediate_8>

Page 16: TI Enhanced ARM925T Core

BLX instruction (ARM)

• Used to call a Thumb subroutine from ARM instruction set.

• Unconditional branching

• Uses 24 bit offset, which gives a range of +32 Mbytes.

• BLX <target_address>

Page 17: TI Enhanced ARM925T Core

BLX instruction (ARM)

• Uses the address specified in a register like BX instruction.

• The least significant bit enters T bit of CPSR.

• BLX {<cond>} <Rm>

Page 18: TI Enhanced ARM925T Core

BLX instruction (Thumb)

• Uses 11 bit offset and works same as BL instruction.

• BLX <target address>

Page 19: TI Enhanced ARM925T Core

BLX instruction (Thumb)

• Uses the target address specified in a register.

• T flag is updated with bit 0 of register specified.

• BLX <Rm>

Page 20: TI Enhanced ARM925T Core

Condition code 0b1111

• Prior to V3 this refers to instruction was never executed (NV)

• In V3 &V4 it is unpredictable.

• In V5 this is used to encode various instructions which can only be executed unconditionally.

Page 21: TI Enhanced ARM925T Core

21

CDP2, LDC2, STC2, MCR2, MRC2

• Causes the conditional field of the instruction to be set to 0b1111.

• This provides additional opcode space for coprocessor designers

• Resulting instructions can only be executed unconditionally.

Page 22: TI Enhanced ARM925T Core

CLZ instruction (ARM)• Count Leading Zeros

• CLZ {cond} <Rd>, <Rm>

• Returns the number of binary zero bits before the first binary one bit in a register value.

• Source register is scanned from the most significant bit towards the least significant bit.

• Result is 32 if no bits are set in the source register and zero if bit 31 is set.

Page 23: TI Enhanced ARM925T Core

LDM instruction (ARM)

• If PC get loaded in the process, then bit 0 of the loaded value determines whether the execution continues in ARM or Thumb mode.

• T bit = Value [0]

Page 24: TI Enhanced ARM925T Core

POP instruction (ARM)

• If PC gets loaded then bit 0 determines whether the execution continues after this branch in ARM state or in thumb state

• T = bit[0]

Page 25: TI Enhanced ARM925T Core

LDR instruction (ARM)

• If the destination register is PC, then bit 0 of the loaded value determines whether the execution continues in ARM or Thumb mode.

• T bit = Value [0]

Page 26: TI Enhanced ARM925T Core

ARM 9 Architecture

• Based on the Harvard Architecture -- Separate bus for Address & Data

-- Allows concurrent Instruction & Data access(reduces CPI of processor)

– Normally uses separate instruction and data cache.

Page 27: TI Enhanced ARM925T Core

ARM 9 Pipeline

• Uses Five Stage pipeline– Instruction Fetch (F)– Instruction Decode (D)– Execute (E)– Data Memory Access (M)– Register Write (W)

Page 28: TI Enhanced ARM925T Core
Page 29: TI Enhanced ARM925T Core

Pipeline stages :Cycle 1 and cycle 2

MemoryAccess

Write

Back

InstructionFetch

Instr. DecodeReg. Fetch

ExecuteAddr. Calc

LMD

ALU

MU

X

Mem

ory

Reg File

MU

XM

UX

Data

Mem

ory

MU

X

SignExtend

4

Ad

der Zero?

Next SEQ PC

PC

Next PC

WB Data

Inst

RD

RS1

RS2

Imm

Page 30: TI Enhanced ARM925T Core

Pipeline Stage 1 & 2

1. Instruction fetch cycle (IF)

load instruction

update program counter

2. Instruction decode / register fetch cycle (ID)

fetch source registers

sign-extend immediate field

Page 31: TI Enhanced ARM925T Core

Pipeline Stage 3

• The third cycle is known as the Execution/ effective address cycle (EX)

• The actions performed in this cycle depend on the type of operations. – Loads and Stores

• calculate effective address– ALU operations

• perform ALU operation– Branch

• compute branch target• determine if the branch is taken

Page 32: TI Enhanced ARM925T Core

Pipeline Stage 4• The fourth cycle is known as the

Memory access / branch completion cycle (MEM)

• The only DLX instructions active in this cycle are loads, stores, and branches– Loads

• load memory onto processor

– Stores

• store data into memory

– Branch

• go to branch target or next instruction

– ALU Operations

• do nothing

Page 33: TI Enhanced ARM925T Core

Pipeline Stage 5

• The fifth cycle is known as the Write-back cycle (WB)

• During this cycles, results are written to the register file– Loads

• write value from memory into register file– ALU Operations

• write ALU result into register file– Stores and Branches

• do nothing

Page 34: TI Enhanced ARM925T Core

Visualizing Pipelining

Instr.

Order

Time (clock cycles)

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 6Cycle 7Cycle 5

Page 35: TI Enhanced ARM925T Core

The MPU core incorporates:

A coprocessor 15 (CP15) and protection module

Data and program memory management units (MMUs) with translation look-aside buffers.

A separate 16K-byte instruction cache and 8K-byte data cache. Both are two-way associative with virtual index virtual tag (VIVT).

A 17-word write buffer (WB)

A local bus interface

The OMAP1510 device uses the TI925T core in little endian mode only.

Page 36: TI Enhanced ARM925T Core

Cache memory minimizes external memory access and allows the use of low-cost RAM while maintaining maximum performance.

Cached cores are also ideal in systems where the processor must share limited bus bandwidth with other devices requiring high data throughput (such as streaming audio or video).

The processor operates at full speed from the cache, leaving the system bus free for use by other devices.

Cache Memory

Page 37: TI Enhanced ARM925T Core

2 Way set-associative mapping

• Compromise between direct mapping and fully associative mapping

• Index same as in direct mapping

• But, each cache address contains content and tags of 2 or more memory address locations

• Tags of that set simultaneously compared as in fully associative mapping

• Cache with set size N called N-way set-associative– 2-way, 4-way, 8-way are common

Tag Index Offset

=

V T D

Data

Valid

V T D

=

Page 38: TI Enhanced ARM925T Core

Instruction Cache

The 16K-byte instruction cache (I-cache) has 1024 lines of 16 bytes arranged as a two-way set-associative cache. It uses the virtual addresses generated by the processor core.

The I-cache is always reloaded one line at a time.

It can be enabled or disabled via the CP15 control register (I_CP15 bit) and is disabled and flushed upon reset.

Page 39: TI Enhanced ARM925T Core

Instruction Cache• When the I-cache is enabled, it is searched whenever

the processor requests an instruction. • If the cache hits, data is returned to the core whether

the MMU is enabled or not. • If a cache read misses, a line fetch is performed and

data is written to the cache following a least recently used (LRU) replacement algorithm.

• For best performance, enable the I-cache as soon as possible after reset. If the I-cache is disabled, it is not searched.

• All instruction fetches generate a single 16-bit or 32-bit external access.

Page 40: TI Enhanced ARM925T Core

Validity of I-Cache• The flush I-cache instruction is fetched at cycle time

0, for example, but not executed until cycle time 4 (the TI925T uses a five-stage opcode pipe).

• Thus, four additional opcodes potentially are still fetched from the I-cache before the flush I-cache opcode is executed.

• Once executed, the entire I-cache is invalidated before the next opcode executes.

• The I-cache content is not flushed when the I-cache is disabled. Its contents remain valid and are accessible again when the I-cache is reenabled.

Page 41: TI Enhanced ARM925T Core

Data CacheThe 8K-byte data cache (D-cache) has 512 lines of 16 bytes arranged as a two-way set-associative cache. It uses the virtual addresses generated by the processor.

The D-cache is always reloaded one line at a time, because it always requires the MMU to be enabled.

The MMU can operate in write-through (WT) or in copy-back (CB) mode.

The translation look-aside buffer (TLB) descriptors that are placed in memory determine which mode is used.

D-cache is disabled and flushed upon reset.

The D-cache supports byte,half-word, and word accesses.

The D-cache is always disabled when the MMU is off.

Page 42: TI Enhanced ARM925T Core

Operation of D-Cache• If the D-cache is enabled, it is searched whenever the

processor performs a data load or store. • If the cache hits on a load, data is returned to the core

regardless of the C_MMU bit. • If a cache read misses, the C_MMU bit is examined. If it is 1,

a line fetch is performed and the line is written to the cache following an LRU (least recently used) replacement algorithm.

• If C_MMU is 0, a single external access is performed and the cache is not updated.

• Stores that hit the D-cache always update it, regardless of the C_MMU bit, to keep the D-cache contents consistent with the external memory.

• Stores that miss do notupdate the D-cache

Page 43: TI Enhanced ARM925T Core

Validity of D-Cache• The D-cache always requires that the MMU be enabled.• The CP15 register allows software to invalidate the entire D-

cache.• Disabling the D-cache and reenabling it does not invalidate it.• If CB mode is used, software must first clean the cache to

make it coherent with main memory • Cleaning is not the same as flushing.• The entire D-cache can be invalidated with a single flush D-

cache instruction through the CP15 cache operation register. • The D-cache is flushed upon reset.• If the D-cache is disabled, its content is maintained valid and

is accessible when the cache is reenabled.

Page 44: TI Enhanced ARM925T Core

Write Buffer

The write buffer (WB) increases system performance and can buffer up to seventeen 32-bit words of data.

The MMU attributes B (B_MMU) and C (C_MMU) (which are part of the TLB descriptor) and the CP15 control register W bit (W_CP15) control WB behavior.

Clearing W_CP15 and C_CP15 upon reset ensures that all accesses are non-bufferable until the MMU is enabled. To use the write buffer,the MMU must be enabled.

Page 45: TI Enhanced ARM925T Core

Enabling Write buffer

• To use the write buffer, you must enable the MMU.

• However, you can enable the two functions simultaneously with a single write to the CP15 control register.

• The write buffer is always disabled when the MMU is off.

• Clearing bit 3 in the CP15 control register disables the write buffer.

Page 46: TI Enhanced ARM925T Core

Coprocessor 15

TI925T operation and configuration are controlled with coprocessor instructions,configuration pins, and the MMU translation tables.

The coprocessor instructions manipulate on-chip registers, which control the configuration of the cache memories, write buffer, MMU.

Page 47: TI Enhanced ARM925T Core

CP15 Register summary

Page 48: TI Enhanced ARM925T Core

Memory Management Unit The MPU MMU performs virtual-to-physical address translations and access permission checks for access to the system memory

provides the flexibility and security required for the OS to manage physical memory space shared by the DSP subsystem and the MPU subsystem.

The MPU MMU provides no protection from DSP shared memory accesses.

The MMU supports memory accesses based on sections or pages: (Sections represent memory blocks of 1M byte).

Three different page sizes are supported: Large pages consist of 64K-byte blocks of memory. Small pages consist of 4K-byte blocks of memory. Tiny pages consist of 1K-byte blocks of memory.

Page 49: TI Enhanced ARM925T Core

The MMU hardware required to perform these functions consists of:

A 64-entry translation look-aside buffer for instructions (I_TLB)

A 64-entry translation look-aside buffer for data (D_TLB)

Access control logic

Translation table walking logic

Page 50: TI Enhanced ARM925T Core

Translation Look-Aside Buffer

The TLB contains entries for virtual-to-physical address translation and access permission checking.

Access control logic

If the TLB contains a translated entry for the VA, the access control logic determines whether the access is permitted.

If access is permitted, the MMU generates the appropriate PA corresponding to the VA. If access is not permitted, the MMU sends an abort signal to TI925T.

Page 51: TI Enhanced ARM925T Core

Translation table walking hardware

Upon a TLB miss, it retrieves the translation and access permission information from the translation table in physical memory. Once retrieved, the page or section descriptor is stored into the TLB at a random location.

Unpredictable behavior occurs if two TLB entries correspond to overlapping areas of memory in the virtual space. This can occur if the TLB is not flushedafter the memory is remapped with different-sized pages.

Page 52: TI Enhanced ARM925T Core

The translation table held in main memory has two levels:

The first-level table can hold both section translation entries and pointers to second-level tables (either fine tables or coarse tables).

The second-level tables can hold large, small, and tiny page translations entries.

Page 53: TI Enhanced ARM925T Core

The MMU generates the following types of faults:

• Alignment fault (on data access only)

• Translation fault

• Domain fault

• Permission fault

Page 54: TI Enhanced ARM925T Core