Post on 14-Dec-2015
ARM System - On - Chip Architecture
2
INTRODUCTION ARM is a RISC processor. It is used for small size and high
performance applications. Simple architecture – low power
consumption.
ARM System - On - Chip Architecture
3
TIMELINE (1/2)
1985: Acorn Computer Group manufactures the first commercial RISC microprocessor.
1990: Acorn and Apple participation leads to the founding of Advanced RISC Machines (A.R.M.).
1991: ARM6, First embeddable RISC microprocessor.
1992 – 1994: Various companies use ARM (Sharp, Samsung), while in 1993 ARM7, the first multimedia microprocessor is introduced.
ARM System - On - Chip Architecture
4
TIMELINE (2/2)
1995: Introduction of Thumb and ARM8. 1996 – 2000: Alcatel, Huindai, Philips, Sony,
use ΑRM, while in 1999 η ARM cooperates with Erickson for the development of Bluetooth.
2000 – 2002: ARM’s share of the 32 – bit embedded RISC microprocessor market is 80%. ARM Developer Suite is introduced.
THE ARM ARCHITECTURE
ARM System - On - Chip Architecture
6
GENERAL INFO (1/2)
AIM: Simple design
Load – store architecture 32 bit data bus 3 addressing modes
ARM System - On - Chip Architecture
7
GENERAL INFO (2/2)
Simple architecture +
Simple instruction set
+
Code density
Small size
Low power consumption
ARM System - On - Chip Architecture
8
Registers 32 general purpose registers 7 modes of operation Different set of visible registers
and different cpsr control level in each mode.
ARM Programming Model
r13_und r14_und r14_irq
r13_irq
SPSR_und
r14_abt r14_svc
user modefiq
modesvc
modeabortmode
irqmode
undefinedmode
usable in user mode
system modes only
r13_abt r13_svc
r8_fiqr9_fiq
r10_fiqr11_fiq
SPSR_irq SPSR_abt SPSR_svc SPSR_fiqCPSR
r14_fiqr13_fiqr12_fiq
r0r1r2r3r4r5r6r7r8r9r10r11r12r13r14r15 (PC)
ARM System - On - Chip Architecture
10
CPSR
N: NegativeZ: ZeroC: CarryV: OverflowQ: Saturation (for enhanced DSP instructions)
ARM CPSR format
N Z C V unused mode
31 28 27 8 7 6 5 4 0
I F T
ARM System - On - Chip Architecture
11
Memory Organization
half-word4
word16
0123
4567
891011
byte0
byte
12131415
16171819
20212223
byte1byte2
half-word14
byte3
byte6
address
bit 31 bit 0
half-word12
word8
Address bus: 32 – bits
1 word = 32 – bits
ARM System - On - Chip Architecture
12
Instruction Set Three instruction types
Data processing Data transfer Control flow
ARM System - On - Chip Architecture
13
Supervisor mode In user mode the operating system
handles operations outside user privileges.
Using “supervisor calls”, the user goes to system level and can perform system functions.
ARM System - On - Chip Architecture
14
I/O System ARM handles peripherals as “memory
mapped devices with interrupt support”. Interrupts:
IRQ: normal interrupt FIQ: fast interrupt
ARM System - On - Chip Architecture
15
Exceptions Exceptions:
Interrupts Supervisor Call Traps
When an exception takes place: The value of PC is copied to r14_exc The operating mode changes into the
respective exception mode. The PC takes the exception handler vector
address.
ARM programming model
r13_und r14_und r14_irq
r13_irq
SPSR_und
r14_abt r14_svc
user modefiq
modesvc
modeabortmode
irqmode
undefinedmode
usable in user mode
system modes only
r13_abt r13_svc
r8_fiqr9_fiq
r10_fiqr11_fiq
SPSR_irq SPSR_abt SPSR_svc SPSR_fiqCPSR
r14_fiqr13_fiqr12_fiq
r0r1r2r3r4r5r6r7r8r9r10r11r12r13r14r15 (PC)
THE ARM INSTRUCTION SET
ARM System - On - Chip Architecture
18
Data Processing Instructions (1/2)
Arithmetic OperationsADD r0, r1, r2 ; r0:= r1+r2 and don’t update flags
ADDS r0, r1, r2 ; r0:= r1+r2 and update flags Logical Operations
AND r0, r1, r2 ; r0:= r1 AND r2 Register Movement
MOV r0, r2 Comparison
CMP r1, r2
ARM System - On - Chip Architecture
19
Data Processing Instructions (2/2)
Operands: Immediate operands
ADD r3, r3, #1 Shifted register operands:
ADD r3, r2, r1, LSL #3
Miscellaneous data processing instructions: Multiplication:
MUL r4, r3, r2
ARM System - On - Chip Architecture
20
Data transfer instructions Load and store instructions:
LDR r0, [r1]STR r0, [r1]
Offset: LDR r0, [r1,#4] Post – indexed: LDR r0, [r1], #16 Auto – indexed: LDR r0, [r1,#16]!
Multiple data transfers:LDMIA r1, {r0,r2,r5}
ARM System - On - Chip Architecture
21
Examples PRE:
r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202
LDR r0, [r1, #4]! POST:
r0 = 0x02020202 r1 = 0x00009004
ARM System - On - Chip Architecture
22
Examples PRE:
r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202
LDR r0, [r1, #4] POST:
r0 = 0x02020202 r1 = 0x00009000
ARM System - On - Chip Architecture
23
Examples PRE:
r0 = 0x00000000 r1 = 0x00009000 mem32[0x00009000] = 0x01010101 mem32[0x00009004] = 0x02020202
LDR r0, [r1], #4 POST:
r0 = 0x01010101 r1 = 0x00009004
ARM System - On - Chip Architecture
24
Examples mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010
LDMIA r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000001 r2 = 0x00000002 r3 = 0x00000003
ARM System - On - Chip Architecture
25
Examples mem32[0x8001c] = 0x04 mem32[0x80018] = 0x03 mem32[0x80014] = 0x02 mem32[0x80010] = 0x01 r0 = 0x00080010
LDMIB r0!, {r1-r3} r0 = 0x0008001c r1 = 0x00000002 r2 = 0x00000003 r3 = 0x00000004
ARM System - On - Chip Architecture
26
Conditional execution Instructions can be executed
conditionally without brachesCMP r2, r3 ;subtract and set flagsADDGE r4, r5, r6 ; if r2>r3SUBLT r4, r5, r6 ; else
ARM System - On - Chip Architecture
27
Conditional execution mnemonics
ARM System - On - Chip Architecture
28
Control flow instructions Branch instruction: B label Conditional branch: BNE label Branch and Link: BL label
BL loop… …
Loop … …… …MOV PC, r14 ;
επιστροφή
ARM System - On - Chip Architecture
29
Example 1AREA ARMex, CODE, READONLY ; Name this block of code ARMexENTRY ; Mark first instruction to execute
startMOV r0, #10 ; Set up parametersMOV r1, #3ADD r0, r0, r1 ; r0 = r0 + r1
stopMOV r0, #0x18 ; angel_SWIreason_ReportExceptionLDR r1, =0x20026 ; ADP_Stopped_ApplicationExitSWI 0x123456 ; ARM semihosting SWIEND ; Mark end of file
ARM System - On - Chip Architecture
30
Example 2AREA subrout, CODE, READONLY ; Name this block of codeENTRY ; Mark first instruction to execute
start MOV r0, #10 ; Set up parametersMOV r1, #3BL doadd ; Call subroutine
stop MOV r0, #0x18 ; angel_SWIreason_ReportExceptionLDR r1, =0x20026 ; ADP_Stopped_ApplicationExitSWI 0x123456 ; ARM semihosting SWI
doadd ADD r0, r0, r1 ; Subroutine codeMOV pc, lr ; Return from subroutineEND ; Mark end of file
ARM ORGANIZATION AND
IMPLEMENTATION
3 – Stage Pipeline (ARM7 – 80MHz)
Fetch Decode Execute
multiply
data out register
instruction
decode
&
control
incrementer
registerbank
address register
barrelshifter
A[31:0]
D[31:0]
data in register
ALU
control
PC
PC
ALU bus
A bus
B bus
register
Throughput: 1 instruction / cycle
ARM System - On - Chip Architecture
33
5 – stage pipeline (1/2)
Program execution time:
Ways to reduce : Increase Logic simplification Reduce CPI reduce the number of
multicycle instructions.
clk
instprog f
CPINT
clkfprogT
5 – stage pipeline (ARM9-150MHz) (2/2)
I-cache
rot/sgn ex
+4
byte repl.
ALU
I decode
register read
D-cache
fetch
instructiondecode
execute
buffer/data
write-back
forwardingpaths
immediatefields
nextpc
regshift
load/storeaddress
LDR pc
SUBS pc
post-index
pre-index
LDM/STM
register write
r15
pc + 8
pc + 4
+4
mux
shift
mul
B, BL
MOV pc
Fetch Decode Execute Buffer /
Data Write -
Back
ARM System - On - Chip Architecture
35
ARM coprocessor interface ARM supports upto 16 coprocessors,
which can be software emulated. Each coprocessor has upto 16 general-
purpose registers ARM is a load and store architecture. Coprocessors usually handle on – chip
functions, such as cache and memory management.
ARCHITECTURAL SUPPORT FOR HIGH – LEVEL LANGUAGES
ARM System - On - Chip Architecture
37
Floating - point accelerator (1/2)
For floating-point operations, ARM has the FPE software emulator and the FPA 10 hardware floating – point accelerator.
FPA 10 includes: Coprocessor interface Load / store unit Register bank ( 8 registers 80 – bit ) ALU (adder, mult, div)
ARM System - On - Chip Architecture
38
Floating - point accelerator (2/2)
coprocessorinterface
instructionissuer
load/storeunit
register bank
arithmeticunit
data bus
pipelinecontrol
coprocessorhand-shake
add
mult
div
ARM System - On - Chip Architecture
39
APCS (1/2)
APCS (ARM Procedure Call Standard) is a set of rules concerning C procedure input and output.
Specific use of general purpose registers. (r0 – r4: arguments, r4 – r8 variables, r10 stack limit, etc. )
Procedure I/O:BL Loop
…Loop …
MOV pc, lr
ARM System - On - Chip Architecture
40
APCS (2/2)
C code
void f1(int a) {
f2(a); }
Assembly code
f1 LDR r0, [r13]STR r13!, [r14]STR r13!, [r0]BL f2SUB r13,#4LDR r13!, r15
Stack pointer
0
4
8
16
THUMB PROGRAMMER’S MODEL
ARM System - On - Chip Architecture
42
General information Thumb objective:
Code density. Thumb has a 16 – bit instruction set. A subset of the ARM instruction set is coded
to a 16–bit space With appropriate use great benefits can be
achieved in terms of Power efficiency Enhanced performance
ARM System - On - Chip Architecture
43
Going in and out of Thumb mode Using the BX instruction, in ARM state:
e.g. ΒΧ r0 Commands are assembled as 16 – bit
instructions with the appropriate directive If r0[0] is 1, the T bit in the CPSR becomes
1 and the PC is set to the address obtained from the remaining bits of r0.
Using the BX instruction from Thumb state, we return to ARM state.
ARM System - On - Chip Architecture
44
The Thumb programmer’s model Thumb registers
r0r1r2r3
r4
r5r6r7
r8r9r10r11
r12SP (r13)LR (r14)PC (r15)
CPSR
Hi registers
Lo registers
shaded registers haverestricted access
ARM System - On - Chip Architecture
45
ARM vs. Thumb (1/3)
Thumb Upto 70% code
size reduction 40% more
instructions. 45% faster code
with 16-bit memory
Requires about 30% less external memory
ARM 40% faster code
when coupled with a 32-bit memory
ARM System - On - Chip Architecture
46
ARM vs. Thumb (2/3)
If performance is critical:
ARM
If cost and power consumption are critical:
Thumb
ARM System - On - Chip Architecture
47
ARM and Τhumb interaction A 32 – bit ARM system can go into Thumb
mode for specific routines, in order to meet power and memory constraints.
A 16 – bit system: Can use an on – chip, 32 – bit memory for ARM state routines, and a 16-bit off – chip memory and Thumb code for the rest of the application.
ARM System - On - Chip Architecture
48
Example 3AREA ThumbSub, CODE, READONLY ; Name this block of codeENTRY ; Mark first instruction to executeCODE32 ; Subsequent instructions are ARM
header ADR r0, start + 1 ; Processor starts in ARM state,BX r0 ; so small ARM code header used
; to call Thumb main programCODE16 ; Subsequent instructions are Thumb
startMOV r0, #10 ; Set up parametersMOV r1, #3BL doadd ; Call subroutine
stopMOV r0, #0x18 ; angel_SWIreason_ReportExceptionLDR r1, =0x20026 ; ADP_Stopped_ApplicationExitSWI 0xAB ; Thumb semihosting SWI
doaddADD r0, r0, r1 ; Subroutine codeMOV pc, lr ; Return from subroutineEND ; Mark end of file
ARM System - On - Chip Architecture
49
Example 4 Implement the following pseudocode in
ARM and Thumb assembly. Which is more efficient in terms of execution time and which in terms of code size?
If r1>r2 thenR3= r4 + r5R6 = r4 – r5ElseR3= r4 - r5R6 = r4 + r5
ARM System - On - Chip Architecture
50
Example 5 Write an ARM assembly program
that loads data from memory location 0x40, sets bits 3 to 5, clears bits 0 to 2 and leaves the remaining bits unchanged.
Test it using 0xAD as input data
ARCHITECTURAL SUPPORT FOR SYSTEM
DEVELOPMENT
The ARM memory interface
ROM
D[7:0]
ROM
D[7:0]
ROM
D[7:0]
ROM
D[7:0]
SRAM
D[7:0]
SRAM
D[7:0]
SRAM
D[7:0]
SRAM
control
D[7:0]
D[31:0] D[31:24] D[23:16] D[15:8] D[7:0]
A[n+2:2] A[n+2:2] A[n+2:2] A[n+2:2]
A[m+2:2] A[m+2:2] A[m+2:2] A[m+2:2]
RAMoe
RAMwe3 RAMwe2 RAMwe1 RAMwe0
ROM0e
ARM
D[31:0]
A[31:0]
A basic ARM
memory
system
ARM System - On - Chip Architecture
53
AMBA (1/4)
Advanced Microcontroller Bus Architecture Advanced High – Performance Bus Advanced System Bus Advanced Peripheral Bus
AMBA objectives: Technology – independence To encourage modular system design
ARM System - On - Chip Architecture
54
AMBA (2/4)
A typical AMBA – based system
ARM System - On - Chip Architecture
55
AMBA (3/4)
decoder
address
write
data
readdata
master3
master2
master1
arbiter
slave3
slave2
slave1
AHB bus Burst
transaction Split
transaction Data bus 64
– 128 bit
ARM System - On - Chip Architecture
56
AMBA (4/4)
AMBA Design Kit (ADK) An environment that assists designers in developing
ΑΜΒΑ based components και SoC designs.
ARM System - On - Chip Architecture
57
Signal Processing Support (1/2)
Piccolo DSP coprocessor. Various data memories for
maximizing throughput.
Signal Processing Support (2/2)
Piccolo
ARM7TDMI
AMBA i/fAMBA i/f
deco
de a
nd
control
mult
I cache
registerbank
ALU
inputbuffer
outputbuffer
AMBA
MEMORY HIERARCHY
ARM System - On - Chip Architecture
60
Memory hierarchyLarger size Lower speed
Memory Memory typetype
SizeSize SpeedSpeed
Registers 32 – bit A few nsec
On – chip cache
8 – 32kbytes
10 nsec
Off – chip cache
100 – 200 kbytes
10 – 30 nsec
RAM Mbytes 100 nsec
ARM System - On - Chip Architecture
61
On – chip memory Necessary for performance Some system prefer RAM to on – chip
cache. Simpler, cheaper and less power-hungry.
ARM System - On - Chip Architecture
62
Cache types Cache types:
Unified cache. Separate instruction and data caches.
Performance: hit rate – miss rate
Compulsory miss: first time and address is accessed Capacity miss: When cache full Conflict miss: Two addresses compete for the same
place in the cache
maincacheav thhtt )1(
ARM System - On - Chip Architecture
63
Replacement policy -implementation
Least Recently Used (LRU) Least Frequently Used (LFU) Data prediction
Fully-associative Direct-mapped Set-associative
ARM System - On - Chip Architecture
64
Direct – mapped cache (1/2)
A line of data stored in a tag of memory
data RAMtag RAM
compare mux
datahit
address
ARM System - On - Chip Architecture
65
Direct – mapped cache (2/2)
Each memory location has a specific place in the cache.
Tag and data can be accessed at the same time.
Tag RAM smaller than data RAM and has a smaller access time allowing the comparison to complete before accessing the data RAM.
2 – way set – associative cache. (1/3)
data RAMtag RAM
compare mux
address
data RAMtag RAM
compare mux
datahit
ARM System - On - Chip Architecture
67
Set associative cache (2/3)
A set – associative cache has a number of sets yielding n – way associative cache.
Two addresses that would be competing for the same spot in a direct mapped cache, can be stored in different locations and accessed independently.
ARM System - On - Chip Architecture
68
Set associative (3/3)
Set selection: Random allocation Least recently used (LRU) Round – robin (cyclic)
Fully associative (1/2)
data RAMtag CAM
mux
datahit
address
ARM System - On - Chip Architecture
70
Write strategies Write – through
All write operations are passed to main memory
Write – through with buffered writeWrite operations are passed to main memory through the write buffer
Copy – back (write – back)Write operations update only the cache.
ARM System - On - Chip Architecture
71
Cache feature summary
Org ani zat i o nal feature Opti o nsCache-MMU re l at i o ns hi p Physical cache Virtual cacheCache co ntents Unified instruction
and data cacheSeparate instructionand data caches
As s o c i at i v i ty Direct-mappedRAM-RAM
Set-associativeRAM-RAM
Fully associativeCAM-RAM
Repl acement s trateg y Cyclic Random LRUWri te s trateg y Write-through Write-through with
write bufferCopy-back
ARM System - On - Chip Architecture
72
‘Perfect’ cache performance
Cache fo rm Perfo rmanceNo cache 1Instruction-only cache 1.95Instruction and data cache 2.5Data-only cache 1.13
ARM System - On - Chip Architecture
73
MMU (1/3)
Two memory management approaches:
Segmentation Paging
ARM System - On - Chip Architecture
74
MMU (2/3)
Segmented memory management:
segment descriptor table
logical addresssegment selector
physical address
+ >?
access fault
base limit
ARM System - On - Chip Architecture
75
MMU (3/3)
Paging memory management:
logical address
pagedirectory
pagetable
pageframe
31 22 21 12 11 0
data
ARCHITECTURAL SUPPORT FOR OPERATING SYSTEMS
ETM
SSP(PL022)
SCI(PL131)
UART(PL011)
GPIO(PL061)
AHB/APBBridge
Timers&
RTC(PL031)
W'Dog
AHB/APBBridge
MPMC(PL176)
VIC(PL192)
ARM1136JFcore
DMAC(PL080)
CLCD(PL110)
SMC(PL093)
AHB/APBBridge
Bus Matrix
1. ARM Periph AHB2. ARM D Write AHB3. ARM D Read AHB4. ARM I AHB5. ARM DMA AHB6. CLCD AHB7. DMA 2 AHB8. DMA 1 AHB
1.2.3.4.5.6.7.8.
14 ExternalInterrupts
Trace PortAnalyser
CLCDDisplay
8 external DMArequests
} 8 AHBs
2x UARTs
Smart Card(UICCcompliant)
ExternalReset &
Battery Fail
ExternalClock
32 GPIOLines
SDRAM& DDR
StaticMemory
SystemControl
config
config
un
ass
ign
ed
64 64 6464
64
64
64
64
ARM System - On - Chip Architecture
77
CP15 On – chip coprocessor for MMU,
cache, protection unit control. Control takes place through registers
with instructions executed in supervisor mode.
ARM System - On - Chip Architecture
78
Protection Unit Simpler alternative to the MMU.
Requires simpler software and hardware.
Does not use translation tables, but 8 protection regions instead.
ARM DEVELOPER SUITE
ARM System - On - Chip Architecture
80
ARMULATOR (1/2)
Armulator: Emulator of various ARM processors.
Allows project development in C, C++ or Assembly.
It includes debugger, compilers, assembler and this entire set is called ARM Developer Suite (ADS).
ARM System - On - Chip Architecture
81
ARMULATOR (2/2)
Possible project options: ARM and Thumb Interworking Mixing C, C++ and Assembly Code for ROM Exception handlers
MM
ARM System - On - Chip Architecture
82
ARMULATOR TUTORIAL CODEWARRIOR ENVIRONMENT
ARM System - On - Chip Architecture
83
ARM System - On - Chip Architecture
84
ARM System - On - Chip Architecture
85
ARM System - On - Chip Architecture
86
ARM System - On - Chip Architecture
87