ARM 64bit has come!

1 ARM 64bit has come! Tetsuyuki Kobayashi 2014.5.23 Japan Technical Jamboree 2014.5.25 Updated for カーネル /VM 探検隊


The first impression of A64 instruction set.

Transcript of ARM 64bit has come!

Page 1: ARM 64bit has come!


ARM 64bit has come!

Tetsuyuki Kobayashi

2014.5.23 Japan Technical Jamboree2014.5.25 Updated for カーネル/VM探検隊

Page 2: ARM 64bit has come!


The latest version of this slide will be available from here

Page 3: ARM 64bit has come!


Who am I?

20+ years involved in embedded systems 10 years in real time OS, such as iTRON 10 years in embedded Java Virtual Machine Now GCC, Linux, QEMU, Android, …

Blogs (Personal) (Corporate)

Twitter @tetsu_koba

Page 4: ARM 64bit has come!

Today's topics

Introduction of ARM 64bit But does not cover all, only something interesting for me :)

Try aarch64 using QEMU

Page 5: ARM 64bit has come!

ARMv8 terminology

AArch64: 64 bit mode 1 instruction set: A64 A64: 32bit fixed length instructions

AArch32: 32 bit mode Upper compatible with ARMv7-A architecture 2 instruction sets: A32, T32 A32: ARM, 32bit fixed length instructions T32: Thumb2, 16bit/32bit instructions

Page 6: ARM 64bit has come!


ARM64 is not official name

In the kernel source arch/arm64

Page 7: ARM 64bit has come!

Exception level

4 levels Typical usage

EL0: User application EL1: Kernel of OS EL2: Hypervisor EL3: Secure monitor

Aarch64/aarch32 can change between exception level

CF. PL0-PL2 (Privilege level) at ARMv7

Page 8: ARM 64bit has come!

Aarch64 execution model

R0 – R30: 64bit length general purpose registers

Wn: lower 32bit Xn: 64bit 32th register means zero register(XZR, WZR) or SP

SP: Stack Pointer Must be 16 byte aligned WSP for lower 32bit

PC: Program Counter Can not use for calculate destination

Page 9: ARM 64bit has come!

Aarch64 execution model (cont.)

V0 – V31: 128 bit length registers For floating point and SIMD Aarch64 must have FPU. No calling standard for

soft-float. Scalar

Bn, Hn, Sn, Dn, Qn Vector

Vn.8B, Vn.16B, Vn.4H, Vn.8H, Vn.2S, Vn.4S, Vn.1D, Vn.2D

FPCR: Floating Point Control Register FPSR: Floating Point Status Register

Page 10: ARM 64bit has come!

Aarch64 addressing model

Without tag: 64bit virtual address With tag: 8bit tag + 56bit virtual address

Tag is ignored when load/store/branch Good for implementing type-less languages

Effective virtual address length is 48bit.

Page 11: ARM 64bit has come!

Calling standard (AAPCS64)

R30 = LR (Link Register) R29 = FP (Frame Pointer) Parameter passing

R0 – R7 for integer and pointer V0 – V7 for float

Callee must preserve R19 – R29, SP V8 – V15

No calling standard for soft-float

Page 12: ARM 64bit has come!

A64 instruction set

Brand-new, clean design for 64bit architecture Not all, very small set of ”conditional data

processing” instructions No equivalent of Thumb2's IT instruction.

Page 13: ARM 64bit has come!

No multiple load/store

No multiple load/store GP registers such as LDM/STM, PUSH/POP

Instead, there are 2 register load/store such as LDP/STP

Page 14: ARM 64bit has come!

YIELD instruction

NOP with hinting not important Use in spin-loop and trigger context

switching in SMT(Symmetric Multi-Threading)

Page 15: ARM 64bit has come!

Sample #1 source

#include <stdio.h>

int main(){

int i;

for (i = 5; i >=0; i--) {printf("count down: %d\n", i);

}return 0;


Page 16: ARM 64bit has come!

Sample #1 Thumb2

000083f8 <main>: 83f8: b570 push {r4, r5, r6, lr} 83fa: 2405 movs r4, #5 83fc: f248 456c movw r5, #33900 ; 0x846c 8400: f2c0 0500 movt r5, #0 8404: 2601 movs r6, #1 8406: 4630 mov r0, r6 8408: 4629 mov r1, r5 840a: 4622 mov r2, r4 840c: f7ff ef7a blx 8304 <_init+0x38> 8410: 3c01 subs r4, #1 8412: f1b4 3fff cmp.w r4, #4294967295 ; 0xffffffff 8416: d1f6 bne.n 8406 <main+0xe> 8418: 2000 movs r0, #0 841a: bd70 pop {r4, r5, r6, pc}

Page 17: ARM 64bit has come!

Sample #1 A64

0000000000400440 <main>: 400440: a9be7bfd stp x29, x30, [sp,#-32]! 400444: 910003fd mov x29, sp 400448: a90153f3 stp x19, x20, [sp,#16] 40044c: 90000014 adrp x20, 400000 <_init-0x3c0> 400450: 528000b3 mov w19, #0x5 // #5 400454: 911a0294 add x20, x20, #0x680 400458: 2a1303e2 mov w2, w19 40045c: 52800020 mov w0, #0x1 // #1 400460: aa1403e1 mov x1, x20 400464: 97ffffeb bl 400410 <__printf_chk@plt> 400468: 51000673 sub w19, w19, #0x1 40046c: 3100067f cmn w19, #0x1 400470: 54ffff41 400458 <main+0x18> 400474: 52800000 mov w0, #0x0 // #0 400478: a94153f3 ldp x19, x20, [sp,#16] 40047c: a8c27bfd ldp x29, x30, [sp],#32 400480: d65f03c0 ret

Page 18: ARM 64bit has come!

Sample #2 source

int iaload(int *base, int index){

return base[index];}

long long laload(long long *base, int index){

return base[index];}

char ibload(char *base, int index){

return base[index];}

short isload(short *base, int index){

return base[index];}

Page 19: ARM 64bit has come!

Sample #2 Thumb2

00000000 <iaload>: 0: f850 0021 ldr.w r0, [r0, r1, lsl #2] 4: 4770 bx lr 6: bf00 nop

00000008 <laload>: 8: eb00 01c1 add.w r1, r0, r1, lsl #3 c: e9d1 0100 ldrd r0, r1, [r1] 10: 4770 bx lr 12: bf00 nop

00000014 <ibload>: 14: 5c40 ldrb r0, [r0, r1] 16: 4770 bx lr

00000018 <isload>: 18: f930 0011 ldrsh.w r0, [r0, r1, lsl #1] 1c: 4770 bx lr 1e: bf00 nop

Page 20: ARM 64bit has come!

Sample #2 A64

0000000000000000 <iaload>: 0: b861d800 ldr w0, [x0,w1,sxtw #2] 4: d65f03c0 ret

0000000000000008 <laload>: 8: f861d800 ldr x0, [x0,w1,sxtw #3] c: d65f03c0 ret

0000000000000010 <ibload>: 10: 3861c800 ldrb w0, [x0,w1,sxtw] 14: d65f03c0 ret

0000000000000018 <isload>: 18: 7861d800 ldrh w0, [x0,w1,sxtw #1] 1c: d65f03c0 ret

Page 21: ARM 64bit has come!

Sample #3 source

double range(double x, double min, double max){

if (x < min) return min;

else if (x > max)return max;

else return x;


Page 22: ARM 64bit has come!

Sample #3 Thumb2

00000000 <range>: 0: eeb4 0bc1 vcmpe.f64 d0, d1 4: eef1 fa10 vmrs APSR_nzcv, fpscr 8: d407 bmi.n 1a <range+0x1a> a: eeb4 0bc2 vcmpe.f64 d0, d2 e: eef1 fa10 vmrs APSR_nzcv, fpscr 12: bfc8 it gt 14: eeb0 0b42 vmovgt.f64 d0, d2 18: 4770 bx lr 1a: eeb0 0b41 vmov.f64d0, d1 1e: 4770 bx lr

Page 23: ARM 64bit has come!

Sample #3 A64

0000000000000000 <range>: 0: 1e612010 fcmpe d0, d1 4: 540000a4 b.mi 18 <range+0x18> 8: 1e622010 fcmpe d0, d2 c: 1e604041 fmov d1, d2 10: 5400004c 18 <range+0x18> 14: 1e604001 fmov d1, d0 18: 1e604020 fmov d0, d1 1c: d65f03c0 ret

Page 24: ARM 64bit has come!

Cache control

Application level cache instructions Data cache


Instruction cache IC IVAU

No need to call kernel syscall JIT friendly

Page 25: ARM 64bit has come!

Preloading cache

PRFM <prfop>, addr|label <prfop> ::= <type><target><policy> <type> ::= PLD | PST | PLI <target> ::= L1 | L2 | L3 <policy> ::= KEEP | STRM

Page 26: ARM 64bit has come!

Non-temporal load/store

LDNP/STNP Hinting unlikely to be accessed again

(like streaming)

Page 27: ARM 64bit has come!


Upper compatible with ARMv7 Added encrypt extension Added other some new instructions

aligned to aarch64 Removed Jazelle, ThumbEE

Page 28: ARM 64bit has come!

Let's try Aarch64 using QEMU

Qemu 2.0 supports aarch64 user mode emulation

Ubuntu 14.04 has qemu 2.0 and cross compiler for aarch64

$ sudo apt-get install qemu-user-static$ sudo apt-get install g++-aarch64-linux-gnu

Page 29: ARM 64bit has come!

Prepare gdb for aarch64

$ sudo apt-get build-dep gdb $ wget $ tar xf gdb-7.7.1.tar.bz2 $ mkdir obj $ cd obj $ ../gdb-7.7.1/configure --target=aarch64-linux-gnu $ make $ sudo make install

Page 30: ARM 64bit has come!

Execute by qemu and connect gdb

$ aarch64-linux-gnu-gcc -g a.c$ export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu/$ qemu-aarch64-static -g 1234 ./a.out

$ aarch64-linux-gnu-gdb ./a.out  ...(gdb) target remote :1234(gdb) b main(gdb) c(gdb) x/i $pc=> 0x4005a0 <main>: stp x29, x30, [sp,#-48]!(gdb)

Page 31: ARM 64bit has come!


Page 32: ARM 64bit has come!



ARMv8Technology Preview ARMv8 Instruction Set Overview ARM®Architecture Reference Manual Procedure Call Standard for theARM 64-bitArch

itecture(AArch64) ARM 64bit ARMv8 の アーキテクチャ の概要

Ubuntu 14.04 arm 64bit(aarch6で4)のコードをコンパイルして動かしてみる

Page 33: ARM 64bit has come!


Any comment?


Thank you for listening!