ARM把新的64位指令集叫做A64，运行在AArch64状态。原来的指令集叫做A32和T32，这两种指令集运行在AArch32状态，和ARMv7兼容。

ARMv8架构也包含一些32位指令集增强，使用这些特性，将和旧的ARMv7实现不兼容。

更多详细的A64汇编语言描述可参考ARM Compiler armasm Reference Guid v6.01

The ARMv8 instruction sets

A64指令集和A32指令集相似，指令是32位宽度，语法也相似。

A64指令集有一些优化：

A consistent encoding scheme

Wide range of constants

Data types are easier

Long offsets

64 bits Pointers

Conditional constructs are used instead of IT blocks

Shift and rotate behavior is more intuitive

Code generation

Fixed-length instructions

Three operands map better

寻址优化

Exclusive accesses

Increased PC-relative offset addressing

Unaligned address support

Bulk transfers

Load/Store

Alignment checking

The A64 instruction set

6.1 Instruction mnemonics 指令助记符

根据不同的操作数寄存器名字来区分，如

ADD W0, W1, W2

ADD X0, X1, X2

6.2 Data processing instructions

数据处理指令基本上使用一个目标寄存器和两个操作数。通常的形式：

Instruction Rd, Rn, Operand2

第二个操作数可以为register，modified register或者一个立即数，寄存器可以为X寄存器或者W寄存器。

数据处理指令包括

Arithmetic and logical operations 算术和逻辑运算

Move and shift operations 移动和移位运算

Instruction for sign and zero extension 符号扩展和0扩展

Bit and bitfield manipulation 位和位域管理

Conditional comparison and data processing 条件比较和数据处理

6.2.1 Arithmetic and logical operations

Arithmetic

ADD, SUB, ADC, SBC, NEG

Logic

AND, BIC, ORR, ORN, EOR, EON

Comparision

CMP, CMN, TST

Move

MOV, MVN

一些指令有S后缀，意味着这些指令设置标志位。ADC和SBC考虑carry condition flag。

6.2.2 Multiply and divide instruction

MADD, MNEG, MSUB, MUL, SMADDL, SMNEGL, SMSUBL ...

SDIV UDIV

6.2.3 Shift operations

Logical Shift Left (LSL)

Logical Shift Right (LSR)

Arithmetic Shift Right (ASR)

Rotate right (ROR)

6.2.4 Bitfield and byte manipulation instructions

符号扩展SXTB,SXTH,SXTW, 无符号扩展UXTB, UXTH

Bit Field Insert (BFI), signed and unsigned Bit Field Extract (S/U)BFX

BFXIL/UBFIZ/SBFIZ

其他的位维护指令，类似ARMv7架构

CLZ

RBIT

REV

REV16

REV32

6.2.5 Conditional instructions

CMP指令，N,Z,C,V标志

条件选择指令

CSEL w1, w1, w2, EQ

CINC X0, X0, LS 如果小于，则X0=X0+1

条件设置

CSET W0, EQ

相等设置为1，不等设置为0

6.3 Memory access instructions

和之前的所有ARM处理器一样，ARMv8架构是一个Load/Store架构。这意味着所有的数据处理指令不能直接访问内存。

ARMv8支持访问未对齐的数据。当然exclusive accesses, load acquire, store release不支持未对齐地址。

6.3.1 Load instruction format

LDR Rt, <addr>

LDRB 8位,0扩展

LDRSB 8位，符号扩展

LDRH

LDRSH

LDRSW

6.3.2 Store instruction format

STR Rn, <addr>

存储的大小可以比寄存器的大小小，通过指定B或H后缀

6.3.3 Floating-point and NEON scalar loads and stores

可以访问浮点和NEON寄存器。

如LDR D0, [X0, X1]

6.3.4 Specifying the address for a Load or Store instruction

在A64中，地址操作数的base寄存器必须为X寄存器。

Offset modes

相对base寄存器的偏移模式，可以加一个立即数或者是一个可选修改的寄存器值。

Example instruction	Description
LDR X0, [X1]	地址为X1
LDR X0, [X1, #8]	地址为X1+8
LDR X0, [X1, X2]	地址为X1+X2
LDR X0, [X1, X2, LSL, #3]	地址为X1+(X2<<3)
LDR X0, [X1, W2, SXTW]	地址为X1+sign_extend(W2)
LDR X0, [X1, W2, SXTW, #3]	地址为X1+(sign_extend(W2)<<3)

Index modes

和偏移模式类似，但是会更新base寄存器。

Example instruction	Description
LDR X0, [X1, #8]!	Pre-index：新更新X1为X1+8，然后从新地址加载。
LDR X0, [X1], #8	Post-index：先从X1加载数据，然后再更新X1为X1+8.
STP X0, X1, [SP, #-16]!	将X0, X1推入栈
LDP X0, X1, [SP], #16	将栈弹出到X0, X1

PC-relative modes (load-literal)

A64添加了另一种寻址模式，通过literal pools访问。literal pools是编码到指令流中的数据块，pools不会被执行。但是他们的数据可以通过附近的指令使用PC-relative memory addresses访问。

A32和T32，PC可以直接访问，所以可以通过将PC作为base寄存器实现literal pool。

Example instruction	Description
LDR W0, <label>	从<label>处加载4个字节到W0
LDR X1, <label>	从<label>处加载8个字节到X0
LDRSW X0, <label>	从<label>处加载4个字节，并符号扩展进X0
LDR S0, <label>
LDR D0, <label>
LDR Q0, <label>

注意：label必须4字节对齐

6.3.5 Accessing multiple memory locations

A64不包括Load Multiple (LDM)和Store Multiple (STM)指令。但是有Load Pair (LDP)和Store Pair (STP)指令，这两个指令只支持2个整数寄存器。立即数只支持一个 scaled 7-bit signed immediate value。支持可选的pre-或者post-increment。

Load and Store pair	Description
LDP W3, W7, [X0]	W3=[X0], W7=[X0+4]
LDP X8, X2, [X0, #0x10]!	X0=X0+0x10, X8=[X0], X2=[X0+8]
LDPSW X3, X4, [X0]	从X0加载4个字节，符号扩展到到X3，从X0+4执行类似加载
LDP D8, D2, [X11], #0x10	D8=[X11], D2=[X11+8] X11=X11+0x10
STP X9, X8, [X4]	[X4]=X9, [X4+8]=X8

6.3.6 Unpriviledged access

6.3.7 Prefetching memory

PRFM <prfop>, <addr> | label

通常用来使指令或数据被cache

6.3.8 Non-temporal and store pair

这个ARMv8的新概念，有两个指令LDNP和STNP，TAMEN暗示给内存系统，caching该数据是没用的，这个暗示病不禁止内存系统的行为，如caching of the address, preload或者gathering。然而他表达了caching可能不会提高性能，典型的场景是流数据。

Non-temporal loads and stores放松了内存顺序的需求。因此可能需要一个显式的加载屏障。

LDR X0, [X3]

DMB nshld

LDNP X2, X1, [X0]

6.3.9 Memory access atomicity

对齐的内存访问，使用一个通用寄存器，被保证是原子的。

对齐的，通用寄存器的，Load pair和store pair被认为是两个独立的原子操作。

未对齐的访问不是原子的。

浮点和SIMD内存访问不保证是原子的。

6.3.10 Memory barrier and fence instructions

ARMv7和ARMv8提供不同barrier operations的支持。

Data Memory Barrier (DMB)

Data Synchronization Barrier (DSB)

Instruction Synchronization Barrier (ISB)

ARMv8引入one-sided fences，和Relase Consistency model相关，它们被称作Load-Acquire (LDAR)和Store-Release (STLR)，是arress-based synchronization primitives。

6.3.11 Synchronization primitives

ARMv7和ARMv8架构都提供exclusive memory accesses支持。在A64中，它们为Load/Store exclusive (LDXR/STXR) pair.

LDXR指令从内存加载一个值，and attempts to silently claim an exclusive lock on the address. 然后Store-Exclusive instruction将一个新值写入到地址，如果成功获得锁的话。

LDXR/STXR paring 用来实现标准的同步原语，如spinlock。还提供LDXRP/STXRP，CLREX用来清除monitors。

6.4 Flow control

无条件相对跳转范围为前后128MB，有条件的跳转位前后1MB

6.5 System control and other instructions

A64指令集包含与下述相关的指令

Exception handling

System register access

Debug

Hint instructions

6.5.1 Exception handling instructions

引起一个异常，用来进入更高的异常等级，OS(EL1), Hypervisor (EL2), Secure Monitor (EL3)

SVC #imm16

HVC #imm16

SMC #imm16

立即数可以通过Exception Syndrome Register访问。

使用ERET从异常返回，这个指令从SPSR_ELn恢复处理器状态，跳转到ELR_ELn。

6.5.2 System register access

MRS Xt, <system register> 将系统寄存器拷贝到通用寄存器

MSR <system register>, Xt 将通用寄存器拷贝到系统寄存器

6.5.3 Debug instructions

BRK #imm16 进入monitor mode debug，那里有on-chip debug monitor code

HLT #imm16 进入halt mode debug，连接有外部调试硬件

6.5.4 Hint instructions

可被认为是NOP，他们的影响跟实现相关，通常跟多核处理器以及电源管理相关。

NOP

YIELD

WFE

WFI

SEV

SEVL

6.5.5 NEON Instructions

6.5.6 Floating-point instructions

6.5.7 Cryptographic instructions

ILD